System Design - basic - Sharding in horizontal scaling of databases

Sharding in horizontal scaling of databases is a technique used to distribute data across multiple database servers to enhance performance, scalability, and availability. Here’s a detailed explanation:

What is Sharding?

Sharding involves breaking up a large database into smaller, more manageable pieces called shards. Each shard holds a portion of the total data and runs on a separate database server. The shards work together to form the complete dataset.

Horizontal vs. Vertical Scaling

  • Vertical Scaling (Scaling Up): Adding more resources (CPU, RAM, storage) to a single server.
  • Horizontal Scaling (Scaling Out): Adding more servers to handle the load. Sharding is a form of horizontal scaling.

How Sharding Works

  1. Data Partitioning: Data is divided into shards based on a shard key. The shard key can be a specific column or set of columns that determines how data is distributed.
  2. Shard Key Selection: The choice of shard key is crucial as it impacts data distribution and performance. Common shard keys include:
    • Range-based Sharding: Data is divided into ranges based on the shard key. For example, if sharding by user ID, user IDs 1-1000 might go to shard 1, 1001-2000 to shard 2, and so on.
    • Hash-based Sharding: A hash function is applied to the shard key, and data is distributed based on the hash value. This helps achieve more even data distribution.
    • Geographical Sharding: Data is divided based on geographic regions.
  3. Shard Management: Each shard operates independently but is part of the overall system. Data requests are routed to the appropriate shard based on the shard key.
  4. Query Routing: A middleware or application logic is used to route queries to the correct shard(s). This ensures that the database client doesn’t need to know the details of the underlying sharding.

Benefits of Sharding

  • Scalability: Adding more shards increases the database capacity.
  • Performance: Distributing data across multiple servers can improve read and write performance by reducing the load on each server.
  • Availability: In case of a failure, only the data on the failed shard is affected, not the entire dataset.

Challenges of Sharding

  • Complexity: Managing and maintaining multiple shards can be complex.
  • Data Distribution: Uneven data distribution can lead to hotspots where some shards handle more load than others.
  • Cross-Shard Queries: Queries that span multiple shards can be more complicated and less efficient.
  • Consistency: Ensuring data consistency across shards, especially in transactions, can be challenging.

Example Scenario

Consider an online store with millions of users and transactions:

  • Shard Key: User ID
  • Shards: 4 shards (each on a separate server)
    • Shard 1: User IDs 1-250,000
    • Shard 2: User IDs 250,001-500,000
    • Shard 3: User IDs 500,001-750,000
    • Shard 4: User IDs 750,001-1,000,000

When a user with ID 123,456 logs in, the system routes the request to Shard 1. If another user with ID 678,901 makes a purchase, the request is routed to Shard 3.

Conclusion

Sharding is a powerful technique for horizontally scaling databases to handle large volumes of data and high traffic. By carefully selecting a shard key and managing shards effectively, organizations can achieve significant improvements in performance, scalability, and availability.

It seems there might be a small confusion here. The correct term is “sharding,” not “shading.” Sharding derives from the word “shard,” which means a fragment or piece of a whole. The term is used to describe the process of dividing a database into smaller, more manageable pieces.

Why is it Called Sharding?

  1. Shard: In English, a shard refers to a small part or piece of a larger object, often broken off from the main body. Similarly, in database sharding, the entire database is divided into smaller parts called shards.
  2. Fragmentation: The concept of sharding involves breaking the database into fragments or shards. Each shard is a complete and independent subset of the database that can operate on its own.
  3. Distributed Storage: By distributing these shards across multiple servers, the database can handle more load and store more data than a single server could manage on its own.

Key Concepts:

  • Shard Key: A key that determines how data is divided into shards. The shard key ensures that data is evenly distributed across the shards.
  • Shard: Each individual part of the larger database. Shards can reside on separate servers or even in different geographic locations.
  • Horizontal Scaling: Adding more servers (shards) to handle the increased load, as opposed to vertical scaling, which involves adding more resources (CPU, RAM) to a single server.

Example:

Imagine you have a large book and you tear it into smaller sections, distributing each section to different people to read. Each person has a shard of the book. Together, all the people represent the entire book, but each one holds only a part of it. This way, multiple people can read different sections at the same time, speeding up the process.

Conclusion:

Sharding is called sharding because it involves dividing a large database into smaller, manageable pieces called shards. These shards help distribute the load and data across multiple servers, improving performance and scalability. The term “shard” aptly describes these fragments of the larger whole, making the process of database partitioning both efficient and effective.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mfbz.cn/a/756653.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

Workbench密码登录登录失败

Workbench密码登录登录失败操作系统禁用了密码登录方式,会导致使用了正确的用户名和密码仍无法登录 sudo vim /etc/ssh/sshd_config 输入O进入编辑 改完后重启 systemctl restart sshd.service 登录报错 有试了几遍登上了 可能是改完还要等一会儿

Python:探索高效、智能的指纹识别技术(简单易懂)

目录 概括 导入库 函数一 参数: 函数二 函数三 主函数 运行结果 src: model_base 7.bmp ​编辑 总结 概括 指纹识别是一种基于人体生物特征的身份验证技术。它通过捕捉和分析手指上的独特纹路和细节特征,实现高准确度的身份识别。…

账做错了怎么办?看完这篇三分钟学会调错账|柯桥职业技能培训

作为会计遇到错账、漏账在所难免。既然错误在所难免,如果纠正错误就十分重要。今天就跟小编一起学下如何调账。 在处理错账之前,我们首先要把会计科目做一下分类,以便于我们找到错账的类型和原因。会计科目可以分为资产负债类科目&#x…

2.SQL注入-字符型

SQL注入-字符型(get) 输入kobe查询出现id和邮箱 猜测语句,字符在数据库中需要用到单引号或者双引号 select 字段1,字段2 from 表名 where usernamekobe;在数据库中查询对应的kobe,根据上图对应上。 select id,email from member where usernamekobe;编写payload语…

Emp.dll文件丢失?理解Emp.dll重要性与处理常见问题

在繁多的动态链接库(DLL)文件中,emp.dll 可能不是最广为人知的,但在特定软件或环境中,它扮演着关键角色。本文旨在深入探讨 emp.dll 的功能、重要性以及面对常见问题时的解决策略。 什么是 emp.dll? Emp.d…

【Java Gui精美界面】IDEA安装及配置SwingX

SwingX 是一个基于 Swing 的 Java GUI 库,旨在为 Swing 提供额外的功能和丰富的组件 特点描述基于 Swing继承了 Swing 的所有特性和功能。丰富组件SwingX 提供了一组高级 UI 组件,例如 TreeTable仍在发展中不活跃的发展ing。。。支持搜索高亮如 TreeTab…

【单元测试】Controller、Service、Repository 层的单元测试

Controller、Service、Repository 层的单元测试 1.Controller 层的单元测试1.1 创建一个用于测试的控制器1.2 编写测试 2.Service 层的单元测试2.1 创建一个实体类2.2 创建服务类2.3 编写测试 3.Repository 1.Controller 层的单元测试 下面通过实例演示如何在控制器中使用 Moc…

【漏洞复现】飞企互联——SQL注入

声明:本文档或演示材料仅供教育和教学目的使用,任何个人或组织使用本文档中的信息进行非法活动,均与本文档的作者或发布者无关。 文章目录 漏洞描述漏洞复现测试工具 漏洞描述 飞企互联-FE企业运营管理平台是一个基于云计算、智能化、大数据…

JAVA高级进阶13单元测试、反射、注解

第十三天、单元测试、反射、注解 单元测试 介绍 单元测试 就是针对最小的功能单元(方法),编写测试代码对其进行正确性测试 咱们之前是如何进行单元测试的? 有啥问题 ? 只能在main方法编写测试代码,去调用其他方法进行测试。 …

Django 自定义过滤器

1,编写自定义过滤器并注册 创建目录 Test/app5/templatetags 分别创建文件 Test/app5/templatetags/__init__.py Test/app5/templatetags/myfilter.py 添加过滤器脚本 Test/app5/templatetags/myfilter.py from django import template register template.…

shell:处理命令行参数 获取用户输入

1. 命令行参数 1.1 位置参数 bash shell会将一些称为位置参数(positional parameter)的特殊变量分配给输入到命令行中的 所有参数。这也包括shell所执行的脚本名称。位置参数变量是标准的数字:$0是程序名,$1是第 一个参数,$2是第二个参数,依…

基于C语言的Jacobi迭代和Gauss-Seidel迭代的方程组求解实现

文章目录 Jacobi迭代方法介绍Gauss-Seidel迭代方法介绍具体代码实现示例题目实现效果 Jacobi迭代方法介绍 Jacobi迭代法是一种简单的迭代求解方法,适用于严格对角占优矩阵。其基本思想是利用当前迭代步的已知解来更新下一个迭代步的解。在C语言实现中,我…

网格处理库 pmp-library 编译及应用笔记 -- 已全部解决√

多边形网格处理库Polygon Mesh Processing Library,简称pmp-library的 编译及应用笔记 – 已全部解决√ 官网:https://www.pmp-library.org/index.html 代码:https://github.com/pmp-library/pmp-library 平台:Ubuntu1 20.04&…

Python功能制作之使用streamlit做一个简单的WebUI

使用Streamlit创建WebUI 1. 什么是Streamlit Streamlit 是一个开源的Python库,用于快速创建美观的Web应用。 它适合数据科学家和机器学习工程师,因为它能够以最小的代码量将数据应用程序带到浏览器中。通过简单的Python脚本,可以创建交互式…

C++中的三大池:线程池,内存池,数据库连接池

C中有三大池,即我们常说的:线程池,内存池,数据库连接池。 一.线程池 多线程同时访问共享资源造成数据混乱的原因就是因为CPU的上下文切换导致,线程池就是为了解决此问题而生。 多线程常用的有:std::threa…

基于Spring Boot的校园失物招领系统

1 项目介绍 1.1 研究的背景及意义 在网络时代飞速发展的今天,随着网络技术日臻完善,我们的生活方式正经历深刻变革。在物质追求日益增长的同时,提升个人精神境界也成为了现代人的共同向往,而阅读则是滋养心灵、丰富精神世界的重…

【SpringBoot3学习 | 第1篇】SpringBoot3介绍与配置文件

文章目录 前言 一. SpringBoot3介绍1.1 SpringBoot项目创建1. 创建Maven工程2. 添加依赖(springboot父工程依赖 , web启动器依赖)3. 编写启动引导类(springboot项目运行的入口)4. 编写处理器Controller5. 启动项目 1.2 项目理解1. 依赖不需要写版本原因2. 启动器(Starter)3. Sp…

C++——探索智能指针的设计原理

前言: RAII是资源获得即初始化, 是一种利用对象生命周期来控制程序资源地手段。 智能指针是在对象构造时获取资源, 并且在对象的声明周期内控制资源, 最后在对象析构的时候释放资源。注意, 本篇文章参考——C 智能指针 - 全部用法…

Arduino - TM1637 4 位 7 段显示器

Arduino - TM1637 4 位 7 段显示器 Arduino-TM1637 4 位 7 段显示器 A standard 4-digit 7-segment display is needed for clock, timer and counter projects, but it usually requires 12 connections. The TM1637 module makes it easier by only requiring 4 connectio…

电通出席2024年世界经济论坛(WEF),重申推动可持续发展创新和人才培育的承诺

中国,上海——电通将出席世界经济论坛2024年新领军者年会(夏季达沃斯),本次大会将于6月25日至6月27日在中国大连举行。 2024年世界经济论坛主题为“未来增长的新前沿”,将聚焦于全球经济复苏、通胀缓解,以…