System Design - basic - Sharding in horizontal scaling of databases

Sharding in horizontal scaling of databases is a technique used to distribute data across multiple database servers to enhance performance, scalability, and availability. Here’s a detailed explanation:

What is Sharding?

Sharding involves breaking up a large database into smaller, more manageable pieces called shards. Each shard holds a portion of the total data and runs on a separate database server. The shards work together to form the complete dataset.

Horizontal vs. Vertical Scaling

Vertical Scaling (Scaling Up): Adding more resources (CPU, RAM, storage) to a single server.
Horizontal Scaling (Scaling Out): Adding more servers to handle the load. Sharding is a form of horizontal scaling.

How Sharding Works

Data Partitioning: Data is divided into shards based on a shard key. The shard key can be a specific column or set of columns that determines how data is distributed.
Shard Key Selection: The choice of shard key is crucial as it impacts data distribution and performance. Common shard keys include:
- Range-based Sharding: Data is divided into ranges based on the shard key. For example, if sharding by user ID, user IDs 1-1000 might go to shard 1, 1001-2000 to shard 2, and so on.
- Hash-based Sharding: A hash function is applied to the shard key, and data is distributed based on the hash value. This helps achieve more even data distribution.
- Geographical Sharding: Data is divided based on geographic regions.
Shard Management: Each shard operates independently but is part of the overall system. Data requests are routed to the appropriate shard based on the shard key.
Query Routing: A middleware or application logic is used to route queries to the correct shard(s). This ensures that the database client doesn’t need to know the details of the underlying sharding.

Benefits of Sharding

Scalability: Adding more shards increases the database capacity.
Performance: Distributing data across multiple servers can improve read and write performance by reducing the load on each server.
Availability: In case of a failure, only the data on the failed shard is affected, not the entire dataset.

Challenges of Sharding

Complexity: Managing and maintaining multiple shards can be complex.
Data Distribution: Uneven data distribution can lead to hotspots where some shards handle more load than others.
Cross-Shard Queries: Queries that span multiple shards can be more complicated and less efficient.
Consistency: Ensuring data consistency across shards, especially in transactions, can be challenging.

Example Scenario

Consider an online store with millions of users and transactions:

Shard Key: User ID
Shards: 4 shards (each on a separate server)
- Shard 1: User IDs 1-250,000
- Shard 2: User IDs 250,001-500,000
- Shard 3: User IDs 500,001-750,000
- Shard 4: User IDs 750,001-1,000,000

When a user with ID 123,456 logs in, the system routes the request to Shard 1. If another user with ID 678,901 makes a purchase, the request is routed to Shard 3.

Conclusion

Sharding is a powerful technique for horizontally scaling databases to handle large volumes of data and high traffic. By carefully selecting a shard key and managing shards effectively, organizations can achieve significant improvements in performance, scalability, and availability.

It seems there might be a small confusion here. The correct term is “sharding,” not “shading.” Sharding derives from the word “shard,” which means a fragment or piece of a whole. The term is used to describe the process of dividing a database into smaller, more manageable pieces.

Why is it Called Sharding?

Shard: In English, a shard refers to a small part or piece of a larger object, often broken off from the main body. Similarly, in database sharding, the entire database is divided into smaller parts called shards.
Fragmentation: The concept of sharding involves breaking the database into fragments or shards. Each shard is a complete and independent subset of the database that can operate on its own.
Distributed Storage: By distributing these shards across multiple servers, the database can handle more load and store more data than a single server could manage on its own.

Key Concepts:

Shard Key: A key that determines how data is divided into shards. The shard key ensures that data is evenly distributed across the shards.
Shard: Each individual part of the larger database. Shards can reside on separate servers or even in different geographic locations.
Horizontal Scaling: Adding more servers (shards) to handle the increased load, as opposed to vertical scaling, which involves adding more resources (CPU, RAM) to a single server.

Example:

Imagine you have a large book and you tear it into smaller sections, distributing each section to different people to read. Each person has a shard of the book. Together, all the people represent the entire book, but each one holds only a part of it. This way, multiple people can read different sections at the same time, speeding up the process.

Conclusion:

Sharding is called sharding because it involves dividing a large database into smaller, manageable pieces called shards. These shards help distribute the load and data across multiple servers, improving performance and scalability. The term “shard” aptly describes these fragments of the larger whole, making the process of database partitioning both efficient and effective.

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mfbz.cn/a/756653.html

如若内容造成侵权/违法违规/事实不符，请联系我们进行投诉反馈qq邮箱809451989@qq.com，一经查实，立即删除！