[ $davids.sh ] — david shekunts blog

How Notion Sharded PostgreSQL

# [ $davids.sh ] · message #144

How Notion Sharded PostgreSQL

**https://www.notion.so/blog/sharding-postgres-at-notion

In brief:

  • Sharding is a technique for splitting one database instance into multiple (shards) with data divided among them to distribute the load
  • The biggest complexity is choosing an algorithm for dividing data among shards so that 1 query only affects 1 shard (ideally)
  • Notion decided to shard by workspace ID, because in fact, if a person does something, they only do it within one workspace ID
  • For them, the optimal size of 1 table is 500Gb, and of one instance is 10Tb (a note about PSQL capabilities)
  • They divided everything into 32 databases with 15 schemas each (480 schemas). This is convenient because 480 can be divided without a remainder into many numbers, allowing for proportional increases in volume in small steps (e.g., adding 4 databases at a time)
  • They chose a migration strategy through an audit log - a separate table is created to write operations, and then a separate script picks up these records from the table and applies them to the necessary shard instances
  • Next, they had a separate script for checking data validity, one of its functions was Dark read (in production, when reading data, we read from both the old and new databases and verify the data is the same)
  • The identifier in the tables is a composite key of the entity ID and workspace ID, but it would be much simpler to make 1 ID that combines both columns

All powerful pumping 💪

Article on Medium