- Published on
Database partitioning and sharding
- Authors
- Name
- sinhnt
- @sinhnt
Indexing
Heads up
The term partitioning (Postgres) used in this guidance refers to the process of dividing a large database table into smaller, more manageable pieces.
Docker compose
Docker-compose.yml file:
services:
postgres:
image: postgres:latest
container_name: pg_demo
ports:
- "5433:5432"
environment:
POSTGRES_PASSWORD: ${DATABASE_PASSWORD}
POSTGRES_USER: ${DATABASE_USER}
POSTGRES_DB: ${DATABASE_NAME}
volumes:
- pgdata:/var/lib/postgresql/data
restart: always
volumes:
pgdata:
networks:
my-local-network:
name: my-local-network
To run:
docker compose up -d
To access the psql cli:
docker exec -it pg_demo psql -U postgres -d postgres
Create table
DROP TABLE IF EXISTS phonebook_1;
CREATE TABLE phonebook_1 (
id SERIAL PRIMARY KEY,
name TEXT,
phone_number TEXT,
city_code TEXT,
created_at DATE
);
Dump data
-- Insert 250k rows per year (2021 -> 2025)
INSERT INTO phonebook_1 (name, phone_number, city_code, created_at)
SELECT
'Name_' || g,
'0900' || LPAD((g % 10000)::TEXT, 4, '0'),
'CITY_' || (g % 100), -- 100 unique city codes
DATE '2021-01-01' + (g % 1825) -- spread across ~5 years
FROM generate_series(1, 1000000) g;
Search for Contact in specific location (Try before and after indexing)
EXPLAIN ANALYZE SELECT * FROM phonebook_1 WHERE city_code = 'CITY_42';
Create Index on city code
CREATE INDEX idx_city_code ON phonebook_1(city_code);
DROP INDEX idx_city_code;
Partitioning
Create Partition table (LIST by Country)
CREATE TABLE phonebook_2 (
id SERIAL,
name TEXT,
phone_number TEXT,
city_code TEXT,
created_at DATE
) PARTITION BY LIST (city_code);
-- Generate 100 partitions
DO $$
BEGIN
FOR i IN 0..99 LOOP
EXECUTE format('
CREATE TABLE phonebook_city_%s PARTITION OF phonebook_2
FOR VALUES IN (''CITY_%s'')', i, i);
END LOOP;
END $$;
Dump data again (For new table)
-- Insert 250k rows per year (2021 -> 2025)
INSERT INTO phonebook_2 (name, phone_number, city_code, created_at)
SELECT
'Name_' || g,
'0900' || LPAD((g % 10000)::TEXT, 4, '0'),
'CITY_' || (g % 100), -- 100 unique city codes
DATE '2021-01-01' + (g % 1825) -- spread across ~5 years
FROM generate_series(1, 1000000) g;
Query
Single partition query
EXPLAIN ANALYZE SELECT * FROM phonebook_2 WHERE city_code = 'CITY_42';
Cross partition query:
EXPLAIN ANALYZE SELECT * FROM phonebook_2 WHERE name = 'Name_1';
You can try to index to see the speed is double increasing
CREATE INDEX idx_city_code ON phonebook_1(city_code);
DROP INDEX idx_city_code;