Announcing Postgres to Doltgres Replication

DOLTGRESFEATURE RELEASE
5 min read

Introduction

Last month we published a deep dive on how to build a Postgres logical replication system in Go. Today we're excited to announce the release of this feature, available in the latest version of Doltgres.

doltgres

Why replicate from Postgres to Doltgres?

We built replication from Postgres to Doltgres because an early customer wanted to track how the data in his primary Postgres database was changing over time. This is a great use case for Doltgres in its pre-beta-release days: you can use it in addition to your primary Postgres server, instead of switching database products to one not quite ready for production. It's effectively free in terms of performance from the perspective of your existing Postgres installation, and you get a complete version history of every change made to your primary database that you can query with the standard set of Dolt functions, like DOLT_DIFF().

Let's see how to set it up.

Enabling Postgres to Doltgres replication

Getting Doltgres replicating from Postgres involves making configuration changes on both Postgres and the Doltgres sides. All of this is detailed in the official docs, but here's a quick overview.

First, turn on the correct level of replication in Postgres. This requires a server restart.

% echo "wal_level = logical" >> /var/lib/postgresql/data/postgresql.conf

Then run some commands as the postgres super user to create some replication slots on the primary:

% export DBNAME=postgres
% export TABLE=t1
% export SLOTNAME=doltgres_slot
% psql "dbname=$DBNAME replication=database" -c "CREATE PUBLICATION $SLOTNAME FOR TABLE $TABLE;"
% psql "dbname=$DBNAME replication=database" -c "CREATE_REPLICATION_SLOT $SLOTNAME LOGICAL pgoutput;"

Finally, start up doltgres with an appropriate config file that enables replication.

  log_level: info

  behavior:
    read_only: false
    autocommit: true
    persistence_behavior: load
    dolt_transaction_commit: true

  user:
    name: "doltgres"
    password: "password"

  listener:
    host: localhost
    port: 5433
    max_connections: 100
    read_timeout_millis: 28800000
    write_timeout_millis: 28800000

  postgres_replication:
    postgres_server_address: 127.0.0.1
    postgres_user: postgres
    postgres_password: password
    postgres_database: postgres
    postgres_port: 5432
    slot_name: doltgres_slot

Let's see a demo of this in action.

Demo

First I created a table t1 on both the primary and the replica:

create table t1 (a int primary key, b int);

Then I started replication on the primary for this table using the psql commands above. Finally, I started the Doltgres replica with the config file above. Then I connected to the primary Postgres server with psql and started issuing some queries.

postgres=# insert into t1 values (1, 2);
INSERT 0 1
postgres=# insert into t1 values (3, 4);
INSERT 0 1
postgres=# delete from t1 where a = 1;
DELETE 1
postgres=# update t1 set b = 10;
UPDATE 1

I see a bunch of replication information in the log for the Doltgres replica:

2024/04/22 16:31:12 BeginMessage: &{{66} 0/542E9468 2024-04-22 16:31:12.85403 -0700 PDT 1882880}
2024/04/22 16:31:12 replicating query: START TRANSACTION
2024/04/22 16:31:12 Sent Standby status message with WALWritePosition = 0/542E9219, WALApplyPosition = 0/542E9301
2024/04/22 16:31:12 XLogData (*pglogrepl.RelationMessageV2) => WALStart 0/0 ServerWALEnd 0/0 ServerTime 2024-04-22 16:31:12.856936 -0700 PDT
2024/04/22 16:31:12 Sent Standby status message with WALWritePosition = 0/542E9219, WALApplyPosition = 0/1
2024/04/22 16:31:12 XLogData (*pglogrepl.InsertMessageV2) => WALStart 0/542E9300 ServerWALEnd 0/542E9300 ServerTime 2024-04-22 16:31:12.856975 -0700 PDT
2024/04/22 16:31:12 replicating query: INSERT INTO public.t1 (a, b) VALUES (1, 2)
2024/04/22 16:31:12 Sent Standby status message with WALWritePosition = 0/542E9219, WALApplyPosition = 0/542E9301
2024/04/22 16:31:12 XLogData (*pglogrepl.CommitMessage) => WALStart 0/542E9498 ServerWALEnd 0/542E9498 ServerTime 2024-04-22 16:31:12.856988 -0700 PDT
2024/04/22 16:31:12 CommitMessage: &{{67} 0 0/542E9468 0/542E9498 2024-04-22 16:31:12.85403 -0700 PDT}
2024/04/22 16:31:12 replicating query: COMMIT
2024/04/22 16:31:12 Writing LSN 0/542E9468 to file

Now when I connect to the Doltgres replica with psql, I can see that the table has the expected contents:

doltgres=> select * from t1;
 a | b
---+----
 3 | 10
(1 row)

And I can query the history to see how the data changed on the primary. Here's the log of all changes:

doltgres=> select * from dolt_log;
           commit_hash            |   committer   |      email       |        date         |          message
----------------------------------+---------------+------------------+---------------------+----------------------------
 gdi6ioptbm6qda4uuqt38mnh01qv3bef | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:31:51 | Transaction commit
 79ap78nes15k6ta5dknp6cqgbglutk4d | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:31:35 | Transaction commit
 3l3mami9v0qu1ggel39d4smsl9j27aia | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:31:27 | Transaction commit
 nv6899loipsbm3955vnvo8g1o74gaqho | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:31:12 | Transaction commit
 65oq43mtpghr8376an0nhthad868nt74 | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:30:05 | Transaction commit
 54vfim5l5mccsskhlfl97l4egpl84087 | Zach Musgrave | zach@dolthub.com | 2024-03-18 22:17:49 | Initialize data repository
(6 rows)

And here's examining a few of the diffs in these transactions:

doltgres=> select * from dolt_diff('HEAD~', 'HEAD', 't1');
 to_a | to_b | to_commit |     to_commit_date      | from_a | from_b | from_commit |    from_commit_date     | diff_type
------+------+-----------+-------------------------+--------+--------+-------------+-------------------------+-----------
    3 |   10 | HEAD      | 2024-04-22 23:31:51.382 |      3 |      4 | HEAD~       | 2024-04-22 23:31:35.073 | modified
(1 row)

doltgres=> select * from dolt_diff('HEAD~2', 'HEAD', 't1');
 to_a | to_b | to_commit |     to_commit_date      | from_a | from_b | from_commit |    from_commit_date     | diff_type
------+------+-----------+-------------------------+--------+--------+-------------+-------------------------+-----------
      |      | HEAD      | 2024-04-22 23:31:51.382 |      1 |      2 | HEAD~2      | 2024-04-22 23:31:27.714 | removed
    3 |   10 | HEAD      | 2024-04-22 23:31:51.382 |      3 |      4 | HEAD~2      | 2024-04-22 23:31:27.714 | modified
(2 rows)

All of the various table functions and system tables to explore the history of the tables work as you would expect. For example, if I want to see the table as it existed at every commit in its history, I can use the dolt_history_t1 system table:

doltgres=> select * from dolt_history_t1;
 a | b  |           commit_hash            |   committer   |     commit_date
---+----+----------------------------------+---------------+---------------------
 3 | 10 | gdi6ioptbm6qda4uuqt38mnh01qv3bef | Zach Musgrave | 2024-04-22 23:31:51
 3 |  4 | 79ap78nes15k6ta5dknp6cqgbglutk4d | Zach Musgrave | 2024-04-22 23:31:35
 1 |  2 | 3l3mami9v0qu1ggel39d4smsl9j27aia | Zach Musgrave | 2024-04-22 23:31:27
 3 |  4 | 3l3mami9v0qu1ggel39d4smsl9j27aia | Zach Musgrave | 2024-04-22 23:31:27
 1 |  2 | nv6899loipsbm3955vnvo8g1o74gaqho | Zach Musgrave | 2024-04-22 23:31:12
(5 rows)

Conclusion

Doltgres is free and open source, so go check it out if you're interested in replicating your PostgreSQL primary to a data store with built-in diff capabilities. Doltgres isn't ready for production use yet, but because this feature works alongside your existing Postgres installation, it's zero risk to try it out. Doltgres can be a great way to understand and keep track of how your Postgres data is changing over time.

Have questions about DoltgreSQL or Postgres replication? Join us on Discord to talk to our engineering team and meet other Dolt users.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.