Dolt Reflog

FEATURE RELEASE
10 min read

DoltDB is the world's first fully-versioned relational database. You can branch, merge, diff, push, and pull your relational data in the same ways that Git allows you to work with your source code files.

We're passionate about making Dolt the safest place for your most important data. Having a versioned history of all your relational data means it's really hard to lose data. If an application bug or operator error deletes data it shouldn't have, you can diff your data to see exactly what was deleted and easily restore it. Dolt provides a lot of powerful tools for working with your databases and some of those tools can modify history or modify metadata that isn't versioned, such as branches and tags. Lately, we've been digging into some of these cases and seeing how we can make them even safer, too. For example, last month we talked about how we made drop database recoverable. Today, we're talking about how the Dolt reflog makes operations on named references, such as branches and tags, recoverable.

Dolt Reflog

Dolt Reflog

If you're familiar with Git, you may have come across the Git reflog before. If not, don't worry, it's not a commonly used feature of Git, but it's an easy feature to understand and use. Before we explain what a reflog, or reference log, is, we first need to explain what a reference is.

In Git, and in Dolt, a reference (or "ref", for short) is similar to a variable in a program. A variable has a name that can be used to reference the value that variable holds, and that variable can be assigned different values. Similarly, a ref has a name that can be used to reference a commit that the reference points to, and that ref can be assigned to point at different commits. Branches and tags are the most common refs that you interact with in Git, but there are several other types of refs, too. When you make a commit in Dolt or in Git, the ref for the branch you currently have checked out is updated to point at that new commit.

So, refs are a pretty simple concept that you've already been using. A reflog is simply a history of what commits a ref has pointed to. We can look at a reflog and see how a ref has changed over time to point at different commits. That can be useful in situations such as when a commit we previously referenced is no longer reachable, or when a branch or tag has been deleted and we need to restore it.

The data from Dolt's reflog comes from Dolt's journaling chunk store. This data is local to a Dolt database and never included when pushing, pulling, or cloning a Dolt database. This means when you clone a Dolt database, it will not have any reflog data until you perform operations that change what commit branches or tags reference.

Dolt's reflog is very similar to Git's reflog, but there are a few differences to be aware of:

  • The initial version of Dolt's reflog only supports showing a log of named references, such as branches and tags, and not Git's special refs (e.g. HEAD, FETCH-HEAD, MERGE-HEAD).
  • The Dolt reflog can be queried for the log of references, even after those references have been deleted. In Git, once a branch or tag is deleted, the reflog for that ref is also deleted and to find the last commit a branch or tag pointed to you have to use Git's reflog for the special HEAD ref to find the commit, which can sometimes be a little challenging to sift through. Dolt makes this much easier by allowing you to still see the history of a deleted ref so you can easily see the last commit a branch or tag pointed to before it was deleted.

Dolt's Reflog in Action! ⚡️

Okay, enough talk about Dolt's reflog, let's jump in and start using it!

The example below shows a few ways that Dolt's reflog can help you recover from changing branches and tags. The first section shows how to find a commit that is no longer reachable after dolt_reset() was used to move a branch to an earlier commit. The next section shows how to restore a branch that was deleted.

Let's start by creating some data and some Dolt commits to test with. You can follow along by creating a Dolt database locally – just run dolt init in a new directory and then launch the Dolt sql shell by running dolt sql.

-- Create a test table on the default, main branch 
create table t(pk int primary key);
call dolt_commit('-Am', 'adding table t');

-- Add a row and commit it on the main branch
insert into t values(1);
call dolt_commit('-Am', 'adding row 1');

-- Checkout a new branch named branch1 and create some commits 
call dolt_checkout('-b', 'branch1');
insert into t values(100);
call dolt_commit('-Am', 'adding row 100');
insert into t values(1000);
call dolt_commit('-Am', 'adding row 1000');
insert into t values(10000);
call dolt_commit('-Am', 'adding row 10000');

Now that we've got some sample data created and we've got the branch1 branch checked out, let's take a look at the dolt_log system table and the output of the dolt_reflog() table function. Note that the output is very similar, but there are some important differences. Notably, Dolt's reflog only shows the commits that a branch or tag pointed directly to. That means the commits for "adding table t" and "Initialize data repository" don't show up in the reflog for branch1 because branch1 never pointed directly at either of those commits. Another important difference is that the date field in dolt_log isn't the same concept as the ref_timestamp field returned by dolt_reflog() (even though they are the same in this example). The date field from dolt_log shows us when a commit was created, where as the ref_timestamp field returned by dolt_reflog() shows us when a ref was set to point that commit. We'll see a concrete example of this difference when we use dolt_reset() to change what the branch1 ref points to in just a moment.

-- dolt_log shows us all the commits that are reachable from our current branch (branch1)
select * from dolt_log;
+----------------------------------+-----------+----------------+---------------------+----------------------------+
| commit_hash                      | committer | email          | date                | message                    |
+----------------------------------+-----------+----------------+---------------------+----------------------------+
| ur97jo956pl54hvsvvev7hbrjncqfgb0 | root      | root@localhost | 2023-10-27 19:36:52 | adding row 10000           |
| e5okejmah5dqaccmllk1jgkqfidsmur2 | root      | root@localhost | 2023-10-27 19:36:48 | adding row 1000            |
| ecfl4l7j3uf8rbf1ro3l7v2po2ut5750 | root      | root@localhost | 2023-10-27 19:36:44 | adding row 100             |
| 1mgo4rvee5mqnnbj3a7vikv8m6ge2d0u | root      | root@localhost | 2023-10-27 19:36:32 | adding row 1               |
| iub6852sn9r9afjoh0s36o0h4it5ulc6 | root      | root@localhost | 2023-10-27 19:36:26 | adding table t             |
| dolt0vfrq699dv2l9jtrhvv5r9m4mnvv | root      | root@localhost | 2023-10-27 19:25:31 | Іnіtіalizе datа rеposіtory |
+----------------------------------+-----------+-------------------------+---------------------+----------------------------+

-- dolt_reflog('branch1') shows us all the commits that branch1 has directly referenced, which doesn't include any
-- commits create 
select * from dolt_reflog('branch1');
+--------------------+---------------------+----------------------------------+------------------+
| ref                | ref_timestamp       | commit_hash                      | commit_message   |
+--------------------+---------------------+----------------------------------+------------------+
| refs/heads/branch1 | 2023-10-27 19:36:52 | ur97jo956pl54hvsvvev7hbrjncqfgb0 | adding row 10000 |
| refs/heads/branch1 | 2023-10-27 19:36:52 | e5okejmah5dqaccmllk1jgkqfidsmur2 | adding row 1000  |
| refs/heads/branch1 | 2023-10-27 19:36:48 | ecfl4l7j3uf8rbf1ro3l7v2po2ut5750 | adding row 100   |
| refs/heads/branch1 | 2023-10-27 19:36:43 | 1mgo4rvee5mqnnbj3a7vikv8m6ge2d0u | adding row 1     |
+--------------------+---------------------+----------------------------------+------------------+

Restoring a Branch to an Unreachable Commit

The dolt_reset() stored procedure allows you to change the commit that a branch points to. After we use dolt_reset() to make branch1 point to an older commit, the descendant commits (i.e. "adding row 10000") are no longer reachable from our branch head. The data in those commits still lives in our database, but if they remain unreachable from any references, they'll be garbage collected and permanently removed the next time you run dolt gc. You can see this difference clearly in the output of dolt_log and dolt_reflog() below. The dolt_log system table no longer mentions the "adding row 10000" commit, because it isn't reachable from branch1. However, we can still see that commit in the dolt_reflog() output, since it's part of the history of what commits that reference pointed to. Note also that the commit for "adding row 1000" appears twice in the output of dolt_reflog() since branch1 has pointed to that commit two separate times. You can see that the first time branch1 referenced that commit, the ref_timestamp field from dolt_reflog() matches the date field, but the second time branch1 references that commit it does not. This is because the ref_timestamp field shows when the reference was set to point at that commit, which is not necessarily the same timestamp as when the commit was created.

call dolt_reset('e5okejmah5dqaccmllk1jgkqfidsmur2');

-- dolt_log only shows the commit reachable from the current commit our branch head references
select * from dolt_log;
+----------------------------------+-----------+----------------+---------------------+----------------------------+
| commit_hash                      | committer | email          | date                | message                    |
+----------------------------------+-----------+----------------+---------------------+----------------------------+
| e5okejmah5dqaccmllk1jgkqfidsmur2 | root      | root@localhost | 2023-10-27 19:36:48 | adding row 1000            |
| ecfl4l7j3uf8rbf1ro3l7v2po2ut5750 | root      | root@localhost | 2023-10-27 19:36:44 | adding row 100             |
| 1mgo4rvee5mqnnbj3a7vikv8m6ge2d0u | root      | root@localhost | 2023-10-27 19:36:32 | adding row 1               |
| iub6852sn9r9afjoh0s36o0h4it5ulc6 | root      | root@localhost | 2023-10-27 19:36:26 | adding table t             |
| dolt0vfrq699dv2l9jtrhvv5r9m4mnvv | root      | root@localhost | 2023-10-27 19:25:31 | Іnіtіalizе datа rеposіtory |
+----------------------------------+-----------+----------------+---------------------+----------------------------+

-- dolt_reflog('branch1') shows a history of what commits branch1 has ever pointed to, including a timestamp of when 
-- the ref was set to point at that commit
select * from dolt_reflog('branch1');
+--------------------+---------------------+----------------------------------+------------------+
| ref                | ref_timestamp       | commit_hash                      | commit_message   |
+--------------------+---------------------+----------------------------------+------------------+
| refs/heads/branch1 | 2023-10-27 19:42:46 | e5okejmah5dqaccmllk1jgkqfidsmur2 | adding row 1000  |
| refs/heads/branch1 | 2023-10-27 19:36:52 | ur97jo956pl54hvsvvev7hbrjncqfgb0 | adding row 10000 |
| refs/heads/branch1 | 2023-10-27 19:36:52 | e5okejmah5dqaccmllk1jgkqfidsmur2 | adding row 1000  |
| refs/heads/branch1 | 2023-10-27 19:36:48 | ecfl4l7j3uf8rbf1ro3l7v2po2ut5750 | adding row 100   |
| refs/heads/branch1 | 2023-10-27 19:36:43 | 1mgo4rvee5mqnnbj3a7vikv8m6ge2d0u | adding row 1     |
+--------------------+---------------------+----------------------------------+------------------+

Before dolt_reflog(), if we had reset a branch back too far, it could have been hard to find those commits that weren't referenced any longer. With Dolt's reflog this becomes very easy. We can see the history of commits a reference has pointed to, and simply call dolt_reset() again to point to the commit we want.

Restoring a Deleted Branch

Restoring a deleted branch works very similarly to resetting a branch to an unreachable commit. Let's delete our branch branch1 and show how we can use the reflog to recreate it at the last commit it was referencing.

-- Checkout the main branch and delete branch1
call dolt_checkout('main');
call dolt_branch('-D', 'branch1');

-- Oops! We've realized we deleted a branch we really need! To recreate it, we need to know what it was last pointing at.
-- We can use dolt_reflog('branch1') to see a history of the commits branch1 referenced and take the most recent commit.
select * from dolt_reflog('branch1');
+--------------------+---------------------+----------------------------------+------------------+
| ref                | ref_timestamp       | commit_hash                      | commit_message   |
+--------------------+---------------------+----------------------------------+------------------+
| refs/heads/branch1 | 2023-10-27 19:52:09 | e5okejmah5dqaccmllk1jgkqfidsmur2 | adding row 1000  |
| refs/heads/branch1 | 2023-10-27 19:36:52 | ur97jo956pl54hvsvvev7hbrjncqfgb0 | adding row 10000 |
| refs/heads/branch1 | 2023-10-27 19:36:52 | e5okejmah5dqaccmllk1jgkqfidsmur2 | adding row 1000  |
| refs/heads/branch1 | 2023-10-27 19:36:48 | ecfl4l7j3uf8rbf1ro3l7v2po2ut5750 | adding row 100   |
| refs/heads/branch1 | 2023-10-27 19:36:43 | 1mgo4rvee5mqnnbj3a7vikv8m6ge2d0u | adding row 1     |
+--------------------+---------------------+----------------------------------+------------------+

-- The Dolt reflog shows that the last commit branch1 referenced was: e5okejmah5dqaccmllk1jgkqfidsmur2
-- All we need to do is recreate a branch with the same name pointing to that commit.
call dolt_branch('branch1', 'e5okejmah5dqaccmllk1jgkqfidsmur2');

-- If we check out branch1 we'll see it has the rows we expect in our table 
call dolt_checkout('branch1');
select * from t;
+------+
| pk   |
+------+
| 1    |
| 100  |
| 1000 |
+------+

-- If we look at the commit history in dolt_log, we'll see it's all still there, too
select * from dolt_log;
+----------------------------------+-----------+----------------+---------------------+----------------------------+
| commit_hash                      | committer | email          | date                | message                    |
+----------------------------------+-----------+----------------+---------------------+----------------------------+
| e5okejmah5dqaccmllk1jgkqfidsmur2 | root      | root@localhost | 2023-10-27 19:36:48 | adding row 1000            |
| ecfl4l7j3uf8rbf1ro3l7v2po2ut5750 | root      | root@localhost | 2023-10-27 19:36:44 | adding row 100             |
| 1mgo4rvee5mqnnbj3a7vikv8m6ge2d0u | root      | root@localhost | 2023-10-27 19:36:32 | adding row 1               |
| iub6852sn9r9afjoh0s36o0h4it5ulc6 | root      | root@localhost | 2023-10-27 19:36:26 | adding table t             |
| dolt0vfrq699dv2l9jtrhvv5r9m4mnvv | root      | root@localhost | 2023-10-27 19:25:31 | Іnіtіalizе datа rеposіtory |
+----------------------------------+-----------+-------------------------+---------------------+----------------------------+

Wrap Up

The Dolt reflog allows you to inspect the history of named references, like branches and tags, and see what commits they have pointed to. This is useful when a commit becomes unreachable and needs to be restored or when a branch or tag has been deleted and needs to be restored. One of the things customers love about Dolt is the safety Dolt provides – because every version of your data is tracked, it is VERY hard to lose data with Dolt and you always have a way to audit the history of data changes. The Dolt reflog improves on this by giving you more tools to see the history of references, even though those references themselves aren't versioned, like all of your table data is.

If you have comments, questions, or feature request ideas, please swing by the DoltHub Discord and say hello! Our dev team hangs out on Discord every day while we're working and we're always happy to talk about databases, versioning, and data safety! 🤓

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.