Dolt Corruption Challenge

8 min read

You want $1000? We want to give you $1000.

A couple of weeks ago, we announced support of dolt fsck to allow our users to ensure that their Dolt database isn't corrupted. We're so confident in the data model that we're offering $1000 to anyone who can tamper with a Dolt database and avoid detection with dolt fsck. Read on for more details!

The Rules

The rules are pretty simple.

  1. Alter the dolt-mangle database such that a query on the altered database produces different results than on the unaltered database. The checked out HEAD must at least appear to be the commit 5c2gra4nvk9d9tv3k4b1c9jqio73e2ap. Note that commit is an empty signed commit, indicating that I have given it my stamp of approval.
  2. Run dolt fsck on the repository and have it complete without finding any errors.
  3. Reward goes to the first person to find a given bug. Any number of unique submissions can be made, so long as they uncover new defects.

You can submit your entries by emailing security@dolthub.com. A zip of the .dolt directory should be sufficient. If you can get your changes into a pull request on DoltHub.com, that's next level and we will take that too. Finally, if you ever want to talk to Dolt developers, our Discord server is the best way to get our attention. If you aren't sure how to get your results to us, just ask for help!

Any submissions which do not reside strictly in the .dolt contents will be disqualified. IE, if you send us a trojan horse which infects our computers, you won't get a reward. You'll get a lawsuit instead!

Background Information

Knowing a few things about where the data is in your database may help you get started. First, you need to understand that Dolt data is stored in content addressed objects. The documentation for Dolt covers the topic pretty deeply. Understanding the Prolly Tree will be essential if you want to modify user data. IE, what you would typically think of as data in your tables. If you want to alter the shape of the history, say commit structure or contents, then you should delve into the Commit Graph. Finally, the format that is on disk is covered here (and that applies to all chunks which is everything).

There is also the Journal format, and the archive format. All of these would be places that you could attempt to insert corrupt/fraudulent data.

Hacking on Dolt

You are going to want to run Dolt code in a debugger, and to do that you are going to need to build Dolt from source. Dolt's source code is public on GitHub, and building from source is documented here.

In my example code below, I'll give you some code to create a corrupt Table File. There are probably other ways you can perform this challenge without the code, but at the very least you'll move more quickly if you look at it. IMO, this is the benefit of Open Source. We invite you to try and find the bugs by giving you the source.

Specifics for Our Database

Clone the database:

$ dolt clone dolthub/dolt-mangle

This will create the dolt-mangle directory, and within it will be a .dolt/noms directory, which contains Dolt data files.

$ cd dolt-mangle
dolt-mangle$ find .dolt/noms -type f
.dolt/noms/gben1ou6r8jt1sa6gtdg7igavsc46uhc
.dolt/noms/manifest
.dolt/noms/LOCK
.dolt/noms/b1co5d1h1teedcrp4aeujd6idjn4atru
.dolt/noms/fdn6sdb6rbb1efigfa39bp1p575p3ou9
.dolt/noms/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
.dolt/noms/journal.idx
.dolt/noms/3fnvllnrdns5lr343a2bfrj10o29s648

Dolt data files come in three forms currently, two of which you can see here.

  1. Journal: the vvvvv...vvvvv file is the journal file which the database writes to for any local updates prior to garbage collection or pushing changes to another Dolt database. Since you just cloned the database, there should not be any useful data in the journal file. The journal.idx file is an optional file which exists to speed loading of the journal file. Journal generated here
  2. Table Files: These are the 32 character files which look random. These files are what are transported between Dolt databases, and you will have data in these files as a result of cloning. The .dolt/noms/manifest file is important as well. The manifest file lists which Table files exist in the current noms dir. If you open yours, you'll see text like b1co5d1h1teedcrp4aeujd6idjn4atru:132, which says that the b1co... file has 132 chunks. Altering this file is fair game. Table files are generated here
  3. Archive files: No examples of archives here, but they look like Table Files with a suffix of .darc. If you manage to break archive files that's fair game. Generated here

If you dolt gc the database, it will take all of the Table Files and put them into the .dolt/noms/oldgen directory as a new Table file. You can alter your local repository in any way you see fit, but remember that the signed commit, 5c2gra4nvk9d9tv3k4b1c9jqio73e2ap is the one we will checkout before verifying results. If you want to hack on archives run dolt archive after you run dolt gc.

My Failed Attempt

As part of developing dolt fsck, I had to create some corrupted files for testing. This is a fork of Dolt, and this single commit demonstrates how you could alter a Table File.

Expose nbs Package

The NBS package is where the Block Store code is, and much of it is package private. For this reason, most of the code you care about is in that package in the mangle_hooks.go.

In this file you'll find some interesting hints at what we're doing.

On line 36 we loop through all the tables of the newGen chunk store (see comment to use oldGen). Each table is used to create an index which then allows us to loop over all chunks in the object store.

Line 50 is where we create a new persister which will write the hacked Table File. Note that this code is fairly blunt - it re-writes all Table Files. That's only required because I don't know off hand which Table File contains the commit chunk for 3pdd8aasraqh1tmuedjmcr5nr2fccud2.

Line 69 is where we determine that we've found the object we want to corrupt.

Finally on lines 98 and 99 you can see where we alter the timestamp in the commit. The rest of the code is just writing and finalizing the Table File to disk.

Expose datas Package

The datas package is responsible for the serialization and deserialization of chunks. Similar to the nbs package, we add a very small amount of code to the datas package in order to get around package privacy. This is necessary because we don't want production code to do any of this!

main.go

The last piece this commit introduces a new main.go, which is in the utils directory. In order to build it, you can run go install like so:

$ cd go/utils/mangle
go/utils/mangle$ go install

As is common with Go, that will build into your $HOME/go directory:

go/utils/mangle$ which mangle
{HOME}/go/bin/mangle

The newly created mangle command takes no arguments and uses the current directory for its data directory. If you run it in the directory where you've cloned the dolt-mangle database, you'll see this:

dolt-mangle$ mangle
----------------------- MANGLE -----------------------------
Found object 3pdd8aasraqh1tmuedjmcr5nr2fccud2 in Table File: b1co5d1h1teedcrp4aeujd6idjn4atru
{
        Name: macneale
        Desc: add another 10 entities
        Email: neil@dolthub.com
        Timestamp: 2024-10-21 10:01:51.111 -0700 PDT
        UserTimestamp: 2024-10-21 10:01:31.382 -0700 PDT
        Height: 6
        RootValue: {
                #6fh7126ajine4a51rcipd0cdvbv9u3ii
        }
        Parents: {
                #pv43nlp1t2gr9ph0hevtqjgji4k53fp4
        }
        ParentClosure: {
                #dfcf58640dmmtkiqthtgr4lfd769bkp1
        }
}
ALTERED TO:
{
        Name: macneale
        Desc: add another 10 entities
        Email: neil@dolthub.com
        Timestamp: 2024-10-21 09:56:51.111 -0700 PDT
        UserTimestamp: 2024-10-21 09:56:31.382 -0700 PDT
        Height: 6
        RootValue: {
                #6fh7126ajine4a51rcipd0cdvbv9u3ii
        }
        Parents: {
                #pv43nlp1t2gr9ph0hevtqjgji4k53fp4
        }
        ParentClosure: {
                #dfcf58640dmmtkiqthtgr4lfd769bkp1
        }
}
------------------------------------------------------------

Look carefully at the Timestamp and UserTimestamp - The altered version is five minutes earlier.

Also, at the top of the output, it states that the object of interest was found in the Table File b1co5d1h1teedcrp4aeujd6idjn4atru. The command writes all altered Table Files into your current directory, and you can see them here:

dolt-mangle$ ls -l
total 144
-rw-------@ 1 neil  staff    561 Oct 22 11:09 3fnvllnrdns5lr343a2bfrj10o29s648.hacked
-rw-------@ 1 neil  staff  58891 Oct 22 11:09 b1co5d1h1teedcrp4aeujd6idjn4atru.hacked
-rw-------@ 1 neil  staff   2973 Oct 22 11:09 fdn6sdb6rbb1efigfa39bp1p575p3ou9.hacked
-rw-------@ 1 neil  staff   1679 Oct 22 11:09 gben1ou6r8jt1sa6gtdg7igavsc46uhc.hacked

Given the output of the file, we know the b1co...hacked file is the one which contains the altered object. We now need to insert that into our database, and the hack is complete.

dolt-mangle$ cp b1co5d1h1teedcrp4aeujd6idjn4atru.hacked .dolt/noms/b1co5d1h1teedcrp4aeujd6idjn4atru

Testing the Results

The first criteria is that two identical queries produce different results. On the unaltered database, looking at the commit shows the correct timestamp:

dolt-mangle/main> select * from dolt_log where commit_hash = '3pdd8aasraqh1tmuedjmcr5nr2fccud2';
+----------------------------------+-----------+------------------+---------------------+-------------------------+
| commit_hash                      | committer | email            | date                | message                 |
+----------------------------------+-----------+------------------+---------------------+-------------------------+
| 3pdd8aasraqh1tmuedjmcr5nr2fccud2 | macneale  | neil@dolthub.com | 2024-10-21 17:01:31 | add another 10 entities |
+----------------------------------+-----------+------------------+---------------------+-------------------------+
1 row in set (0.00 sec)

And if we run the same query on the hacked database, we see a different date:

dolt-mangle-hacked/main> select * from dolt_log where commit_hash = '3pdd8aasraqh1tmuedjmcr5nr2fccud2';
+----------------------------------+-----------+------------------+---------------------+-------------------------+
| commit_hash                      | committer | email            | date                | message                 |
+----------------------------------+-----------+------------------+---------------------+-------------------------+
| 3pdd8aasraqh1tmuedjmcr5nr2fccud2 | macneale  | neil@dolthub.com | 2024-10-21 16:56:31 | add another 10 entities |
+----------------------------------+-----------+------------------+---------------------+-------------------------+
1 row in set (0.00 sec)

Well that's not good! One criterion down. What does dolt fsck do?

$ dolt fsck --quiet
Chunks Scanned: 154
------ Corruption Found ------
Chunk: 3pdd8aasraqh1tmuedjmcr5nr2fccud2 content hash mismatch: 7vgqvft52ia84to2bavsp0cbla8ddu0s
{
        Name: macneale
        Desc: add another 10 entities
        Email: neil@dolthub.com
        Timestamp: 2024-10-21 09:56:51.111 -0700 PDT
        UserTimestamp: 2024-10-21 09:56:31.382 -0700 PDT
        Height: 6
        RootValue: {
                #6fh7126ajine4a51rcipd0cdvbv9u3ii
        }
        Parents: {
                #pv43nlp1t2gr9ph0hevtqjgji4k53fp4
        }
        ParentClosure: {
                #dfcf58640dmmtkiqthtgr4lfd769bkp1
        }
}

Yay! dolt fsck determines that the database is corrupt! I guess I won't get $1000.

Challenge Accepted!

We really believe that Dolt's data model is tamper resistant. So much so that we challenge you to break it. If you do, $1000 is yours. We're happy to answer any questions you have on your quest to break it. Come join us on Discord!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.