Dolt Corruption Challenge
You want $1000? We want to give you $1000.
A couple of weeks ago, we announced support of dolt fsck
to allow our users to ensure that their Dolt database isn't corrupted. We're so confident in the data model that we're offering $1000 to anyone who can tamper with a Dolt database and avoid detection with dolt fsck
. Read on for more details!
The Rules
The rules are pretty simple.
- Alter the
dolt-mangle
database such that a query on the altered database produces different results than on the unaltered database. The checked outHEAD
must at least appear to be the commit5c2gra4nvk9d9tv3k4b1c9jqio73e2ap
. Note that commit is an empty signed commit, indicating that I have given it my stamp of approval. - Run
dolt fsck
on the repository and have it complete without finding any errors. - Reward goes to the first person to find a given bug. Any number of unique submissions can be made, so long as they uncover new defects.
You can submit your entries by emailing security@dolthub.com. A zip of the .dolt
directory should be sufficient. If you can get your changes into a pull request on DoltHub.com, that's next level and we will take that too. Finally, if you ever want to talk to Dolt developers, our Discord server is the best way to get our attention. If you aren't sure how to get your results to us, just ask for help!
Any submissions which do not reside strictly in the .dolt
contents will be disqualified. IE, if you send us a trojan horse which infects our computers, you won't get a reward. You'll get a lawsuit instead!
Background Information
Knowing a few things about where the data is in your database may help you get started. First, you need to understand that Dolt data is stored in content addressed objects. The documentation for Dolt covers the topic pretty deeply. Understanding the Prolly Tree will be essential if you want to modify user data. IE, what you would typically think of as data in your tables. If you want to alter the shape of the history, say commit structure or contents, then you should delve into the Commit Graph. Finally, the format that is on disk is covered here (and that applies to all chunks which is everything).
There is also the Journal format, and the archive format. All of these would be places that you could attempt to insert corrupt/fraudulent data.
Hacking on Dolt
You are going to want to run Dolt code in a debugger, and to do that you are going to need to build Dolt from source. Dolt's source code is public on GitHub, and building from source is documented here.
In my example code below, I'll give you some code to create a corrupt Table File. There are probably other ways you can perform this challenge without the code, but at the very least you'll move more quickly if you look at it. IMO, this is the benefit of Open Source. We invite you to try and find the bugs by giving you the source.
Specifics for Our Database
Clone the database:
$ dolt clone dolthub/dolt-mangle
This will create the dolt-mangle
directory, and within it will be a .dolt/noms
directory, which contains Dolt data files.
$ cd dolt-mangle
dolt-mangle$ find .dolt/noms -type f
.dolt/noms/gben1ou6r8jt1sa6gtdg7igavsc46uhc
.dolt/noms/manifest
.dolt/noms/LOCK
.dolt/noms/b1co5d1h1teedcrp4aeujd6idjn4atru
.dolt/noms/fdn6sdb6rbb1efigfa39bp1p575p3ou9
.dolt/noms/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
.dolt/noms/journal.idx
.dolt/noms/3fnvllnrdns5lr343a2bfrj10o29s648
Dolt data files come in three forms currently, two of which you can see here.
- Journal: the
vvvvv...vvvvv
file is the journal file which the database writes to for any local updates prior to garbage collection or pushing changes to another Dolt database. Since you just cloned the database, there should not be any useful data in the journal file. The journal.idx file is an optional file which exists to speed loading of the journal file. Journal generated here - Table Files: These are the 32 character files which look random. These files are what are transported between Dolt databases, and you will have data in these files as a result of cloning. The
.dolt/noms/manifest
file is important as well. Themanifest
file lists which Table files exist in the currentnoms
dir. If you open yours, you'll see text likeb1co5d1h1teedcrp4aeujd6idjn4atru:132
, which says that theb1co...
file has 132 chunks. Altering this file is fair game. Table files are generated here - Archive files: No examples of archives here, but they look like Table Files with a suffix of
.darc
. If you manage to break archive files that's fair game. Generated here
If you dolt gc
the database, it will take all of the Table Files and put them into the .dolt/noms/oldgen
directory as a new Table file. You can alter your local repository in any way you see fit, but remember that the signed commit, 5c2gra4nvk9d9tv3k4b1c9jqio73e2ap
is the one we will checkout before verifying results. If you want to hack on archives run dolt archive
after you run dolt gc
.
My Failed Attempt
As part of developing dolt fsck
, I had to create some corrupted files for testing. This is a fork of Dolt, and this single commit demonstrates how you could alter a Table File.
Expose nbs
Package
The NBS package is where the Block Store code is, and much of it is package private. For this reason, most of the code you care about is in that package in the mangle_hooks.go
.
In this file you'll find some interesting hints at what we're doing.
On line 36 we loop through all the tables of the newGen
chunk store (see comment to use oldGen
). Each table is used to create an index
which then allows us to loop over all chunks in the object store.
Line 50 is where we create a new persister which will write the hacked Table File. Note that this code is fairly blunt - it re-writes all Table Files. That's only required because I don't know off hand which Table File contains the commit chunk for 3pdd8aasraqh1tmuedjmcr5nr2fccud2
.
Line 69 is where we determine that we've found the object we want to corrupt.
Finally on lines 98 and 99 you can see where we alter the timestamp in the commit. The rest of the code is just writing and finalizing the Table File to disk.
Expose datas
Package
The datas
package is responsible for the serialization and deserialization of chunks. Similar to the nbs
package, we add a very small amount of code to the datas
package in order to get around package privacy. This is necessary because we don't want production code to do any of this!
main.go
The last piece this commit introduces a new main.go
, which is in the utils
directory. In order to build it, you can run go install
like so:
$ cd go/utils/mangle
go/utils/mangle$ go install
As is common with Go, that will build into your $HOME/go
directory:
go/utils/mangle$ which mangle
{HOME}/go/bin/mangle
The newly created mangle
command takes no arguments and uses the current directory for its data directory. If you run it in the directory where you've cloned the dolt-mangle
database, you'll see this:
dolt-mangle$ mangle
----------------------- MANGLE -----------------------------
Found object 3pdd8aasraqh1tmuedjmcr5nr2fccud2 in Table File: b1co5d1h1teedcrp4aeujd6idjn4atru
{
Name: macneale
Desc: add another 10 entities
Email: neil@dolthub.com
Timestamp: 2024-10-21 10:01:51.111 -0700 PDT
UserTimestamp: 2024-10-21 10:01:31.382 -0700 PDT
Height: 6
RootValue: {
#6fh7126ajine4a51rcipd0cdvbv9u3ii
}
Parents: {
#pv43nlp1t2gr9ph0hevtqjgji4k53fp4
}
ParentClosure: {
#dfcf58640dmmtkiqthtgr4lfd769bkp1
}
}
ALTERED TO:
{
Name: macneale
Desc: add another 10 entities
Email: neil@dolthub.com
Timestamp: 2024-10-21 09:56:51.111 -0700 PDT
UserTimestamp: 2024-10-21 09:56:31.382 -0700 PDT
Height: 6
RootValue: {
#6fh7126ajine4a51rcipd0cdvbv9u3ii
}
Parents: {
#pv43nlp1t2gr9ph0hevtqjgji4k53fp4
}
ParentClosure: {
#dfcf58640dmmtkiqthtgr4lfd769bkp1
}
}
------------------------------------------------------------
Look carefully at the Timestamp
and UserTimestamp
- The altered version is five minutes earlier.
Also, at the top of the output, it states that the object of interest was found in the Table File b1co5d1h1teedcrp4aeujd6idjn4atru
. The command writes all altered Table Files into your current directory, and you can see them here:
dolt-mangle$ ls -l
total 144
-rw-------@ 1 neil staff 561 Oct 22 11:09 3fnvllnrdns5lr343a2bfrj10o29s648.hacked
-rw-------@ 1 neil staff 58891 Oct 22 11:09 b1co5d1h1teedcrp4aeujd6idjn4atru.hacked
-rw-------@ 1 neil staff 2973 Oct 22 11:09 fdn6sdb6rbb1efigfa39bp1p575p3ou9.hacked
-rw-------@ 1 neil staff 1679 Oct 22 11:09 gben1ou6r8jt1sa6gtdg7igavsc46uhc.hacked
Given the output of the file, we know the b1co...hacked
file is the one which contains the altered object. We now need to insert that into our database, and the hack is complete.
dolt-mangle$ cp b1co5d1h1teedcrp4aeujd6idjn4atru.hacked .dolt/noms/b1co5d1h1teedcrp4aeujd6idjn4atru
Testing the Results
The first criteria is that two identical queries produce different results. On the unaltered database, looking at the commit shows the correct timestamp:
dolt-mangle/main> select * from dolt_log where commit_hash = '3pdd8aasraqh1tmuedjmcr5nr2fccud2';
+----------------------------------+-----------+------------------+---------------------+-------------------------+
| commit_hash | committer | email | date | message |
+----------------------------------+-----------+------------------+---------------------+-------------------------+
| 3pdd8aasraqh1tmuedjmcr5nr2fccud2 | macneale | neil@dolthub.com | 2024-10-21 17:01:31 | add another 10 entities |
+----------------------------------+-----------+------------------+---------------------+-------------------------+
1 row in set (0.00 sec)
And if we run the same query on the hacked database, we see a different date:
dolt-mangle-hacked/main> select * from dolt_log where commit_hash = '3pdd8aasraqh1tmuedjmcr5nr2fccud2';
+----------------------------------+-----------+------------------+---------------------+-------------------------+
| commit_hash | committer | email | date | message |
+----------------------------------+-----------+------------------+---------------------+-------------------------+
| 3pdd8aasraqh1tmuedjmcr5nr2fccud2 | macneale | neil@dolthub.com | 2024-10-21 16:56:31 | add another 10 entities |
+----------------------------------+-----------+------------------+---------------------+-------------------------+
1 row in set (0.00 sec)
Well that's not good! One criterion down. What does dolt fsck
do?
$ dolt fsck --quiet
Chunks Scanned: 154
------ Corruption Found ------
Chunk: 3pdd8aasraqh1tmuedjmcr5nr2fccud2 content hash mismatch: 7vgqvft52ia84to2bavsp0cbla8ddu0s
{
Name: macneale
Desc: add another 10 entities
Email: neil@dolthub.com
Timestamp: 2024-10-21 09:56:51.111 -0700 PDT
UserTimestamp: 2024-10-21 09:56:31.382 -0700 PDT
Height: 6
RootValue: {
#6fh7126ajine4a51rcipd0cdvbv9u3ii
}
Parents: {
#pv43nlp1t2gr9ph0hevtqjgji4k53fp4
}
ParentClosure: {
#dfcf58640dmmtkiqthtgr4lfd769bkp1
}
}
Yay! dolt fsck
determines that the database is corrupt! I guess I won't get $1000.
Challenge Accepted!
We really believe that Dolt's data model is tamper resistant. So much so that we challenge you to break it. If you do, $1000 is yours. We're happy to answer any questions you have on your quest to break it. Come join us on Discord!