So you want a Git Database?
Dolt is a "Git Database". Why are we ranking rather low (around 10th) for the query on Google? Why are Liquibase, Planetscale, and TerminusDB, other products in the Git database space, not ranked at all? It makes no sense. This blog article sets out to right that injustice.
What do you mean by database?
When people search for git, the term has a pretty specific meaning. "Git" means "version control in the distributed Git fashion". So, the whole question for the "git database" query revolves around the word "database". Are people referring to Git's internal database? Do people want to use Git as a database? Or do they want products that bring git semantics (ie. version control) to the database?
We're biased but we think they want the last one.
Git has a database
In the official Git documentation, the metadata pertaining to the files stored in Git is referred to as Git's database. I'm pretty sure when people are searching for "git database" they don't want to go to the Git Getting Started page despite these gems:
"When you do actions in Git, nearly all of them only add data to the Git database."
"Committed means that the data is safely stored in your local database."
"The Git directory is where Git stores the metadata and object database for your project."
GitHub also provides access to a git database stored on GitHub via an API, aptly named "Git database".
Using Git as a Database
What if I could write and read data from Git using a query engine? There's a Stack Overflow post discussing pros and cons.
Some clever folks have used Git as a NoSQL database:
- GitRows for json and csv files.
- Kenneth Truyers
- nede.dev
This seems like a popular exploratory area, but other than GitRows, the internet kind of agrees using Git as a database is a bad idea. You have to build your own query language and the read/write throughput is not great.
Bringing Git semantics to the database
I think this is what people really want when they search for "git database". They want to know if anyone has built a version controlled database.
As discussed in this blog article on database version control, this question breaks into two categories:
- git semantics for database migrations
- git databases
Git Database Migrations
Liquibase
- Tagline
- Version Control for Databases
- Initial Release
- April 2012
- GitHub
- https://github.com/liquibase/liquibase
Liquibase formalizes your database migration language with a configuration language for database schema and alterations. The examples lean into XML but further digging say they support SQL, JSON, and YAML as well. Using something other than SQL makes Liquibase cross platform. You can use the same Liquibase XML descriptors to migrate from say, PostgreSQL to MySQL, which is nice.
The Liquibase magic is how it applies these changes to your database. This is where the git semantics come in. Liquibase supports branching, rollbacks, and preview.
Planetscale
- Tagline
- The database for developers
- Initial Release
- March 2016
- GitHub
- https://github.com/planetscale
Planetscale is awesome and takes database migrations to the next level. Imagine if someone ran the database for you and controlled how schema patches were applied. What would be possible?
Planetscale is run by the good folks who wrote Vitess. Vitess is an open source "database clustering system for horizontal scaling of MySQL". Dolt is a heavy user of Vitess' MySQL dialect parsing code. We wouldn't be here without them.
So, on Planetscale, you get all the schema branch/merge functionality of Liquibase. Additionally you also get a world class, modern deployment environment for your changes. As far as Git goes, you can change and deploy your schema on branches. On the downside, if this is a downside for you, Planetscale runs your database for you. Planetscale is MySQL only so if you have some aversion to that database format, that's also a downside.
Note, for the release date, I went with the first release of Planetscale's fork of Vitess. I'm not exactly sure when Planetscale in its current form launched and my cursory research didn't turn it up.
Git databases
Terminus DB
- Tagline
- Making Data Collaboration Easy
- Initial Release
- October 2019
- GitHub
- https://github.com/terminusdb/terminusdb
TerminusDB is a "git graph database". TerminusDB has full schema and data versioning capability. but offers a graph database interface using a custom query language called Web Object Query Language (WOQL). WOQL is schema optional. TerminusDB just released the option to query JSON directly, similar to MongoDB, giving users a more document database style interface.
The versioning syntax is exposed via TerminusDB Console or a command line interface. The versioning metaphors are similar to Git. You branch, push, and pull. See their how to documentation for more information.
TerminusDB is new but we like what we see. The company is very responsive, has an active Discord, and is well funded. If you think your git database makes more sense in graph or document form, check them out.
Dolt
- Tagline
- It's Git for Data
- Initial Release
- August 2019
- GitHub
- https://github.com/dolthub/dolt
Dolt takes “Git database” rather literally. Dolt implements the Git command line and associated operations on table rows instead of files. Data and schema are modified in the working set using SQL. When you want to permanently store a version of the working set, you make a commit. In SQL, dolt implements Git read operations (ie. diff, log) as system tables and write operations (ie. commit, merge) as functions. Dolt produces cell-wise diffs and merges, making data debugging between versions tractable. That makes Dolt the only SQL database on the market that has branches and merges. You can run Dolt offline, treating data and schema like source code. Or you can run Dolt online, like you would PostgreSQL or MySQL.
We are biased but we think if you want a Git database, there is only one product that fits that label and that's Dolt. Interested? Come chat with us on our Discord.