- REFERENCE9 min read
So you want Database Versioning?
Here at DoltHub, we've had a lot of success with our "So you want..." series of blog posts helping people find Dolt when they are looking for it. Dolt is a lot of things. Dolt is a version controlled database, a Git database, Git for data, data…
Read More
- FEATURE RELEASEWEB6 min read
Data CI with DoltHub Webhooks
Dolt and DoltHub are Git and GitHub for data . The same way that GitHub enables collaboration on source code repositories in Git format, DoltHub enables collaboration on data repositories in Dolt format. A very common workflow on GitHub i...
Read More - 5 min read
Tracking SQL Correctness and Performance Regressions in Dolt
Tracking Dolt's SQL regressions As part of our journey to make Dolt a great SQL database, we set out to track the correctness of Dolt’s SQL engine against a suite of SQL tests called the sqllogictests . These tests are what we use to measur...
Read More - REFERENCE14 min read
Dolt for Git Noobs
TL;DR Dolt is a SQL database with built-in Git versioning, branching, and distribution semantics that makes collaborating on and distributing data effortless. What Git does for files, Dolt does for data. Where Git versions files, a...
Read More - REFERENCE7 min read
How Dolt Stores Table Data
Dolt is Git for data . It's a SQL database that lets you clone, branch, diff, merge, and fork your data just like you can with a filesystem tree in Git. This blog post explores one of the fundamental datastructures that underlies Dolt's impleme...
Read More - USE CASE5 min read
Dolt Use Cases
Dolt is Git for data. Instead of versioning files, Dolt versions tables. DoltHub is a place on the internet to share Dolt repositories. As far as we can tell, Dolt is the only database with branches . How would you use such a thing? One o...
Read More - DATASET14 min read
Who's at Risk of COVID-19 in the US Congress?
Overview In this blog post, we discuss an approach for simulating an outbreak of COVID-19 in the US Congress. This is a long technical article about data sets, epidemiology, and simulation. Feel free to jump straight to the results of ...
Read More - REFERENCEWEB8 min read
How We Built DoltHub: Front-End Architecture
In the previous article in this series, we took a deep look at the overall system architecture of DoltHub , the online data community powered by the Dolt version-controlled database. In this article, we'll zoom in on the front end and see h...
Read More - 4 min read
Testing Dolt using Bats
We adopted Bash Automated Testing System (Bats) to test the Dolt command-line. As of March 10, 2020 we are up to 473 tests, though 55 are skipped because they currently fail. The tests define desired behavior so we're constantly working to ge...
Read More - FEATURE RELEASESQL6 min read
Querying Historical Data with AS OF Queries
Dolt is Git for data . It's a SQL database that lets you branch, merge, and fork your data just like you would a Git repository. In previous blog posts we announced how you can use special system tables to query the history of your database ...
Read More - DATASET4 min read
Novel Coronavirus Dataset in Dolt: A Case for Branches
Here at DoltHub , we've been working on COVID-19 data since February 5, 2020. First, we started importing John Hopkins data and then we worked on assembling the largest open, regularly-updated set of case details from Singapore, Hong Kong an...
Read More - DATASET3 min read
Scraping a JavaScript-enabled Website in 2020
As part of our effort to track data related to the Novel Coronavirus (COVID-19) , we wanted to scrape a JavaScript-enabled website on Coronavirus from Hong Kong . Moreover, you'll notice that the website from Hong Kong uses lazy loading based o...
Read More - DATASET5 min read
Novel Coronavirus Dataset in Dolt: Case Details
On Saturday, February 29, this transpired in our company chat room: A project was born. We had time series data for confirmed cases, deaths, and recoveries segmented by location sourced from John Hopkins but we d...
Read More - REFERENCEWEB5 min read
How We Built DoltHub: Stack and Architecture
In our introductory article for this series, we took a high-level look at the technology stack and architecture behind DoltHub , the online home for Dolt data repositories. In this article, we'll delve a little deeper and discuss how the pi...
Read More - REFERENCE6 min read
Optimizing Sorted Map Iteration
In this blog post I want to give an introduction to some core concepts used to implement fast querying of databases. These techniques were implemented in Dolt and produced significant performance improvements. Database internals The B-Tr...
Read More - REFERENCE8 min read
So You Want Git for Data?
An updated version of this blog was published in September 2024. So you want Git for Data? 2024 Edition . People have been asking for a Git and GitHub for data for a while. That thread on Stack Exchange is almost seven years old and...
Read More - DATASET6 min read
Visualizing Temperature Changes Over Time
In the first part of this two part blog I covered NOAA's "Global Hourly Surface Data" dataset and how it is modeled in Dolt . Dolt is git for data, and for this dataset we model a day of observations as a single commit in the commit grap...
Read More - DATASET5 min read
NOAA Global Hourly Surface Data
The National Oceanic and Atmospheric Administration, NOAA, publishes weather measurements taken from stations around the world. It started in 1901 with a handful of stations, and there are more than 35,000 stations today. Most of these stations…
Read More - FEATURE RELEASE7 min read
Announcing Saved Queries
Dolt is Git for data. We built Dolt to help teams collaborate on data sets using the forking, branching, and merging workflows that Git popularized. These workflows are what enable software engineers to collaborate on source code, and they...
Read More - 3 min read
Copyrightable Material
In our previous blog post we examined some freely available licensing tools for open data from Creative Commons. To briefly recap a license specifies the terms under which copyrightable material is made available for public access, sharply dis...
Read More - 2 min read
Data Licensing
Introduction Dolt is a data format. DoltHub is a collaboration platform for data stored in the Dolt format. When sharing copyrighted content the terms of that sharing are governed by a license. In this post we highlight some common licen...
Read More