Blog
PRODUCTS
KEYWORDS
In a previous blog I showed how the history of a dataset can be queried using the dolt history tables, and in the first part of this 2 part blog I covered the IRS SOI data. In this second part I use the IR…
5 min readRead MoreEvery year the IRS publishes a treasure trove of data. It contains over a hundred different metrics which provide insight into the finances of American taxpayers. Even more compelling is they provide this inf…
6 min readRead MoreSince its launch in 2008, GitHub has catalyzed the open source software world and accelerated the culture of software collaboration. Source control was an old idea at that point, but GitHub offered a central…
2 min readRead MoreWhen we started developing Dolt our vision was to deliver git functionality for data. Where git versions files, Dolt versions tables. We implemented table based diff and conflict logic and shipped the init…
6 min readRead MoreRedesigning DoltHub Dolt is a database and a data format. DoltHub is a way of hosting and collaborating on Dolt databases. We decided to redesign DoltHub to make it more user friendly. We are excited to ann…
5 min readRead MoreA few months ago we finally settled on a good way to measure the correctness of Dolt's SQL engine: the sqllogictest package, first developed for SQLite and since used as a benchmark for lots of other datab…
6 min readRead MoreIBM and General Electric invented the first databases in the early 1960s. It was only by the early 1970s that enough data had accumulated in databases that the need to transfer data between databases emerged. …
5 min readRead MoreWikipedia is the largest and most popular general reference work on the internet, making it a powerful tool for predictive language modeling. Wikipedia releases a dump of all its articles and pages twice a mon…
6 min readRead MoreSince releasing Dolt, we have often been asked how it scales. How many rows and how many gigs can you get into a Dolt dataset before things start breaking badly? Answering this question in practice is kind…
5 min readRead MoreI have been a huge Econtalk fan for over ten years. On his podcast with Sebastian Junger, Russ Roberts brought up what he called a Chinese proverb. No food, one problem. Have food, many problems. The wisdom of…
3 min readRead MoreImageNet is a dataset maintained by the Stanford Vision Lab. It seems to have fallen into disrepair. The links to download the image labels are broken. We have managed to procure all four released versions of …
4 min readRead MoreEver look at some data and wonder where a particular value came from, how long it's been there, or what the reason for changing it was? This is important information, but current data storage formats don't tra…
7 min readRead MoreAs we discussed in the Where Is the Data Catalog? blog post, Dolt is a database designed for internet-scale collaboration. There are databases with differences, history, rollback, and audit logging. We think t…
3 min readRead MoreWhen we first started writing Dolt, we weren’t thinking about SQL functionality. We just knew we wanted a way to package data sets to make them easy to share, collaborate and merge -- to do for data what …
5 min readRead MoreThe Princeton WordNet database is on DoltHub. This blog entry will be about how it got there and how to use it. WordNet is distributed natively from Princeton as a compilable custom database. You can also d…
4 min readRead MoreWhen Dolt and DoltHub first went into private beta, we were surprised that the Iris dataset was the dataset people first tried to put in Dolt. If you are looking for that dataset, we have uploaded it to DoltHu…
4 min readRead MoreWhy is there no place on the internet to get useful, maintained data? This question has puzzled me since 2013. We can rent a server. We can rent a database. Why can't we rent the data in the database? Somethin…
4 min readRead More
Tim's Weekly DoltHub Update
Stay in the loop and join the community on Discord