Dolt + Google Sheets
Dolt, DoltHub, and Hosted Dolt can be used for Spreadsheet Version Control. Many of our customers migrated to Dolt from a really complicated, heavily edited spreadsheet.
I first tested Dolt against Google Sheets in fall of 2020 working from home during the pandemic. I produced this documentation. Dolt has changed a lot since then. We launched an API on DoltHub to make SQL reads and writes. We also launched Hosted Dolt, a managed Dolt instance you connect to over the internet. Integration with Google Sheets has never been better. It's time for a documentation update.
Why Dolt and Google Sheets
There is collaborative editing and history in Google Sheets but as I explain in my Spreadsheet Version Control article, collaborative editing and history is not version control.
Version control allows for branch and merge, the ability to asynchronously collaborate on multiple versions in an organized way. Instead of everyone editing the same Google Sheet, you can create you own copy, make edits, see the differences (ie. diffs) between your copy and another copy, and then merge two copies together. That workflow is true version control. No spreadsheet has those features.
Moreover, by adding DoltHub or Hosted Dolt to the mix, you add a Pull Request workflow. Pull Requests allow for human review of edits. Human review helps ensure data quality of changes to your spreadsheets.
Goals
To show off how Dolt can work with Google sheets, I built a simple demonstration sheet. The demo shows off a basic read, edit on a branch, and pull request workflow as well as some advanced features like buttons. The basic and advanced goals are enumerated below.
Basic
- Read a Dolt table from the main branch into a Google Sheet
- Write a Dolt table from another Google Sheet to a branch
- Review differences in a Google Sheet
- Create a pull request on DoltHub or Hosted Dolt from branch to main
- Perform a human review of the pull request
- Merge the changes into main branch
- Read the new merged Dolt table into a Google Sheet
Advanced
These actually turn out to be not so advanced.
- Trigger the above functionality using Google Sheet Buttons
- Trigger the above functionality using Google Sheet formulas
How it works
Apps Script
Google Sheets exposes programmatic functionality via Apps Script, a Javascript-based language.
Apps Script supports Web APIs via the UrlFetchApp
and JSON.parse
functions. We use these functions to work with a DoltHub database via the DoltHub API.
Apps Script supports MySQL connectivity via JDBC. Dolt is MySQL-compatible. Hosted Dolt exposes a MySQL compatible server over the internet you can connect to with any MySQL client. We use the built-in Apps Script MySQL client to work with a Hosted Dolt database.
Both approaches are standard, out-of-the-box functionality for Apps Script. No special Dolt plug-ins are required.
Set Up
Dolt
For this demo, we created a very simple Dolt database. The database has one table named test
. The test
table has two columns, id
and words
. id
is an integer and the primary key. Words is a string of max size 255 characters.
Additionally, we create two branches main
and branch
.
We create this database on both DoltHub and Hosted Dolt so we can demonstrate how to connect to each of those products using Google Sheets. We'll use these products to implement the Pull Request part of our goals.
DoltHub
You can find the DoltHub database named google_sheets
here under my personal timsehn
DoltHub account.
With DoltHub, you access Dolt via a web API. This has a few limitations. You can only run one write query at a time. Read query results are limited to 1000 rows. That being said, DoltHub feels a lot like GitHub. Non-technical users can manage it. So, if you have non-technical users using your version-controlled Google Sheet, DoltHub may be a better option for you.
To use the write web API I also needed to create a token using our new fine-grained API permissions system.
Hosted Dolt
You can find the DoltHub database named google_sheets
here. You will need to be granted permissions to see it. Either email me at tim@dolthub.com or come by our Discord and I'll give you access. Screenshots of the experience can be found below.
Hosted Dolt is a running Online Transaction Protocol (OLTP) database. You connect to it over the internet using any MySQL client. In general, queries to Hosted Dolt are going to return results faster. There is no limit to result set size. However, Hosted Dolt feels like a SQL database. Non-technical users may be a bit intimidated by the user interface.
The Google Sheet
The Google Sheet I created can be found here. It has global edit permissions. Don't delete it!
The Google sheet has three sheets:
- test-read: reflects the main branch of the database
- test-write: the content you would like to insert on a branch
- test-diff: the difference between what is in main and what is on branch
Depending on what other users have done, you'll see different content in each of the sheets.
Apps Script
In the sheet click on Extensions
> Apps Script
. This will bring up a new table with an Apps Script code editor named "Dolt Connector".
The Hosted Dolt connection code is in HostedDolt.gs
. There are three public functions and one helper private function. Private functions in Apps Script end with an _
.
We start by defining the Hosted Dolt connection information.
var server = 'timsehn-google-sheets.dbs.hosted.doltdb.com';
var dbName = 'google_sheets';
var username = 'kicu458f9bdnaq56';
var password = 'FN4pY87wFwoDHroNUePBsceMS3pB7AWX';
var port = 3306;
Then, the readHostedDolt()
function is a simple select * from test
followed by a call to the displayResultsHostedDolt()
function. Notice how in the connection string you can specify a branch to connect to by appending /<branch>
. This is one of the ways to use branches in Dolt.
function readHostedDolt() {
var branch = 'main'
var url = 'jdbc:mysql://' + server + ':' + port + '/' + dbName + '/' + branch;
conn = Jdbc.getConnection(url, username, password);
var query = 'SELECT * FROM test';
var results = conn.createStatement().executeQuery(query);
var sheetName = 'test-read';
displayResultsHostedDolt_(sheetName, results)
conn.close();
}
For the writeHostedDolt()
function, we iterate through the data on the sheet and batch up a prepared insert statement. We do this while connected to a branch named branch
. At the end of our writes we make sure to call dolt_commit()
. The --skip-empty
flag allows the commit to be skipped if there is nothing to commit. Without skip-empty
, call dolt_commit()
will fail if the user tries to write the same data that is already there.
This function could be more robust to some errors like extra columns or a missing header row. I wanted to keep the example as simple as possible.
function writeHostedDolt() {
var sheetName = 'test-write'
var spreadsheet = SpreadsheetApp.getActive();
var sheet = spreadsheet.getSheetByName(sheetName);
var data = sheet.getDataRange().getValues();
var branch = 'branch'
var url = "jdbc:mysql://" + server + ":" + port + "/" + dbName + '/' + branch;
var conn = Jdbc.getConnection(url, username, password);
// Clear the table so we mirror the sheet
var query = "delete from test";
conn.createStatement().execute(query);
// Prepare a batch query
query = conn.prepareStatement("insert into test values(?, ?)");
// ignore header row by starting index at 1
for (var i = 1; i < data.length; i++) {
for (var j = 0; j < data[i].length; j++) {
query.setObject(j+1, data[i][j]);
}
query.addBatch();
}
query.executeBatch();
query.close();
var query = "call dolt_commit('--skip-empty', '-am', 'New commit from Google Sheets edits')"
conn.createStatement().executeQuery(query);
conn.close();
}
For the showDiffHostedDolt()
, we call the dolt_diff()
table function and display the results.
function showDiffHostedDolt() {
var url = "jdbc:mysql://" + server + ":" + port + "/" + dbName;
var conn = Jdbc.getConnection(url, username, password);
var query = "SELECT * FROM dolt_diff('main', 'branch', 'test')";
var results = conn.createStatement().executeQuery(query);
var sheetName = 'test-diff';
displayResultsHostedDolt_(sheetName, results);
conn.close();
}
This helper function takes the results of a query and prints them in a sheet.
function displayResultsHostedDolt_ (sheetName, results) {
var spreadsheet = SpreadsheetApp.getActive();
var sheet = spreadsheet.getSheetByName(sheetName);
// Start the sheet fresh
sheet.clearContents();
var metaData = results.getMetaData();
var numCols = metaData.getColumnCount();
var insert = [];
// Insert the column names as a header row
for (var col = 0; col < numCols; col++) {
insert.push(metaData.getColumnName(col + 1));
}
sheet.appendRow(insert);
while (results.next()) {
insert = [];
for (var col = 0; col < numCols; col++) {
insert.push(results.getString(col + 1));
}
sheet.appendRow(insert);
}
}
The DoltHub connection code is in DoltHub.gs
.
The readHostedDolt()
function is a call to the DoltHub SQL API with a simple select * from test
query. We then call the display results helper function. Because DoltHub API results come back unordered, we then sort by the ID column using the spreadsheet sort function.
function readDoltHub() {
// This will only scale to 200 rows
var query = 'select * from test';
var url = 'https://www.dolthub.com/api/v1alpha1/timsehn/google_sheets/main?q=' + encodeURIComponent(query);
var response = UrlFetchApp.fetch(url, {'muteHttpExceptions': true});
data = JSON.parse(response);
var sheetName = 'test-read';
displayResultsDoltHub_(sheetName, data);
// Workaround for https://github.com/dolthub/dolthub-issues/issues/497
var spreadsheet = SpreadsheetApp.getActive();
var sheet = spreadsheet.getSheetByName(sheetName);
sheet.getRange(2, 1, sheet.getLastRow() - 1, sheet.getLastColumn()).sort(1);
}
For the writeDoltHub()
function, we iterate through the data on the sheet and make an individual insert query for each row. Unlike Hosted Dolt, this requires we know whether we are inserting a string or a number so we can quote correctly. The we use the callWriteAPI_()
helper function to run each query on DoltHub.
function writeDoltHub() {
var sheetName = 'test-write';
var spreadsheet = SpreadsheetApp.getActive();
var sheet = spreadsheet.getSheetByName(sheetName);
var data = sheet.getDataRange().getValues();
var branch = 'branch' ;
// Clear the table so we mirror the sheet
var query = "delete from test";
callWriteAPI_(query)
var data = sheet.getDataRange().getValues();
// ignore header row by starting index at 1
for (var i = 1; i < data.length; i++) {
// Need to know the data shape here because we need to know what to quote
var id = data[i][0];
var words = data[i][1];
query = 'insert into test values (' + id + ',"' + words + '")';
callWriteAPI_(query);
}
}
For showDiffDoltHub()
we also use the DoltHub SQL API ro run the dolt_diff()
table function and display the results.
function showDiffDoltHub() {
var query = "SELECT * FROM dolt_diff('main', 'branch', 'test')";
var url = 'https://www.dolthub.com/api/v1alpha1/timsehn/google_sheets/main?q=' + encodeURIComponent(query);
var response = UrlFetchApp.fetch(url, {'muteHttpExceptions': true});
data = JSON.parse(response);
var sheetName = 'test-diff';
displayResultsDoltHub_(sheetName, data);
}
This helper function takes the results if a query and prints them in a sheet. Notice with DoltHub API versus Hosted Dolt, you have to take a lot more care in getting the correct order.
function displayResultsDoltHub_(sheetName, results) {
var spreadsheet = SpreadsheetApp.getActive();
var sheet = spreadsheet.getSheetByName(sheetName);
// Start the sheet fresh
sheet.clearContents();
var insert = [];
// Column Names
var numCols = data.schema.length
for (var col = 0; col < numCols; col++) {
insert.push(data.schema[col].columnName);
}
sheet.appendRow(insert);
// Data
var numRows = data.rows.length;
for (var row = 0; row < numRows; row++) {
insert = []
// Row data comes back unordered so use the column names which are
// ordered to put the row data in the right order
for (var col = 0; col < numCols; col++) {
var colName = data.schema[col].columnName;
var rowValue = data.rows[row];
var colValue = rowValue[colName];
insert.push(colValue);
}
sheet.appendRow(insert);
}
}
This helper function abstracts the heavy lifting of writing to the DoltHub API away from the main write function. The DoltHub write API is asynchronous so you have to make a request and then poll an endpoint and wait until the job is marked complete. Only one write can be processed at a time.
function callWriteAPI_(query) {
var url = 'https://www.dolthub.com/api/v1alpha1/timsehn/google_sheets/write/main/branch?q='
+ encodeURIComponent(query);
var token = 'dhat.v1.a0d6k79cnr88q7pvc6lmjpva7olqff3ioufjb6qnem59kkksdc4g';
var headers = {
"authorization": token
};
var options = {
"method": "post",
"headers": headers
};
var response = UrlFetchApp.fetch(url, options);
var responseJSON = JSON.parse(response);
var operationName = responseJSON.operation_name;
options = {
"headers": headers
};
url = 'https://www.dolthub.com/api/v1alpha1/timsehn/google_sheets/write?operationName=' +
encodeURIComponent(operationName);
var isDone = false;
var maxAttempts = 10;
var attempts = 1;
while ( !isDone ) {
if (attempts > maxAttempts) {
throw new Error('Could not write to DoltHub');
}
response = UrlFetchApp.fetch(url, options);
responseJSON = JSON.parse(response)
isDone = responseJSON.done;
Utilities.sleep(100);
attempts++;
}
}
Buttons and Formulas
Once you have Apps Script functions accessing them via buttons of formulas is pretty straight forward. For buttons, you make an image and assign the image an Apps Script function. For formulas, every Apps Script function defined is available via standard formula syntax. In the demo, we could replace the "Show Current Diff" button with =showDiffDoltHub()
in cell A1.
The Demo
Time for a demo. We're going to show off the DoltHub version.
Load data from Hosted Dolt
Go to the Google Sheet, navigate to the "test-read" sheet, and click the "Load from DoltHub" button. This button runs the readDoltHub()
function.
When I ran it, the database on DoltHub looked like this.
After I clicked the button, the Google Sheet matches.
So far so good.
Copy data to test-write sheet and make some changes
Now I copy the data I loaded to the "test-write" sheet and make some changes. This database is really down on Tim's abilities. I'm going to cheer that happy fellow on a bit more.
After I'm done, I click the "Write to DoltHub" button which calls the writeDoltHub()
function.
View the diff
The data will now be on the branch
branch on the DoltHub database. I select branch
from the branch selector to view the data on DoltHub.
Back on the Google Sheet, I can see the diff by loading it into the "test-diff" sheet using the "Show Current Diff" Button.
Much more positive. You can see all but the row with ID 3 changed.
Make a Pull Request
Now I go back to the DoltHub database to make a pull request to send to my teammate for review.
I click the "Pull Requests" tab and the "Create Pull Request" button. I select the "base branch" main
and the "from branch" branch
.
I click the "Create Pull Request Button" and voila, I have an open Pull Request to send to my teammate for review.
My teammate can now review the Pull Request. The most useful tool for Pull Request review is the diff page.
This all looks exactly like I intended. I leave a nice PR comment to indicate I reviewed.
Merge
Now, I want to merge my changes to the main
branch so everyone can work off the new copy. I can click the "Merge" button above to have my changes reflected in the main
branch. This can take a while for large databases so it kicks off an asynchronous job. But, for this database, it should be very quick.
Once it's done. I can see my new changes in main.
Reload in Google sheet
Finally, I reload the new data in the Google Sheet.
Finally, I confirm main is what I edited and loaded on DoltHub by making sure there is an empty diff.
Conclusion
As you can see Google Sheets and Dolt make a natural pairing for your complicated spreadsheets. Dolt allows Google sheets to be asynchronously edited and human reviewed, much like code in Git and GitHub. Interested? Come ask me for a live demo on our Discord or just play for yourself!