Announcing automatic merging of JSON changes

FEATURE RELEASE
3 min read

Here's something that's true: Dolt makes it easy to collaborate and merge independent changes to your SQL database.

Here's something else that's true: People really like JSON. It's flexible, human readable, and there's lots of existing tools for reading and writing it.

That's why all the major relational databases today have support for JSON as a first-class data type: all the benefits of a relational database, plus the flexibility of JSON-style data when you needed it. It's the best of both worlds.

Dolt supports JSON as a first-class data type too. So it stands to reason that you should be able to easily merge independent changes to a JSON document stored within Dolt, just like you can with all the other data in your Dolt repo.

And as of today, you can! It works exactly the same as merging other types of changes, no additional work required.

Suppose you set up your main branch like this:

CREATE TABLE characters (name varchar(20) primary key, stats json);
INSERT INTO characters VALUES ('Ozymandias', '{"Charisma": 10, "Intelligence": 16}'),
                            ('Millicent', '{"Strength": 12, "Wisdom": 8}');
                            
CALL DOLT_ADD('characters');
CALL DOLT_COMMIT('-m', 'created table');

And then you have a dev branch like this:

CALL DOLT_CHECKOUT('-b', 'nick-dev', 'main');

-- Add new "Best Friend" stat
UPDATE characters SET stats = JSON_INSERT(stats, '$."Best Friend"', 'Millicent') where name = 'Ozymandias';
UPDATE characters SET stats = JSON_INSERT(stats, '$."Best Friend"', 'Ozymandias') where name = 'Millicent';

CALL DOLT_ADD('characters');
CALL DOLT_COMMIT('-m', 'added friendship');

And your coworker has a dev branch like this:

CALL DOLT_CHECKOUT('-b', 'max-dev', 'main');

-- Rename "Charisma" stat to "Rizz"
How does join update work?
UPDATE characters SET stats = JSON_INSERT(
    JSON_REMOVE(stats, '$.Charisma'),
    '$.Rizz',
    JSON_EXTRACT(stats, '$.Charisma' )
) where JSON_CONTAINS_PATH(stats, 'one', '$.Charisma');

CALL DOLT_ADD('characters');
CALL DOLT_COMMIT();
');

It's now possible to merge both of these changes seamlessly with CALL DOLT_MERGE. The workflow is exactly the same as it would be for any other merge. For example:

CALL DOLT_CHECKOUT('main')
CALL DOLT_MERGE('max-dev');
CALL DOLT_MERGE('nick-dev');

And you'll end up with a table that looks like this:

> select * from characters;
+------------+------------------------------------------------------------------+
| name       | stats                                                            |
+------------+------------------------------------------------------------------+
| Millicent  | {"Wisdom": 8, "Strength": 12, "Best Friend": "Ozymandias"}       |
| Ozymandias | {"Rizz": 10, "Best Friend": "Millicent", "Intelligence": 16} |
+------------+------------------------------------------------------------------+

Before now, we only did automatic merges at the level of individual table cells. If the cell changed on both branches, the merge would need to be manually resolved. But now, when a cell containing JSON is modified independently on two different branches, dolt will intelligently determine how to combine the changes. It even works with nested objects.

There are some caveats:

  • Depending on your exact use case, it's possible that you would rather have an error if two branches try to modify the same JSON document. In that case, you can set the @@dolt_dont_merge_json system variable if using a SQL shell, or pass the --dont_merge_json command line flag if using a CLI command like dolt merge. If you set the flag, then merges like this will return a merge conflict, like they did before.
  • One of Dolt's killer features is that the time it takes to merge a table scales with the size of the only the parts of the table that have changed, not the entire table. We are currently in the process of improving how JSON is stored internally so that merging large JSON documents can also benefit from this. But in the meantime, merging changes to large JSON documents may be slower than you would like. Again, the @@dolt_dont_merge_json system variable and the --dont_merge_json command line flag can restore the previous behavior.
  • When merging changes to JSON documents, Dolt looks at each key in the document independently. If the same key is modified on both branches, then Dolt will still report a conflict. However, if the value of that key is itself a JSON object, then Dolt will recursively merge those objects.
  • As a result of the previous bullet point, renaming a key in a JSON object will be seen by Dolt as removing one property, and adding a new property with the same value. This is consistent with how Dolt merges changes to SQL tables where the primary key has changed.
  • Dolt cannot currently merge concurrent modifications to a JSON array. These changes will still need to be manually resolved.

This is only the beginning when it comes to making JSON first-class in Dolt. Our next goals are improved support for generated table indexes on JSON columns, and improved on-disk representation of JSON documents for more efficient diffing, reading and updating.

We're always working to make Dolt faster and easier for users. But to help us prioritize, we need to understand how Dolt is being used in the real world. If you have an interesting use case, or if you think there's some killer feature that would make Dolt faster, better, or stronger, let us know! You can always reach us on our Discord!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.