PRODUCTS

KEYWORDS

Announcing Dumbo Transaction Support

DumboDB Logo

DumboDB, the database that combines the best of MongoDB and Git, now supports transactions! This means you can perform multiple operations atomically, ensuring data integrity and consistency across your database. Let’s dive in!

What Are Transactions?#

In databases, a transaction is a sequence of operations performed as a single logical unit of work. If any operation within the transaction fails, the entire transaction is rolled back, ensuring that the database remains in a consistent state. The classic example of a transaction is transferring money from one bank account to another. You want to ensure that either both the withdraw and deposit operations happen together or not at all.

In SQL databases, this looks like:

BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT;

In order to move money from one account to another, you need to ensure that both updates happen together. If one update fails, you don’t want the other to succeed, leaving the database in an inconsistent state.

MongoDB has transactions as well. The syntax is a bit different, but the concept is the same:

const session = client.startSession();
session.startTransaction();
try {
  accounts.updateOne({ account_id: 1 }, { $inc: { balance: -100 } }, { session });
  accounts.updateOne({ account_id: 2 }, { $inc: { balance: 100 } }, { session });

  session.commitTransaction();
} catch (error) {
  session.abortTransaction();
} finally {
  session.endSession();
}

The forms are different, but the idea is the same: you want to ensure that both updates happen together or not at all.

Transactions (and Sessions): Now Available in DumboDB 0.1.2#

Originally I had planned to implement transactions a little later in the DumboDB roadmap, but I changed my mind because, in order to implement garbage collection, DumboDB needed sessions. In particular, we needed sessions with deferred writes to storage. So I bit the bullet and implemented sessions and transactions together.

Previously, every document creation, update, or deletion in DumboDB was an atomic operation. There wasn’t a session at all - we just had a global lock that ensured that only one operation could happen at a time. This was simple, but it meant that you couldn’t perform multiple operations atomically. This is how MongoDB behaves by default. If two separate operations update the same field in the same document, the last one wins. MongoDB gets a bad rap for this, but it’s actually a common pattern in many databases. If you want to perform multiple operations atomically, you need to use transactions.

In order to perform a transaction, you need to have a session. Sessions in MongoDB are maintained with a session identifier called the lsid. It’s a field that is entirely hidden from the user and managed by the MongoDB drivers. The idea is that if a connection gets cut off, the subsequent reconnection can resume the session by providing the same lsid. This is important for transactions, because if a transaction is interrupted, you want to be able to resume it.

While Dumbo previously depended only on the TCP connection for the duration of the operation, we now have sessions that are independent of the connection.

One thing to call out is that in MongoDB, transactions are only supported on replica sets. Honestly, I don’t know the reason for this. Maybe they just want to prompt users to use their online offering. In DumboDB, we don’t have replica sets and probably never will, so transactions are supported in any instance of a DumboDB server.

Example#

First, you’ll need to install DumboDB 0.1.2. Then, you can start a DumboDB server:

dumbodb --data-dir /path/to/data

Run that in its own terminal. We’ll use two additional terminals to run the client operations. This is necessary because you want to see what happens when two clients are attempting to update the same document.

We are going to use Python to talk to the DumboDB server, but you can use any MongoDB driver. To install the MongoDB Python driver, run:

pip install pymongo

Then you can create a setup.py file to initialize the database with some starting data. In this example, we’ll create a simple bank database with an accounts collection. We’ll insert two documents representing two bank accounts: one for Alice with a balance of $500, and one for Bob with a balance of $0.

from pymongo import MongoClient

# Connect to your database
client = MongoClient("mongodb://127.0.0.1:27017")
db = client["bank"]

print("Initializing accounts collection...")
# Insert starting records for Alice and Bob
db.accounts.insert_many([
    {"_id": 1, "name": "Alice", "balance": 500},
    {"_id": 2, "name": "Bob", "balance": 0}
])

print("Database initialized successfully!")

Run the script once to set up the database:

$ python setup.py
Initializing accounts collection...
Database initialized successfully!

If you don’t see that output, then something went wrong. The Dumbo server isn’t running, or your Python installation is using a clashing environment. Don’t proceed until you see that output.

We’ll create two Python scripts to demonstrate transactions. The first script will be called transfer.py, and it will transfer $100 from Alice’s account to Bob’s account. The second script will be called withdraw.py, and it will attempt to withdraw $450 from Alice’s account.

We’ll also add a delay before committing the transaction to simulate a long-running transaction and to show how locks are held during the transaction. Put the following in transfer.py:

import time
from pymongo import MongoClient

client = MongoClient("mongodb://127.0.0.1:27017")
db = client["bank"]

with client.start_session() as sessionA:
    with sessionA.start_transaction():
        # 1. Verify Alice has enough in her account
        alice = db.accounts.find_one({"_id": 1}, session=sessionA)
        if not alice or alice["balance"] < 100:
            raise Exception("Insufficient funds")

        # 2. Transfer $100 from Alice to Bob
        db.accounts.update_one({"_id": 1}, {"$inc": {"balance": -100}}, session=sessionA)
        db.accounts.update_one({"_id": 2}, {"$inc": {"balance": 100}}, session=sessionA)

        # 3. Sleep to hold the locks for a while before committing
        print("Client 1 holding locks for 10 seconds...")
        time.sleep(10)

Next, put the following in withdraw.py:

from pymongo import MongoClient

client = MongoClient("mongodb://127.0.0.1:27017")
db = client["bank"]

with client.start_session() as sessionB:
    with sessionB.start_transaction():
        # 1. Load Alice inside the transaction
        alice = db.accounts.find_one({"_id": 1}, session=sessionB)

        if not alice or alice["balance"] < 450:
            raise Exception("Insufficient funds")

        # 2. Attempt update -> This will instantly trigger your WriteConflict!
        print("Client 2 attempting withdrawal...")
        db.accounts.update_one({"_id": 1}, {"$inc": {"balance": -450}}, session=sessionB)

Note that in Python, the with statement is used to manage the session and transaction lifecycle. The transaction will automatically be committed when the block is exited, unless an exception is raised, in which case it will be aborted.

Now, we can run each script. First, run the transfer.py script in one terminal:

$ python transfer.py
Client 1 holding locks for 10 seconds...

Before it completes, run the withdraw.py script in another terminal:

$ python withdraw.py
Client 2 attempting withdrawal...
Traceback (most recent call last):
  File "/private/tmp/trans/./withdraw.py", line 16, in <module>
    db.accounts.update_one({"_id": 1}, {"$inc": {"balance": -450}}, session=sessionB)
  File "/opt/homebrew/lib/python3.11/site-packages/pymongo/synchronous/collection.py", line 1335, in update_one
    self._update_retryable(
  File "/opt/homebrew/lib/python3.11/site-packages/pymongo/synchronous/collection.py", line 1117, in _update_retryable
  [... SNIP ...]
    File "/opt/homebrew/lib/python3.11/site-packages/pymongo/helpers_shared.py", line 284, in _check_command_response
    raise OperationFailure(errmsg, code, response, max_wire_version)
pymongo.errors.OperationFailure: [update.go:80 common.UpdateDocument.func1] ErrorCodeWriteConflict: write conflict: document locked by another transaction, full error: {'ok': 0.0, 'errmsg': '[update.go:80 common.UpdateDocument.func1] ErrorCodeWriteConflict: write conflict: document locked by another transaction', 'code': 112, 'codeName': 'ErrorCode(112)'}

ErrorCodeWriteConflict is the error that MongoDB (and now DumboDB) throws when a transaction attempts to update a document that is currently locked by another transaction. Thankfully, the withdrawal transaction was able to detect the conflict and fail gracefully, rather than leaving the database in an inconsistent state.

Gotchas#

As part of investigating transactions in MongoDB, I came across a few gotchas that are worth mentioning. First, if you attempt to perform an update against a document that is currently locked by another transaction when your client is not in a transaction, the update will block until the lock is released, and then the update will happen anyway. This means you need to be careful to never perform updates outside of transactions if you want to avoid this surprising behavior.

It gets even worse if you are attempting to insert documents. In MongoDB, if you have a transactional insert that is currently holding locks, and another client attempts to insert a document with the same _id value outside of a transaction, the non-transactional insert will succeed. The transaction will fail with a duplicate key error when it attempts to commit. This also applies to upserts.

For better or worse, DumboDB behaves the same way. Not because we think it’s correct, but because we want to be consistent with MongoDB. This is a critical piece of our methodology: we want to be as consistent with MongoDB as possible, even if we think MongoDB is doing something wrong. We want to make it easy for users to switch between MongoDB and DumboDB without having to learn a new set of behaviors.

Conclusion#

All of the demo code above would have failed if you ran it in DumboDB yesterday. Any client attempting to start a transaction would have received an error saying that transactions are not supported. Now, with DumboDB 0.1.2, you can use transactions to ensure data integrity and consistency across your database.

Want to learn more about Dolt and Dumbo? Hop on our Discord to ask questions and nerd out about version-controlled databases!