Re-introducing Dolt Functions in Doltgres

July 30, 2024

5 min read

Introduction

We're hard at work building Doltgres, a Postgres-compatible database with git-inspired version control features. Before we built Doltgres we built Dolt, which is MySQL-compatible instead.

But there are many differences between MySQL and Postgres, which we're learning about as we go. This week I want to talk about a relatively obscure difference which nonetheless had ramifications for our product: the difference between functions and procedures in the two dialects.

Functions v. Procedures

SQL databases have two basic ways to perform user-defined computation: functions and stored procedures. You're probably most familiar with functions, since there are lots of built-in ones you probably use.

SELECT rand();
+--------------------+
| rand()             |
+--------------------+
| 0.8348555218752991 |
+--------------------+

Stored procedures, on the other hand, are a bit more esoteric. They let you execute arbitrary SQL statements according to parameters you pass in. You can find lots of examples in an earlier blog if you're interested. You invoke them with the CALL keyword, which is its own statement type. This means you can't use them in a SELECT statement. They look like this (using one of Dolt's built-in procedures).

CALL DOLT_CHECKOUT('-b', 'my-new-branch');
+--------+------------------------------------+
| status | message                            |
+--------+------------------------------------+
|      0 | Switched to branch 'my-new-branch' |
+--------+------------------------------------+

In MySQL, stored procedures can return results, like above. In Postgres, they can't. More on that in a bit.

Dolt lore: from functions to stored procedures

When we first implemented Dolt's SQL version control functionality way back in 2020, we hadn't implemented stored procedures in the engine yet. So our first draft was to use functions for things like checkout, like this:

SELECT DOLT_CHECKOUT('-b', 'my-new-branch');

This made some people uncomfortable, for the simple reason that functions like this aren't supposed to have side-effects, and these functions do. Imagine running a SELECT statement and having it make changes to your database! And because they're valid anywhere a normal function is, you could do wacky things like this:

SELECT DOLT_BRANCH('-b', customer_name) from customers;

This would invoke DOLT_BRANCH for every row in the customers table. This is kind of neat, but also kind of horrifying, considering what some of these functions are capable of.

So as soon as we had support in the engine for stored procedures, we implemented procedures for all these functions and deprecated the functional versions.

Case closed, the world is a better place, right? Well, not quite.

Building out Doltgres's test coverage

For the past several weeks I've been working on getting Dolt's suite of integration tests running on Doltgres. Dolt has tens of thousands of such hand-written tests that verify the SQL engine returns correct results, and that the procedures and other version control features work as intended. We generally define them in Go structs that look like this:

	{
		Name: "dolt_hashof_table tests",
		SetUpScript: []string{
			"CREATE TABLE t1 (pk int primary key);",
		},
		Assertions: []queries.ScriptTestAssertion{
			{
				Query: "SHOW TABLES;",
				Expected: []sql.Row{
					{"t1"},
				},
			},
			{
				Query:    "SELECT dolt_hashof_table('t1');",
				Expected: []sql.Row{{"0lvgnnqah2lj1p6ilvfg0ssaec1v0jgk"}},
			},
			{
				Query:    "INSERT INTO t1 VALUES (1);",
				Expected: []sql.Row{{types.OkResult{RowsAffected: 1}}},
			},
			{
				Query:    "SELECT dolt_hashof_table('t1');",
				Expected: []sql.Row{{"a2vkt9d1mtuhd90opbcseo5gqjae7tv6"}},
			},
			{
				Query:          "SELECT dolt_hashof_table('noexist');",
				ExpectedErrStr: "table not found: noexist",
			},
		},
	},

One of my biggest reasons for wanting to get these tests running on Doltgres was to find bugs in the version control operations in the new Postgres context. And right away, I found a big problem. We have lots of tests that check the results of our built-in stored procedures, like this:

	{
		Name: "dolt-tag: SQL use a tag as a ref for merge",
		SetUpScript: []string{
			"CREATE TABLE test(pk int primary key);",
			"CALL DOLT_ADD('.')",
			"INSERT INTO test VALUES (0),(1),(2);",
			"CALL DOLT_COMMIT('-am','created table test')",
			"DELETE FROM test WHERE pk = 0",
			"INSERT INTO test VALUES (3)",
			"CALL DOLT_COMMIT('-am','made changes')",
		},
		Assertions: []queries.ScriptTestAssertion{
			{
				Query:    "CALL DOLT_TAG('v1','HEAD')",
				Expected: []sql.Row{{0}},
			},
			{
				Query:    "CALL DOLT_CHECKOUT('-b','other','HEAD^')",
				Expected: []sql.Row{{0, "Switched to branch 'other'"}},
			},
			{
				Query:    "INSERT INTO test VALUES (8), (9)",
				Expected: []sql.Row{{types.OkResult{RowsAffected: 2}}},
			},
			{
				Query:    "CALL DOLT_COMMIT('-am','made changes in other')",
				Expected: []sql.Row{{doltCommit}},
			},
			{
				Query:    "CALL DOLT_MERGE('v1')",
				Expected: []sql.Row{{doltCommit, 0, 0, "merge successful"}},
			},
			{
				Query:    "SELECT * FROM test",
				Expected: []sql.Row{{1}, {2}, {3}, {8}, {9}},
			},
		},
	},
}

The problem: in Postgres, stored procedures can't return result rows like MySQL can. These tests weren't going to work. And even worse, our customers couldn't examine the results of their DOLT_MERGE operation and similar version-control features, which is a requirement for writing version controlled application.

What to do?

From functions to procedures back to functions again

The solution is straightforward, although it struck us as a little bit gross: travel back in time and create function versions of all the Dolt stored procedures, just like we had originally. We don't even need to do this manually -- a little bit of clever metaprogramming at server startup time can transform all the Dolt procedures into functions automatically, and handles any new ones as we add them to Dolt.

But isn't this wrong? Aren't functions with side effects, and SELECT statements that alter the state of your database, a huge no-no?

Well, Postgres doesn't think so. This pattern is standard practice in the Postgres world:

SELECT NEXTVAL('mySequence');

This is a SELECT statement that permanently alters the value of the sequence given. And this isn't a fluke, either. Here's the official way that you manually set the next value in a sequence:

SELECT setval('myseq', 42);

(In MySQL, the way to change the next value in an AUTO_INCREMENT column is with an ALTER TABLE statement).

There are lots of different examples to choose from. Even administrative functionality is accessed this way, e.g.:

SELECT pg_log_backend_memory_contexts(pg_backend_pid());

Yes, that's a SELECT statement that alters the logging behavior of a running server. Working as intended!

But we aren't here to pass judgment on Postgres's design decisions, we're here to enable our customers who expect a drop-in replacement for Postgres. So now, in Doltgres only, you can call the Dolt procedures as functions, just like in the old days.

SELECT DOLT_CHECKOUT('-b', 'my-new-branch');

As for Dolt's integration tests, it's a simple matter to use regular expressions to replace CALL DOLT_ with SELECT DOLT_ in Doltgres's test harness.

This is a work in progress: we still need to enable this syntax, which is valid in Postgres:

SELECT * FROM DOLT_CHECKOUT('-b', 'my-new-branch');

Stay tuned!

Conclusion

Doltgres is free and open source, so go check it out if you're curious about a Postgres compatible database with version control features. Doltgres isn't ready for production use yet, but we're looking for potential customers to guide our roadmap and let us know what we should be building first.

Have questions about Doltgres or Postgres procedures? Join us on Discord to talk to our engineering team and meet other Dolt users.

Blog