Re-introducing Dolt Functions in Doltgres
Introduction
We're hard at work building Doltgres, a Postgres-compatible database with git-inspired version control features. Before we built Doltgres we built Dolt, which is MySQL-compatible instead.
But there are many differences between MySQL and Postgres, which we're learning about as we go. This week I want to talk about a relatively obscure difference which nonetheless had ramifications for our product: the difference between functions and procedures in the two dialects.
Functions v. Procedures
SQL databases have two basic ways to perform user-defined computation: functions and stored procedures. You're probably most familiar with functions, since there are lots of built-in ones you probably use.
SELECT rand();
+--------------------+
| rand() |
+--------------------+
| 0.8348555218752991 |
+--------------------+
Stored procedures, on the other hand, are a bit more esoteric. They let you execute arbitrary SQL
statements according to parameters you pass in. You can find lots of examples in an earlier
blog if you're interested. You
invoke them with the CALL
keyword, which is its own statement type. This means you can't use them
in a SELECT
statement. They look like this (using one of Dolt's built-in
procedures).
CALL DOLT_CHECKOUT('-b', 'my-new-branch');
+--------+------------------------------------+
| status | message |
+--------+------------------------------------+
| 0 | Switched to branch 'my-new-branch' |
+--------+------------------------------------+
In MySQL, stored procedures can return results, like above. In Postgres, they can't. More on that in a bit.
Dolt lore: from functions to stored procedures
When we first implemented Dolt's SQL version control functionality way back in 2020, we hadn't
implemented stored procedures in the engine yet. So our first draft was to use functions for things
like checkout
, like this:
SELECT DOLT_CHECKOUT('-b', 'my-new-branch');
This made some people uncomfortable, for the simple reason that functions like this aren't supposed
to have side-effects, and these functions do. Imagine running a SELECT
statement and having it
make changes to your database! And because they're valid anywhere a normal function is, you could do
wacky things like this:
SELECT DOLT_BRANCH('-b', customer_name) from customers;
This would invoke DOLT_BRANCH
for every row in the customers
table. This is kind of neat, but
also kind of horrifying, considering what some of these functions are capable of.
So as soon as we had support in the engine for stored procedures, we implemented procedures for all these functions and deprecated the functional versions.
Case closed, the world is a better place, right? Well, not quite.
Building out Doltgres's test coverage
For the past several weeks I've been working on getting Dolt's suite of integration tests running on Doltgres. Dolt has tens of thousands of such hand-written tests that verify the SQL engine returns correct results, and that the procedures and other version control features work as intended. We generally define them in Go structs that look like this:
{
Name: "dolt_hashof_table tests",
SetUpScript: []string{
"CREATE TABLE t1 (pk int primary key);",
},
Assertions: []queries.ScriptTestAssertion{
{
Query: "SHOW TABLES;",
Expected: []sql.Row{
{"t1"},
},
},
{
Query: "SELECT dolt_hashof_table('t1');",
Expected: []sql.Row{{"0lvgnnqah2lj1p6ilvfg0ssaec1v0jgk"}},
},
{
Query: "INSERT INTO t1 VALUES (1);",
Expected: []sql.Row{{types.OkResult{RowsAffected: 1}}},
},
{
Query: "SELECT dolt_hashof_table('t1');",
Expected: []sql.Row{{"a2vkt9d1mtuhd90opbcseo5gqjae7tv6"}},
},
{
Query: "SELECT dolt_hashof_table('noexist');",
ExpectedErrStr: "table not found: noexist",
},
},
},
One of my biggest reasons for wanting to get these tests running on Doltgres was to find bugs in the version control operations in the new Postgres context. And right away, I found a big problem. We have lots of tests that check the results of our built-in stored procedures, like this:
{
Name: "dolt-tag: SQL use a tag as a ref for merge",
SetUpScript: []string{
"CREATE TABLE test(pk int primary key);",
"CALL DOLT_ADD('.')",
"INSERT INTO test VALUES (0),(1),(2);",
"CALL DOLT_COMMIT('-am','created table test')",
"DELETE FROM test WHERE pk = 0",
"INSERT INTO test VALUES (3)",
"CALL DOLT_COMMIT('-am','made changes')",
},
Assertions: []queries.ScriptTestAssertion{
{
Query: "CALL DOLT_TAG('v1','HEAD')",
Expected: []sql.Row{{0}},
},
{
Query: "CALL DOLT_CHECKOUT('-b','other','HEAD^')",
Expected: []sql.Row{{0, "Switched to branch 'other'"}},
},
{
Query: "INSERT INTO test VALUES (8), (9)",
Expected: []sql.Row{{types.OkResult{RowsAffected: 2}}},
},
{
Query: "CALL DOLT_COMMIT('-am','made changes in other')",
Expected: []sql.Row{{doltCommit}},
},
{
Query: "CALL DOLT_MERGE('v1')",
Expected: []sql.Row{{doltCommit, 0, 0, "merge successful"}},
},
{
Query: "SELECT * FROM test",
Expected: []sql.Row{{1}, {2}, {3}, {8}, {9}},
},
},
},
}
The problem: in Postgres, stored procedures can't return result rows like MySQL can. These tests
weren't going to work. And even worse, our customers couldn't examine the results of their
DOLT_MERGE
operation and similar version-control features, which is a requirement for writing
version controlled application.
What to do?
From functions to procedures back to functions again
The solution is straightforward, although it struck us as a little bit gross: travel back in time and create function versions of all the Dolt stored procedures, just like we had originally. We don't even need to do this manually -- a little bit of clever metaprogramming at server startup time can transform all the Dolt procedures into functions automatically, and handles any new ones as we add them to Dolt.
But isn't this wrong? Aren't functions with side effects, and SELECT
statements that alter the
state of your database, a huge no-no?
Well, Postgres doesn't think so. This pattern is standard practice in the Postgres world:
SELECT NEXTVAL('mySequence');
This is a SELECT
statement that permanently alters the value of the sequence given. And this isn't
a fluke, either. Here's the official
way that you manually set the next
value in a sequence:
SELECT setval('myseq', 42);
(In MySQL, the way to change the next value in an AUTO_INCREMENT
column is with an ALTER TABLE
statement).
There are lots of different examples to choose from. Even administrative functionality is accessed this way, e.g.:
SELECT pg_log_backend_memory_contexts(pg_backend_pid());
Yes, that's a SELECT
statement that alters the logging behavior of a running server. Working as
intended!
But we aren't here to pass judgment on Postgres's design decisions, we're here to enable our customers who expect a drop-in replacement for Postgres. So now, in Doltgres only, you can call the Dolt procedures as functions, just like in the old days.
SELECT DOLT_CHECKOUT('-b', 'my-new-branch');
As for Dolt's integration tests, it's a simple matter to use regular expressions to replace CALL
DOLT_
with SELECT DOLT_
in Doltgres's test
harness.
This is a work in progress: we still need to enable this syntax, which is valid in Postgres:
SELECT * FROM DOLT_CHECKOUT('-b', 'my-new-branch');
Stay tuned!
Conclusion
Doltgres is free and open source, so go check it out if you're curious about a Postgres compatible database with version control features. Doltgres isn't ready for production use yet, but we're looking for potential customers to guide our roadmap and let us know what we should be building first.
Have questions about Doltgres or Postgres procedures? Join us on Discord to talk to our engineering team and meet other Dolt users.