Automatically translating MySQL statements into Postgres statements
Introduction
We're hard at work building Doltgres, a Postgres-compatible database with git-inspired version control features. Doltgres is currently in alpha release, meaning lots of functionality doesn't work yet. We're committed to a beta release by Q1 of next year, and are running as fast as we can to reach that goal.
Before we built Doltgres we built Dolt, which is MySQL-compatible instead. Dolt has tens of thousands of tests of its SQL functionality, and we'd love to run all those tests on Doltgres as well. The only problem: Dolt's tests are written in the MySQL dialect that Dolt uses, which means many of those tests won't run on Doltgres, which needs the Postgres dialect. We don't want to rewrite all these tests, and we want new Dolt tests, written in the MySQL dialect, to automatically run on Doltgres too.
How do we do that?
Examples of differences in dialects
Most differences in dialect between MySQL and Postgres are pretty minor. For things like most
SELECT
or INSERT
statements, all you really need to do is handle differences in quoting
syntax. For example, this MySQL query:
INSERT INTO `mydb`.`mytable` VALUES ('hello', "goodbye");
Needs to be translated into this Postgres query:
INSERT INTO "mydb"."mytable" VALUES ('hello', 'goodbye');
That's pretty easy to do, and we unlocked a whole set of tests when we started doing this automatically a few weeks ago.
But there are a class of statements where it's not easy to automatically convert between dialects
via textual replacements: DDL statements. These include things like CREATE TABLE
and ALTER
TABLE
, and a ton of other variants for other schema elements. Consider this valid MySQL statement:
CREATE TABLE foo (a INT primary key, b int, key (b));
If you run this in Postgres, you get an error:
# CREATE TABLE foo (a INT primary key, b int, key (b));
ERROR: syntax error at or near "("
LINE 1: CREATE TABLE foo (a INT primary key, b int, key (b));
In MySQL, it's valid to create non-primary keys in a CREATE TABLE
statement. In Postgres, you have
to split these two concerns into two separate statements, like this:
CREATE TABLE foo (a INTEGER NOT NULL PRIMARY KEY, b INTEGER NULL);
CREATE INDEX ON foo ( b ASC ) NULLS NOT DISTINCT;
Now, obviously, Doltgres needs its own suite of hand-written tests of CREATE TABLE
statements and
other DDL operations, there's no way around that. But there are very many places in Dolt's
integration tests that aren't testing CREATE TABLE
functionality itself, but are just using it to
test some other capability, like the version control procedures. These test are defined in scripts
that look like this:
{
Name: "test null filtering in secondary indexes (https://github.com/dolthub/dolt/issues/4199)",
SetUpScript: []string{
"create table t (pk int primary key auto_increment, d datetime, index index1 (d));",
"insert into t (d) values (NOW()), (NOW());",
"insert into t (d) values (NULL), (NULL);",
},
Assertions: []queries.ScriptTestAssertion{
{
Query: "select count(*) from t where d is not null",
Expected: []sql.Row{{2}},
},
...
What we want is a system that will rewrite these queries into the equivalent Postgres queries for us automatically. How do we do that?
Translating MySQL into Postgres using AST transformations
To handle these translations, we'll leverage the abstract syntax tree (AST) of the underlying query. The process will look like this:
We'll use a couple different open source libraries:
- Our fork of Vitess has a MySQL parser, which will transform our original query into an AST
- Then we write some functions to inspect that AST and produce an equivalent AST in the Postgres
dialect, using the Postgres
parser we forked from Cockroach
DB. This is a 1-to-many transformation, since a single MySQL
CREATE TABLE
statement might be represented by multiple Postgres statements. - Finally, call
Format()
on the Postgres AST to convert it to a Postgres query string, which we can then feed into the server for a test.
Note that this is similar to the process Doltgres uses to execute queries, where we parse a Postgres query into an AST, then turn that into an equivalent MySQL AST for execution by Dolt's query engine.
Here's the entry point for transforming a MySQL query into a Postgres query. Our function returns a
[]string
with the results, and a boolean value with whether transformation was possible. We're
starting small, with only some CREATE TABLE
statements transformed, and adding more over time.
func transformAST(query string) ([]string, bool) {
parser := sql.NewMysqlParser()
stmt, err := parser.ParseSimple(query)
if err != nil {
return nil, false
}
switch stmt := stmt.(type) {
case *sqlparser.DDL:
if stmt.Action == "create" {
return transformCreateTable(query, stmt)
}
}
return nil, false
}
Most of the interesting logic happens here. Again, this is a work in progress that we'll continue fleshing out over time.
func transformCreateTable(query string, stmt *sqlparser.DDL) ([]string, bool) {
if stmt.TableSpec == nil {
return nil, false
}
createTable := tree.CreateTable{
IfNotExists: stmt.IfNotExists,
Table: tree.MakeTableNameWithSchema("", "", tree.Name(stmt.Table.Name.String())), // TODO: qualified names
}
var queries []string
for _, col := range stmt.TableSpec.Columns {
createTable.Defs = append(createTable.Defs, &tree.ColumnTableDef{
Name: tree.Name(col.Name.String()),
Type: convertTypeDef(col.Type),
Collation: "", // TODO
Nullable: struct {
Nullability tree.Nullability
ConstraintName tree.Name
}{
Nullability: convertNullability(col.Type),
},
PrimaryKey: struct {
IsPrimaryKey bool
}{
IsPrimaryKey: col.Type.KeyOpt == 1,
},
Unique: col.Type.KeyOpt == 3,
UniqueConstraintName: "", // TODO
DefaultExpr: struct {
Expr tree.Expr
ConstraintName tree.Name
}{
Expr: nil, // TODO
ConstraintName: "", // TODO
},
CheckExprs: nil, // TODO
})
}
ctx := formatNodeWithUnqualifiedTableNames(&createTable)
queries = append(queries, ctx.String())
// If there are additional (non-primary key) indexes defined, each one gets its own additional statement
if len(stmt.TableSpec.Indexes) > 0 {
for _, index := range stmt.TableSpec.Indexes {
if index.Info.Primary {
continue
}
createIndex := tree.CreateIndex{
Name: tree.Name(index.Info.Name.String()),
Table: tree.MakeTableNameWithSchema("", "", tree.Name(stmt.Table.Name.String())), // TODO: qualified
Unique: index.Info.Unique,
Columns: make(tree.IndexElemList, len(index.Columns)),
}
for i, col := range index.Columns {
createIndex.Columns[i] = tree.IndexElem{
Column: tree.Name(col.Column.String()),
Direction: tree.Ascending,
}
}
ctx := formatNodeWithUnqualifiedTableNames(&createIndex)
queries = append(queries, ctx.String())
}
}
return queries, true
}
One of the tedious parts of this transformation is manually converting each MySQL type to a Postgres type. That looks like this, in part (lots of omitted types):
func convertTypeDef(columnType sqlparser.ColumnType) tree.ResolvableTypeReference {
switch strings.ToLower(columnType.Type) {
case "int", "mediumint", "integer":
return &types.T{
InternalType: types.InternalType{
Family: types.IntFamily,
Width: 32,
Oid: oid.T_int4,
},
}
case "double precision", "double":
return &types.T{
InternalType: types.InternalType{
Family: types.FloatFamily,
Oid: oid.T_float8,
},
}
case "decimal":
return &types.T{
InternalType: types.InternalType{
Family: types.DecimalFamily,
Oid: oid.T_numeric,
},
}
case "varchar":
return &types.T{
InternalType: types.InternalType{
Family: types.StringFamily,
Width: int32FromSqlVal(columnType.Length),
Oid: oid.T_varchar,
},
}
case "datetime", "timestamp":
return &types.T{
InternalType: types.InternalType{
Family: types.TimestampFamily,
Width: int32FromSqlVal(columnType.Length),
Oid: oid.T_timestamp,
},
}
default:
panic(fmt.Sprintf("unhandled type: %s", columnType.Type))
}
}
And that's all there is to it! When you run this function with a MySQL CREATE TABLE
statement, you
get one or more equivalent CREATE TABLE .. CREATE INDEX
Postgres statements out the other side. In
addition to this process making many of Dolt's integration tests work on Doltgres, we also wrote a
bunch of unit tests of the query translation itself.
{
input: "CREATE TABLE foo (a INT primary key, b int, c int not null, d int, key (c), key (b), key (b,c))",
expected: []string{
"CREATE TABLE foo (a INTEGER NOT NULL PRIMARY KEY, b INTEGER NULL, c INTEGER NOT NULL, d INTEGER NULL)",
"CREATE INDEX ON foo ( c ASC ) NULLS NOT DISTINCT ",
"CREATE INDEX ON foo ( b ASC ) NULLS NOT DISTINCT ",
"CREATE INDEX ON foo ( b ASC, c ASC ) NULLS NOT DISTINCT ",
},
},
Future work
This is a work in progress, and there's still a lot of desirable functionality missing:
- Column defaults
- Constraint definitions including
CHECK
- Foreign keys
- Automatically translate
AUTO_INCREMENT
to sequences - Other statement types, not just
CREATE TABLE
We'll be adding these features and others over time as we need them to make additional tests run. You can check out the converter's full source here.
Conclusion
Doltgres is free and open source, so go check it out if you're curious about a Postgres compatible database with version control features. Doltgres isn't ready for production use yet, but we're looking for potential customers to guide our roadmap and let us know what we should be building first.
Have questions about Doltgres or translating MySQL to Postgres? Join us on Discord to talk to our engineering team and meet other Dolt users.