Creating a Postgres Foreign Data Wrapper
Here at DoltHub some of us have been working with PostgreSQL extensions recently. This is an introductory tutorial on how to get started building a PostgreSQL foreign data wrapper. We introduce the basics around setting up a project for building and installing the extension and how to implement a very basic read only scan.
Overview
PostgreSQL is a powerful open-source SQL engine with a ton of compelling features and great stability. It's also very extensible. It's extension architecture allows shared libraries developed in C to add new data types and functions, index and table storage formats, and query planning and execution functionality.
One common type of postgres extension is a foreign data wrapper. Originally designed for exposing tables and relations in a foreign SQL server so that they could be queried and processed on a different postgres server, the architecture allows for developing extensions to expose many foreign data sources as tables to the postgres backend. They can then be queried and processed using the full power of the postgres query engine, and can be easily combined with local data or other remote data using powerful query capabilities like joins and aggregations.
Postgres itself ships with two foreign data wrappers:
-
postgres_fdw
— which will connect to a remote PostgreSQL server and expose tables from it. -
file_fdw
— which will allow for accessing local files and programs as if they were tables, exposing functionality similar to the COPY statement, but without importing the data into postgres tables first.
There are a multitude of third-party foreign data wrappers for connecting to other SQL databases, like MySQL, Oracle and MSSQL, and also for connecting to other data sources entirely, such as Redis and DynamoDB.
The postgres documentation is fantastic, but the surface area of the of the API is decently large. This is a guided tour for getting started with a new foreign data wrapper implementation.
Prerequisites and Some Scaffolding
To get started, you will need the following installed:
- GNU Make
- GCC or Clang
- Postgres
The instructions here have been tested on macOS and Linux. On Windows, they will definitely require Cygwin or MinGW, and may require further changes as well.
We're going to create a foreign data wrapper named tutorial_fdw
. To start
with, we will create a directory for our project and a Makefile
for building
it.
$ mkdir tutorial_fdw
$ cd tutorial_fdw
And we create the following Makefile
:
MODULE_big = tutorial_fdw
OBJS = tutorial_fdw.o
EXTENSION = tutorial_fdw
DATA = tutorial_fdw--1.0.sql
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
This Makefile
uses a build system that ships with Postgres called
PGXS. To use it, we
set some variables describing what artifacts we need to build and install, and
we include build system makefile which we find using pg_config
.
If we now run make
, it will complain that there is no rule to make
tutorial_fdw.o
. We can
$ touch tutorial_fdw.c
to make a bit of progress. Eventually that file will contain our implementation.
Running make
again complains about a missing tutorial_fdw.control
file.
The control
file
describes a proper Postgres extension, and includes metadata about the
extension itself.
$ cat > tutorial_fdw.control <<EOF
comment = 'Tutorial FDW.'
default_version = '1.0'
module_pathname = '$libdir/tutorial_fdw'
relocatable = true
Now our extension will build successfully. make install
will fail with:
install: .//tutorial_fdw--1.0.sql: No such file or directory
Our Makefile
references this file as a DATA
component, and PostgreSQL's
extension mechanism will attempt to load it when someone runs CREATE EXTENSION
tutorial_fdw;
. Its purpose is to run the lower-level SQL commands that will
load our shared library and point PostgreSQL at its entry points. A foreign
data wrapper in postgres has one mandatory and one optional entry point:
-
A
handler
entry point, which returns a struct of function pointers that will implement the foreign data wrapper API. These function pointers will be called by postgres to participate in query planning and execution. They will do things like estimate the number of rows that match a given restrict clause, enumerate and estimate the costs of the different ways of accessing a given set of rows (through an index, table scan, covering index scan, etc.), and implement the table access itself where each row is returned to the execution engine. This is the meat of a FDW implementation and what we are going to build. -
A
validator
entry point, which is an optional entry point that will be called with the options which have been set on the foreign server, table, user or wrapper itself.
We will leave the validator as a future addition and just implement handler
for now.
$ cat > tutorial_fdw--1.0.sql <<EOF
CREATE FUNCTION tutorial_fdw_handler()
RETURNS fdw_handler
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;
CREATE FOREIGN DATA WRAPPER tutorial_fdw
HANDLER tutorial_fdw_handler;
EOF
A Small Smoke Test
Now our extension can build and install as a proper extension accessible from our postgres installation. We will now be iterating on an implementation to actually return result tuples as part of a table scan. Let's define exactly what we want our tutorial foreign data wrapper to do, and how people will interface with it.
Our FDW will be quite silly. We will implement it to support single column
tables where the column is an integer. By default, we will return single
element tuples sequentially from 0 to 63 inclusive. Then we will add the
ability for options
on the table to control where we start and where we end.
Interacting with our extension should look like:
> CREATE EXTENSION tutorial_fdw;
CREATE EXTENSION
> CREATE SERVER tutorial_server FOREIGN DATA WRAPPER tutorial_fdw;
CREATE SERVER
> CREATE FOREIGN TABLE sequential_ints ( val int ) SERVER tutorial_server;
CREATE FOREGIN TABLE
> SELECT * FROM sequential_ints;
val
-----
0
1
2
3
4
5
...
(64 rows)
As we iterate on our implementation, we will want to test it out. So we create
a small smoke test which does the above in a standalone (and newly created)
postgres database. smoke_test.sh
looks like:
#!/bin/bash
set -eo pipefail
make install
PGDATA=`mktemp -d -t tfdw-XXXXXXXXXXX`
trap "PGDATA=\"$PGDATA\" pg_ctl stop >/dev/null || true; rm -rf \"$PGDATA\"" EXIT
PGDATA="$PGDATA" pg_ctl initdb > /dev/null
PGDATA="$PGDATA" pg_ctl start
psql postgres -f smoke_test.sql
And smoke_test.sql
is:
CREATE EXTENSION tutorial_fdw;
CREATE SERVER tutorial_server FOREIGN DATA WRAPPER tutorial_fdw;
CREATE FOREIGN TABLE sequential_ints ( val int ) SERVER tutorial_server;
SELECT * FROM sequential_ints;
Now after ./smoke_test.sh
, we will see the server logs, which will include error log lines similar to:
ERROR: incompatible library "/usr/local/lib/postgresql/tutorial_fdw.so": missing magic block
HINT: Extension libraries are required to use the PG_MODULE_MAGIC macro.
That means our extension is installed and postgres attempted to load it. Great progress. Now we just need to write the implementation.
Iterative Implementation
Let's continue on our path of addressing the immediate failures we see.
PG_MODULE_MAGIC
is documented
here and we
will need to include it in our shared library. From now on, we will be making
changes to tutorial_fdw.c
, which is currently empty.
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -0,0 +1,3 @@
+#include "postgres.h"
+#include "fmgr.h"
+PG_MODULE_MAGIC;
Changes our first error upon running smoke_test.sh
to:
psql:smoke_test.sql:1: ERROR: could not find function "tutorial_fdw_handler" in file "/usr/local/lib/postgresql/tutorial_fdw.so"
So we will need to implement the function we told postgres would exist.
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -1,3 +1,14 @@
#include "postgres.h"
#include "fmgr.h"
+#include "foreign/fdwapi.h"
+Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
+
+Datum
+tutorial_fdw_handler(PG_FUNCTION_ARGS)
+{
+ FdwRoutine *fdwroutine = makeNode(FdwRoutine);
+ PG_RETURN_POINTER(fdwroutine);
+}
+
PG_MODULE_MAGIC;
This is an extension function that is callable from SQL, so it has a particular
calling convention it needs to follow. We use preprocessor macros to declare
its arguments, generate its return value and expose it in our shared library.
You can read about these calling
conventions and
how to work with them further. For the purposes of a foreign data wrapper, we
have a parameter-less function that always returns a pointer to a struct
FdwRoutine
.
When we run our smoke_test.sh
now, you will see that our client gets a closed
connection error and the postgres server logs that the child process died with
a segmentation fault. Our implementation returned a pointer to a newly
allocated and zeroed FdwRoutine struct, but that struct is meant to contain
function pointers to our implementation. Our handler function did not populate
any. A small number of the function pointers are necessary, but most are
optional. You can find documentation on the
members and see the
definition of the
struct.
For now, let's populate the necessary members with some stubs.
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -4,11 +4,51 @@
Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
+void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid);
+void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid);
+ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid,
+ ForeignPath *best_path, List *tlist, List *scan_clauses, Plan *outer_plan);
+void tutorial_fdw_BeginForeignScan(ForeignScanState *node, int eflags);
+TupleTableSlot *tutorial_fdw_IterateForeignScan(ForeignScanState *node);
+void tutorial_fdw_ReScanForeignScan(ForeignScanState *node);
+void tutorial_fdw_EndForeignScan(ForeignScanState *node);
+
Datum
tutorial_fdw_handler(PG_FUNCTION_ARGS)
{
FdwRoutine *fdwroutine = makeNode(FdwRoutine);
+ fdwroutine->GetForeignRelSize = tutorial_fdw_GetForeignRelSize;
+ fdwroutine->GetForeignPaths = tutorial_fdw_GetForeignPaths;
+ fdwroutine->GetForeignPlan = tutorial_fdw_GetForeignPlan;
+ fdwroutine->BeginForeignScan = tutorial_fdw_BeginForeignScan;
+ fdwroutine->IterateForeignScan = tutorial_fdw_IterateForeignScan;
+ fdwroutine->ReScanForeignScan = tutorial_fdw_ReScanForeignScan;
+ fdwroutine->EndForeignScan = tutorial_fdw_EndForeignScan;
PG_RETURN_POINTER(fdwroutine);
}
+void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
+}
+
+void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
+}
+
+ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid,
+ ForeignPath *best_path, List *tlist, List *scan_clauses, Plan *outer_plan) {
+ return NULL;
+}
+
+void tutorial_fdw_BeginForeignScan(ForeignScanState *node, int eflags) {
+}
+
+TupleTableSlot *tutorial_fdw_IterateForeignScan(ForeignScanState *node) {
+ return NULL;
+}
+
+void tutorial_fdw_ReScanForeignScan(ForeignScanState *node) {
+}
+
+void tutorial_fdw_EndForeignScan(ForeignScanState *node) {
+}
+
Our stubs all do absolutely nothing and, when they have to return a pointer, return NULL
. Now if we run smoke_test, we get:
ERROR: could not devise a query plan for the given query
It's time to flesh out the implementation and actually participate in query planning.
Paths and Plans
Postgres documentation describes the path of a
query. After a query has
been parsed and rewritten, our FDW will participate in planning and optimizing
to come up with a final query plan that can be executed. The postgres query
planner is going to call GetForeignRelSize
, GetForeignPaths
and
GetForeignPlan
, in that order, for each table access in a query. The purpose
is as follows:
-
GetForeignRelSize
— updatebaserel->rows
, and possiblybaserel->width
andbaserel->tuples
, with an estimated result set size for a scan ofbaserel
, accounting for the filtering done by restriction qualifiers. Planning can proceed without an accurate implementation here, but the potential for misoptimizations is high. -
GetForeignPaths
— updatebaserel->pathlist
to include ways of accessingbaserel
. Typically addsForeignPath *
s created withcreate_foreignscan_path
tobaserel
usingadd_path
. Each added path will include a cost estimate, a rows estimate and potentially outer dependencies. The optimizer will work to choose a plan that minimizes costs. The error we see currently fromsmoke_test
is indicating that this method did not add a path tobaserel
and thus the query cannot be planned. -
GetForeignPlan
— responsible for creating aForeignScan *
for the givenForeignPath *
.base_path
was created byGetForeignPaths
and has been chosen by the planner as the access path for the query.
Because it is not strictly necessary, we will ignore GetForeignRelSize
for
now. Let's implement GetForeignPaths
so that we always add a single path,
representing a full enumeration of all the integers we return.
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -1,6 +1,7 @@
#include "postgres.h"
#include "fmgr.h"
#include "foreign/fdwapi.h"
+#include "optimizer/pathnode.h"
Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
@@ -31,6 +32,16 @@ void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid
}
void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
+ Path *path = (Path *)create_foreignscan_path(root, baserel,
+ NULL, /* default pathtarget */
+ baserel->rows, /* rows */
+ 1, /* startup cost */
+ 1 + baserel->rows, /* total cost */
+ NIL, /* no pathkeys */
+ NULL, /* no required outer relids */
+ NULL, /* no fdw_outerpath */
+ NIL); /* no fdw_private */
+ add_path(baserel, path);
}
ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid,
After this, our smoke test is back to segfaulting. Our returned ForeignPath *
told it to use our GetForeignPlan
to access that relation, but instead of
returning a Plan node that could be executed, tutorial_fdw_GetForeignPlan
turned NULL
. No problem, now we can just implement
tutorial_fdw_GetForeignPlan
.
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -2,6 +2,8 @@
#include "fmgr.h"
#include "foreign/fdwapi.h"
#include "optimizer/pathnode.h"
+#include "optimizer/restrictinfo.h"
+#include "optimizer/planmain.h"
Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
@@ -46,7 +48,15 @@ void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid fo
ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid,
ForeignPath *best_path, List *tlist, List *scan_clauses, Plan *outer_plan) {
- return NULL;
+ scan_clauses = extract_actual_clauses(scan_clauses, false);
+ return make_foreignscan(tlist,
+ scan_clauses,
+ baserel->relid,
+ NIL, /* no expressions we will evaluate */
+ NIL, /* no private data */
+ NIL, /* no custom tlist; our scan tuple looks like tlist */
+ NIL, /* no quals we will recheck */
+ outer_plan);
}
void tutorial_fdw_BeginForeignScan(ForeignScanState *node, int eflags) {
The extract_actual_clauses
builds a List *
of the clause
field of the
passed in scan_clauses
, which are a list of RestrictInfo *
nodes. By adding
them all in the second parameter to make_foreignscan
, we are telling the
executor that we are not responsible for enforcing any of those clauses, and it
is responsible for evaluating them on the tuples we return and excluding the
appropriate ones.
And now we've reached a major milestone. Because if we run smoke_test.sh
, we
don't crash and we don't get an error! We get the following output from
smoke_test.sql
:
CREATE EXTENSION
CREATE SERVER
CREATE FOREIGN TABLE
val
-----
(0 rows)
Now we just need to generate some rows.
Begin, Iterate, End
The next time postgres calls into our FDW, it's going to be completely done
planning the query and it's going to be in the execution phase. It will start
one or more scans, using the ForeignScan *
nodes we returned to it to construct
ForeignScanState *
nodes and passing them into
tutorial_fdw_BeginForeignScan
. Then it will call
tutorial_fdw_IterateForeignScan
one or more times, until either it no longer
needs results from the scan or the scan is complete. Each call to
tutorial_fdw_IterateForeignScan
will return one new result tuple. Finally,
the engine will call tutorial_fdw_EndForeignScan
to allow our FDW to clean up
anything it needs to.
We won't need to clean up anything for our implementation, so we only need to implement Begin and Iterate. Both of them are quite straightforward.
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -59,11 +59,28 @@ ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel,
outer_plan);
}
+typedef struct tutorial_fdw_state {
+ int current;
+} tutorial_fdw_state;
+
void tutorial_fdw_BeginForeignScan(ForeignScanState *node, int eflags) {
+ tutorial_fdw_state *state = palloc0(sizeof(tutorial_fdw_state));
+ node->fdw_state = state;
}
TupleTableSlot *tutorial_fdw_IterateForeignScan(ForeignScanState *node) {
- return NULL;
+ TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+ ExecClearTuple(slot);
+
+ tutorial_fdw_state *state = node->fdw_state;
+ if (state->current < 64) {
+ slot->tts_isnull[0] = false;
+ slot->tts_values[0] = Int32GetDatum(state->current);
+ ExecStoreVirtualTuple(slot);
+ state->current++;
+ }
+
+ return slot;
}
void tutorial_fdw_ReScanForeignScan(ForeignScanState *node) {
ForeignScanState
includes a void *fdw_state
field as a place for FDW
implementations to store state that is used in generating the scan results. We
store our scan state there and Iterate accesses and mutates it in order to
return the desired results.
On the provided ForeignScanState node we also have a TupleTableSlot which is
set up for taking the values of the columns we are returning. We can store a
virtual tuple directly to the slot as shown above. If we have more columns to
populate, we could do it with slot->tts_values[1] = ...
for example.
At this point our FDW passes our smoke test. We still have one unimplemented
function to implement: tutorial_fdw_ReScanForeignScan
. Because our
implementation is so simple, restarting the scan from the beginning is also
straightforward.
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -84,6 +84,8 @@ TupleTableSlot *tutorial_fdw_IterateForeignScan(ForeignScanState *node) {
}
void tutorial_fdw_ReScanForeignScan(ForeignScanState *node) {
+ tutorial_fdw_state *state = node->fdw_state;
+ state->current = 0;
}
void tutorial_fdw_EndForeignScan(ForeignScanState *node) {
Some Simple Error Checking
Our current implementation is not robust regarding the schema of the created
foreign table. Try changing smoke_test.sql
to create a table with a text
column, or a float
column, or multiple columns, or no columns, and see what
happens. We definitely don't want unallocated memory access and potential
memory corruption in our postgres extension.
To address this, we will add some simple error checking in the planner phase so
we can return an error if the schema doesn't match what we expect. The first
time our plugin sees the remote table definition is in
tutorial_fdw_GetForeignRelSize
, so that seems like a natural place to check
things.
We can go about this sanity check in a few ways, but here we're just going to do something simple and strict, making assertions against the attribute descriptions of the foreign table relation definition. Doing so looks like this:
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -4,6 +4,8 @@
#include "optimizer/pathnode.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/planmain.h"
+#include "utils/rel.h"
+#include "access/table.h"
Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
@@ -31,6 +33,19 @@ tutorial_fdw_handler(PG_FUNCTION_ARGS)
}
void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
+ Relation rel = table_open(foreigntableid, NoLock);
+ if (rel->rd_att->natts != 1) {
+ ereport(ERROR,
+ errcode(ERRCODE_FDW_INVALID_COLUMN_NUMBER),
+ errmsg("incorrect schema for tutorial_fdw table %s: table must have exactly one column", NameStr(rel->rd_rel->relname)));
+ }
+ Oid typid = rel->rd_att->attrs[0].atttypid;
+ if (typid != INT4OID) {
+ ereport(ERROR,
+ errcode(ERRCODE_FDW_INVALID_DATA_TYPE),
+ errmsg("incorrect schema for tutorial_fdw table %s: table column must have type int", NameStr(rel->rd_rel->relname)));
+ }
+ table_close(rel, NoLock);
}
void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
This demonstrates a few new constructs. First, we can inspect the description
of the foreign table definition in the system catalog using table_open
. The
RelationData
struct which it
returns has a number of fields related to the implementation of various
postgres features, but it notably has the TupleDesc rd_att
field containing
the description of the relation's tuple. Above we assert that there is exactly
one attribute in this tuple and that it is an int4
, which corresponds to the
type with which we're populating the virtual tuple stored in TupleTableSlot in
our Iterate function (Int32GetDatum
up above).
We also see how to report errors up above. There is a list of predefined
errcodes
and errmsg
accepts a formatting string and arguments. Reporting an error of
severity ERROR
will abort the current query, and even the current C function
execution, using setjmp
/longjmp
machinery. See
elog.h
for more information, including information about try
/catch
/finally
functionality.
Adding Options
As our last piece of functionality, we will change our FDW so that tables can be annotated with options regarding the start and end of their sequences. This will let us learn about threading some state information from the planning phase to the execution phase.
The CREATE FOREIGN TABLE
syntax in postgres supports arbitrary string-valued
options. We're going to support the following:
CREATE FOREIGN TABLE sequential_ints ( val int ) SERVER tutorial_server OPTIONS ( start '0', end '64' )
Both options will be optional. start
will default to 0. end
will default to
64. They will specify a range [start, end)
. If present, an option will be
required to be parse as a base 10 non-negative integer. After options are
parsed and defaults are applied, we will throw an error if end < start
.
We will access our options from a ForeignTable
struct that is accessible
through the foreigntableoid
. GetRelSize
is a perfect place to initially
parse and validate these options. Let's do that first.
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -6,6 +6,8 @@
#include "optimizer/planmain.h"
#include "utils/rel.h"
#include "access/table.h"
+#include "foreign/foreign.h"
+#include "commands/defrem.h"
Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
@@ -47,6 +49,51 @@ void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid
}
table_close(rel, NoLock);
+ int start = 0, end = 64;
+
+ ForeignTable *ft = GetForeignTable(foreigntableid);
+ ListCell *cell;
+ foreach(cell, ft->options) {
+ DefElem *def = lfirst_node(DefElem, cell);
+
+ if (strcmp("start", def->defname) == 0) {
+ char *val = defGetString(def);
+ if (sscanf(val, "%d", &start) != 1) {
+ ereport(ERROR,
+ errcode(ERRCODE_FDW_ERROR),
+ errmsg("invalid value for option \"start\": \"%s\" must be a decimal integer", val));
+ }
+ } else if (strcmp("end", def->defname) == 0) {
+ char *val = defGetString(def);
+ if (sscanf(val, "%d", &end) != 1) {
+ ereport(ERROR,
+ errcode(ERRCODE_FDW_ERROR),
+ errmsg("invalid value for option \"end\": \"%s\" must be a decimal integer", val));
+ }
+ } else {
+ ereport(ERROR,
+ (errcode(ERRCODE_FDW_INVALID_OPTION_NAME),
+ errmsg("invalid option \"%s\"", def->defname),
+ errhint("Valid table options for tutorial_fdw are \"start\" and \"end\"")));
+ }
+ }
+ if (start < 0) {
+ ereport(ERROR,
+ errcode(ERRCODE_FDW_ERROR),
+ errmsg("invalid value for option \"start\": must be non-negative"));
+ }
+ if (end < 0) {
+ ereport(ERROR,
+ errcode(ERRCODE_FDW_ERROR),
+ errmsg("invalid value for option \"end\": must be non-negative"));
+ }
+ if (end < start) {
+ ereport(ERROR,
+ errcode(ERRCODE_FDW_ERROR),
+ errmsg("invalid values for option \"start\" and \"end\": end must be >= start"));
+ }
+
+ baserel->rows = end - start;
}
void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
That's a bunch of code, but a lot of it is standard C or things we've seen
before, such as sscanf
and ereport
. Take a moment to look at the foreach
macro which iterates through the List *options
field. In this List, the Cells
all have DefElem *
values, so we use lfirst_node
to work on each value from
within our loop. See
pg_list.h for more
context on the List type in postgres and a lot of convenience functions for
working with it.
Now we have two populated local variables with our start and end options. We
will thread those through baserel
so that they are available to
tutorial_fdw_GetForeignPlan
and can be put on the ForeignScan node itself.
Postgres provides a void *fdw_private
field for exactly this purpose. It will
not be touched by anything else, and so we are free to store whatever we need
to in it.
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -34,6 +34,10 @@ tutorial_fdw_handler(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(fdwroutine);
}
+typedef struct tutorial_fdw_TableOptions {
+ int start, end;
+} tutorial_fdw_TableOptions;
+
void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
Relation rel = table_open(foreigntableid, NoLock);
if (rel->rd_att->natts != 1) {
@@ -94,6 +98,11 @@ void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid
}
baserel->rows = end - start;
+
+ tutorial_fdw_TableOptions *opts = palloc(sizeof(tutorial_fdw_TableOptions));
+ opts->start = start;
+ opts->end = end;
+ baserel->fdw_private = opts;
}
void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
Now they will arrive at GetForeignPlan without a problem. We need to get them
from GetForeignPlan to Begin/Iterate. But Begin/Iterate don't have access to
baserel
; it is a planning construct and no longer exists during execution.
The ForeignScan *
plan node has a fdw_private
field as well, but it is of
type List *
, not void *
. And, in fact, because of how execution nodes are
used, there are constraints on what we can put into the fdw_private
field of
the ForeignScan *
node. It must be a decently proper node
which
copyObject
will work on.
You can read about nodes in the README and the various source files adjacent to it. It's possible to make our own node type within an extension by using an ExtensibleNode. But that is a bit overkill for this tutorial. Instead, we will just populate a list with two value nodes, having the start and end integers respectively.
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -8,6 +8,7 @@
#include "access/table.h"
#include "foreign/foreign.h"
#include "commands/defrem.h"
+#include "nodes/pg_list.h"
Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
@@ -120,12 +121,14 @@ void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid fo
ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid,
ForeignPath *best_path, List *tlist, List *scan_clauses, Plan *outer_plan) {
+ tutorial_fdw_TableOptions *opts = baserel->fdw_private;
+ List *fdw_private = list_make2(makeInteger(opts->start), makeInteger(opts->end));
scan_clauses = extract_actual_clauses(scan_clauses, false);
return make_foreignscan(tlist,
scan_clauses,
baserel->relid,
NIL, /* no expressions we will evaluate */
- NIL, /* no private data */
+ fdw_private, /* pass along our start and end */
NIL, /* no custom tlist; our scan tuple looks like tlist */
NIL, /* no quals we will recheck */
outer_plan);
list_make2
makes a two element list, and makeInteger
is creating value
nodes which respect copyObject
, and thus the contract. Now we can use what
has been passed along by accessing the plan node's fdw_private from our
execution node and using it to populate our fdw_state
.
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -136,10 +136,14 @@ ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel,
typedef struct tutorial_fdw_state {
int current;
+ int end;
} tutorial_fdw_state;
void tutorial_fdw_BeginForeignScan(ForeignScanState *node, int eflags) {
tutorial_fdw_state *state = palloc0(sizeof(tutorial_fdw_state));
+ ForeignScan *fs = (ForeignScan *)node->ss.ps.plan;
+ state->current = intVal(linitial(fs->fdw_private));
+ state->end = intVal(lsecond(fs->fdw_private));
node->fdw_state = state;
}
@@ -148,7 +152,7 @@ TupleTableSlot *tutorial_fdw_IterateForeignScan(ForeignScanState *node) {
ExecClearTuple(slot);
tutorial_fdw_state *state = node->fdw_state;
- if (state->current < 64) {
+ if (state->current < state->end) {
slot->tts_isnull[0] = false;
slot->tts_values[0] = Int32GetDatum(state->current);
ExecStoreVirtualTuple(slot);
The only new stuff here is linitial
and lsecond
, extracting the first and
second pointer value from the cells of a List; intVal
, extracting the
integer of a value node; and the chain of field accesses to get from the
ForeignScanState
to the ForeignScan
plan node that created it.
Wrapping Up and Next Steps
If you've made it this far, congratulations. Try updating the smoke_test.sql
script to make use of the new functionality, and to exercise the functionality
of the query engine further. Here are some fun near-at-hand exercises:
-
Update
smoke_test.sql
to create multiple tables with different ranges and query them in various ways — joins, unions, where clauses, subselects. -
Add another table option,
step
, defaulting to1
. Requirestep
to be non-zero and positive. -
Remove the restriction of
start
andend
being non-negative. -
Remove the restriction on
step
being positive andend >= start
. Still throw an error if step does not move from start to end. -
Extend
Iterate
and the validation logic to allow for more numeric column types.bigint
(INT8OID
) andsmallint
(INT2OID
) are natural extensions.bigint
might require careful handling on 32-bit platforms and it's easy to trick yourself into thinking it's working if you're only testing on 64-bit architectures. Can you extend the validation logic to avoid overflow and underflow? Can you support float types as well?
After gaining familiarity with what a bare-bones postgres FDW extension looks like, the fantastic postgres documentation might seem a lot more accessible. Here are some good resources to follow up with:
-
Slides from the presentation Implementing your first PostgreSQL extension
-
The postgresql source code doxygen. Some constants, particularly catalog data like
INT4OID
and errcode constants likeERRCODE_FDW_ERROR
are generated into header files at build time, so don't be surprised by some missing stuff. -
Check out resources for developing FDWs in other languages. multicorn will make the above exposition seem a little silly in comparison.
And, of course, check out the implementations of the existing foreign data wrappers linked above.
PostgreSQL and Dolt
Dolt is a SQL database that supports Git-like functionality, including branch, merge, diff and clone. Dolt is written in Go and it implements its own SQL engine that aims for MySQL compatibility. So why are we blogging about how to implement PostgreSQL foreign data wrappers? Well, a lot of our customers are interested in PostgreSQL or already have a substantial investment in PostgreSQL. We think it would be great if people could use PostgreSQL to access Dolt tables and all of their related functionality such as the commit graph.
Because Dolt is compatible with MySQL, the existing mysql_fdw
provides an
avenue for integrating from PostgreSQL with Dolt. But we think there is also
opportunity for native extensions that expose the full power of Dolt without the
constraints and overhead of separate servers, network protocols, mismatched
type systems, etc.
We're still exploring exactly what may be possible here, but we're definitely excited about the future. We hope you'll keep an eye on this space and let us know your thoughts on Twitter, Discord and GitHub.