psql's Hidden Queries
Introduction
We're hard at work building Doltgres, a Postgres-compatible database with git-inspired version control features. Before we built Doltgres we built Dolt, which is MySQL-compatible instead. One of the first things we needed to understand and get working was how Postgres handles informational queries from its shell.
This blog post discusses what the informational commands from psql
do under the hood and how we
support them in Doltgres.
What are informational queries?
Informational queries are queries of database metadata to examine what databases, tables, columns, and other schema entities exist. They're a quick way to figure out the shape of the data and how to start querying it.
For example, in MySQL you can use the DESCRIBE
keyword to get a simplified view of a table's
schema:
mysql> describe t1;
+-------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------+------+-----+---------+-------+
| a | int | NO | | NULL | |
| b | int | YES | | NULL | |
+-------+------+------+-----+---------+-------+
2 rows in set (0.00 sec)
MySQL has a bunch of different useful, easy-to-use informational queries built right into the MySQL dialect, things like this:
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| common |
| test |
| test1 |
| world |
+--------------------+
mysql> show tables;
+---------------------+
| Tables_in_test |
+---------------------+
| a |
| about |
| ai |
| ... |
| t1 |
+---------------------+
These work in the MySQL shell, but there's nothing special about the MySQL shell here: it just sends the queries to the server, which interprets them as normal statements and executes them. This means the same syntax works on any MySQL connection or application, not just the MySQL shell.
Postgres, meanwhile, doesn't have this built in syntactic sugar. But it does ship with a shell
called psql
, which supports a bunch of informational commands you can view with the \?
command:
Informational
(options: S = show system objects, + = additional detail)
\d[S+] list tables, views, and sequences
\d[S+] NAME describe table, view, sequence, or index
\da[S] [PATTERN] list aggregates
\dA[+] [PATTERN] list access methods
\dAc[+] [AMPTRN [TYPEPTRN]] list operator classes
\dAf[+] [AMPTRN [TYPEPTRN]] list operator families
\dAo[+] [AMPTRN [OPFPTRN]] list operators of operator families
\dAp[+] [AMPTRN [OPFPTRN]] list support functions of operator families
\db[+] [PATTERN] list tablespaces
\dc[S+] [PATTERN] list conversions
...
You run these commands in the shell and get the results. For example, this one is roughly equivalent
to MySQL's SHOW TABLES
.
doltgres-> \d
List of relations
Schema | Name | Type | Owner
--------+------------+-------+----------
public | test_table | table | postgres
(1 row)
The problem is that these commands only work in the psql
shell. If you want this functionality in
your application, or in another shell, you're out of luck.
But not to worry. We are here to peel back the curtain and show you what these commands are actually doing so you can use them in other contexts as needed.
psql commands under the hood: Listing tables
Behind the scenes, when you run one of the informational commands supported by psql
, the shell
intercepts it and translates it to a query on the pg_catalog
tables. Let's look at the example
above, which lists all relations.
doltgres-> \d
List of relations
Schema | Name | Type | Owner
--------+------------+-------+----------
public | test_table | table | postgres
(1 row)
To get this result, psql
is sending this SQL query to the Postgres server:
SELECT n.nspname as "Schema",
c.relname as "Name",
CASE c.relkind
WHEN 'r' THEN 'table'
WHEN 'v' THEN 'view'
WHEN 'm' THEN 'materialized view'
WHEN 'i' THEN 'index'
WHEN 'S' THEN 'sequence'
WHEN 't' THEN 'TOAST table'
WHEN 'f' THEN 'foreign table'
WHEN 'p' THEN 'partitioned table'
WHEN 'I' THEN 'partitioned index'
END as "Type",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_catalog.pg_am am ON am.oid = c.relam
WHERE c.relkind IN ('r','p','v','m','S','f','')
AND n.nspname <> 'pg_catalog'
AND n.nspname !~ '^pg_toast'
AND n.nspname <> 'information_schema'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
That's quite a bit more complicated than \d
! But as you can see, it resolves into a relatively
straightforward query on the pg_catalog.pg_class
table and a few others. pg_catalog
is a special
built-in schema that Postgres uses to store information about types, tables, and other database
schema entities. Unlike the Information schema,
which is part of the SQL standard, the pg_catalog
schema is custom to Postgres and reflects
internal architectural decisions particular to how Postgres organizes its data.
Let's look at a few other examples.
Listing indexes
MySQL has the query SHOW INDEXES from t1
, which tells you what indexes exist on the table
named. The closest equivalent from psql
is \di
, which tells you all indexes in the current
database.
postgres=# \di
List of relations
Schema | Name | Type | Owner | Table
----------+----------------------------+-------+----------+-----------------------
postgres | test_schema_pkey | index | postgres | test_schema
public | a.b.c_pkey | index | postgres | a.b.c
public | t10_pkey | index | postgres | t10
public | t1_pkey | index | postgres | t1
public | t2_pkey | index | postgres | t2
public | t_bit_pkey | index | postgres | t_bit
public | t_bit_varying_pkey | index | postgres | t_bit_varying
public | t_box_pkey | index | postgres | t_box
public | t_bytea_pkey | index | postgres | t_bytea
public | t_cidr_pkey | index | postgres | t_cidr
public | t_circle_pkey | index | postgres | t_circle
public | t_inet_pkey | index | postgres | t_inet
public | t_interval_pkey | index | postgres | t_interval
public | t_line_pkey | index | postgres | t_line
public | t_lseg_pkey | index | postgres | t_lseg
public | t_macaddr_pkey | index | postgres | t_macaddr
public | t_money_pkey | index | postgres | t_money
public | t_path_pkey | index | postgres | t_path
public | t_pg_lsn_pkey | index | postgres | t_pg_lsn
public | t_pkey | index | postgres | t
public | t_point_pkey | index | postgres | t_point
public | t_polygon_pkey | index | postgres | t_polygon
public | t_smallserial_pkey | index | postgres | t_smallserial
public | t_timestamp_with_zone_pkey | index | postgres | t_timestamp_with_zone
public | t_tsquery_pkey | index | postgres | t_tsquery
public | t_tsvector_pkey | index | postgres | t_tsvector
public | t_uuid_pkey | index | postgres | t_uuid
public | t_xml_pkey | index | postgres | t_xml
public | test_pkey | index | postgres | test
You can filter this down to a single index by naming it:
# \di t_pkey
List of relations
Schema | Name | Type | Owner | Table
--------+--------+-------+----------+-------
public | t_pkey | index | postgres | t
(1 row)
Behind the scenes, psql
is running this query:
SELECT n.nspname as "Schema",
c.relname as "Name",
CASE c.relkind
WHEN 'r' THEN 'table'
WHEN 'v' THEN 'view'
WHEN 'm' THEN 'materialized view'
WHEN 'i' THEN 'index'
WHEN 'S' THEN 'sequence'
WHEN 't' THEN 'TOAST table'
WHEN 'f' THEN 'foreign table'
WHEN 'p' THEN 'partitioned table'
WHEN 'I' THEN 'partitioned index'
END as "Type",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_catalog.pg_am am ON am.oid = c.relam
WHERE c.relkind IN ('r','p','v','m','S','f','')
AND n.nspname <> 'pg_catalog'
AND n.nspname !~ '^pg_toast'
AND n.nspname <> 'information_schema'
AND pg_catalog.pg_table_is_visible(c.oid);
What's interesting is that most of the CASE
statement is rendered useless by the WHERE
clause,
but it's still in there, presumably copied from another similar query. Note also that the
pg_catalog
tables themselves are filtered from the result, as are the information_schema
tables
and other Postgres internal tables.
Listing databases
MySQL has SHOW DATABASES
, which is equivalent to the \l
command in psql
:
doltgres=> \l
List of databases
Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges
--------------------+----------+----------+----------------------------+----------------------------+------------+-----------------+-------------------
doltgres | postgres | UTF8 | English_United States.1252 | English_United States.1252 | | libc |
information_schema | postgres | UTF8 | English_United States.1252 | English_United States.1252 | | libc |
(2 rows)
Under the hood, psql
is executing this query:
SELECT
d.datname as "Name",
pg_catalog.pg_get_userbyid(d.datdba) as "Owner",
pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding",
CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS "Locale Provider",
d.datcollate as "Collate",
d.datctype as "Ctype",
d.daticulocale as "ICU Locale",
NULL as "ICU Rules",
pg_catalog.array_to_string(d.datacl, E'\n') AS "Access privileges"
FROM pg_catalog.pg_database d
ORDER BY 1;
What's curious is that altough the pg_catalog
schema is technically local to the current database,
it obviously does contain information about other databases than the one currently connected.
Describing a table
psql
's built-in way to describe a table is arguably more useful than MySQL's DESCRIBE
keyword. You just use \d
with the name of the table you want to examine:
postgres=# \d t1
Table "public.t1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | not null |
name | text | | |
Indexes:
"t1_pkey" PRIMARY KEY, btree (id)
Publications:
"doltgres_slot"
"pglogrepl_demo"
Here I'm getting not just the columns of the table, but also indexes on it, as well as some related info like which publications refer to this table. Pretty neat!
If you look at the queries psql
is sending to the server, you'll see there's a lot of
them. Unlike other psql
commands, the command to describe a table is not a straightforward query
and clearly involves a lot of back and forth and interpretation between shell and server. The psql
shell is smart enough to omit any elements that aren't present and only show you relevant
results. Here's what it's doing under the hood, annotated with comments to help you understand
what's happening.
-- Getting the table's OID for use in further queries
SELECT c.oid,
n.nspname,
c.relname
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname OPERATOR(pg_catalog.~) '^(t1)$' COLLATE pg_catalog.default
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 2, 3;
-- Getting which kind of schema elements are in this table (using the OID from the first query)
SELECT c.relchecks,
c.relkind,
c.relhasindex,
c.relhasrules,
c.relhastriggers,
c.relrowsecurity,
c.relforcerowsecurity,
false AS relhasoids,
c.relispartition, '',
c.reltablespace,
CASE
WHEN c.reloftype = 0 THEN ''
ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text
END, c.relpersistence, c.relreplident, am.amname
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)
LEFT JOIN pg_catalog.pg_am am ON (c.relam = am.oid)
WHERE c.oid = '62222';
-- Getting column attributes like name, collation, nullability
SELECT a.attname,
pg_catalog.format_type(a.atttypid, a.atttypmod),
(SELECT pg_catalog.pg_get_expr(d.adbin, d.adrelid, true)
FROM pg_catalog.pg_attrdef d
WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef),
a.attnotnull,
(SELECT c.collname FROM pg_catalog.pg_collation c, pg_catalog.pg_type t
WHERE c.oid = a.attcollation AND t.oid = a.atttypid AND a.attcollation <> t.typcollation) AS attcollation,
a.attidentity,
a.attgenerated
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = '62222' AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum;
-- Getting index information
SELECT c2.relname,
i.indisprimary,
i.indisunique,
i.indisclustered,
i.indisvalid,
pg_catalog.pg_get_indexdef(i.indexrelid, 0, true),
pg_catalog.pg_get_constraintdef(con.oid, true),
contype,
condeferrable,
condeferred,
i.indisreplident,
c2.reltablespace
FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i
LEFT JOIN pg_catalog.pg_constraint con ON (conrelid = i.indrelid AND conindid = i.indexrelid AND contype IN ('p','u','x'))
WHERE c.oid = '62222' AND c.oid = i.indrelid AND i.indexrelid = c2.oid
ORDER BY i.indisprimary DESC, c2.relname;
-- Getting policy and permission information (not relevant to results above)
SELECT pol.polname, pol.polpermissive,
CASE WHEN pol.polroles = '{0}' THEN NULL ELSE pg_catalog.array_to_string(array(select rolname from pg_catalog.pg_roles where oid = any (pol.polroles) order by 1),',') END,
pg_catalog.pg_get_expr(pol.polqual, pol.polrelid),
pg_catalog.pg_get_expr(pol.polwithcheck, pol.polrelid),
CASE pol.polcmd
WHEN 'r' THEN 'SELECT'
WHEN 'a' THEN 'INSERT'
WHEN 'w' THEN 'UPDATE'
WHEN 'd' THEN 'DELETE'
END AS cmd
FROM pg_catalog.pg_policy pol
WHERE pol.polrelid = '62222' ORDER BY 1;
-- Getting column statistics (not relevant to results above)
SELECT oid,
stxrelid::pg_catalog.regclass,
stxnamespace::pg_catalog.regnamespace::pg_catalog.text AS nsp,
stxname,
pg_catalog.pg_get_statisticsobjdef_columns(oid) AS columns,
'd' = any(stxkind) AS ndist_enabled,
'f' = any(stxkind) AS deps_enabled,
'm' = any(stxkind) AS mcv_enabled,
stxstattarget
FROM pg_catalog.pg_statistic_ext
WHERE stxrelid = '62222'
ORDER BY nsp, stxname;
-- Getting publication information
SELECT pubname
, NULL
, NULL
FROM pg_catalog.pg_publication p
JOIN pg_catalog.pg_publication_namespace pn ON p.oid = pn.pnpubid
JOIN pg_catalog.pg_class pc ON pc.relnamespace = pn.pnnspid
WHERE pc.oid ='62222' and pg_catalog.pg_relation_is_publishable('62222')
UNION
SELECT pubname
, pg_get_expr(pr.prqual, c.oid)
, (CASE WHEN pr.prattrs IS NOT NULL THEN
(SELECT string_agg(attname, ', ')
FROM pg_catalog.generate_series(0, pg_catalog.array_upper(pr.prattrs::pg_catalog.int2[], 1)) s,
pg_catalog.pg_attribute
WHERE attrelid = pr.prrelid AND attnum = prattrs[s])
ELSE NULL END) FROM pg_catalog.pg_publication p
JOIN pg_catalog.pg_publication_rel pr ON p.oid = pr.prpubid
JOIN pg_catalog.pg_class c ON c.oid = pr.prrelid
WHERE pr.prrelid = '62222'
UNION
SELECT pubname
, NULL
, NULL
FROM pg_catalog.pg_publication p
WHERE p.puballtables AND pg_catalog.pg_relation_is_publishable('62222')
ORDER BY 1;
-- Getting inherited tables (not present in above output)
SELECT c.oid::pg_catalog.regclass
FROM pg_catalog.pg_class c, pg_catalog.pg_inherits i
WHERE c.oid = i.inhparent AND i.inhrelid = '62222'
AND c.relkind != 'p' AND c.relkind != 'I'
ORDER BY inhseqno;
-- More inheritance information
SELECT c.oid::pg_catalog.regclass, c.relkind, inhdetachpending, pg_catalog.pg_get_expr(c.relpartbound, c.oid)
FROM pg_catalog.pg_class c, pg_catalog.pg_inherits i
WHERE c.oid = i.inhrelid AND i.inhparent = '62222'
ORDER BY pg_catalog.pg_get_expr(c.relpartbound, c.oid) = 'DEFAULT', c.oid::pg_catalog.regclass::pg_catalog.text;
This example is much, much more complex than the other ones, and writing these queries yourself (or
interpreting the results the way that psql
does) requires a pretty deep understanding of
postgres's internal storage details. Personally, I would take DESCRIBE t1
any day of the week.
Revealing the hidden queries
I obtained these hidden queries by connecting psql
to a running Doltgres server, where I turned on
debug logging to print every query received. Then I can just pull the queries out of the log:
DEBU[0005] Received message: Query { SELECT n.nspname as "Schema",
c.relname as "Name", ...
You can also use a packet-sniffing tool like Wireshark for this, but make sure to disable SSL on your connection or you won't be able to read the contents of the TCP packets.
% PGSSLMODE=disable PGPASSWORD=password psql -h 127.0.0.1 -U postgres
Conclusion
We're building Doltgres, the Postgres version of our version-controlled SQL database Dolt. Learning the ins and outs of how Postgres organizes its data is critical to Doltgres becoming a drop-in replacement for Postgres, which is our long-term goal. We're learning more and getting closer every day.
Have questions on psql
queries? Or maybe you are curious about the world's first
version-controlled SQL database? Join us on Discord to talk to our
engineering team and other Dolt users.