Common Table Expressions in MySQL: How and Why with Examples
In this guide, learn how to use common table expressions (CTEs) in MySQL, what they're good for, and try out some simple examples.
What's a common table expression?
Common table expressions (CTEs) are also known by their keyword, WITH
. They are a way to refer to
a subquery with a single name in a query that uses it multiple times. Using a CTE makes it less
likely that you'll make an error by changing only one instance of an identical subquery, and makes
it much simpler to iterate when developing a query that contains a large subquery that is used more
than once.
This is best illustrated with an example. Let's create a table with cities.
CREATE TABLE cities(
id int primary key,
name varchar(255),
state varchar(2),
zip varchar(5)
);
Let's say that we want to get a list of all the cities that share a zipcode and start with the letter A. We could write this query like this:
select * from
(select name, zip from cities where lower(name) like 'a%') as a_cities1
join (select name, zip from cities where lower(name) like 'a%') as a_cities2
on c_cities1.zip = a_cities2.zip;
Using a common table expression allows us to avoid repeating the same subquery twice.
WITH a_cities AS
(select name, zip from cities where lower(name) like 'a%')
select * from a_cities cities1 join a_cities cities2
on cities1.zip = cities2.zip
The WITH
keyword introduces common table expressions to use for the query and precedes the
SELECT
keyword. A query can define any number of common table expressions this way.
WITH a_cities AS
(select name, zip from cities where lower(name) like 'a%'),
b_cities AS
(select name, zip from cities where lower(name) like 'b%')
select * from a_cities join b_cities
on a_cities.zip = b_cities.zip;
CTEs can even be nested to refer to other CTEs. This makes them harder to read and understand, but it can be the most compact way to express certain queries.
WITH first_100_a_cities AS
(select name, zip from cities where lower(name) like 'a%' order by name limit 100),
first_ab_city as (select name from first_100_a_cities where name like 'ab' limit 1)
SELECT ...
Defining column names for CTEs
By default, CTEs are like any other subqueries, which means the column names they return are derived from the selected expressions, or an alias you choose. Also like subqueries, you can name the columns to refer to in other parts of the query.
WITH a_cities (city_state, zip) AS
(select concat(name, concat(',', state)), zip from cities where lower(name) like 'a%')
select city_state from a_cities cities1 join a_cities cities2
on cities1.zip = cities2.zip
Here we've renamed the output column for a complex expression so that we can refer to it by name.
CTEs in other kinds of statements
So far we've looked at SELECT
statements, where it's most common to find CTEs used. But they're
also legal in UPDATE
, DELETE
, and INSERT
statements, as well as anywhere that a SELECT
clause can appear, such a CREATE TABLE AS ...
.
For example, we can update multiple fields of a table with the results of a subquery like this.
WITH tallahassee AS
(SELECT name, zip from cities WHERE name = 'Tallahassee')
UPDATE populations
SET name = (select name from tallahassee),
zip = (select zip from tallahassee)
WHERE name = 'Tallhassee'; -- We had a typo here
This technique can be a more compact way of expressing certain UPDATE
statements. Make sure in
this case that the subquery of the CTE returns only a single row, or this statement will error.
The same thing works with DELETE
:
WITH tallahassee AS
(SELECT name, zip from cities WHERE name = 'Tallahassee')
DELETE FROM populations
WHERE name = (select name from tallahassee),
AND zip = (select zip from tallahassee);
Again, the subquery must return a single row for this to work. LIMIT 1
is appropriate in some
cases.
Recursive CTEs
Using the RECURSIVE
keyword, you can define CTEs that refer to themselves. This lets you define a
subquery whose results are computed rather than enumerated. For example, here is a CTE that includes
the numbers 1 through 5.
WITH RECURSIVE cte (n) AS
(
SELECT 1
UNION ALL
SELECT n + 1 FROM cte WHERE n < 5
)
SELECT * FROM cte;
The first part of the CTE SELECT 1
defines the recursive base case. Then you must use UNION ALL
or UNION DISTINCT
followed by the recursive part of the CTE with a SELECT
expression on the CTE
being defined.
The main reason you would want to use a recursive CTE is to traverse recursive data structures, such as graphs or trees. For example, let's say we have two tables that model a tree structure, created like this.
CREATE TABLE tree(
node_id int primary key,
value varchar(100)
);
CREATE TABLE children(
parent_id int,
child_id int
);
If we want to find all the values in the tree starting at some node, we can do that with this recursive CTE:
WITH RECURSIVE walk(i, v) AS
(
SELECT node_id, value FROM tree WHERE node_id = 1
UNION ALL
SELECT node_id, value FROM walk
JOIN children c ON c.parent_id = i
JOIN tree t ON t.node_id = c.child_id
) SELECT * FROM walk;
For example, let's say we have a tree that looks like this:
a
├─ b
│ ├─ d
│ └─ e
└─ c
We can insert these values into our tables like so:
mysql> insert into tree values (1, "a"), (2, "b"), (3, "c"), (4, "d"), (5, "e");
mysql> insert into children values (1,2), (1,3), (2,4), (2,5);
Now we can walk the entire tree with node_id = 1
:
mysql> WITH RECURSIVE walk(i, v) AS
-> (
-> SELECT node_id, value FROM tree WHERE node_id = 1
-> UNION ALL
-> SELECT node_id, value FROM walk
-> JOIN children c ON c.parent_id = i
-> JOIN tree t ON t.node_id = c.child_id
-> ) SELECT * FROM walk;
+------+------+
| i | v |
+------+------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
| 5 | e |
+------+------+
5 rows in set (0.00 sec)
Or the subtree rooted a b
with node_id = 2
:
mysql> WITH RECURSIVE walk(i, v) AS (
SELECT node_id, value FROM tree WHERE node_id = 2
UNION ALL
SELECT node_id, value FROM walk
JOIN children c ON c.parent_id = i
JOIN tree t ON t.node_id = c.child_id
) SELECT * FROM walk;
+------+------+
| i | v |
+------+------+
| 2 | b |
| 4 | d |
| 5 | e |
+------+------+
3 rows in set (0.00 sec)
This can get tricky fast, but it's a good tool to have in your belt. Refer to the MySQL manual for additional examples and caveats with this advanced use case.
What are CTEs good for?
Unlike some other SQL features, there's nothing you can do with non-recursive CTEs that you can't do
without them: they're just subqueries that you don't have to type out more than once to use more
than once. But for people who spend hours creating long, complicated queries, this is a big
deal. Even if your subquery is only used a single time rather than multiple times, using WITH
can
make it easier to read the query by putting it first, so it has a definition like any other table.
For example, here's a very large query from a customer of ours where the table and column names have
been obfuscated to protect their trade secrets. The large common table expression AX7FV
is only
used once, but defining it at the start of the query like this reduces the nesting and makes it
easier to reason about (but still hard -- this is a very complex query). As a fun exercise, see if
you can see where the main SELECT
statement begins.
WITH AX7FV AS
(SELECT
bs.T4IBQ AS T4IBQ,
pa.DZLIM AS ECUWU,
pga.DZLIM AS GSTQA,
pog.B5OUF,
fc.OZTQF,
F26ZW.YHYLK,
nd.TW55N AS TW55N
FROM
SZQWJ ms
INNER JOIN XOAOP pa
ON ms.CH3FR = pa.id
LEFT JOIN NPCYY pog
ON pa.id = pog.CH3FR
INNER JOIN PG27A pga
ON pog.XVSBH = pga.id
INNER JOIN FEIOE GZ7Z4
ON pog.id = GZ7Z4.GMSGA
INNER JOIN E2I7U nd
ON GZ7Z4.LUEVY = nd.id
RIGHT JOIN (
SELECT
THNTS.id,
YK2GW.FTQLQ AS T4IBQ
FROM THNTS
INNER JOIN YK2GW
ON IXUXU = YK2GW.id
) bs
ON ms.GXLUB = bs.id
LEFT JOIN AMYXQ fc
ON bs.id = fc.GXLUB AND nd.id = fc.LUEVY
LEFT JOIN (
SELECT
iq.T4IBQ,
iq.BRQP2,
iq.Z7CP5,
CASE
WHEN iq.FSDY2 IN ('SRARY','UBQWG') AND vc.ZNP4P = 'L5Q44' AND iq.IDWIO = 'KAOAS'
THEN 0
WHEN iq.FSDY2 IN ('SRARY','UBQWG') AND vc.ZNP4P = 'L5Q44' AND iq.IDWIO = 'OG'
THEN 0
WHEN iq.FSDY2 IN ('SRARY','UBQWG') AND vc.ZNP4P = 'L5Q44' AND iq.IDWIO = 'TSG'
THEN 0
WHEN iq.FSDY2 IN ('SRARY','UBQWG') AND vc.ZNP4P <> 'L5Q44' AND iq.IDWIO = 'W6W24'
THEN 1
WHEN iq.FSDY2 IN ('SRARY','UBQWG') AND vc.ZNP4P <> 'L5Q44' AND iq.IDWIO = 'OG'
THEN 1
WHEN iq.FSDY2 IN ('SRARY','UBQWG') AND vc.ZNP4P <> 'L5Q44' AND iq.IDWIO = 'TSG'
THEN 0
ELSE NULL
END AS YHYLK
FROM (
SELECT /*+ JOIN_ORDER( cla, bs, mf, nd, nma, sn ) */
cla.FTQLQ AS T4IBQ,
sn.BRQP2,
mf.id AS Z7CP5,
mf.FSDY2,
nma.DZLIM AS IDWIO
FROM
HGMQ6 mf
INNER JOIN THNTS bs
ON mf.GXLUB = bs.id
INNER JOIN YK2GW cla
ON bs.IXUXU = cla.id
INNER JOIN E2I7U nd
ON mf.LUEVY = nd.id
INNER JOIN TNMXI nma
ON nd.HPCMS = nma.id
INNER JOIN NOXN3 sn
ON sn.BRQP2 = nd.id
WHERE cla.FTQLQ IN ('SQ1')
) iq
LEFT JOIN SEQS3 W2MAO
ON iq.Z7CP5 = W2MAO.Z7CP5
LEFT JOIN D34QP vc
ON W2MAO.YH4XB = vc.id
) F26ZW
ON F26ZW.T4IBQ = bs.T4IBQ AND F26ZW.BRQP2 = nd.id
LEFT JOIN TNMXI nma
ON nd.HPCMS = nma.id
WHERE bs.T4IBQ IN ('SQ1') AND ms.D237E = TRUE)
SELECT
XPRW6.T4IBQ AS T4IBQ,
XPRW6.ECUWU AS ECUWU,
SUM(XPRW6.B5OUF) AS B5OUF,
SUM(XPRW6.SP4SI) AS SP4SI
FROM (
SELECT
NRFJ3.T4IBQ AS T4IBQ,
NRFJ3.ECUWU AS ECUWU,
NRFJ3.GSTQA AS GSTQA,
NRFJ3.B5OUF AS B5OUF,
SUM(CASE
WHEN NRFJ3.OZTQF < 0.5 OR NRFJ3.YHYLK = 0 THEN 1
ELSE 0
END) AS SP4SI
FROM (
SELECT DISTINCT
T4IBQ,
ECUWU,
GSTQA,
B5OUF,
TW55N,
OZTQF,
YHYLK
FROM
AX7FV) NRFJ3
GROUP BY T4IBQ, ECUWU, GSTQA, B5OUF
) XPRW6
GROUP BY T4IBQ, ECUWU;
Conclusion
Have a question about MySQL common table expressions? Come by our Discord and talk to our engineering team. We hope to see you there.