Migrating our Monorepo to Yarn 2
DoltHub is a web-based UI built in React to share, discover, and collaborate on Dolt databases. We recently migrated our monorepo to Yarn 2 (or Yarn Modern). It took us some extra steps to make Yarn 2 work with our monorepo and other infrastructure. We thought sharing them could be useful for others looking to adopt Yarn 2.
Why migrate to Yarn 2
Yarn summarizes the reasons to upgrade to Yarn Modern on their website. In short, they include:
- New features
- Efficiency
- Extensibility
- Stability
- Future proof
I initially decided to upgrade our repository when I came across one of Yarn 2's new commands researching a solution to a dependency issue we were having. I started looking into it and got excited about some of the new commands, workspace tools, and potential performance gains, so I decided to try it out.
Our architecture
Our monorepo (named ld
, short for Liquidata, our former company name) houses all of our
back-end, front-end, deployment configuration, and other related code. Our front-end code
lives in a directory within ld
called web
, which is split up into packages managed
with Yarn workspaces. This means that
each package is just a regular NPM package with its own package.json
. We can add our
packages as dependencies in one another's package.json
files exactly as with any other
package, but they are resolved locally and share a single node_modules
and yarn.lock
.
This is an abbreviated version of what our web
directory looks like:
- web
- node_modules
- packages
- blog
- package.json
- dolthub
- package.json
- graphql-server
- package.json
- shared-components
- package.json
- tailwind-config
- package.json
- package.json
- yarn.lock
You can read more about our front-end architecture here.
How we migrated our monorepo
In the end, the actual changes required to migrate to Yarn 2 were not that many. However, I did try and fail a few times before landing the change. At first, it was a dependency that was breaking our website build. To resolve it we either needed to downgrade to webpack 4 or upgrade to React 18 (which was in its beta phase at the time). We wanted React 18 to be more stable before upgrading and webpack 4 was not working with Yarn 2.
Once we upgraded React to its release candidate I tried Yarn 2 again. I gave up on zero-installs pretty early because it didn't seem like enough of the tools we were using were compatible (specifically ESLint, dependabot) and it still seemed like there were enough benefits without it to continue with the migration. There were a few extra changes I needed to make outside of the four main documented ones which I discuss below, and I was eventually able to successfully land the migration.
Yarn has a step-by-step guide to migrate your repository. Here are all the steps we needed to migrate our monorepo.
1. Install yarn
web % npm install -g yarn
web % yarn set version berry # I forgot this step initially and it was a pain to switch between branches with different versions
2. Add .yarnrc to web
nodeLinker: node-modules
yarnPath: .yarn/releases/yarn-3.1.1.cjs
3. Commit changes and run yarn install
web % yarn install
4. Add to web/.gitignore
.pnp.*
.yarn/*
!.yarn/patches
!.yarn/plugins
!.yarn/releases
!.yarn/sdks
!.yarn/versions
5. Add TypeScript plugin
We also added Yarn's TypeScript
plugin by
running yarn plugin import typescript
. This automatically adds @types/
packages into
your dependencies when you add a package that doesn't include its own types.
6. package.json
updates
We were already using workspaces with Classic Yarn, but there were a few changes we needed
to make to our package.json
scripts to get them to work with Yarn 2.
In our root web/package.json
, we have scripts that run commands for individual packages,
as well as multiple packages at once. We were originally using --cwd
(which specifies
the working directory) to run a script for a specific package. Yarn 2 no longer supported
this argument, so we used yarn workspace
instead:
{
"scripts: {
- "test:blog": "yarn --cwd 'packages/blog' test",
+ "test:blog": "yarn workspace @dolthub/blog test"
}
}
We use npm-run-all
to run a command in more
than one package. We can still use npm-run-all
with some small changes as so:
{
"scripts: {
- "compile": "npm-run-all compile:*",
- "compile:fakers": "yarn --cwd 'packages/fakers' compile",
- "compile:utils": "yarn --cwd 'packages/utils' compile",
- "compile:resource-utils": "yarn --cwd 'packages/resource-utils' compile",
- "compile:shared-components": "yarn --cwd 'packages/shared-components' compile",
- "compile:tailwind-config": "yarn --cwd 'packages/tailwind-config' compile",
- "compile:blog": "yarn --cwd 'packages/blog' compile",
- "compile:graphql-server": "yarn --cwd 'packages/graphql-server' compile",
- "compile:dolthub": "yarn --cwd 'packages/dolthub' compile",
- "check:graphql-server": "yarn --cwd 'packages/graphql-server' run check-server",
+ "compile": "npm-run-all 'compile:*'",
+ "compile:fakers": "yarn workspace @dolthub/fakers compile",
+ "compile:utils": "yarn workspace @dolthub/utils compile",
+ "compile:resource-utils": "yarn workspace @dolthub/resource-utils compile",
+ "compile:shared-components": "yarn workspace @dolthub/shared-components compile",
+ "compile:tailwind-config": "yarn workspace @dolthub/tailwind-config compile",
+ "compile:blog": "yarn workspace @dolthub/blog compile",
+ "compile:graphql-server": "yarn workspace @dolthub/graphql-server compile",
+ "compile:dolthub": "yarn workspace @dolthub/dolthub compile",
+ "check:graphql-server": "yarn workspace @dolthub/graphql-server run check-server",
}
}
Yarn 2 has a new workspaces foreach
command that accomplishes this same thing (don't forget to install the
workspace-tools
plugin first).
web % yarn workspaces foreach run compile
Notice we also now need single quotes around compile
in the line "compile": "npm-run-all 'compile:*'"
. We had a few scripts we needed to add single quotes too,
including our clean
scripts, which use rimraf
.
Without the single quotes around file paths, whenever that file was not found rimraf
would error instead of skipping and moving on.
{
"scripts: {
- "clean": "npm-run-all clean:*",
- "clean:blog": "yarn --cwd 'packages/blog' clean",
- "clean:misc": "rimraf node_modules packages/*/node_modules packages/*/.eslintcache packages/*/*.tsbuildinfo packages/*/dist packages/*/.rts2_cache* packages/dolthub/.next",
+ "clean": "npm-run-all 'clean:*'",
+ "clean:blog": "yarn workspace @dolthub/blog clean",
+ "clean:misc": "rimraf node_modules 'packages/*/node_modules' 'packages/*/.eslintcache' 'packages/*/*.tsbuildinfo' 'packages/*/dist' packages/dolthub/.next",
}
}
7. Upgrade docker-node
We use Docker to deploy our DoltHub services. After all of the
above, everything was building and running smoothly. I then tried to deploy and could not
do so successfully. After an embarrassing amount of time trying to figure out why, I
finally realized the issue was the docker-node
version we were using in our Dockerfiles. Upgrading from 14.17.4
to 16.14.0
solved the
issue.
At this point everything was working and I was able to land the Yarn migration. But I had forgotten to check one thing.
8. Handling Dependabot incompatibilities
Every month Dependabot bumps our web dependencies. This keeps our packages up-to-date and helps prevent dependency hell if we do need to upgrade something.
When that time arrived, I realized not only does Dependabot not support Plug n Play, but
it doesn't work with Yarn 2 at all! When it upgrades a dependency it either does not come
with yarn.lock
changes or converts the yarn.lock
file to Yarn Classic, which breaks
everything. This dependabot
issue with a request to
support Yarn 2 has been open for over two years and has hundreds of supporters, and still
nothing.
In this issue I found a comment with a workaround GitHub Actions
workflow. When I
tried it out it worked for the case when there was no yarn.lock
file (upgrading
dependencies in an individual workspace package.json
), but not when the yarn.lock
file
was wrong (updating a dependency in the root package.json
).
I changed the workflow a bit so that instead of just running yarn install
and committing
the yarn.lock
file, it soft resets the last commit, undoes any yarn.lock
changes, and
then runs yarn install
and commits. You can view the workflow we use
here.
All in all not too much harm done, but it would be great if Dependabot could support Yarn 2 as it continues to be adopted by more and more people.
Is it worth it?
There are some benefits to migrating to Yarn 2 without Zero-Installs, but after some comparison we didn't really see any performance gains, and it was even worse than Yarn Classic at times.
Here's a little comparison for our repository*:
Yarn Classic | Yarn Modern | |
---|---|---|
node_modules size** |
1.3G | 1.4G |
web directory size |
2.0G | 2.2G |
yarn.lock lines |
19517 | 27960 |
time yarn.install , no cache |
79.44s user 146.53s system 205% cpu 1:49.86 total | 133.45s user 121.33s system 151% cpu 2:47.98 total |
time yarn install , with Y2 cache |
79.44s user 146.53s system 205% cpu 1:49.86 total | 99.87s user 93.42s system 172% cpu 1:52.02 total |
time yarn add [dep] |
17.67s user 22.31s system 183% cpu 21.779 total | 10.81s user 1.57s system 121% cpu 10.213 total |
time yarn remove [dep] |
4.09s user 1.45s system 127% cpu 4.353 total | 8.76s user 1.38s system 121% cpu 8.344 total |
* This comparison would be more accurate if we averaged the results from many runs because there's a lot of variation between runs.
** There was a typo in an earlier version of this blog that mistakenly listed the
node_modules
size as the web
directory size. It has now been fixed.
Yarn also maintains performance benchmarks that compares different versions of Yarn as well as other package managers like NPM for Next.js and Gatsby apps (both of which we use!). You can check that out here.
It would be interesting to see how performance for our repository would compare to Yarn 2 with Zero-Installs. In theory it would be possible to reach zero second installs, even for large repositories like ours. I'm looking forward to the day more tools are compatible.
In the meantime, there are some useful Yarn 2 features that could make migrating worth it, like types support and improved readability and usability of logs and commands.
One underrated feature, especially for those using workspaces, is automatic resolution of
different versions of the same dependency in different packages. With Yarn Classic we
needed to add a resolutions
field to our web/package.json
like this:
"resolutions": {
"**/react": "^18.0.0-rc.0",
"**/react-dom": "^18.0.0-rc.0"
}
It looks for different versions of react
and react-dom
and resolves them in our
yarn.lock
after running yarn install
. This prevents our React build from breaking with
the dreaded Invalid hook call. Hooks can only be called inside of the body of a function component
error.
With Yarn Modern we no longer need the resolutions
field. If I upgrade React in just one
of our workspaces (like shared-components
), it actually updates React in all workspaces
that have React as a dependency.
shared-components % yarn up react
web % git status
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: packages/blog/package.json
modified: packages/dolthub/package.json
modified: packages/shared-components/package.json
modified: yarn.lock
This saves a lot of headache, especially as we continue to develop and share code between different packages.
We waited over two years since it's first stable release to adopt Yarn 2. It seems like many of the kinks have been worked out since then, and it will continue to improve with time.
If you're interested in learning more about Dolt or DoltHub or have found better solutions to any of the above, feel free to reach out to me on Discord in the #dolthub channel.
A performance improvement for Yarn 2 without Zero-Installs
Updated 03/21/2022
After posting this blog in a few places on the Internet, people had some feedback and insights on Zero-Installs performance. It seems like people who have tried PnP with Zero-Installs did have improved performance as expected and recommended it despite some frustration with the migration process.
It also turns out that there is additional yarnrc.yml
configuration that improves Yarn 2
performance without Zero-Installs. Victor Vlasenko (larixer
), one of the Yarn 2+
maintainers who has written most of the code specific to node_modules
support for Yarn 2
and also works at SysGears, joined our
Discord to offer advice on how to improve our
performance for Yarn 2 without Zero Installs:
As recommended, we added these three lines to our .yarnrc.yml
:
compressionLevel: 0
nmMode: hardlinks-local
enableGlobalCache: true
And we did see some improvements. Here's how it compares to the performance metrics we used above:
Yarn Modern with updated yarnrc.yml |
|
---|---|
node_modules size |
1.2G |
web directory size |
1.7G |
yarn.lock lines |
27960 |
time yarn.install , no cache |
116.74s user 106.38s system 159% cpu 2:20.11 total |
time yarn install , with Y2 cache |
52.59s user 77.35s system 188% cpu 1:09.04 total |
time yarn add [dep] |
10.81s user 1.57s system 121% cpu 10.213 total |
time yarn remove [dep] |
8.36s user 1.52s system 119% cpu 8.242 total |
While some performance gains are less significant, compared to Yarn Classic almost every item has now shown some kind of improvement. Most significantly, the time to install was cut almost in half!