A Spooky Performance Regression in AWS EBS Volumes
For every Dolt release, we run a suite of Sysbench tests that measure the median latency of Dolt's reads and writes.
Since Dolt is a drop-in replacement for MySQL, and soon to be a drop-in replacement for PostgreSQL, we compare Dolt's results to MySQL's to understand how much slower Dolt is than its counterpart.
We publish these benchmarks on our documentation site and also post them in Dolt's release notes. Internally, to catch any significant changes in latency that might have been committed during the day, we run these tests against nightly builds of Dolt as well.
Curiously, on October 31st 2023, All Hallows' Eve, in our nightly run of Sysbench against Dolt and MySQL we noticed the latency of writes against both MySQL and Dolt had increased, meaning writes got slower.
Even more curious, still, was that MySQL's latency for certain write tests had gotten much worse than Dolt's, and surprisingly, Dolt was reporting as faster than MySQL on writes, where it used to be slower.
Here's what we saw. The nightly write benchmarks on Oct 31th looked like this:
write_tests | server_name | server_version | latency_median | server_name | server_version | latency_median | multiplier |
---|---|---|---|---|---|---|---|
oltp_delete_insert | mysql | 8.0.35 | 4.49 | dolt | c1790e8c | 5.57 | 1.2 |
oltp_insert | mysql | 8.0.35 | 2.26 | dolt | c1790e8c | 2.66 | 1.2 |
oltp_read_write | mysql | 8.0.35 | 6.79 | dolt | c1790e8c | 13.95 | 2.1 |
oltp_update_index | mysql | 8.0.35 | 2.3 | dolt | c1790e8c | 2.76 | 1.2 |
oltp_update_non_index | mysql | 8.0.35 | 2.26 | dolt | c1790e8c | 2.66 | 1.2 |
oltp_write_only | mysql | 8.0.35 | 3.43 | dolt | c1790e8c | 6.91 | 2.0 |
types_delete_insert | mysql | 8.0.35 | 4.49 | dolt | c1790e8c | 5.88 | 1.3 |
The latency_median
columns show the median latencies in milliseconds for each server, and the multiplier
columns show how many multiples slower Dolt is than MySQL.
Then, on Halloween, our Sysbench report showed the following:
write_tests | server_name | server_version | latency_median | server_name | server_version | latency_median | multiplier |
---|---|---|---|---|---|---|---|
oltp_delete_insert | mysql | 8.0.35 | 7.98 | dolt | c7b85e8c | 6.79 | 0.9 |
oltp_insert | mysql | 8.0.35 | 3.75 | dolt | c7b85e8c | 3.36 | 0.9 |
oltp_read_write | mysql | 8.0.35 | 8.43 | dolt | c7b85e8c | 14.46 | 2.1 |
oltp_update_index | mysql | 8.0.35 | 3.82 | dolt | c7b85e8c | 3.36 | 0.9 |
oltp_update_non_index | mysql | 8.0.35 | 3.82 | dolt | c7b85e8c | 3.3 | 0.9 |
oltp_write_only | mysql | 8.0.35 | 5.37 | dolt | c7b85e8c | 7.56 | 1.4 |
types_delete_insert | mysql | 8.0.35 | 7.7 | dolt | c7b85e8c | 7.3 | 1.0 |
MySQL and Dolt had gotten slower on all benchmarks, but MySQL had gotten much slower than Dolt had, compared to the previous night's run.
Additionally, Dolt was also reporting faster than MySQL on four benchmarks, with a multiplier of only 0.9
, where previously it had been slower with a multiplier of 1.2
.
Once we saw this anomaly, we started an investigation to find out why our benchmark results changed so significantly overnight.
We started by looking at changes in Dolt to see if we'd altered anything in the Dolt code that might explain the regression. But, because both Dolt and MySQL were showing a performance regression, we knew pretty quickly that something in our benchmarking infrastructure had changed, resulting in the slower results for both databases.
Coincidentally, on October 31st during the day, we had made a change that affected our benchmarking infrastructure.
We upgraded the AWS EKS cluster where our benchmarks run from version 1.27
to 1.28
. This also required us to upgrade the EC2 AMIs used for the benchmarking hosts, and upgrade some EKS Add-ons as well. Benchmarking hosts are provisioned when a benchmarking Job is scheduled, and are deprovisioned when it's complete.
This upgrade process is nothing new for us, and until now, we've never seen it affect our benchmarking results. This made this performance regression all the more odd, but we went ahead and started downgrading the AMIs and Add-ons to see if the regression went away.
It did not!
At this point we were confident that upgrading the EKS cluster did not actually change write performance on the benchmarking hosts. We were also sure that upgrading the host's AMIs were also not responsible for the slow down, since we downgraded to an AMI released in May of 2023 and still saw the regression.
Finally, we decided to strip our benchmarking down to the bare essentials and run the Sysbench tests on an EC2 host outside of our EKS Cluster to try and isolate and confirm the source of regression.
We set up the host and ran the Sysbench tests by hand from the terminal. When we did this, we still observed the regression in write tests, and this led us to our hypothesis that something must have changed with EC2's EBS volumes.
We've always benchmarked MySQL and Dolt on hosts using EBS volumes since modern applications often use similar cloud architecture, and bare metal hosts with disks attached are becoming less and less common in application stacks.
To test our hypothesis that the regression might be a result of the disk being used, we created an in-memory filesystem on the EC2 and benchmarked MySQL using this instead.
As expected, we saw no regression in these in-memory results, and most of the write test median latencies were sub-millisecond.
My coworker Aaron then suggested we run the benchmarks again against the EBS volume, only this time, with fsync
disabled in MySQL.
Disabling fsync
in MySQL changes the flush behavior of writes, allowing the operating system to do its best to buffer, order, and delay writes, which can improve performance, but also can corrupt the database if the system crashes and transactional writes are not completely written to disk.
And boom. With fsync
disabled, Sysbench was once again reporting the numbers we'd seen for MySQL before October 31st!
As far as we can tell, AWS changed something with their EBS gp2
volumes that causes MySQL's (and Dolt's, to a lesser degree) write performance with fsync
enabled, to be slower than it used to be.
We aren't sure what exactly they've changed, but it has been consistent in all of our benchmarking tests... ever... since... that... night...
Conclusion
In the end, we decided to keep running and publishing benchmarks with fsync
enabled. As far as we can tell, this is real-world performance and Dolt now compares favorably on writes to MySQL with the current EBS network attached storage.
However, this saga shows the danger of publishing benchmark comparisons of databases. Small changes in the set up can create large differences when comparing results.
It's safe to say that Dolt is approximately 2X slower than MySQL. Making a comparison with more precision requires further specifying the problem. Are we using network attached storage or local SSDs? Is fsync
on or off? As you can see, these choices really matter.
Have you noticed any difference in EBS performance recently? Let us know by hopping into our Discord.
Don't forget to check out each of our different product offerings below, to find which ones are right for you:
- Dolt—it's Git for data.
- DoltHub—it's GitHub for Dolt.
- DoltLab—it's GitLab for data.
- Hosted Dolt—it's RDS for Dolt databases.