Productionizing Dolt
So you've been using Dolt for a while, and you are ready to take the next step and use it in production. Whether you are running your production environment on prem, or in the cloud, there are several steps you should take to ensure that your Dolt database is production ready. Any production system should be monitored, backed up, and have a plan for disaster recovery. Dolt is no different. In this post, I'll cover some of the things you should consider when taking Dolt to production.
Configuring Your Server
Dolt sql-server can be configured using command line arguments or with a yaml config file. Command line configuration only provides access to a subset of all the options so we will launch Dolt like so:
dolt sql-server --config config.yaml
Initially we will configure the network host and port that our server will listen on. We will add more to our config file as we go, but our initial file looks like:
listener:
host: 0.0.0.0
port: 3306
This will start our server listening on port 3306
on all network interfaces. You can change this to be whatever is
appropriate for your environment.
Users and Grants
The first thing you will want to do is lock down who has access to your databases. Dolt uses standard MySQL users and grants statements to control user access, and user privileges. Additionally, we will want to change the root account username and password. The root account is the default superuser account that has full access to the database. We will setup the location where users and grants get stored and set our root user at the same time by adding the following entries to our server's config.yaml file.
privilege_file: "privs.json"
user:
name: "my-root-user-name"
password: "my-strong-password"
Setting the privilege_file
will store our user and grants in a file called privs.json
in the working directory of the server.
You may change the files storage location, but having a known location will help us later when we need to back up our database server.
Monitoring and Alerting
Prerequisites
Monitoring is a critical part of any production system. It allows you to track the health of your system and catch issues before they become problems. Before you can monitor your Dolt database you need to choose a monitoring solution for both your metrics and logs, and implement a way to access and view your data. Some solutions, like AWS CloudWatch, provide both metrics and logs in a single solution, and provide a means of building dashboards to view your data all using Amazon's cloud console to provide secure access. Other solutions like the ELK Stack (Elasticsearch, Logstash, Kibana) give you more control of your data, but require more setup and maintenance. Additionally you will need to handle auth and access control to ensure that only authorized users can access your monitoring data. Other popular monitoring solutions include:
Metrics
Now that you have a place to store your metrics and logs, you will want to track both system and application metrics. I will leave system metric tracking to you, as cloud instances, instances in your datacenter, and kubernetes deployments may have different metrics that you will want to track, as does the operating system you are running on.
Dolt provides a single http endpoint that must be configured to expose metrics. To do this we
will add a metrics
section to the config. This section will contain both a host
and a port
entry and should look something like:
metrics:
host: "0.0.0.0"
port: 8000
Again, you should change these values to be whatever is appropriate for your environment. Our server here will listen on
port 8000 for requests to /metrics
. This endpoint will return a prometheus compatible
response that can be scraped by your monitoring solution (Prometheus metrics format is very popular and is compatible
with most systems). You can inspect the metrics by simply sending a command line request or using a browser to access the endpoint.
~>curl http://127.0.0.1:8000/metrics
# HELP dss_concurrent_connections Number of clients concurrently connected to this instance of dolt sql server
# TYPE dss_concurrent_connections gauge
dss_concurrent_connections 0
# HELP dss_concurrent_queries Number of queries concurrently being run on this instance of dolt sql server
# TYPE dss_concurrent_queries gauge
dss_concurrent_queries 0
# HELP dss_connects Count of server connects
# TYPE dss_connects counter
dss_connects 0
# HELP dss_disconnects Count of server disconnects
# TYPE dss_disconnects counter
dss_disconnects 0
# HELP dss_dolt_version The version of dolt currently running on the machine
# TYPE dss_dolt_version gauge
dss_dolt_version 1.9529745e+07
# HELP dss_query_duration Histogram of dolt sql server query runtimes
# TYPE dss_query_duration histogram
dss_query_duration_bucket{le="0.01"} 0
dss_query_duration_bucket{le="0.1"} 0
dss_query_duration_bucket{le="1"} 0
dss_query_duration_bucket{le="10"} 0
dss_query_duration_bucket{le="100"} 0
dss_query_duration_bucket{le="1000"} 0
dss_query_duration_bucket{le="+Inf"} 0
dss_query_duration_sum 0
dss_query_duration_count 0
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.8962e-05
go_gc_duration_seconds{quantile="0.25"} 0.000175573
go_gc_duration_seconds{quantile="0.5"} 0.000192408
go_gc_duration_seconds{quantile="0.75"} 0.000298023
go_gc_duration_seconds{quantile="1"} 0.000298023
go_gc_duration_seconds_sum 0.000684966
go_gc_duration_seconds_count 4
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 27
# ... and many more metrics
Something to note is that there is no auth on the metrics endpoint. This means that anyone who can access the endpoint can see the metrics so you will want to make sure that the endpoint is not exposed to the public internet.
Inspecting the metrics that get returned will show metrics beginning with the prefix dss_
which are our Dolt
SQL Server application metrics. The metrics beginning with go_
are Go runtime metrics.
Alerting
Having metrics is great, but you aren't going to sit and watch them all day. You need a way to be alerted when something goes wrong. For this you will want to integrate with an alerting system. Many monitoring solutions provide alerting as part of their offering, but you can also use a separate system like:
Alerting systems allow you to set up rules that will trigger alerts when certain conditions are met. For example, you could set up an alert that triggers when the query latency exceeds a certain threshold, or when CPU usage is too high. These alerting systems generally provide ways to setup on call rotations, and to escalate alerts if they are not acknowledged in a certain amount of time.
Logging
So now you have metrics and alerts, but what do you do when you get an alert? You need logs. Logs are a critical part of
troubleshooting and debugging issues in your system. By default, Dolt logs to stdout
and stderr
,
but you can configure it to log to a file as well using the command line flag --out-and-err
. This will log both stdout
and stderr
to a file. Additionally you can configure the log level in your dolt sql-server config file. The log_level
entry can be set to one of debug
, info
, warning
, error
, or fatal
. The default is info
.
log_level: "info"
If we are writing our logs to a file we will need to setup log rotation to prevent the log file from growing too large. Most logging solutions provide log rotation as part of their offering, but you can also use a separate system like:
Once we are happy with the way our logs are being written and rotated we can send them to our log aggregation system. This system will store and index our logs so that we can search and filter them. The specifics of how to do this will depend on the system you are using, but most systems provide a way to send logs over http or tcp. Like our metrics, we will need to make sure we have set up auth and access control to ensure that only authorized users can read the logs as it is possible that sensitive information could be logged.
Disaster Recovery
No matter how good your database is, things can go wrong. Disks can fail, data can be corrupted, and servers can go down. You need to be prepared for these events, and have a plan in place to recover from them.
Backups
The first step in being able to recover from a disaster, is to have a backup of your data. With a backup we can restore our data to a point in time before the disaster occurred. There are a couple of strategies that we can use to backup our data:
call dolt_push
The simplest way to back up your data is to set up a Dolt Remote for
each database and use dolt_push
to push your data to the remote. This
works well for backing up one branch of a database at a time, but if you want to inspect and push all your branches of all
databases, you'll need to write a script to do so. dolt_push
will push incremental updates of your database to the remote
which is more efficient than storing the entire database each time. Additionally, it doesn't backup changes to the working
set of a branch, only committed data.
If disaster does strike, and you want to stand up a new Dolt sql-server instance with the data from the remotes, you can simply clone the remote for each database.
call dolt_backup
The alternative is to use the dolt_backup
.
stored procedure. Unlike dolt_push
, dolt_backup
will back up the entire database, including all branches and their working
sets, tags, and remote tracking refs. You will still need to back up each database individually.
Executing the dolt_backup
can be executed in two ways. The first is to configure a named url for the backup, and then
executing CALL dolt_backup('sync', 'named-url');
to back up the database to the named url. You can also skip the url
naming step and execute CALL dolt_backup('sync-url', <DESTINATION>);
to back up the database to the url directly. The
backup url format supports aws, gs, and file url schemas. Examples of each are:
aws://[DYNAMO_TABLE:/S3_BUCKET_NAME]/path/to/backup/to
gs://BUCKET_NAME/path/to/backup/to
file://path/to/backup/to
Because dolt_backup
is done via a stored procedure run in a SQL session, we want to set up credentials
to access the backup storage location on a per user basis. This is currently only supported for AWS backups, and is done
by providing an AWS credential file and an AWS region in the server's config.yaml file.
user_session_vars:
name: "SQL_USER_NAME",
vars:
aws_credentials_file: "/path/to/aws/credentials/file",
aws_credentials_region: "AWS_REGION_WHERE_BACKUP_WILL_BE_STORED"
This will allow only the user named SQL_USER_NAME
to use the credentials stored in the file /path/to/aws/credentials/file
to access aws resources like the S3 bucket where the backup will be stored, and the DynamoDB table where the backup metadata
will be stored. An example of the credentials file is:
[default]
aws_access_key_id=ASIAIOSFODNN7EXAMPLE
aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
aws_session_token = IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZVERYLONGSTRINGEXAMPLE
Additional Things to Backup
In addition to backing up your data, you will want to backup your server configuration, and user and grants. This is as
simple as copying our config.yaml
and privs.json
files to our backup storage location. Your script that takes periodic
backups of all of your databases should also copy these files to your backup storage location.
Replication
Backups are great, but what if you could recover from disaster with little to no downtime. With one or more read replicas you can do just that. Read replicas are copies of your database that are kept in sync with the primary database. If the primary database goes down, you can promote a read replica to be the primary database and continue serving traffic.
Replication is a complex topic that is covered in depth in our replication documentation.
Conclusion
Taking any system to production is a big step, and Dolt is no different. There are numerous considerations and steps that you need to take to ensure that your system is production ready. If you have a lot of experience with taking systems to production, and you already have a monitoring, alerting, and backup storage location in place, then this isn't to be anything you haven't done before. However, if you don't have experience with these things, or if you just don't want to deal with them yourself, Hosted Dolt is a great option. We take care of all of these things for you, so you can focus on building your application. Additionally, if an alarm does sound, or if you are having production issues we have a team of experienced Dolt engineers that will respond to alerts and user created incident reports. Check out https://hosted.doltdb.com/ for more information, or come and talk to us on Discord.