MediaWiki works with Dolt
We're on a mission to show that Dolt, the world's first version controlled SQL database, works with all your favorite tools in all your favorite languages.
Today, we're going to show how to set up a MediaWiki backed by Dolt, import an English Language Wikipedia dump, and share the resulting database on DoltHub.
Download MediaWiki
MediaWiki is distributed as a zip file. You can get it from MediaWiki's official download page. MediaWiki is written in PHP. You need to unpack the zip file and stick the contents in the document root of a web server that can execute the PHP.
Set up Apache HTTPD
For the web server, I chose Apache HTTPD because I'm old school. I'm installing this on a Mac so I used Homebrew to install Apache HTTPD.
$ brew install apache-httpd
==> Downloading https://ghcr.io/v2/homebrew/core/httpd/manifests/2.4.58
######################################################################### 100.0%
==> Fetching httpd
==> Downloading https://ghcr.io/v2/homebrew/core/httpd/blobs/sha256:b9af089ded42
######################################################################### 100.0%
==> Pouring httpd--2.4.58.arm64_sonoma.bottle.tar.gz
==> Caveats
DocumentRoot is /opt/homebrew/var/www.
The default ports have been set in /opt/homebrew/etc/httpd/httpd.conf to 8080 and in
/opt/homebrew/etc/httpd/extra/httpd-ssl.conf to 8443 so that httpd can run without sudo.
To start httpd now and restart at login:
brew services start httpd
Or, if you don't want/need a background service you can just run:
/opt/homebrew/opt/httpd/bin/httpd -D FOREGROUND
==> Summary
🍺 /opt/homebrew/Cellar/httpd/2.4.58: 1,663 files, 32MB
==> Running `brew cleanup httpd`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
Some important paths noted in the install notes are the Document Root at /opt/homebrew/var/www
and the httpd.conf
located at /opt/homebrew/etc/httpd/httpd.conf
.
The first thing we must do is change the httpd.conf to execute PHP. I already had PHP installed on my machine using Homebrew so I load that module in the modules section of my httpd.conf
.
+ LoadModule php_module /opt/homebrew/opt/php/lib/httpd/modules/libphp.so
And then at the end of the httpd.conf, I add a Handler.
+ <FilesMatch \.php$>
+ SetHandler application/x-httpd-php
+ </FilesMatch>
Now run the web server using:
$ brew services start httpd
Finally I unpack MediaWiki, copy the directory to my Document Root, and rename it to w
.
$ unzip mediawiki-1.41.0.zip
$ cp -r mediawiki-1.41.0 /opt/homebrew/var/www/
$ mv /opt/homebrew/var/www/mediawiki-1.41.0 /opt/homebrew/var/www/w
Now when you hit http://localhost:8080/w/index.php
, you should see:
It's working!
Set Up Dolt as Your Database
Now, you need to set up Dolt as the database that backs MediaWiki. Dolt is MySQL-compatible and MediaWiki works with MySQL so it also works with Dolt.
Install Dolt
If you have Dolt installed you can skip ahead to the Run Dolt section.
Dolt is a single ~103 megabyte program.
$ du -h ~/go/bin/dolt
103M /Users/timsehn/go/bin/dolt
It's really easy to install. Download it and put it on your PATH
. We have a bunch of ways to make this even easier for most platforms.
Here is a convenience script that does that for *NIX
platforms. Open a terminal and run it.
sudo bash -c 'curl -L https://github.com/dolthub/dolt/releases/latest/download/install.sh | sudo bash'
Run Dolt
Navigate to the place you want to store your Dolt database. Make a media_wiki
directory and call dolt init
to create your database.
$ cd ~/dolt
$ mkdir media_wiki
$ cd media_wiki
$ dolt init
Now start a Dolt SQL Server. This server is a MySQL-compatible database server on port 3306. Any tool that can connect to a MySQL database, like MediaWiki, can connect to it.
$ dolt sql-server
Leave that shell open. Any errors from Dolt will be printed there.
Run MediaWiki Installation
Now, we need to generate a LocalSettings.php using the set up pages that come packaged with MediaWiki. Go back to http://localhost:8080/w/index.php
and click the "set up the wiki" link.
I worked through the steps outlined noting my database named was media_wiki
and you could access it as root
with no password. I named my wiki test
and made myself as a user. At the end of the process, I was prompted to download a LocalSettings.php
file and put it in my MediaWiki root. After the download completed, I ran:
$ cp ~/Download/LocalSettings.php /opt/homebrew/var/www/w/
And Lo and Behold, it works!
Import a Wikipedia Dump
Now, we're going to install the latest Wikipedia dump into our local Dolt database for fun and science.
First, download the latest dump. The file you want will be named something like enwiki-20240301-pages-articles-multistream.xml.bz2
. It's about 20GB compressed and 90GB uncompressed. Unzip it.
Then, we need to run the importDump.php
script.
$ cd /opt/homebrew/var/www/w/maintenance/
$ php ./run.php importDump.php ~/Downloads/enwiki-20240301-pages-articles-multistream.xml
This import will take a long time. On my beefy MacBook Pro I was getting about 2 pages per second. There are 6.7M pages. I keep making progress each day. I'm sharing the database on DoltHub if you want to check it out.
As the import progresses, you'll start to be able to view random Wikipedia articles locally!
Running Your MediaWiki from the DoltHub Wikipedia Clone
To run MediaWiki with Wikipedia imported, first clone the timsehn/media_wiki database from DoltHub and start a SQL server.
$ cd ~/dolt
$ dolt clone timsehn/media_wiki
$ cd media_wiki
$ dolt sql-server
Then on a fresh MediaWiki install (or just delete your LocalSettings.php), run the install program, set the database to media_wiki
with root
as the user, no password. When you load the main page, you should be able to hit the "Random Page" link and see a random Wikipedia page.
Pretty easy, right? Dolt is a database built for sharing.
Decentralized Data Sharing
You saw a prime example of how convenient Dolt is for sharing Wikipedia, the world's biggest encyclopedia. Instead of taking weeks to import Wikipedia from a dump, you simply download a clone and you are off.
Let's say you also want to allow edits to your clone. You want to use a Pull Request workflow to the main copy stored on DoltHub. Let's show you how to do this.
Set up a Local Branch to Edit
We're going to make a new branch using the Dolt CLI. Navigate to the directory you started your database in, in our case ~/dolt/
, and go into the directory called media_wiki
. Then use the dolt branch
command to create a branch, just like you would in Git.
$ pwd
/Users/timsehn/dolt/media_wiki
$ dolt branch local
$ dolt branch
local
* main
Now, we have a new branch called local
to connect to.
Point your MediaWiki at the Branch
Navigate to the root of your MediaWiki install and edit LocalSettings.php
to point at your new branch.
$ cd /opt/homebrew/var/www/w/
In the database section of LocalSettings.php
you just add the branch name at the end of the database name.
## Database settings
$wgDBtype = "mysql";
$wgDBserver = "127.0.0.1";
- $wgDBname = "media_wiki";
+ $wgDBname = "media_wiki/local";
$wgDBuser = "root";
$wgDBpassword = "";
Now, you are connecting to the new branch called local
.
I can confirm this in the debug logs of the running SQL server.
DEBU[0251] Starting query connectTime="2024-04-03 11:39:25.297596 -0700 PDT m=+9.767598584" connectionDb=media_wiki/local connectionID=1 query="..."
Make a Commit and Push
Now let's make a new page on our branch and then make a Pull Request on DoltHub.
Click Create this page
.
Create your article and you'll end up with something like this:
Stop the SQL server and check out the local
branch.
$ dolt checkout local
You can see what you've created by examining the diff. Take a moment to marvel how cool Dolt is. Unlike other databases, you can see what you changed!
$ dolt status
On branch local
Changes not staged for commit:
(use "dolt add <table>" to update what will be committed)
(use "dolt checkout <table>" to discard changes in working directory)
modified: watchlist
modified: slots
modified: module_deps
modified: recentchanges
modified: log_search
modified: content
modified: user
modified: job
modified: revision
modified: objectcache
modified: page
modified: comment
modified: searchindex
modified: logging
modified: text
$ dolt diff text
diff --dolt a/text b/text
--- a/text
+++ b/text
+---+---------+----------------------------------------------------+-----------+
| | old_id | old_text | old_flags |
+---+---------+----------------------------------------------------+-----------+
| + | 1209461 | The world's first version controlled SQL database. | utf-8 |
+---+---------+----------------------------------------------------+-----------+
$ dolt diff page
diff --dolt a/page b/page
--- a/page
+++ b/page
+---+---------+----------------+---------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
| | page_id | page_namespace | page_title | page_is_redirect | page_is_new | page_random | page_touched | page_links_updated | page_latest | page_len | page_content_model | page_lang |
+---+---------+----------------+---------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
| + | 1209460 | 0 | Dolt_Database | 0 | 1 | 0.651856670903 | 20240403191437 | 20240403191437 | 1211506 | 50 | wikitext | NULL |
+---+---------+----------------+---------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
Now we'll make a Dolt commit on our branch so we can send the changes to DoltHub.
$ dolt commit -am "Added Dolt database page"
commit n8v7boeva91c198ip4h1uichrl0thssp (HEAD -> local)
Author: timsehn <tim@dolthub.com>
Date: Wed Apr 03 11:59:44 -0700 2024
Added Dolt database page
Finally, we push our changes to DoltHub.
$ dolt push origin local
/ Uploading...
To https://doltremoteapi.dolthub.com/timsehn/media_wiki
* [new branch] local -> local
Open a PR on DoltHub
Now, we want our changes reviewed and merged into the main copy of Wikipedia. Our local copy continues to have our new article and users of it can continue to enjoy our version. This is the beauty of decentralized collaboration. There can be multiple competing Wikipedias!
So, we open a Pull Request on DoltHub.
After submitting the form, I am greeted by the Pull Request page. I can send reviewers to this page to review and comment on my changes.
The reviewers can even review a diff.
We're biased but we think this decentralized collaboration workflow has a lot of promise for data like Wikipedia. Can we get a decentralized encyclopedia with many competing versions? Dolt is here to help make that a reality.
Conclusion
As you can see, Dolt works with MediaWiki and the Wikipedia import. Dolt can also add decentralized collaboration to the MediaWiki environment. Curious to learn more? Stop by our Discord and let's chat.