MediaWiki works with Dolt

REFERENCE
7 min read

We're on a mission to show that Dolt, the world's first version controlled SQL database, works with all your favorite tools in all your favorite languages.

Today, we're going to show how to set up a MediaWiki backed by Dolt, import an English Language Wikipedia dump, and share the resulting database on DoltHub.

Dolt + MediaWiki + Wikipedia

Download MediaWiki

MediaWiki is distributed as a zip file. You can get it from MediaWiki's official download page. MediaWiki is written in PHP. You need to unpack the zip file and stick the contents in the document root of a web server that can execute the PHP.

Set up Apache HTTPD

For the web server, I chose Apache HTTPD because I'm old school. I'm installing this on a Mac so I used Homebrew to install Apache HTTPD.

$ brew install apache-httpd
==> Downloading https://ghcr.io/v2/homebrew/core/httpd/manifests/2.4.58
######################################################################### 100.0%
==> Fetching httpd
==> Downloading https://ghcr.io/v2/homebrew/core/httpd/blobs/sha256:b9af089ded42
######################################################################### 100.0%
==> Pouring httpd--2.4.58.arm64_sonoma.bottle.tar.gz
==> Caveats
DocumentRoot is /opt/homebrew/var/www.

The default ports have been set in /opt/homebrew/etc/httpd/httpd.conf to 8080 and in
/opt/homebrew/etc/httpd/extra/httpd-ssl.conf to 8443 so that httpd can run without sudo.

To start httpd now and restart at login:
  brew services start httpd
Or, if you don't want/need a background service you can just run:
  /opt/homebrew/opt/httpd/bin/httpd -D FOREGROUND
==> Summary
🍺  /opt/homebrew/Cellar/httpd/2.4.58: 1,663 files, 32MB
==> Running `brew cleanup httpd`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).

Some important paths noted in the install notes are the Document Root at /opt/homebrew/var/www and the httpd.conf located at /opt/homebrew/etc/httpd/httpd.conf.

The first thing we must do is change the httpd.conf to execute PHP. I already had PHP installed on my machine using Homebrew so I load that module in the modules section of my httpd.conf.

+ LoadModule php_module /opt/homebrew/opt/php/lib/httpd/modules/libphp.so

And then at the end of the httpd.conf, I add a Handler.

+ <FilesMatch \.php$>
+     SetHandler application/x-httpd-php
+ </FilesMatch>

Now run the web server using:

$ brew services start httpd

Finally I unpack MediaWiki, copy the directory to my Document Root, and rename it to w.

$ unzip mediawiki-1.41.0.zip
$ cp -r mediawiki-1.41.0 /opt/homebrew/var/www/
$ mv /opt/homebrew/var/www/mediawiki-1.41.0 /opt/homebrew/var/www/w

Now when you hit http://localhost:8080/w/index.php, you should see:

No Local Settings page

It's working!

Set Up Dolt as Your Database

Now, you need to set up Dolt as the database that backs MediaWiki. Dolt is MySQL-compatible and MediaWiki works with MySQL so it also works with Dolt.

Install Dolt

If you have Dolt installed you can skip ahead to the Run Dolt section.

Dolt is a single ~103 megabyte program.

$ du -h ~/go/bin/dolt
103M	/Users/timsehn/go/bin/dolt

It's really easy to install. Download it and put it on your PATH. We have a bunch of ways to make this even easier for most platforms.

Here is a convenience script that does that for *NIX platforms. Open a terminal and run it.

sudo bash -c 'curl -L https://github.com/dolthub/dolt/releases/latest/download/install.sh | sudo bash'

Run Dolt

Navigate to the place you want to store your Dolt database. Make a media_wiki directory and call dolt init to create your database.

$ cd ~/dolt
$ mkdir media_wiki
$ cd media_wiki
$ dolt init

Now start a Dolt SQL Server. This server is a MySQL-compatible database server on port 3306. Any tool that can connect to a MySQL database, like MediaWiki, can connect to it.

$ dolt sql-server

Leave that shell open. Any errors from Dolt will be printed there.

Run MediaWiki Installation

Now, we need to generate a LocalSettings.php using the set up pages that come packaged with MediaWiki. Go back to http://localhost:8080/w/index.php and click the "set up the wiki" link.

Install Page

I worked through the steps outlined noting my database named was media_wiki and you could access it as root with no password. I named my wiki test and made myself as a user. At the end of the process, I was prompted to download a LocalSettings.php file and put it in my MediaWiki root. After the download completed, I ran:

$ cp ~/Download/LocalSettings.php /opt/homebrew/var/www/w/

And Lo and Behold, it works!

It Works!

Import a Wikipedia Dump

Now, we're going to install the latest Wikipedia dump into our local Dolt database for fun and science.

First, download the latest dump. The file you want will be named something like enwiki-20240301-pages-articles-multistream.xml.bz2. It's about 20GB compressed and 90GB uncompressed. Unzip it.

Then, we need to run the importDump.php script.

$ cd /opt/homebrew/var/www/w/maintenance/
$ php ./run.php importDump.php ~/Downloads/enwiki-20240301-pages-articles-multistream.xml

This import will take a long time. On my beefy MacBook Pro I was getting about 2 pages per second. There are 6.7M pages. I keep making progress each day. I'm sharing the database on DoltHub if you want to check it out.

As the import progresses, you'll start to be able to view random Wikipedia articles locally!

Random Wikipedia Article

Running Your MediaWiki from the DoltHub Wikipedia Clone

To run MediaWiki with Wikipedia imported, first clone the timsehn/media_wiki database from DoltHub and start a SQL server.

$ cd ~/dolt
$ dolt clone timsehn/media_wiki
$ cd media_wiki
$ dolt sql-server

Then on a fresh MediaWiki install (or just delete your LocalSettings.php), run the install program, set the database to media_wiki with root as the user, no password. When you load the main page, you should be able to hit the "Random Page" link and see a random Wikipedia page.

Pretty easy, right? Dolt is a database built for sharing.

Decentralized Data Sharing

You saw a prime example of how convenient Dolt is for sharing Wikipedia, the world's biggest encyclopedia. Instead of taking weeks to import Wikipedia from a dump, you simply download a clone and you are off.

Let's say you also want to allow edits to your clone. You want to use a Pull Request workflow to the main copy stored on DoltHub. Let's show you how to do this.

Set up a Local Branch to Edit

We're going to make a new branch using the Dolt CLI. Navigate to the directory you started your database in, in our case ~/dolt/, and go into the directory called media_wiki. Then use the dolt branch command to create a branch, just like you would in Git.

$ pwd
/Users/timsehn/dolt/media_wiki
$ dolt branch local
$ dolt branch
  local
* main

Now, we have a new branch called local to connect to.

Point your MediaWiki at the Branch

Navigate to the root of your MediaWiki install and edit LocalSettings.php to point at your new branch.

$ cd /opt/homebrew/var/www/w/

In the database section of LocalSettings.php you just add the branch name at the end of the database name.

## Database settings
$wgDBtype = "mysql";
$wgDBserver = "127.0.0.1";
- $wgDBname = "media_wiki";
+ $wgDBname = "media_wiki/local";
$wgDBuser = "root";
$wgDBpassword = "";

Now, you are connecting to the new branch called local.

I can confirm this in the debug logs of the running SQL server.

DEBU[0251] Starting query connectTime="2024-04-03 11:39:25.297596 -0700 PDT m=+9.767598584" connectionDb=media_wiki/local connectionID=1 query="..."

Make a Commit and Push

Now let's make a new page on our branch and then make a Pull Request on DoltHub.

Empty page

Click Create this page.

Create page

Create your article and you'll end up with something like this:

New Page

Stop the SQL server and check out the local branch.

$ dolt checkout local

You can see what you've created by examining the diff. Take a moment to marvel how cool Dolt is. Unlike other databases, you can see what you changed!

$ dolt status
On branch local

Changes not staged for commit:
  (use "dolt add <table>" to update what will be committed)
  (use "dolt checkout <table>" to discard changes in working directory)
	modified:         watchlist
	modified:         slots
	modified:         module_deps
	modified:         recentchanges
	modified:         log_search
	modified:         content
	modified:         user
	modified:         job
	modified:         revision
	modified:         objectcache
	modified:         page
	modified:         comment
	modified:         searchindex
	modified:         logging
	modified:         text
$ dolt diff text
diff --dolt a/text b/text
--- a/text
+++ b/text
+---+---------+----------------------------------------------------+-----------+
|   | old_id  | old_text                                           | old_flags |
+---+---------+----------------------------------------------------+-----------+
| + | 1209461 | The world's first version controlled SQL database. | utf-8     |
+---+---------+----------------------------------------------------+-----------+
$ dolt diff page
diff --dolt a/page b/page
--- a/page
+++ b/page
+---+---------+----------------+---------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
|   | page_id | page_namespace | page_title    | page_is_redirect | page_is_new | page_random    | page_touched   | page_links_updated | page_latest | page_len | page_content_model | page_lang |
+---+---------+----------------+---------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
| + | 1209460 | 0              | Dolt_Database | 0                | 1           | 0.651856670903 | 20240403191437 | 20240403191437     | 1211506     | 50       | wikitext           | NULL      |
+---+---------+----------------+---------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+

Now we'll make a Dolt commit on our branch so we can send the changes to DoltHub.

$ dolt commit -am "Added Dolt database page"
commit n8v7boeva91c198ip4h1uichrl0thssp (HEAD -> local)
Author: timsehn <tim@dolthub.com>
Date:  Wed Apr 03 11:59:44 -0700 2024

        Added Dolt database page

Finally, we push our changes to DoltHub.

$ dolt push origin local
/ Uploading...
To https://doltremoteapi.dolthub.com/timsehn/media_wiki
 * [new branch]          local -> local

Open a PR on DoltHub

Now, we want our changes reviewed and merged into the main copy of Wikipedia. Our local copy continues to have our new article and users of it can continue to enjoy our version. This is the beauty of decentralized collaboration. There can be multiple competing Wikipedias!

So, we open a Pull Request on DoltHub.

New Pull Request

After submitting the form, I am greeted by the Pull Request page. I can send reviewers to this page to review and comment on my changes.

Pull Request

The reviewers can even review a diff.

Pull Request Diff

We're biased but we think this decentralized collaboration workflow has a lot of promise for data like Wikipedia. Can we get a decentralized encyclopedia with many competing versions? Dolt is here to help make that a reality.

Conclusion

As you can see, Dolt works with MediaWiki and the Wikipedia import. Dolt can also add decentralized collaboration to the MediaWiki environment. Curious to learn more? Stop by our Discord and let's chat.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.