MySQL

Drupal in the cloud

It is not always easy to scale Drupal -- not because Drupal sucks, but simply because scaling the LAMP stack (including Drupal) takes no small amount of skill. You need to buy the right hardware, install load balancers, setup MySQL servers in master-slave mode, setup static file servers, setup web servers, get PHP working with an opcode cacher, tie in a distributed memory object caching system like memcached, integrate with a content delivery network, watch security advisories for every component in your system and configure and tune the hell out of everything.

Either you can do all of the above yourself, or you outsource it to a company that knows how to do this for you. Both are non-trivial and I can count the number of truly qualified companies on one hand. Tag1 Consulting is one of the few Drupal companies that excel at this, in case you're wondering.

My experience is that MySQL takes the most skill and effort to scale. While proxy-based solutions like MySQL Proxy look promising, I don't see strong signals about it becoming fundamentally easier for mere mortals to scale MySQL.

It is not unlikely that in the future, scaling a Drupal site is done using a radically different model. Amazon EC2, Google App Engine and even Sun Caroline are examples of the hosting revolution that is ahead of us. What is interesting is how these systems already seem to evolve: Amazon EC2 allows you to launch any number of servers but you are pretty much on your own to take advantage of them. Like, you still have to pick the operating system, install and configure MySQL, Apache, PHP and Drupal. Not to mention the fact that you don't have access to a good persistent storage mechanism. No, Amazon S3 doesn't qualify, and yes, they are working to fix this by adding Elastic IP addresses and Availability Zones. Either way, Amazon doesn't make it easier to scale Drupal. Frankly, all it does is making capacity planning a bit easier ...

Then comes along Amazon SimpleDB, Google App Engine and Sun Caroline. Just like Amazon EC2/S3 they provide instant scalability, only they moved things up the stack a level. They provide a managed application environment on top of a managed hosting environment. Google App Engine provides APIs that allow you to do user management, e-mail communication, persistent storage, etc. You no longer have to worry about server management or all of the scale-out configuration. Sun Caroline seems to be positioned somewhere in the middle -- they provide APIs to provision lower level concepts such as processes, disk, network, etc.

Unfortunately for Drupal, Google App Engine is Python-only, but more importantly, a lot of the concepts and APIs don't map onto Drupal. Also, the more I dabble with tools like Hadoop (MapReduce) and CouchDB, the more excited I get, but the more it feels like everything that we do to scale the LAMP stack is suddenly wrong. I'm trying hard to think beyond the relational database model, but I can't figure out how to map Drupal onto this completely different paradigm.

So while the center of gravity may be shifting, I've decided to keep an eye on Amazon's EC2/S3 and Sun's Caroline as they are "relational database friendly". Tools like Elastra are showing a lot of promise. Elastra claims to be the world's first infinitely scalable solution for running standard relational databases in an on-demand computing cloud. If they deliver what they promise, we can instantly scale Drupal without having to embrace a different computing model and without having to do all of the heavy lifting. Specifically exciting is the fact that Elastra teamed up with EnterpriseDB to make their version of PostgreSQL virtually expand across multiple Amazon EC2 nodes. I've already reached out to Elastra, EnterpriseDB and Sun to keep tabs on what is happening.

Hopefully, companies like Elastra, EnterpriseDB, Amazon and Sun will move fast because I can't wait to see relational databases live in the cloud ...

Drupal and MySQL hIgh-availability

While I invited Kris Buytaert to present at the Drupal developer room at FOSDEM, I couldn't make it to his presentation as it was scheduled for the last presentation of the day. Oops! Fortunately, Luc Van Braekel recorded almost all Drupal presentations at FOSDEM so I'm sharing the video of Kris's presentation to make up for it. It's is well worth a look, and no, we are not family.

Database replication lag

As explained in an earlier blog post, we recently started using MySQL master-slave replication on drupal.org in order to provide the scalability necessary to accommodate our growing demands. With one or more replicas of our database, we can instruct Drupal to distribute or load balance the SQL workload among different database servers.

MySQL's master-slave replication is an asynchronous replication model. Typically, all the mutator queries (like INSERT, UPDATE, DELETE) go to a single master, and the master propagates all updates to the slave servers without synchronization or communication. While the asynchronous nature has its advantages, it is also means that the slaves might be (slightly) out of sync.

Consider the following pseudo-code:

$nid = node_save($data);
$node = node_load($nid);

Because node_save() executes a mutator query (an INSERT or UPDATE statement) is has to be executed on the master, so the master can propagate the changes to the slaves. Because node_load() uses a read-only query, it can go to the master or any of the available slaves. Because of the lack of synchronization between master and slaves, there is one obvious caveat: when we execute node_load() the slaves might not have been updated. In other words, unless we force node_load() to query the master, we risk not being able to present the visitor the data that he just saved. In other cases, we risk introducing data inconsistencies due to the race conditions.

So what is the best way to fix this?

  1. Our current solution on drupal.org is to execute all queries on the master, except for those that we know can't introduce race conditions. In our running example, this means that we'd chose to execute all node_load()s on the master, even in absence of a node_save(). This limits our scalability so this is nothing but a temporary solution until we have a good solution in place.
  2. One way to fix this is to switch to a synchronous replication model. In such a model, all database changes will be synchronized across all servers to ensure that all replicas are in a consistent state. MySQL provides a synchronous replication model through the NDB cluster storage engine. Stability issues aside, MySQL's cluster technology works best when you avoid JOINs and sub-queries. Because Drupal is highly relational, we might have to rewrite large parts of our code base to get the most out of it.
  3. Replication and load balancing can be left to some of the available proxy layers, most notably Continuent's Sequoia and MySQL Proxy. Drupal connects to the proxy as if it was the actual database, and the proxy talks to the underlying databases. The proxy parses all the queries and propagates mutator queries to all the underlying databases to make sure they remain in a consistent state. Reads are only distributed among the servers that are up-to-date. This solution is transparent to Drupal, and should work with older versions of Drupal. The only downside is that it not trivial to setup, and based on my testing, it requires quite a bit of memory.
  4. We could use database partitioning, and assign data to different shards in one way or another. This would reduce the replica lag to zero but as we don't have that much data or database tables with millions of rows, I don't think partitioning will buy drupal.org much.
  5. Another solution is to rewrite large parts of Drupal so it is "replication lag"-aware. In its most naive form, the node_load() function in our running example would get a second parameter that specifies whether the query should be executed on the master or not. The call should then be changed to node_load($nid, TRUE) when proceeded by a node_save().
    I already concluded through research that this is not commonly done; probably because such a solution still doesn't provide any guarantees across page request.
  6. A notable exception is MediaWiki, the software behind Wikipedia which has some documented best practices to deal with replication lag. Specifically, they recommend to query the master (using a very fast query) to see what version of the data they have to retrieve from the slave. If the specified version is not yet available on the slave due to replication lag, they simply wait for it to become available. In our running example, each node should get a version number and node_load() would first retrieve the latest version number from the master and then use that version number to make sure it gets an up-to-date copy from the slave. If the right version isn't yet available, node_load() will try loading the data again until it becomes available.

Scaling with MySQL replication

To deal with Drupal's growth, we're adding a second database server to drupal.org which is useful for at least two reasons. First, we'll be able to handle more SQL queries as we can distribute them between multiple database servers (load balancing). Secondly, this new server can act as a "hot spare" that can immediately take over if the other database server fails (high availability / fail-over).

The current plan is to configure both database servers in master-slave configuration, which is the most common replication model for websites. This model provides scalability, but not necessarily fail-over. With a master-slave configuration, all data modification queries (like INSERT, UPDATE and DELETE queries) are sent to the master. The master writes updates to a binary log file, and serves this log file to the slaves. The slaves read the queries from the binary log, and execute them against their local copy of the data. While all data modification queries go to the master, all the read-only queries (most notably the SELECT queries) can be distributed among the slaves. By following this model, we can spread the workload amongst multiple database servers. And as Drupal.org's traffic continues to grow, we can scale horizontally by adding more slaves.

While MySQL does the database replication work, it doesn't do the actual load balancing work. That is up to the application or the database abstraction layer to implement. To be able to distribute queries among multiple database servers, the application needs to distinguish between data modification queries and read-only queries.

Care needs to be taken, as the data on the slaves might be slightly out of sync. It may or may not be practical to guarantee a low-latency environment. In those cases, the application might want to require that certain read-only queries go to the master.

There are different ways to accomplish this:

  • Drupal executes all SQL queries through db_query(). Traditionally, big Drupal sites manually patched db_query() to use query parsing (regular expression foo) to separate read queries from write queries. This is not convenient and it doesn't provide a good solution to deal with lag. Fortunately, work is being done to provide better support for database replication in Drupal 6. It's our intend to backport this to Drupal 5 so we can use it on drupal.org until Drupal 6 has been released and drupal.org has been upgraded to use Drupal 6.
  • MediaWiki, the software behind Wikipedia uses $db->select() and $db->insert(). They have some documented best practices to deal with lag.
  • Neither the Pear DB database abstraction layer or its successor Pear MDB2 seem to support database replication.
  • Wordpress uses HyperDB, a drop-in replacement for Wordpress' default database abstraction layer that provides support for replication. It was developed for use on Wordpress.com, a mass hosting provider for Wordpress blogs. Because HyperDB is a drop-in replacement, they don't have a clean API and just like Drupal 5, they have to use query parsing to separate read queries from write queries. It's not clear how they deal with lag.
  • Joomla! 1.0 does not separate read queries from write queries, but Joomla! 1.5 will use functions like $db->insertObject() and $db->updateObject(). Joomla! 1.5 won't support replication out of the box, but their API allows a clean drop-in replacement to be developed. It's not clear how they would deal with lag.
  • PostNuke uses the ADOdb database abstraction library, which at first glance does not support database replication either.
  • Java applications use the statement.executeQuery() and statement.executeUpdate() that are part of the standard class libraries. It's not clear how they deal with lag.

What other popular applications support database replication, and what do their APIs look like? I'd like to find out what the ideal API looks like so we can still push that for inclusion in Drupal 6.

Based on the research above, I think we should get the best of all worlds by introducing these three functions (and deprecating db_query()):

  1. db_select_slave(); // results might be slightly out of date
  2. db_select_master(); // results always up to date
  3. db_update(); // for UPDATE, INSERT, DELETE, etc

Even if they don't actually do anything useful in Drupal 6, and just map onto the deprecated db_query(), they set a better standard, and they allow for a drop-in replacement to be developed during Drupal 6's lifetime. Ideally, however, Drupal 6 would ship with a working implementation.

Drupal's database interaction

I used XDebug to profile the behavior of Drupal, and to study the interaction with the database server. I aggregated the profile information of 100 requests to the main page using the "Apache, mod_php, PHP4, APC" configuration used for previous benchmark experiments. More information about my experimental setup is available at that page. XDebug generates a trace file with all the profile information which I visualized using KCacheGrind.

Drupal has a page cache mechanism which stores dynamically generated web pages in the database. By caching a web page, Drupal does not have to create the page each time it is requested. Only pages requested by anonymous visitors (users that have not logged on) are cached. Once users have logged on, caching is disabled for them since the pages are personalized in various ways. Because this represents two different modes of operation, I investigated Drupal's behavior with and without page caching.

With page caching

Critical functions (with page caching)

This figure shows how much time is spent in each function when page caching is enabled. The functions are sorted by the 'Self' column, which shows the time spent in each function without the time spent in its children. The second column shows how often the function was called to serve 100 requests to the main page.

We observe that more than 45% (14.37% + 14.18% + 8.9% + 5.54% + 2.9% + ...) of the execution time is spent in the database related functions. Sending the SQL queries to the database server and waiting for the results takes 14% of the total execution time. Drupal preparing the queries (eg. database prefixing, sanitizing the input to prevent SQL query injection, etc) takes more than 31% of the total execution time. We should look into optimizing the functions that prepare the queries (db_query(), _db_query and _db_query_callback()).

The figure above depicts that PHP's mysql_query() function is called 1401 times. This means that we need 14 SQL queries to serve a cached page. This is where these SQL queries come from:

Queries (with page caching)

This figure shows the Drupal functions responsible for querying the database when serving 100 cached pages. The first column shows how much time is spent in the calls to db_query(). The second column shows how many times each function queried the database and the last column shows the functions' names and source files.

Without page caching

Critical functions (without page caching)

This figure shows how much time is spent in each function when page caching is disabled. The functions are sorted by the 'Self' column, which shows the time spent in each function without the time spent in its children. The second column shows how often the function was called to serve 100 requests to the main page.

When the page cache is disabled, we see that on average we need 144 SQL queries to generate the main page. That is a lot. We also observe that 25% of the total execution time is spent in database related functions: 13% of the total execution time is spent executing queries, and 12% of the total execution time is spent preparing the queries. This is where the SQL queries come from:
Queries (without page caching)

This figure shows the Drupal functions responsible for querying the database when serving 100 pages. The first column shows how much time is spent in the calls to db_query(). The second column shows how many times each function queried the database and the last column shows the functions' names and source files.

© 1999-2007 Dries Buytaert Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.
Drupal is a Registered Trademark of Dries Buytaert.