Jay Janssen's Bloggidy Blog
Playing around with Galera 0.8pre

Currently I’m loading 100 TPCC warehouses in a 2-node galera cluster that I got up and running in maybe 5 minutes.  Kudos to the Codership guys for making it super easy to download a single tgz and get evaluating quickly.

Galera, for those not in the know, is a “cluster of mysql instances”.  Unlike MySQL Cluster, which is really just a storage engine of clustered data storage nodes.  Galera is hacked into both the main MySQL engine and Innodb (which is the only storage engine it supports).  Galera is fully synchronous, so commits on any MySQL node are applied everywhere in the cluster, at commit (and before the client can proceed).  Writes are actually scalable in Galera, to a point, by adding more data nodes because the replication between nodes is RBR, hence each data node doesn’t have to assume the CPU expense of processing the SQL every time, each node just has to apply the raw row changes.  Obviously this can only scale so far, since each node still has to apply all the transactions, but in theory it should offer something better than 1x.

I ran a quick TPCC test during a slower session at the mysql conference last week and while it worked, Galera does lead to a relatively high number of deadlock rollbacks on client transactions, depending on the workload.  In theory this isn’t a problem as long as the client can handle it correctly.  As Alexey from Codership told me, this particular workload focuses on a relatively small # of rows.  Increasing the number of warehouses (I was using 1, I think), may reduce the # of deadlocks (i.e., a bigger database).  

Benefits (IMHO):

  • It’s just MySQL and Innodb, you don’t need to throw out a lot of knowledge or worry about compatibility issues with mainline MySQL.
  • Some amount of write scaling beyond a single master
  • Synchronous replication baby! - turn off innodb_flush_log_at_trx_commit, it doesn’t matter if it’s locally durable anymore.  Of course, this doesn’t prevent (small) data loss in the event that a power failure suddenly shuts down every node.

Drawbacks (IMHO):

  • Extra deadlocks - applications may not be designed to cope with that (though I’d argue that isn’t Galera’s fault)
  • No infinite scalability (but what has it?) - In theory at least, you shouldn’t be able to keep throwing nodes at a cluster and expect performance to grow linearly (at least for writes).  
  • ??  - Waiting to see

Some criteria I’m evaluating:

  • Just how far will it scale within colo?
  • How well would it work across a WAN (and scale)?
  • How well would it fit into Yahoo!’s dual-master, single-writer scheme?  (i.e., two clusters in two colos using regular replication, only writing to one of the clusters at a time, just like you can do with MySQL Cluster)
  • How nasty is the patch, and how much does it vary from mainline?

Quick update:

I just ran my initial tpcc test and got the following results (100 warehouses, 60 second warmup, 60 second test, 10 clients):

  • 2 nodes, only running test against 1:  4864.000 TpmC
  • 2 nodes, 10 clients testing each simultaneously: 6164 TpmC - though that’s not fair, since there are now 20 clients collectively (there be deadlocks here)
  • 2 nodes, 5 clients each (more fair): 5690 TpmC (still deadlocks here)

So, a reasonable performance gain from 1 to two nodes.  I just groked the tpcc source and it looks like the test client handles deadlock errors correctly by retrying.  I am making an assumption that the TpmC results correctly reflect actual completed transactions only.

Blog comments powered by Disqus