Quantcast

Archive for October 1st, 2009

Keeping an eye on binary log growth

Thursday, October 1st, 2009

Recently I got a report that some pages on the site were extremely slow. Looking at the web server metrics didn't show anything new however mySQL DB metrics showed a definite change

MySQL server CPU utilization

MySQL server CPU utilization

ie. at the end of Week 38 there is an increase in CPU utilization. Nearly 60% increase. Interestingly enough there was a new software release at the end of Week 38 which pointed to either a bug or a new feature. Luckily I have been collecting mySQL metrics using this gmetric script. This led me to these two graphs

mysqlupdate

mysqlinsert

So nearly double number of inserts and nearly triple the updates. Using mysqlbinlog I analyzed the update and insert statements and was able to identify the two culprit INSERT and UPDATE statements then sent it off to developers.

I also observed that had I watched the binary log growth I may have identified this earlier since there were a lot more binary logs for the period since the release. Thus mysql average binary log growth rate gmetric was born :-) . Now all I need to do is find out what normal growth rate is and if it goes outside of that norm use Nagios to send me a non-urgent alert.

How can you make clouds better

Thursday, October 1st, 2009

I have perhaps been overly critical of clouds. I do not think clouds are useless. They certainly have their place and usefulness. I just dislike the hype around the clouds since I find it completely misplaced. I see clouds primarily as a way of easily "creating" and "disposing" of hardware ie. you need extra couple machines you can at a press of a button create them. When you are done you can dispose of them. However they also have some major drawbacks which I have alluded in the past ie. in Cloud Computing's Achilles Heel and Trouble with Cloud Computing.

That said there are a number of ways clouds can be improved. Some may be impractical, some may be expensive and some may be overly complicated but they are certainly options. Shall we go through the list :-)

  1. Don't use virtualization - you could certainly implement a cloud where an instance you get runs on raw hardware. That way you are guaranteed to have a dedicated piece of hardware that is not affected by other users' (mis)use. This would be a lot more expensive but some people may be willing to pay for it
  2. Intelligent web traffic load balancing - one of the big issues in general is that load balancing methods such as round-robin, server with least connections are imperfect since a server may get slow for a number of reasons and a portion of your clients may receive "substandard" response. Chances of that happening in shared environments is greater. Thus devising a load balancer which can "intelligently" figure out which server is slow and deal with it accordingly by either taking it out of the pool or sending less traffic to it.
  3. Similar to 2. for your relational DB traffic you would have to use a DB cluster and devise a way of preferring "faster" DB servers. This part in some ways "scares" me the most since if you start using synchronous replication you have to wait until all members of the cluster have commited the change. If you start doing asynchronous replication you will run the risk of DB inconsistencies which you will have to resolve.

There are likely other options but you can see from above that this gets real complicated, real quick. It can be done it is just a lot of work and a lot of QA. Hopefully someone comes up with a generic solution for problems 2. and 3. and those become non-issues.