<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Vladimir Vuksan&#039;s blog &#187; Monitoring</title>
	<atom:link href="http://vuksan.com/blog/category/monitoring/feed/" rel="self" type="application/rss+xml" />
	<link>http://vuksan.com/blog</link>
	<description>Documenting the systems and network infrastructure madness</description>
	<lastBuildDate>Wed, 01 Sep 2010 12:26:52 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Analyzing your backend web page response times</title>
		<link>http://vuksan.com/blog/2010/07/15/analyzing-your-web-page-response-times/</link>
		<comments>http://vuksan.com/blog/2010/07/15/analyzing-your-web-page-response-times/#comments</comments>
		<pubDate>Fri, 16 Jul 2010 00:59:05 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[web performance optimization]]></category>

		<guid isPermaLink="false">http://vuksan.com/blog/?p=253</guid>
		<description><![CDATA[I have blogged about in the past about some of the ways you can monitor your web site performance e.g how to monitor your site using 90th percentile response times, beauty of aggregate line graphs and tracking web clients in real time. Most recently we wanted to get better insight into how our site and [...]]]></description>
			<content:encoded><![CDATA[<p>I have blogged about in the past about some of the ways you can monitor your web site performance e.g how to <a href="http://vuksan.com/blog/2010/01/15/monitoring-your-site-via-90th-percentile-response-time/">monitor your site using 90th percentile response times</a>, <a href="http://vuksan.com/blog/2010/06/05/beauty-of-aggregate-line-graphs/">beauty of aggregate line graphs</a> and <a href="http://vuksan.com/blog/2010/04/20/tracking-web-clients-in-real-time/">tracking web clients in real time</a>.</p>
<p>Most recently we wanted to get better insight into how our site and more specifically backend is performing. We wanted a tool that could provide us with per URL/page metrics such as</p>
<ul>
<li>total number of requests</li>
<li>aggregate compute time</li>
<li>average request time</li>
<li>90th percentile time (you can find more explanation what it means at <a href="http://vuksan.com/blog/2010/01/15/monitoring-your-site-via-90th-percentile-response-time/">monitor your site using 90th percentile response times</a>) - this eliminates most of the really slow response times that may really affect your averages</li>
</ul>
<p>Initial plan was to build a basic set of reports to tell us what are the pages with excessive response times or large total (aggregate) compute times. Next and yet to be implemented portion was to be able to analyze data in real time so that we'd have another data point to use in troubleshooting in case there is a site slow down.</p>
<p>Basic requirements for the tool were these</p>
<ul>
<li> Capable of crunching 100+ million daily entries</li>
<li>Real-time analysis</li>
<li>Produce multiple metrics with potential to add more down the line</li>
<li>Low footprint</li>
</ul>
<p>An obvious way to do this is to store all data in a heavy duty data store like a relational/SQL database or something MapReduce capable. Trouble is we may be doing in logging in excess of 3,000 hits per second (all dynamic content as static assets are served from the CDN). Doing that many inserts per second on a SQL-type database will be tricky unless you have powerful hardware. Next obvious problem is to scan through hundreds of millions or billions of rows will be slow even if I use MapReduce unless of course you throw tons of hardware at it. We wanted a low footprint remember.</p>
<p>Instead we decided to go with a key/value store. Major pluses were that footprint is relatively low and it performs very fast. Downside was I would not be able to run any sophisticated queries. Since we already have an app that uses memcached to give us <a href="http://vuksan.com/blog/2010/04/20/tracking-web-clients-in-real-time/">real-time view per IP number of accesses</a> we ended up using it for this purpose as well.</p>
<h3>Implementation</h3>
<p>I have been working for a while now with <a href="http://bitbucket.org/maplebed/ganglia-logtailer/">ganglia-logtailer </a>which is a Python framework to crunch log data and submit it to <a href="http://ganglia.info/">Ganglia</a>. There are a number of good pieces from it we could reuse and we did. What we ended up is a two part tool. A Python based log parsing piece and a PHP based web GUI and computation part. Division of "labor" was roughly this</p>
<ul>
<li>Python part parses the logs and creates entries/keys where the value in each key represent all the response times observed on a particular server and URL in a particular time period ie. one hour</li>
<li>PHP part takes the list once the time period has ended, calculates total time, average time and 90th percentile times and stores computed values in memcache so that retrieval later can be quicker.</li>
</ul>
<p>Graphing is achieved using simple CSS graphs while time based series are done using <a href="http://sourceforge.net/projects/openflashchart">OpenFlashChart</a>. I did look at <a href="http://www.danvk.org/dygraphs/">Dygraphs </a>for Javascript/DHTML based graphing however couldn't figure how to plot hourly values. I could only do daily values.</p>
<p>Tool is operational and so far it has led us to the realization that our mobile web pages are overall much slower than their corresponding web pages. This is due to the way we handle mobile ads since most feature phones don't support Javascript so we have to download the ad which introduces a slight delay. We did figure out that we could use Javascript on Webkit browsers similar to what we do for regular browsers so that should help a bit. We are also chasing some of the other "leads" regarding inconsistent performance for particular pages on some of the servers.</p>
<p>Next steps are to adapt parsing code to work with ganglia-logtailer which would give us real-time reporting. I don't expect too many problems with that. Also graphing could use some more love. Perhaps I'll even do standard deviation calculations <img src='http://vuksan.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .</p>
<p>Anyways you can download source code from here</p>
<p><a href="http://github.com/vvuksan/pagetime-analyzer">http://github.com/vvuksan/pagetime-analyzer</a></p>
<p>You know what to do <img src='http://vuksan.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .</p>
<h2>Obligatory screenshots</h2>
<p>Hourly overview sorted by aggregate time in seconds (you can sort by any column)</p>
<p><a href="http://vuksan.com/blog/wp-content/uploads/2010/07/pt_overview.png"><img title="PageTime Analyzer hourly overview" src="http://vuksan.com/blog/wp-content/uploads/2010/07/pt_overview.png" alt="" width="700" /></a></p>
<p>This is the average response time (over an hour) for a particular URL on separate server instances</p>
<p style="text-align: center;"><a href="http://vuksan.com/blog/wp-content/uploads/2010/07/pt_url_breakdown.png"><img class="size-full wp-image-257  aligncenter" title="PageTime Analyzer URL server breakdown" src="http://vuksan.com/blog/wp-content/uploads/2010/07/pt_url_breakdown.png" alt="" width="626" height="405" /></a></p>
<p>Daily view of performance for a particular URL</p>
<p><a href="http://vuksan.com/blog/wp-content/uploads/2010/07/pt_graph.png"><img class="alignright size-full wp-image-261" title="PageTime Analyzer average/90th percentile graph" src="http://vuksan.com/blog/wp-content/uploads/2010/07/pt_graph.png" alt="" width="713" height="363" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://vuksan.com/blog/2010/07/15/analyzing-your-web-page-response-times/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Store your cron output for analysis and correlation with cronologger</title>
		<link>http://vuksan.com/blog/2010/07/06/store-your-cron-output-with-cronologger/</link>
		<comments>http://vuksan.com/blog/2010/07/06/store-your-cron-output-with-cronologger/#comments</comments>
		<pubDate>Tue, 06 Jul 2010 12:32:41 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[Cron Linux CouchDB]]></category>

		<guid isPermaLink="false">http://vuksan.com/blog/?p=244</guid>
		<description><![CDATA[For the longest time I have wanted to get rid of dozen or so cron messages I receive every morning about things like DB backups, DB cleanups/vacuums, reporting etc. There are a number of solutions out there to help you manage the cron spam such as cronic, shush and cronwrap. They help by e-mailing you [...]]]></description>
			<content:encoded><![CDATA[<p>For the longest time I have wanted to get rid of dozen or so cron messages I receive every morning about things like DB backups, DB cleanups/vacuums, reporting etc. There are a number of solutions out there to help you manage the cron spam such as <a href="http://habilis.net/cronic/">cronic</a>, <a href="http://web.taranis.org/shush/">shush</a> and <a href="http://www.uow.edu.au/~sah/cronwrap.html">cronwrap</a>. They help by e-mailing you only if there is a problem however don't store the cron output itself. To get around that issue I have developed cronologger which can be downloaded from</p>
<p><a href="http://github.com/vvuksan/cronologger">http://github.com/vvuksan/cronologger</a></p>
<p>Cronologger is a BASH script that stores all the cron output into a database. I am using <a href="http://couchdb.apache.org/">CouchDB</a> since it is a great document oriented database that allows me to add attachments (blobs) to a document. I assume it would not be hard to use MongoDB, Riak and others.</p>
<p>Some of the benefits of this utility are</p>
<ul>
<li>Reduce cron spam</li>
<li>Provide the ability to correlate adverse affects by overlaying cron events on e.g. Ganglia graphs</li>
<li>Provide a better report of all the batch jobs that ran, diff them with past jobs if they should look the same, etc.</li>
<li>Provide the ability to easily view what is currently running on the whole infrastructure ie. job_duration &lt; 0</li>
<li>Review historical output</li>
</ul>
<p>I am still working on web GUI for most of these things. I will gladly accept patches and new contributions.</p>
<p>Tip: To get view a list of documents in a CouchDB database you can use the _utils view e.g. http://localhost:5984/_utils/</p>
]]></content:encoded>
			<wfw:commentRss>http://vuksan.com/blog/2010/07/06/store-your-cron-output-with-cronologger/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Overlay deploy timeline on Ganglia graphs</title>
		<link>http://vuksan.com/blog/2010/06/28/overlay-deploy-timeline-on-your-ganglia-graphs/</link>
		<comments>http://vuksan.com/blog/2010/06/28/overlay-deploy-timeline-on-your-ganglia-graphs/#comments</comments>
		<pubDate>Mon, 28 Jun 2010 15:55:54 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[Ganglia RRDtool]]></category>

		<guid isPermaLink="false">http://vuksan.com/blog/?p=232</guid>
		<description><![CDATA[Don't you sometimes wish you could have a visual indicator of when code has been deployed in production. Something like this This is how you can add deploy timeline to your Ganglia graphs or for that matter to any tool that uses RRDs such as Cacti, Munin, Collectd etc. Background RRDtool supports so called VRULEs [...]]]></description>
			<content:encoded><![CDATA[<p>Don't you sometimes wish you could have a visual indicator of when code has been deployed in production. Something like this <img src='http://vuksan.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p><a href="http://vuksan.com/blog/wp-content/uploads/2010/06/deploy_timeline.png"><img class="alignnone size-full wp-image-233" title="Deploy time line on the load graph" src="http://vuksan.com/blog/wp-content/uploads/2010/06/deploy_timeline.png" alt="Shows deploy time line on a load graph" width="577" height="224" /></a></p>
<p>This is how you can add deploy timeline to your Ganglia graphs or for that matter to any tool that uses RRDs such as Cacti, Munin, Collectd etc.</p>
<h3>Background</h3>
<p>RRDtool supports so called <a href="http://oss.oetiker.ch/rrdtool/doc/rrdgraph_graph.en.html">VRULEs</a> which are</p>
<h4 style="padding-left: 30px;"><a id="IVRULE_time_color__legend___dashes__on_s__off_s__on_s_off_s________dash_offset_offset__" title="click to go to top of document" href="http://oss.oetiker.ch/rrdtool/doc/rrdgraph_graph.en.html#___top"><strong>VRULE</strong><strong>:</strong><em>time</em><strong>#</strong><em>color</em>[<strong>:</strong><em>legend</em>][<strong>:dashes</strong>[<strong>=</strong><em>on_s</em>[,<em>off_s</em>[,<em>on_s</em>,<em>off_s</em>]...]][<strong>:dash-offset=</strong><em>offset</em>]]</a></h4>
<p style="padding-left: 30px;">Draw a vertical line at <em>time</em>. Its color is composed from three hexadecimal numbers specifying the rgb color components (00 is off, FF is maximum) red, green and blue followed by an optional alpha. Optionally, a legend box and string is printed in the legend section. <em>time</em> may be a number or a variable from a <strong>VDEF</strong>. It is an error to use <em>vname</em>s from <strong>DEF</strong> or <strong>CDEF</strong> here. Dashed lines can be drawn using the <strong>dashes</strong> modifier. See <strong>LINE</strong> for more details.</p>
<p>What we want to do is add a VRULE for each deployment. For example those three lines above have been generated using these VRULEs</p>
<div id="_mcePaste" style="padding-left: 30px;">VRULE:1277731886#FF00FF:"Deploys" VRULE:1277721886#FF00FF VRULE:1277711886#FF00FF</div>
<h3>Implementation</h3>
<p>Easiest way to add these to Ganglia is to modify graph.php in Ganglia Web. You need to look for following two lines at the end of the file</p>
<pre>$command .=  array_key_exists('extras', $rrdtool_graph) ? ' '.$rrdtool_graph['extras'].' ' : '';
$command .=  " $rrdtool_graph[series]";</pre>
<p>Then append your own VRULEs ie.</p>
<pre>$command .= " VRULE:" . $time . "#FF00FF:\"Deploys\"";</pre>
<p>Obviously you have to pull in the $time info from where you keep track of your deploy times. You can also get creative by using different colors for different deploys, change legend labels, add VRULEs to only certain graphs ie. load, CPU etc. This is a quick and dirty way to do it</p>
<pre>$deploy_times = array(1278082860,1279393200);
foreach ( $deploy_times as $key =&gt; $time ) {
  # Put deploys label only once.
  if ( $key == 0 )
     $command .= " VRULE:" . $time . "#FF00FF:\"Deploys\"";
  else
     $command .= " VRULE:" . $time . "#FF00FF";
}
</pre>
<p>Now you just have to make sure you append deploy times in the array.</p>
<h3>Alternate implementations</h3>
<p>Alternate implementation is to create a RRD file whenever you do deploys then overlay that graph on top of an existing graph. Trouble is you have to worry about scaling the graph. Never could get it quite right.</p>
<h3>Credit</h3>
<p>Thanks goes to the <a href="http://circonus.com/">Circonus</a> guys <img src='http://vuksan.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  since they made me think of vertical lines instead of trying the RRD overlay. Also thanks to <a href="https://twitter.com/toredash">@toredash</a> for pointing me in the right RRDtool direction by suggesting HRULE.</p>
]]></content:encoded>
			<wfw:commentRss>http://vuksan.com/blog/2010/06/28/overlay-deploy-timeline-on-your-ganglia-graphs/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Beauty of aggregate line graphs</title>
		<link>http://vuksan.com/blog/2010/06/05/beauty-of-aggregate-line-graphs/</link>
		<comments>http://vuksan.com/blog/2010/06/05/beauty-of-aggregate-line-graphs/#comments</comments>
		<pubDate>Sat, 05 Jun 2010 15:27:41 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[Systems Management]]></category>
		<category><![CDATA[Ganglia RRDtool]]></category>

		<guid isPermaLink="false">http://vuksan.com/blog/?p=207</guid>
		<description><![CDATA[If you saw a graph like this Would it mean anything to you ? First time I was introduced to it I thought they were pointless since you couldn't really see much. That was until I saw something like this This was was post release. Can you spot something wrong ? Obviously color scheme is [...]]]></description>
			<content:encoded><![CDATA[<p>If you saw a graph like this</p>
<p><a href="http://vuksan.com/blog/wp-content/uploads/2010/06/90thpercentile-consolidated-graph.png"><img class="alignnone size-full wp-image-208" title="90thpercentile-consolidated-graph" src="http://vuksan.com/blog/wp-content/uploads/2010/06/90thpercentile-consolidated-graph.png" alt="90th percentile response time consolidated line graph" width="872" height="479" /></a></p>
<p>Would it mean anything to you <img src='http://vuksan.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  ? First time I was introduced to it I thought they were pointless since you couldn't really see much. That was until I saw something like this</p>
<p><a href="http://vuksan.com/blog/wp-content/uploads/2010/06/netstat-conn.png"><img class="alignnone size-full wp-image-209" title="netstat-conn" src="http://vuksan.com/blog/wp-content/uploads/2010/06/netstat-conn.png" alt="Netstat consolidated line graph" width="981" height="574" /></a></p>
<p>This was was post release. Can you spot something wrong <img src='http://vuksan.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  ? Obviously color scheme is somewhat off in the last graph which we later reworked (visible in the top graph). We then have another set of graphs where you can drill down per host aggregations as we are running multiple Resin instances on the same machine so you could find the misbehaving instance.</p>
<p>You can make these graphs pretty easily by using Ganglia's custom report graphs. I will try and post some of the ones we use in next couple days.</p>
<p>For those wondering what is 90th percentile response time you can read my <a href="http://vuksan.com/blog/2010/01/15/monitoring-your-site-via-90th-percentile-response-time/">Monitoring your website performance via 90th percentile response time</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://vuksan.com/blog/2010/06/05/beauty-of-aggregate-line-graphs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tracking web clients in real time</title>
		<link>http://vuksan.com/blog/2010/04/20/tracking-web-clients-in-real-time/</link>
		<comments>http://vuksan.com/blog/2010/04/20/tracking-web-clients-in-real-time/#comments</comments>
		<pubDate>Wed, 21 Apr 2010 01:35:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[Systems Management]]></category>
		<category><![CDATA[Memcached]]></category>

		<guid isPermaLink="false">http://vuksan.com/blog/?p=170</guid>
		<description><![CDATA[Most recently I have been working on being able to more quickly identify abusers of our service ie. spammers, crawlers etc. We already have a process that rotates web logs on all web servers hourly then processes them extracting per IP access info. On occasion abusers get quite aggressive and cause some of our alarms [...]]]></description>
			<content:encoded><![CDATA[<p>Most recently I have been working on being able to more quickly identify abusers of our service ie. spammers, crawlers etc. We already have a process that rotates web logs on all web servers hourly then processes them extracting per IP access info. On occasion abusers get quite aggressive and cause some of our alarms to go off by causing excessive number of log errors etc. Trouble is that due to logs being processed on the hour there is a window of time where we may spend extra time trying to track down the cause of log errors. I figured it would help if the IP tracker was real-time. Luckily we have already been using a package called Ganglia Logtailer</p>
<p><a href="http://bitbucket.org/maplebed/ganglia-logtailer/">http://bitbucket.org/maplebed/ganglia-logtailer/</a></p>
<p>which processes our web logs every minute and publishes metrics such as number of HTTP 200/300/400/500 hits, average and 90th percentile response time. All I had to do was send the IP data to a storage engine of my choice. Initially I thought I could use mySQL however decided against it due to following reasons</p>
<ol>
<li> Currently we can get up to 2500 hits/sec so processing them on the minute would result in roughly 150k inserts which mySQL may have some trouble processing in short amount of time.</li>
<li>I don't need this data after couple hours.</li>
</ol>
<p>I looked at Redis which has some interesting features around sets however I decided to use memcached since we were already using it and if I ever wanted to use a more persistent storage engine I could replace it with memcachedb or Tokyo Cabinet with no changes to the code.</p>
<p><strong>Implementation</strong></p>
<p>Implementation consists of two pieces</p>
<p>1. Modified Ganglia Logtailer class that inserts data into memcached. You can find a VarnishMemcacheLogtailer class on the Bit Bucker logtailer site which implements this. All you have to do is modify the location of the memcached server (set to localhost). Current implementation aggregates data per hour ie. all the numbers are hourly numbers. It would be trivial to do it for 10 minute or 1 minute periods.</p>
<p>2. Client application that displays data from memcached. I wrote a PHP interface that shows top 20 IPs from the web servers that can be downloaded from here<br />
<a href="http://bitbucket.org/vvuksan/realtime-iptracker"></a></p>
<p><a href="http://bitbucket.org/vvuksan/realtime-iptracker">http://bitbucket.org/vvuksan/realtime-iptracker</a></p>
<p>Tracker looks something like this</p>
<p><a href="http://vuksan.com/blog/wp-content/uploads/2010/04/iptracker.png"><img class="alignnone size-full wp-image-173" title="Page view for the IP Tracker" src="http://vuksan.com/blog/wp-content/uploads/2010/04/iptracker.png" alt="" width="900" height="699" /></a><strong>Update: </strong>I do realize Splunk would be great for this kind of a purpose. Trouble is that for the amount of logs we create we'd have to get a really large Splunk license and those are quite expensive.</p>
]]></content:encoded>
			<wfw:commentRss>http://vuksan.com/blog/2010/04/20/tracking-web-clients-in-real-time/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Monitoring your website performance via 90th percentile response time</title>
		<link>http://vuksan.com/blog/2010/01/15/monitoring-your-site-via-90th-percentile-response-time/</link>
		<comments>http://vuksan.com/blog/2010/01/15/monitoring-your-site-via-90th-percentile-response-time/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 13:20:32 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[Systems Management]]></category>

		<guid isPermaLink="false">http://vuksan.com/blog/?p=102</guid>
		<description><![CDATA[There are numerous ways to monitor the health and performance of your web site. Some of the popular ways are measure response time of a particular URL on your site. If it exceeds a threshold (which is site dependent) it is time to investigate compare pertinent metrics such as the number of created sessions, http [...]]]></description>
			<content:encoded><![CDATA[<p>There are numerous ways to monitor the health and performance of your web site. Some of the popular ways are</p>
<ul>
<li>measure response time of a particular URL on your site. If it exceeds a threshold (which is site dependent) it is time to investigate</li>
<li>compare pertinent metrics such as the number of created sessions, http connections, etc.</li>
<li>watch CPU utilization/load of the machine</li>
</ul>
<p>Unfortunately most of these are flawed since they don't provide you with the most important metric and that is how fast is the site for you customers. Above metrics are not useless and do help paint the picture but they may provide you a false sense of how fast your site is since the URL you are checking may be behaving quite fast however some other part of the site due to a newly introduced feature may be behaving terribly. <span style="background-color: #ffffff;">I have found one of the best metrics to watch is the 90th percentile request response time. Basically, you take every request passing through your web servers, log the time it takes to serve them, sort them from fastest to slowest then take the 90th percentile time. Therefore if your 90th percentile is 1 second it means that 90% of the requests have been served in under a second and 10% in more than a second. You may be asking yourself "so what?". Here is why ?</span></p>
<p><a href="http://vuksan.com/blog/wp-content/uploads/2010/01/response_90th_percentile.png"><img class="alignnone size-full wp-image-104" title="response_90th_percentile" src="http://vuksan.com/blog/wp-content/uploads/2010/01/response_90th_percentile.png" alt="" width="577" height="229" /></a></p>
<p>So for at least couple minutes 10% of your visitors/requests were waiting for more than 17 seconds to have their requests served. That can't be good for business and you may want to investigate the cause.</p>
<p>You could also consolidate response times from different web servers on one graph and you get this.</p>
<p><a href="http://vuksan.com/blog/wp-content/uploads/2010/01/response_90th_percentile1.png"><img class="alignnone size-full wp-image-109" title="response_90th_percentile" src="http://vuksan.com/blog/wp-content/uploads/2010/01/response_90th_percentile1.png" alt="" width="577" height="229" /></a></p>
<p>It may not look like much but it is pretty clear if an individual web server starts acting up.</p>
<p>How do you get on the fun ? You can look at the steps how to add Apache real-time metrics which also covers the 90th percentile response time on this URL</p>
<p><a href="http://vuksan.com/linux/ganglia/#Apache_Traffic_Stats  ">http://vuksan.com/linux/ganglia/#Apache_Traffic_Stats</a></p>
<p>I want to thank Ben Hartshorne (@<a href="http://twitter.com/maplebed">maplebed</a>) for making me aware of this metric.</p>
]]></content:encoded>
			<wfw:commentRss>http://vuksan.com/blog/2010/01/15/monitoring-your-site-via-90th-percentile-response-time/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Nagios alerts based on Ganglia metrics</title>
		<link>http://vuksan.com/blog/2009/09/14/nagios-alerts-based-on-ganglia-metrics/</link>
		<comments>http://vuksan.com/blog/2009/09/14/nagios-alerts-based-on-ganglia-metrics/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 18:21:44 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[Systems Management]]></category>

		<guid isPermaLink="false">http://vuksan.com/blog/?p=53</guid>
		<description><![CDATA[Have you ever wanted to alert based on Ganglia metrics. Well you can You can find the source code here for the plug in here. Instructions how to set it up are here.]]></description>
			<content:encoded><![CDATA[<p>Have you ever wanted to alert based on Ganglia metrics. Well you can <img src='http://vuksan.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>You can find the source code <a href="http://vuksan.com/linux/ganglia/check_ganglia_metric.phps">here</a> for the plug in here.</p>
<p>Instructions how to set it up are <a href="http://vuksan.com/linux/nagios_scripts.html#check_ganglia_metrics">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://vuksan.com/blog/2009/09/14/nagios-alerts-based-on-ganglia-metrics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
