Using rrdtool to graph performance stats

RRDtool stands for Round Robin Database tool. The database is a fixed size and contains a fixed amount of data. When the database fills up it starts again at the beginning overwriting the previous contents. This can be visualised as a circle with data being written onto the circle in a continuous fashion without end. The benefits of a Round Robin Database are that it will never increase in size and therefore does not require maintenance.

I am using RRDtool to store system performance statistics and then displaying this stats in a graph format.

Collecting data

We need to create a file with values representing something meaningful but can be anything. We may want to collect performance statistics such as page-in and page-out, free memory, cpu utilization etc.

We can collect these stats using vmstat and pipe to awk so that we only get the values we want

For instance the following shell script will collect data from vmstat every minute for one hour. If we put this script into cron to run every hour then we will have a continous supply of data for later insertion into rrdtool.

 

exec >> /var/logstat #Append all standard out to logfile

TIME=`date +%s` #Assign Epoch date from the 1st January 1970 in seconds to the variable TIME

 

# Return TYPE,TIME,PAGE_IN,PAGE_OUT,FREE_MEM,KERNEL_TIME,USER_TIME,IO_TIME

vmstat 60 59 | awk -v time=$TIME '/^.[0-9]/{print "VM,"time","$6","$7","$4","$15","$14","$17}'

 

Creating the RRD database

To create the rrdtool database use the rrdtool create command

    for i in host1 host2 host3 host4 host5 host6 host7
    do
      rrdtool create vm$i.rrd \
       -b 931225537 -s 60 \
       DS:page_in:GAUGE:90:0:99999999 \
       DS:page_out:GAUGE:90:0:99999999 \
       DS:free_mem:GAUGE:90:0:99999999 \
       DS:kernel:GAUGE:90:0:100 \
       DS:user:GAUGE:90:0:100 \
       DS:io:GAUGE:90:0:100 \
       RRA:AVERAGE:0.5:1:267840 \
       RRA:AVERAGE:0.5:60:4464 \
       RRA:AVERAGE:0.5:720:372 \
       RRA:AVERAGE:0.5:1440:186
    done
			 

For each computer that requires stats to be collected (host1, host2 etc), a number of round robin databases called vmhost1.rrd, vmhost2.rrd etc, are created.

-b defines the start time in seconds from January 1st 1970 UTC. (It can also be defined using --start instead of -b).

-s specifies the base interval in seconds with which data will be fed into the RRD, the default value is 300 seconds but is specified as 60 seconds in this case

DS stands for Data Source, we have created six data sources here with user defined names (such as page_in, page_out etc) to store the different types of performance data required and gathered by vmstat.

GAUGE is the data source type (DST), GAUGE meaning data that can be measured with a gauge, such as temperature or the number of people in a room

The next field 90 is the heartbeat and is the maximum number of seconds that may pass between two updates of this data source before the value of the data source is assumed to be UNKNOWN

The next two values 0 and 99999999, or 0 and 100 are the minimum and maximum expected range

RRA stands for Round Robin Archive and AVERAGE causes the values to be averaged

Don't worry too much about the next value 0.5, this is the xfiles factor (xff) and has a range of 0 to 1

The next field is the steps field and states how many fields to average, 1 means don't average and 60 means average every 60 samples.

The last field is the rows field and states the number of samples to keep per RRA

Once the RRD is created we can start adding data to it, the following script will update an RRD created for each host with values collected by vmstat, this could be run daily to provide data for a maximum period of time, in this case 6 months as defined in the create rrd script above.

#!/usr/bin/ksh
#Set the file separator value to a comma for insertion of data into the array
typeset IFS=,
udate=`date +%d%m%y`
basedir=/opt/rrd
#Cycle through all the servers to enable the creation of a rrd for each server
for a in host1 host2 host3 host4 host5 host6 host7
do
   #Files in $basedir contain the raw data gathered by vmstat on each server,
   #unzip each file in turn ready for insertion into the array
   gunzip $basedir/system.$a.$udate.stat.gz
   #Initialise each array index variable each time through the loop for each server
   j=0
   k=1
   m=2
   n=3
   o=4
   p=5
   q=6
   t=7
   #For each comma separated value in the vmstat output file, insert into the array
   #at the next index (incremented by one each time)
   #Replace the newline character at the end of each line with a comma 
   #because each value is separated by a comma see IFS intialized above
   for i in `grep VM $basedir/system.$a.$udate.stat | tr '\n' ','`
   do
      array[j]=$i
      j=$(($j + 1))
   done
   #Assign the length of the array to the variable var so when we update the array 
   #with values we know when to stop
   var=`print "${#array[*]}"`
   #While the value of the index of the array for each data element to be inserted
   #is less than the length of the array insert
   #data into each DS (data source) in turn. Increment each index for each type
   #of data (ie page in) by 8 each time as the same
   #data type is repeated every 8 stored values in array.
   while (($t < $var))
   do
      rrdtool update $basedir/vm$a.rrd \
      ${array[$k]}:${array[$m]}:${array[$n]}:${array[$o]}:${array[$p]}:${array[$q]}:${array[$t]}
      k=$(($k + 8))
      m=$(($m + 8))
      n=$(($n + 8))
      o=$(($o + 8))
      p=$(($p + 8))
      q=$(($q + 8))
      t=$(($t + 8))
   done
   #Zip up the vmstat data source again to save space
   gzip $basedir/system.$a.$udate.stat
done

Creating graphs from the data within RRD

The following script will create a basic graph from the data gathered so far

Later on in this tutorial we will go on to describe more advanced graphing parameters allowing us to create better looking and more informative graphs

rrdtool graph vm.png                        \
   --start 1213621203 --end 1214211603      \
   DEF:myvm=vmhost1.rrd:free_mem:AVERAGE    \
   LINE2:myvm#FF0000

This command will create vm.png that displays data starting at 1213621203 and ending 1214211063 seconds from 1st January 1970. Or converted to date strings from 13:00:03 on the 16th June 2008 to 09:00:03 on the 23rd June 2008. A variable named myvm is assigned the name vmhost1.rrd, this is the rrd file that we will extract data from. Also from within this rrd we will be graphing data from the DS (Data Source) free_mem. In addition the data is to be averaged, the line width is 2 pixels and the colour is red (the rgb representation of red is #FF0000).

And that's it, you now have a graph that you can view in your browser. Just put the file vm.png in the document root of your webserver i.e. /www/htdocs/vm.png and point the URL of your browser to something like the following.

http://webserver-ip-address/vm.png

We can customize our graphs to make them more informative and easier to read. There are quite a few options to be used with rrdtool graph. Some of the more popular ones are shown below

 rrdtool graph vm.png                        \
    --start 1213621203 --end 1214211603      \ 
    -w 2000 -h 800 \
    --title "Host1 Wait I/O. 29th June 2008" \
    --vertical-label "Percentage Wait I/O" \ 
    DEF:myvm=vmhost1.rrd:free_mem:AVERAGE    \
    LINE2:myvm#FF0000
    

The line -w 2000 -h 800 sets up the size of the resulting graph.

-w 2000 is the width in pixels and -h 800 is the height in pixels.

--title displays a title on the graph

--vertical-label displays a label on the vertical axis