CMG Home

Site Map Links Members Only National CMG Groups Measure IT International Conference

MeasureIT
 In This Issue
 
From the Editors

Articles >

Forecast Generation

I/O Virtualization

Measurement for Maturity (Part 2)

Capacity Utilisation

CMG News >

'07 Program Update

Press Release (05/31/2007)

Press Release (06/18/2007)

Region News >

Philadelphia

New York

Events >

Calendar

 Article Database
 Resources
 Industry Articles
 Submit Article
 SubscribeIT
 RemoveIT
 Letter to Editor
 About MeasureIT
 Contact Us
 
MeasureIT

Guerrilla Unix Monitoring using the Orca and Open Source Tools
May, 2006
by James Yaple

About the Author
James Yaple, Austin Automation Center

James Yaple works as an IT Specialist for the Austin Automation Center, an enterprise hosting facility for the U. S. Department of Veteran's Affairs. His focus is on Unix systems architecture, availability and performance. James is the AAC's technical representative to Standard Performance Evaluation Corporation (SPEC) and the Storage Performance Council (SPC).

[Send Feedback]
[Hide]

In the summer of 2003, Dr. Neil Gunther used several CMG MeasureIT articles to introduce what he called "Guerrilla Capacity Planning." Since many of the factors involved in developing a performance model are unknown or uncertain, an accurate model can be difficult to construct.  A "guerrilla" approach attempts to provide management a direction as opposed to a precise compass bearing. According to Gunther’s first article, performance monitoring typically gets the most attention because it’s the easiest challenge to address with commercial tools or scripts developed by administrators.  [Gunther 2003]

But what about those situations where monitoring is required, but commercial tools have not been implemented and consistent homegrown scripts are not deployed?  How can measurements be made consistently across dozens of production systems?  Can a "guerrilla" approach be applied to monitoring? 

The Austin Automation Center (AAC) http://www.aac.va.gov/ realized "if you don’t measure it, you can’t manage it."  The center was involved in a large-scale requirements analysis, competition and procurement to obtain and implement a commercial monitoring toolset.  Unfortunately, the procurement cycle was long and it would take months before the vendor was known and monitoring implemented and suitable for use.  A different strategy was needed to obtain capacity planning data in the short-term.

The essence of this paper describes how an enterprise application-hosting center implemented a "guerrilla" approach to monitoring its UNIX systems.  In accordance with Gunther’s description, a "guerrilla" approach to monitoring uses an opportunistic scope, tiny tools and little or no budget. The resulting solution, including data collection, aggregation and alerting will soon be replaced by a commercial version.  However, by sharing the AAC’s experiences, other organizations may improve their own attempts at "guerrilla" monitoring.

Orca

The core of the solution implemented was an open source effort known as Orca, originally developed by Dr. Blaire Zajac [Zajac 1999]. Zajac developed Orca based on the following requirements:

  • The ability to monitor many systems.
  • Measure and display short (daily) and long-term (yearly) trends.
  • Allow easy comparison of the same type of measurement between different systems.
  • Allow easy viewing of all system measurements on different time scales.
  • Ensure plots are always up to date and always available.
  • The act of measuring a system should not adversely affect it, i.e. by placing a large additional load on it, impacting the TCP stack throughput, etc.

Generically, an Orca implementation consists of several functions.  The first is a data collector that typically accesses the /proc table and other UNIX kernel data structures.  Aggregation of collected data into a single location is necessary.  Another component takes raw data and generates graphical output.  Viewing of results is typically via a browser and web page.

Implementing the basic solution

The AAC started with a relatively generic Orca configuration.  Since the target environment was mostly Solaris, data collection is accomplished with orcallator.se [1], developed by Zajac from the Solaris SE toolkit [2] component percollator.se [3].  Data from multiple clients is moved to a central server using rsync [4].  Orca, with embedded Round Robin Database (RRD) [5] library functions processes the aggregated data and produces the files for display.  Apache http server 1.3 provides a suitable web display.

Orcallator.se is written in the SymbEL language.  The language traces its lineage back to the SE toolkit [6] written by Rich Pettit and Adrian Cockcroft and documented in Cockcroft’s book Sun Performance and Tuning: Java and the Internet [Cockcroft/Pettit 1998]Orcallator is an enhanced version of the toolkit’s percollator.se that collects most of the measurements shown in the kit’s zoom.se script. 

Data collected by orcallator.se is appended as a single line to a text file every five minutes for later processing and viewing.   The data is columnar, with varying numbers of columns based on the system configuration, i.e. a system with more disks or network interfaces would have more columns of data, but each observation is contained on a single line.

Data is transferred from hosts running the data collector to a central host using rsync.  Rsync is an open source utility that provides fast incremental file transfer and is freely available under the GNU General Public License version 2. The use of rsync is described in "Capacity Planning for the Masses -- Using the SE Toolkit and Orca" [Buhler/Cockcroft 2003].  At the AAC, the files are transferred using a secure shell (ssh) configuration for encryption of the data stream.   A diagram of the configuration is shown in figure 1.

Figure 1 - AAC Monitoring Infrastructure

The final component is the Orca script itself.  Orca is a Perl script that reads a configuration file, orcallator.cfg [8], describing the location of its input text data files, the general format of the input data files, the destination for its RRD data files and the root of the HTML tree which must be generated. 

The orcallator.cfg file contains an informational link to the Orcaware site.  The link appears as a hyperlink in the resulting web presentation, describing the data being plotted. It also offers some guidelines as to what constitutes a good or bad result. The AAC has found the explanations from the link to be somewhat generic, but it provides a good starting point for analysis and historical information on the evolution of the orcallator.se counters.

Orca output examples

Below are several graphical examples which show the versatility of the Orca toolset. 

Daily mars CPU Usage

Figure 2 - CPU usage

Figure 2 is a CPU usage graph of the system collecting Orca data. Notice regular bursts of processing reflecting the five minute data collection and processing.

Figure 3 - Free memory  (Solaris)

Figure 3 is a free memory graph of an application server. On Saturday, the server was brought down for patching (note the reporting gap) and then restarted.  Following restart, additional memory was recovered.

Monthly pluto Disk System Wide Reads/Writes Per Second

Figure 4 - system-wide disk activity

In figure 4, a database server shows the impact of a faulty software install in week 14.  This problem was detected by an automated alert added into the Orca solution by AAC staff.  Read the full CMG 2004 paper for the details.

Figure 5 - disk usage

The example of a disk usage plot in figure five shows the percent used for a database server file layout.  This type of data collected from Orca is also analyzed for alerting of low file system space conditions and establishing trends for disk usage.  This particular data is from an AIX server.

Round Robin Database (RRD)

The ability to read the collected data files and generate GIF plots is enabled because Orca uses a library written by Tobias Oetiker that provides a Round Robin Database (RRD).  Some users of tools such as the Multi Router Traffic Grapher (MRTG) may be familiar with RRD.  It provides a flexible binary format for the storage of numerical data measured over time.

A convenient function provided by the RRD is data consolidation.  Consolidation of input data reduces the amount of disk space required for long-term data storage.  The consolidated data is used when Orca plots longer term, such as the yearly plots of data. Consolidation is one of the key features of RRD: the data files do not grow significantly over time. In Orca's case, 5-minute data is kept for 200 hours, 30 minute averaged data is kept for 31 days, 2 hour average data is kept for 100 days and daily averaged data for 3 years. Such a data file is typically about 50 Kbytes. RRD reads an arbitrary number of RRD files and generates GIF plots. Plots will either show a daily, weekly, monthly, or yearly view of the data in question.

In normal mode, Orca runs continuously, sleeping until new data is placed by orcallator.se into the output data files. Once new data is written to a file by orcallator.se, Orca updates the RRD data files and recreates any GIFs that need to be updated.

By starting with a basic installation, the AAC realized several advantages quickly.  Orca is freely available in terms of cost and access to the source.  Open source allows sites to adjust parameters in response to local needs, and provides access to other users’ experiences via email support lists and contributions to the project.  The orcallator.se script runs as a single process on each system and does not fork off any processes, extracting performance data from the system without becoming one of the performance problems that needs to be investigated.   Orca is able to work with almost any text data file. 

In the initial implementation, AAC accepted the constraint that the data collector, orcallator.se, and the SE toolkit was only available on SPARC and x86 Solaris platforms. To support Orca monitoring on new platforms, a new data collection tool was required.

Future development of Orca monitoring at AAC

Recently, several customers have initiated plans to develop and launch projects on platforms other than Sun Solaris.  Deployments include variations of Linux and AIX.

Fortunately for Linux platforms, the Orcaware site also provides a Perl data collector, procallator.pl as part of its distribution.  Community input and contributions are a common benefit of many open source projects.  When a need arises, one or more people contribute a solution.  For Linux data collection, Guilherme Carvalho Chehab provided the solution.  The AAC is very close to implementing the Linux collector into the existing Orca data collection, storage, presentation and alerting infrastructure.

On the AIX platform, a similar solution was provided by Jason D. Kelleher and Rajesh Verma.  The current version of orca-aix-stat.pl supports AIX 4.3 and 5.x. [April 2006 Update:  Dave Michaels recently provided an update patch/rewrite of orca-aix-stat.pl]

Conclusion

Orca-based "guerrilla" monitoring, is not for everyone. It requires some level of analysis for a successful implementation.  In its basic form, it can provide a significant amount of information on certain aspects of system health.  With additional needs analysis, data collection can support alerting and capacity projections.  It will not put any of the commercial monitoring companies out of business, because their tools provide richer data collection agents, analysis tools and modeling capabilities. However, if your organization is interested in using an open-source approach to gather capacity planning data, Orca can be a component of that effort.

In addition, the AAC can argue that using Orca for "guerrilla" monitoring improved the process of selecting and implementing a commercial monitoring product.  By knowing what data elements are commonly collected, Orca made it easier to distinguish when a tool vendor was really adding value with their solution.  The effort and discussions involved in identifying key metrics and thresholds for alerting can be transferred into implementation of the commercial product.  In addition, the ease of manipulating historical data using familiar tools required vendors to better justify the value of proprietary components of their solutions.


Acknowledgements

Paul Robinson, Greg Haines and Lonnie Wilson made key contributions in implementing this approach at AAC.

References

[Gunther 2003]

Gunther, N. J., "Guerrilla Capacity Planning: Hit-and-Run Tactics for Website Scalability," CMG MeasureIT. 2003.

(http://www.cmg.org/measureit/issues/mit02/m_2_2.html)

Gunther, N. J., "Guerrilla Capacity Planning: Weapons of Mass Instruction," CMG MeasureIT. 2003.

(http://www.cmg.org/measureit/issues/mit04/m_4_7.html)

[Zajac 1999]

Zajac, B., "Watching All Your Systems In Real-Time," SunWorld 1999

(http://www.orcaware.com/articles/1999_07_01_sunworld.html)

[Cockcroft/Pettit 1998]

Cockcroft, Adrian and Pettit, Richard, "Sun Performance and Tuning: Java and the Internet," Sun Microsystems Press, 1998.

[Buhler/Cockcroft 2003]

Buhler, Justin and Cockcroft, Adrian, "Capacity Planning for the Masses -- Using the SE Toolkit and Orca" (http://www.samag.com/documents/s=8965/sam0314a/0314a.htm), 2003

[1] Source file: orcallator.se  (http://svn.orcaware.com:8000/repos/tags/orca/0.27/orcallator/orcallator.se).

[2] SE Toolkit home page link: (http://www.setoolkit.com).

[3] Source file: percollator.se: (http://svn.orcaware.com:8000/repos/tags/orca/0.10/percollator/percollator.se)

[4] rsync home page link:

(http://samba.anu.edu.au/rsync).

[5] Round Robin Database home page link:

(http://rrfw.sourceforge.net).

[6] About the SE toolkit link: (http://www.setoolkit.com/aboutse.html).

[7] Source file: orca.pl: (http://svn.orcaware.com:8000/repos/tags/orca/0.27/src/orca.pl.in).

[8] Source file: orcallator.cfg: (http://svn.orcaware.com:8000/repos/tags/orca/0.27/orcallator/orcallator.cfg.in).

[9] Source file: orcallator_column.pl: (http://svn.orcaware.com:8000/repos/tags/orca/0.27/orcallator/orcallator_column.pl).

[10] Link to Orca home page: http://www.orcaware.com/orca/

[11] Source file: procallator.pl:

(http://svn.orcaware.com:8000/repos/tags/orca/0.27/contrib/procallator/procallator.pl.in).

[12] Source file: orca-aix-stat.pl (http://svn.orcaware.com:8000/repos/trunk/orca/data_gatherers/aix/orca-aix-stat.pl.in).

 

Last Updated 05/11/06


Home | Conference | Groups | National | Members | Links | Site Map

Computer Measurement Group