|
Guerrilla Unix Monitoring using the Orca and Open Source Tools
May, 2006
by James Yaple
In the summer of 2003, Dr. Neil Gunther used several CMG
MeasureIT articles to introduce what he called "Guerrilla Capacity Planning."
Since many of the factors involved in developing a performance model are
unknown or uncertain, an accurate model can be difficult to construct. A "guerrilla"
approach attempts to provide management a direction as opposed to a precise
compass bearing. According to Gunther’s first article, performance monitoring
typically gets the most attention because it’s the easiest challenge to address
with commercial tools or scripts developed by administrators. [Gunther 2003]
But what about those situations where monitoring is
required, but commercial tools have not been implemented and consistent
homegrown scripts are not deployed? How can measurements be made consistently
across dozens of production systems? Can a "guerrilla" approach be applied to
monitoring?
The Austin Automation Center (AAC) http://www.aac.va.gov/ realized "if you don’t
measure it, you can’t manage it." The center was involved in a large-scale
requirements analysis, competition and procurement to obtain and implement a
commercial monitoring toolset. Unfortunately, the procurement cycle was long
and it would take months before the vendor was known and monitoring implemented
and suitable for use. A different strategy was needed to obtain capacity
planning data in the short-term.
The essence of this paper describes how an enterprise
application-hosting center implemented a "guerrilla" approach to monitoring its
UNIX systems. In accordance with Gunther’s description, a "guerrilla" approach
to monitoring uses an opportunistic scope, tiny tools and little or no budget.
The resulting solution, including data collection, aggregation and alerting will
soon be replaced by a commercial version. However, by sharing the AAC’s
experiences, other organizations may improve their own attempts at "guerrilla"
monitoring.
Orca
The core of the solution implemented was an open source
effort known as Orca, originally developed by Dr. Blaire Zajac [Zajac 1999].
Zajac developed Orca based on the following requirements:
- The ability to monitor many systems.
- Measure and display short (daily) and long-term (yearly)
trends.
- Allow easy comparison of the same type of measurement
between different systems.
- Allow easy viewing of all system measurements on different
time scales.
- Ensure plots are always up to date and always available.
- The act of measuring a system should not adversely affect
it, i.e. by placing a large additional load on it, impacting the TCP stack
throughput, etc.
Generically, an Orca implementation consists of several
functions. The first is a data collector that typically accesses the /proc
table and other UNIX kernel data structures. Aggregation of collected data into
a single location is necessary. Another component takes raw data and generates
graphical output. Viewing of results is typically via a browser and web page.
Implementing the basic solution
The AAC started with a relatively generic Orca
configuration. Since the target environment was mostly Solaris, data
collection is accomplished with orcallator.se
[1], developed by Zajac from the Solaris SE toolkit [2] component percollator.se [3]. Data from multiple
clients is moved to a central server using rsync [4]. Orca, with embedded
Round Robin Database (RRD) [5] library functions processes the aggregated data
and produces the files for display. Apache http server 1.3 provides a suitable
web display.
Orcallator.se
is written in the SymbEL language. The language traces its lineage back to the
SE toolkit [6] written by Rich Pettit and Adrian Cockcroft and documented in
Cockcroft’s book Sun Performance and Tuning: Java and the Internet [Cockcroft/Pettit
1998]. Orcallator is an enhanced version of the toolkit’s percollator.se that collects most of
the measurements shown in the kit’s zoom.se
script.
Data collected by orcallator.se
is appended as a single line to a text file every five minutes for later
processing and viewing. The data is columnar, with varying numbers of columns
based on the system configuration, i.e. a system with more disks or network
interfaces would have more columns of data, but each observation is contained
on a single line.
Data is transferred from hosts running the data collector to
a central host using rsync. Rsync is an open source utility that provides fast
incremental file transfer and is freely available under the GNU General Public
License version 2. The use of rsync is described in "Capacity Planning for the
Masses -- Using the SE Toolkit and Orca" [Buhler/Cockcroft 2003]. At the AAC, the
files are transferred using a secure shell (ssh) configuration for encryption
of the data stream. A diagram of the configuration is shown in figure 1.

Figure 1 - AAC Monitoring Infrastructure
The final component is the Orca script itself. Orca is a
Perl script that reads a configuration file, orcallator.cfg
[8], describing the location of its input text data files, the general
format of the input data files, the destination for its RRD data files and the
root of the HTML tree which must be generated.
The orcallator.cfg file
contains an informational link to the Orcaware site. The link appears as a
hyperlink in the resulting web presentation, describing the data being plotted.
It also offers some guidelines as to what constitutes a good or bad result.
The AAC has found the explanations from the link to be somewhat generic, but it
provides a good starting point for analysis and historical information on the
evolution of the orcallator.se counters.
Orca output examples
Below are several graphical examples which show the
versatility of the Orca toolset.

Figure 2 - CPU usage
Figure 2 is a CPU usage graph of the system collecting Orca data.
Notice regular bursts of processing reflecting the five minute data collection
and processing.

Figure 3 - Free memory (Solaris)
Figure 3 is a free memory graph of an application server.
On Saturday, the server was brought down for patching (note the reporting gap)
and then restarted. Following restart, additional memory was recovered.

Figure 4 - system-wide disk activity
In figure 4, a database server shows the impact of a faulty
software install in week 14. This problem was detected by an automated alert
added into the Orca solution by AAC staff. Read the full CMG 2004 paper for
the details.

Figure 5 - disk usage
The example of a disk usage plot in figure five shows the
percent used for a database server file layout. This type of data collected
from Orca is also analyzed for alerting of low file system space conditions and
establishing trends for disk usage. This particular data is from an AIX
server.
Round Robin Database (RRD)
The ability to read the collected data files and generate
GIF plots is enabled because Orca uses a library written by Tobias Oetiker that
provides a Round Robin Database (RRD). Some users of tools such as the Multi
Router Traffic Grapher (MRTG) may be familiar with RRD. It provides a flexible
binary format for the storage of numerical data measured over time.
A convenient function provided by the RRD is data
consolidation. Consolidation of input data reduces the amount of disk space
required for long-term data storage. The consolidated data is used when Orca
plots longer term, such as the yearly plots of data. Consolidation is one of
the key features of RRD: the data files do not grow significantly over time. In
Orca's case, 5-minute data is kept for 200 hours, 30 minute averaged data is
kept for 31 days, 2 hour average data is kept for 100 days and daily averaged
data for 3 years. Such a data file is typically about 50 Kbytes. RRD reads an
arbitrary number of RRD files and generates GIF plots. Plots will either show a
daily, weekly, monthly, or yearly view of the data in question.
In normal mode, Orca runs continuously, sleeping until new
data is placed by orcallator.se into
the output data files. Once new data is written to a file by orcallator.se, Orca updates the RRD
data files and recreates any GIFs that need to be updated.
By starting with a basic installation, the AAC realized
several advantages quickly. Orca is freely available in terms of cost and
access to the source. Open source allows sites to adjust parameters in
response to local needs, and provides access to other users’ experiences via
email support lists and contributions to the project. The orcallator.se script runs as a single
process on each system and does not fork off any processes, extracting
performance data from the system without becoming one of the performance
problems that needs to be investigated. Orca is able to work with almost any
text data file.
In the initial implementation, AAC accepted the constraint that
the data collector, orcallator.se, and
the SE toolkit was only available on SPARC and x86 Solaris platforms. To
support Orca monitoring on new platforms, a new data collection tool was
required.
Future development of Orca monitoring at AAC
Recently, several customers have initiated plans to develop
and launch projects on platforms other than Sun Solaris. Deployments include
variations of Linux and AIX.
Fortunately for Linux platforms, the Orcaware site also provides a Perl
data collector, procallator.pl
as part of its distribution. Community input and contributions are a common
benefit of many open source projects. When a need arises, one or more people
contribute a solution. For Linux data collection, Guilherme Carvalho Chehab provided
the solution. The AAC is very close to implementing the Linux collector into
the existing Orca data collection, storage, presentation and alerting
infrastructure.
On the AIX platform, a similar solution was provided by
Jason D. Kelleher and Rajesh Verma. The current version of orca-aix-stat.pl
supports AIX 4.3 and 5.x. [April 2006 Update: Dave Michaels recently provided
an update
patch/rewrite of orca-aix-stat.pl]
Conclusion
Orca-based "guerrilla" monitoring, is not for everyone. It
requires some level of analysis for a successful implementation. In its basic
form, it can provide a significant amount of information on certain aspects of
system health. With additional needs analysis, data collection can support alerting
and capacity projections. It will not put any of the commercial monitoring
companies out of business, because their tools provide richer data collection
agents, analysis tools and modeling capabilities. However, if your organization
is interested in using an open-source approach to gather capacity planning
data, Orca can be a component of that effort.
In addition, the AAC can argue that using Orca for "guerrilla"
monitoring improved the process of selecting and implementing a commercial
monitoring product. By knowing what data elements are commonly collected, Orca
made it easier to distinguish when a tool vendor was really adding value with
their solution. The effort and discussions involved in identifying key metrics
and thresholds for alerting can be transferred into implementation of the
commercial product. In addition, the ease of manipulating historical data
using familiar tools required vendors to better justify the value of
proprietary components of their solutions.
Acknowledgements
Paul
Robinson, Greg
Haines and Lonnie Wilson made key contributions in implementing this
approach at AAC.
References
[Gunther 2003]
Gunther, N. J., "Guerrilla Capacity Planning: Hit-and-Run
Tactics for Website Scalability," CMG MeasureIT. 2003.
(http://www.cmg.org/measureit/issues/mit02/m_2_2.html)
Gunther, N. J., "Guerrilla Capacity Planning: Weapons of
Mass Instruction," CMG MeasureIT. 2003.
(http://www.cmg.org/measureit/issues/mit04/m_4_7.html)
[Zajac 1999]
Zajac, B., "Watching All Your Systems In Real-Time,"
SunWorld 1999
(http://www.orcaware.com/articles/1999_07_01_sunworld.html)
[Cockcroft/Pettit 1998]
Cockcroft, Adrian and Pettit, Richard, "Sun Performance
and Tuning: Java and the Internet," Sun Microsystems Press, 1998.
[Buhler/Cockcroft 2003]
Buhler, Justin and Cockcroft, Adrian, "Capacity Planning for
the Masses -- Using the SE Toolkit and Orca" (http://www.samag.com/documents/s=8965/sam0314a/0314a.htm), 2003
[1] Source file: orcallator.se (http://svn.orcaware.com:8000/repos/tags/orca/0.27/orcallator/orcallator.se).
[2] SE Toolkit home page link: (http://www.setoolkit.com).
[3] Source file: percollator.se: (http://svn.orcaware.com:8000/repos/tags/orca/0.10/percollator/percollator.se)
[4] rsync home page link:
(http://samba.anu.edu.au/rsync).
[5] Round Robin Database home page link:
(http://rrfw.sourceforge.net).
[6] About the SE toolkit link: (http://www.setoolkit.com/aboutse.html).
[7] Source file: orca.pl: (http://svn.orcaware.com:8000/repos/tags/orca/0.27/src/orca.pl.in).
[8] Source file: orcallator.cfg: (http://svn.orcaware.com:8000/repos/tags/orca/0.27/orcallator/orcallator.cfg.in).
[9] Source file: orcallator_column.pl: (http://svn.orcaware.com:8000/repos/tags/orca/0.27/orcallator/orcallator_column.pl).
[10] Link to Orca home page: http://www.orcaware.com/orca/
[11] Source file: procallator.pl:
(http://svn.orcaware.com:8000/repos/tags/orca/0.27/contrib/procallator/procallator.pl.in).
[12] Source file: orca-aix-stat.pl (http://svn.orcaware.com:8000/repos/trunk/orca/data_gatherers/aix/orca-aix-stat.pl.in).
Last Updated 05/11/06
Home |
Conference |
Groups |
National |
Members |
Links |
Site Map
|