On February 5th, CMG will release its first Journal of 2018. This member’s only issue will feature 4 articles and a book review. We’re also posting about the journal on our blog here.
In this issue:
- System Utilization: Keeping the Glass Half Full by Bruce McNutt, IntelliMagic, Inc.
- Machine Learning for Predictive Performance Monitoring by Tim Browning, IT Consultant, Kimberly-Clark Corporation
- Achieving CPU (& MLC) Savings through Optimizing Processor Cache by Todd Havekost, IntelliMagic
- FICON CUP Diagnostics and the IBM Health Checker for z/OS by Stephen R. Guendert, Ph.D.
- Book review of Greg Schulz’s Software-Defined Data Infrastructure
- Essentials by Stephen R. Guendert, Ph.D.
From Machine Learning for Predictive Performance Monitoring by Tim Browning, IT Consultant, Kimberly-Clark Corporation:
Predictive monitoring is becoming an important component of data center management to support operational stability. A key element is to understand if a system will be exhibiting CPU constraint in the near future, or operating under more optimal conditions. If the system is expected to have a predictable range of CPU activity, mitigating operational actions can be implemented such as deferring scheduled CPU intensive work during times of high CPU usage or submitting work during predicted low utilization time periods.
This paper presents techniques based on statistical machine learning for predicting near future CPU utilization based on the current system state as defined by selected features, feature extractions, and time classifications. After assessment of several linear models, the proposed solution uses the Least Angle Regression (LAR) technique within SAS/STAT® software and provides significant accuracy for short term predictions. In addition, the application has the ability to learn and adapt to changing conditions by reiterating the model build process
In a series of experiments, many predictive algorithms were applied to production data in a SAP Z/OS-based operating environment to: (1) select the appropriate features and extracted features so as to predict CPU utilization in future 5-minute intervals, and (2) utilize cross validation to reduce over fitting and improve accuracy when using new data. Using error estimations provided by modeling and validation process, we select the linear regression model and features that provide the highest accuracy for out-of-sample data.
Results showed highly accurate short-term predictions as well as acceptable levels of accuracy for some longer-term prediction.