IMPACT Session Spotlight: Catching Anomaly and Normality in Cloud by Neural Net and Entropy Calculation - Computer Measurement Group

IMPACT Session Spotlight: Catching Anomaly and Normality in Cloud by Neural Net and Entropy Calculation

What to do when Machines do Everything
January 30, 2019
Let the machines do the work.
February 12, 2019

IMPACT Session Spotlight: Catching Anomaly and Normality in Cloud by Neural Net and Entropy Calculation

Part 1.  The Neural Network (NN) is not a new machine learning method. About 12 years ago I was involved as a Capacity Planning resource for the project of building an infrastructure (servers) to run NN for the fraud detection application. Now NN got much more attention and popularity as a part of AI, mostly because the computing power is increased dramatically and respectively more tasks can be done by using NN.

The goal of the presentation is  to demystify the technique in some simple terms and examples to show what it actually is and how that could be used for Capacity and Demand management. That is done by developing R code to recognize typical workload pasterns, like OLTP, or others in the time series performance data daily profiles. 

Part 2. It is the typical concern to detect anomalies for short living objects or for the object with very small amount of measurements. Why? Number of those objects could be thousands and thousands so it is important to separate exceptional ones with anomalies for further investigation.  That could be servers or customers that have just started being monitored or public cloud objects (EC2s, ASGs) that usually have very short lifespan. Suggested approach to detect anomalous behavior of this type of objects is to estimate the Entropy of the each object. If the entropy is low, everything should be in order and most likely OK. If not – there is a possible disorder there or mess and someone needs to check what is going on with the object. The method is implemented in the cloud based application written on R that scans every  hour all cloud Auto Scaling Groups (ASG) to detect imbalanced ones in term of number of EC2 instances in the group. That allows to separate a couple hundreds ASGs out of hundreds thousands of them. 

This entropy based method is well known and it described in details in the following www.Trub.in blog post: 

“Quantifying Imbalance in Computer Systems” which is written based on CMG’12 paper.

Register for IMPACT

Presented by Igor Trubin

I started my career in 1979 as an IBM/370 system engineer. In 1986 I got my PhD. in Robotics at St. Petersburg Technical University (Russia) and then worked as a professor teaching there CAD/CAM, Robotics and Computer Science for about 12 years. I published 30 papers and made several presentations for international conferences related to the Robotics, Artificial Intelligent and Computer fields. In 1999 I moved to the US and worked at Capital One bank in Richmond as a Capacity Planner. My first CMG paper was written and presented in 2001. The next one, “Global and Application Level Exception Detection System Based on MASF Technique,” won a Best Paper award at CMG 2002 and was presented again at UKCMG 2003 in Oxford, England. My CMG 2004 paper about applying MASF technique to mainframe performance data was republished in the IBM z/Series Expo. I also presented my papers in Central Europe CMG conference and in numerous US regional meetings. I continue to enhance my exception detection methodologies. After working more than 2 years as the Capacity Management team lead for IBM, I had worked for SunTrust Bank for 3 years and then got back to IBM holding for 2+ years Sr. IT Architect position. Currently I work for Capital One bank as IT Manager for IT Capacity Management group. In 2015 I have been elected to the CMG
(http://www.cmg.org) board of directors.

Verified by MonsterInsights