Published Spring 2016 in MeasureIT 16.1
By Murthuha Shaik (Infosys, Hyderabad) and Madhu Tanikella (Infosys, Hyderabad)
For one of the enterprise applications hosted on a cluster of IIS servers, very high CPU utilization was observed for a business process, even when there were no requests to be serviced. Analysis of performance counters such as CPU, Memory or Paging on the IIS servers did not reveal any clue for the high CPU utilization issue. In this article, I would like to describe a simple but effective technique for analyzing an IIS crash dump using the Debug Diagnostic Tool that helped in the deep-dive analysis of the IIS W3WP process to identify the root-cause for high CPU utilization in a quick and effective way.
____
To continue reading, you must have a CMG membership. Sign up today!For existing members sign in here.
As part of the performance tuning exercise for a business critical application developed and maintained by a Manufacturing Giant in USA, a Load Test was carried out in the Performance Test Environment (PTE) to understand the performance characteristics under Peak Load. The application architecture has 5 IIS servers load balanced through 3 DNS, 1 database server and few other applications as integration points. During the Load Test, it is observed that the W3WP process was utilizing 100% of all the available CPUs continuously on one particular IIS server when all 5 of the IIS servers were being hit at the peak concurrent user load. As soon as 100% CPU utilization was observed on one of the IIS servers the test was stopped; however, the IIS server at 100% CPU utilization continued even after the ending the Load Test (i.e., when no more requests were being sent from the load testing tool), while the other 4 IIS instances were fine.
To identify the root cause of the issue, we captured and analyzed IIS access logs, error logs, Web Service logs, and Windows events on the specific IIS server; however, we could not find any evidence for the high CPU utilization issue. As a last resort, we decided to capture a Thread / Crash dump of the IIS Server process so as to understand the state of its threads and if any specific thread was causing the CPU to be continuously at 100%.
Hence, the same Peak Load Test was repeated and the Thread Dump / Crash Dump data was captured from the IIS server that had 100% CPU utilization during and after the Load Test. We initially started analyzing the dump file with the help of windbg; however, that task was time consuming since considerable manual effort was required to setup symbols and use those symbols. Hence, explored for alternate tools that could help analyze an IIS Crash Dump/Thread dump quickly and efficiently and found the Debug Diagnostic Tool (DebugDiag) from Microsoft which can quickly pin-point thread level details in a Crash Dump / Thread Dump. In the next section, we introduce the DebugDiag tool and the various options that are available for analysis and the steps to follow to analyze a given Thread/Crash Dump from an IIS server so that the root-cause for very high CPU Utilization issues can be uncovered quickly and effectively.
The Debug Diagnostic Tool (DebugDiag) is designed to assist in troubleshooting issues such as hangs, slow performance, memory leaks or fragmentation, and crashes in any user-mode process. The tool includes additional debugging scripts focused on Internet Information Services (IIS) applications, web data access components, COM+ and related Microsoft technologies [1].
The Debug Diagnostic Tool comes with the following 3 analyzer components along with a set of built-in rules:
1) Crash / Hang Analyzers
√ CrashHangAnalysis
√ SharePointAnalysis
2) Memory Pressure Analyzers
√ DotNetMemoryAnalysis
√ MemoryAnalysis
3) Performance Analyzers
√ PerfAnalysis
Since the current problem was to find the root-cause for high CPU Utilization on an IIS server, Performance Analyzer and its PerfAnlaysis component was used. Here, we would like to highlight the process of setting up the tool and how to use it for a given W3WP Thread/Crash Dump to uncover the root-cause quickly and effectively
Here, we would like to highlight the key aspects that need to be looked at in the report generated by the tool to identify the root cause for high CPU Utilization issues (refer to Figure.2)
Analyzing the above key aspects and correlating them helped to uncover the single service (note there are around 40 services running in this one W3WP process) which was causing 100% CPU utilization on the IIS server.
Analyzing a Crash/Thread dump of the W3WP process of IIS Server to identify root-cause for high CPU issues is tedious; but using the Debug Diagnostic Tool helps you pin-point specific thread level details quickly and effectively. The tool helps in reducing turn-around time to identify the root-cause for performance issues such as 100% CPU utilization, very high response time, hung threads, and memory issues for multiple .NET components/applications/services hosted on IIS server.
Murthuja Shaik is working as a Senior Technical Architect at Infosys, Hyderabad. He has 18+ years of experience in software development, performance testing and engineering. His core expertise is in Performance and Scalability Assessment of applications ranging from client/server to Web-based to Enterprise Distributed systems, Performance Bottleneck Identification, Performance Tuning, Suggesting Tuning Solutions, Setting up PE CoEs for clients in domains such as Financial Services, Insurance, Manufacturing, Retail, Healthcare. Murthuja can be reached at [email protected]
Madhu Tanikella is a Senior Technology Architect at Infosys, Hyderabad. He has 14+ years of experience in providing delivery and consulting services in Performance Engineering & Performance Tuning for enterprise software systems that fall under client-server, multi-tier (Java/J2EE), SOA, Messaging Middleware architectures, Virtualization and Cloud models and custom-built applications with multiple COTS products. He has specialized expertise in the areas of NFR & SLA Validation, Workload Modeling, Performance Modeling, Bottleneck Identification and Performance Tuning (of Java/J2EE, Oracle DB, TIBCO Business Works and IBM MQ), Capacity Planning, Hardware Assessment and Sizing for distributed software systems. He also has expertise in assessing performance practices in an organization and setting up Performance Engineering CoEs for various clients in different Domains such as Financial Services, Manufacturing, Insurance, Healthcare, Retail etc. Madhu Tanikella is certified in IBM Cloud Computing Architecture and IBM SOA Solution Designer. He can be reached at [email protected]
/mepr-show]