What I Learned This Month: Counting Pages
Author by Scott Chapman
American Electric Power
Last time I talked a little about my testing of FlashExpress on our test system. As we started to roll out z/OS 1.13 to our production system, I expanded the amount of memory set aside for 1M pages based on the plan that with z/OS 1.13 and pageable 1M pages and FlashExpress we'd try to expand our use of large pages, in particular for our Java workloads. (DB2 was already using a fair number of large pages for its buffer pools.)
The system will break up unused 1M pages into 4K pages if there's demand for 4K pages that the system can't meet, but the reverse is not true: it won't compose 1M pages from 4K pages. So it seems sensible to have more 1M pages than you immediately need.
However, after one of our last production systems moved to z/OS 1.13, I found that we were seeing strange, very brief, paging spikes. These spikes would last for less than a minute, but would often be several thousand pages/second. It showed up very nicely in my graphs—even when averaged over a 15 minute RMF interval, it was substantially more than the zero we usually saw on z/OS 1.12.
RMF III immediately showed that the paging was always in DUMPSRV, and was always around our CICS regions taking dumps.¹ I opened a question with IBM and they said this was expected: when DUMPSRV is processing a dump, it puts that memory at the top of the list of pages to be paged out. So if you have limited available free memory, you would expect to see DUMPSRV paging heavily during dump processing. But it shouldn't be causing other address spaces to suffer paging delays; and in my case that was true—DUMPSRV was the only thing apparently impacted.
But I was still curious because we hadn't previously seen this behavior and the application was certainly taking those same dumps on z/OS 1.12. Reviewing the RMF III STORR panel during problem intervals showed a very generous amount of available storage—typically on the order of 20-30% at the time the dumps were happening. So if we have multiple GBs of available memory, why page at all?
It turns out that this was an issue IBM had previously discovered: SRM was not counting available 1M frames in the available frame count. RMF III STORR does include 1M frames in the available percentage. So in our case, because I had increased the large frame area, we had relatively few 4K pages free, but a whole bunch of 1M pages free. Since the 1M pages weren't on the available frame queue, SRM wasn't doing a good job of breaking those pages up into 4K pages, and so DUMPSRV was causing these high, but brief, paging rates.
The fix is a new parameter in IEASYSxx on the LFAREA option called "INCLUDE1MAFC". That tells SRM to count the available 1M pages as available for 4K requests. Including this on our test system and checking RCEAFC² showed that in fact SRM was including 1M frames in the available frame count. (Each available 1M frame is counted as 256 4K frame.) And doing a "D VS,LFAREA" a few hours after the IPL showed that more of the 1M frames had been broken up into 4K frames. We'll roll this change out to the production systems with the next IPLs, but since it's not causing a significant problem, we're not scheduling IPLs specifically for this.
However, if you use INCLUDE1MAFC, and you track your available memory via the RMF SMF type 71 records³ , you need to be aware of RMF APAR OA42510. This changes what RMF stores in those "available" and "unused" memory fields in the SMF data. Previously those fields simply reflected the value of RCEAFC and RCEAFC less SRM's "ok" threshold. With OA42510, RMF now checks to see if INCLUDE1MAFC is on. If it is, then the available 1M frame count (times 256 to convert to 4K frames) is deducted from the numbers before including them in the SMF71 record.
I've been told the reasoning for this is that previously the SMF71 fields represented the available 4K frames. With this change, it still represents the 4K frames, even if the system is including 1M frames in the available frame count.
My feeling is that previously the SMF71 fields represented the view SRM had of the system and so reflected how SRM was managing the system. With INCLUDE1MAFC and OA42510, the SMF71 fields no longer represent how SRM is managing the system. Furthermore, if you previously calculated the "ok" threshold as the difference between SMF71MNF and SMF71CAM, that calculation is not necessarily true anymore∧4. (It may be true, but it depends on how many free frames are 1M vs. 4K.)
I opened an RFE to suggest that they include sufficient data in the SMF71 records to indicate what numbers SRM is using to manage the system. If you agree, you can vote on that RFE here:
However, despite my issues with the SMF71 data, I believe you should consider using INCLUDE1MAFC if you're over-allocating your 1M frames in preparation for them being used more in the future.
On a personal note, after over 20 years at American Electric Power, I'm leaving to join forces with Peter Enrico at Enterprise Performance Strategies. I'm really excited and looking forward to the opportunity to talk to more people about their mainframe performance challenges. As I transition into my new role, these WILTM columns might take a brief hiatus, but I hope to continue them in the not-too-distant future. By the time you read this, I should be reachable at [email protected] .
¹Yes, your application team should fix regular dumps because they cause system overhead and indicate something is broken. But business priorities for developer time don't always favor reducing system overhead.
²This is the field in the RCE control block that shows SRM's available frame count
³Fields SMF71CAM, SMF71MNF, et al.
∧4See also this great Share presentation from Z. Meral Temel: http://proceedings.share.org/client_files/SHARE_in_Atlanta/Session_10592_handout_2563_0.pdf