Capacity Planning and Performance Software Procurement Strategies

Capacity Planning and Performance Software Procurement Strategies

Author by Dave Halbig

Performance IT Architect, First Data Corporation

Every organization has faced the prospect of buying IT tools. While this paper uses capacity planning and performance management as examples, the principles apply broadly to all buying efforts. The effort can be time-consuming and contentious, with vendors sometimes driving the process. The author proposes a solution-agnostic method based on Use Case Methodology that has proven very effective over the 5 times he has either driven or advised its use. The presentation is divided into a) requirements gathering, b) requirements prioritization, c) conversion to Use Cases and d) managing the vendor review and selection process.

Background

At one time or another in your career, you will either run a procurement or participate in one as a vendor or key staff member. On the face of it, seems simple, right? Just put a bunch of requirements into an email and send them to a bunch of vendors and wait for bids to show up. Not so fast. Just about anything that can go wrong with a procurement, will go wrong. Your ‘requirements’ might just be a regurgitation of features from a particular vendor, in which case you might as well hold a coronation for the benighted vendor and not waste everyone’s time.

On the other side, if the process does not fully acknowledge and engage all the stakeholders, those that feel ‘their’ vendor lost because of biases in the selection team will forever plague you with passive resistance during the eventual rollout and use of the winning product. In government circles, losing vendors make a sport out of challenging awards to get yet another bite at the apple. And, miracle of miracles, the product you selected might not do what you thought it would….and you live with the gnawing pain of buyer’s remorse and an expensive white elephant.

So, what to do?

Ultimately, products you select and use are intended to solve a known or presumed set of problems. The author’s Use Case driven approach exploits a main characteristic of Use Case methodology: defining a Change In State. In the case of Performance and Capacity Planning tools, the Change in State is from No or Little Knowledge of a performance root cause or needed capacity, to Specific Information about one or the other.

Let’s visit a couple of examples:

Capacity Planning

A simple Capacity Planning Use Case is “predict the necessary hardware (CPU, Memory, Disk, Network) to support an accounting system with 50 users”.

With the Use Case-driven approach, you provide basic information about the usage scenarios of the users (1,000 acct queries/hr, 500 acct updates/hr, 200 invoice prints/ hr, etc.) and then stipulate the Success End Condition (Publish the overall hardware profile (CPU, memory, disk, network) to support the workload). You MAY offer to provide various utilization stats and performance stats, but you may also be intentionally vague, forcing the vendor to demand certain data collection from you. Don’t presume you know which metrics need to be collected to develop useful estimates.

Performance

A simple Performance Use case is ‘Discover Source of Response Time Delay in a Distributed Architecture’ where you state various pre-conditions (locked DB row, high disk response time, saturated CPU, serialization delays in a JVM due to lack of threads, etc.) and again stipulate the Success End Condition (Correct identification of the source of delay). Each of the pre-conditions is assumed to be unknown to the analyst / account when the Use Case is initiated (hence the importance of the ‘Discover’ objective for the vendors).

In neither case do you tell the vendors how to do it. That is for them to fill out in the body of the Use Case (Success Scenario). And you give them a score card of how their response will be measured: Examples used by the author include: Total Cost of Ownership (of the solution), Clarity, Comprehensiveness, and Simplicity. At the end of the day, this is the best approach for all parties, since it gives the creative/innovative vendors a chance to flex their wings, and it removes (actually, forbids) any burden on your side to be creative. Your job is to demand results/change of state; theirs is to solve the problem in the most creative/cost-effective way. Don’t stand in their way.

Requirements Gathering

The requirements gathering and ranking are the toughest part of this process, since the rules of a Use Case driven approach force the stakeholders to do something very (very) hard: Don’t discuss how something is to be done. That is, it is quite natural for engineers, with years of training and experience, to exploit that background in communication with the vendor community. Examples are: ‘provide a time-series of CPU busy with curve-fitted function allowing forecasting…<blah, blah, blah>’. Wow...you just told the vendors how you want it done.

Don’t.

Just Don’t.

Here’s why:

Several very negative effects occur if you allow the stakeholders to get wound up in the ‘how’…. In any specialized industry (certainly performance tools and capacity planning tools qualify here), all the players know each other and the implementation strategies of their products. If you state ‘how’, you’re showing your hand as limiting your choices, either intentionally or not, and denying certain products/vendors. This is a great way to drive away vendors and reduce the competitiveness of your process. In one case, this author had to beg some of the vendors to the bargaining table when they felt the contract was ‘wired’. The ‘how’ language (inadvertently, it turns out) showed our hand (the offending language was subsequently removed.)

Stakeholders in your organization who propose ‘how’ become wedded to that particular way of doing business. This is the genesis of the Religious Wars that pervade unsuccessful procurements. If you want to keep peace in the family during the procurement and avoid the passive-aggressive behavior common among ‘sore losers’ after contract award, do not let them state ‘how’. These sore losers will make your subsequent implementation and use-of- products a pure hell.

Instead, force the stakeholders to state their requirements in End-State conditions:

Capacity Planning

Capacity Planning is the fine art of estimating and projection, with an objective of least-cost solutions for maintaining service levels (throughput, response time, batch deadline, availability). So, the stakeholders might reasonably ask for a Use Case Success End Condition of “List of hardware specifications (CPU, memory, Disk, network) to sustain the workload listed in the Pre-Condition at a response time target of ½ second”.

This last example is simple and easy to understand. A more difficult and chronic one exists in executive presentations of capacity planning data. Consider a Use Case Success End Condition that requires: “Credible presentation of hardware requirements for peak processing based on Utility Demand Model”.   This Success End Condition sounds almost lame, or even a bit hard to understand. An explanation is in order. Executives dependent on, but not versed in, IT often mis-understand why so much idle capacity seems to exist on their machine room floors and yet are pressed for hardware upgrades every quarter or so.

The disconnect lies with how online workloads must be accommodated. Consider the electric utility industry, which is the genesis of the Utility Demand Model. It must provide sufficient generation capacity for that single peak hour at, say, 5PM on July 24 when everyone (in the Northern Hemisphere) turns on their home air conditioners at the same time. The lion’s share of the utility’s planning and capital expense is for that single hour. It turns out IT is exactly the same. Typical online peak workloads for retail financial transactions are for 2PM on the Friday after Thanksgiving (Black Friday) or the following Monday (Cyber Monday) when shoppers are bringing their purchases for settlement at cash registers and web sites. The rest of the time, the IT capacity sits largely idle.

Having the capacity planning tool highlight peak demand periods and help tell the Utility Demand story is worth any amount of money you can pay for that capacity planning tool. Don’t tell the vendor how to solve the presentation problem. Let him tell you. Let him be creative.

Performance

Performance is the separate fine art of avoiding, or if not avoiding, then isolating and remediating slowdowns (and more than occasionally, outright outages) in computer systems.

As an interesting aside, by avoiding the urge to specify ‘how’, you can give serious consideration to vendors’ creative and useful suggestions. In one case, one of the vendors was so far out in left field we would not otherwise have understood, much less considered their approach. However, with Use Case methodology forcing them to describe and eventually physically demonstrate how they would solve a key Use Case, we became owners of, and fierce, if belated, believers in, their solution. Typical prescriptive (‘do this/do that’) procurement language, would have excluded them in the first round.

Another guideline in the Requirements Gathering Session is to not restrict or constrain requirements proposed by the stakeholders. If someone proposes a Success End Condition of “Comfortable retirement for IT workers”, let it stand (for now). You know it’s a nonsense/non-relevant requirement. The important part is not constraining what is essentially a brainstorming effort. Avoid making judgments on the proposed requirement. The only filter is to ensure the requirement is not somehow specifying ‘how’.

Requirements Ranking

When you’re done with a successful requirements gathering session, you’ll have waaay too many requirements for managing or going to the vendors. You’ll have nonsense / protest requirements from elements of the stakeholder community. You also don’t know which ones are important to your stakeholders and which are not.

How to choose/rank?

The most effective is the ‘bucket of points’ or ‘bucket of orange dollars’ approach. In this approach, each voting stakeholder is given a fixed number of points or ‘orange dollars’. For sake of example, let’s use 40 per stakeholder. If a stakeholder is ferocious about a particular requirement, the stakeholder can assign a lot of points or orange dollars to that requirement. This assignment of additional points essentially guarantees the requirement makes it to the final list. If a stakeholder is not particularly interested in a requirement, that requirement gets zero points or orange dollars. But the stakeholder cannot ‘spend’ more than the 40 points/’dollars’ that he/she was given.

I also suggest this voting occur in a public forum, such as a meeting room with someone filling out a spreadsheet projected on a common wall (digital projector). All of this forces some interesting group dynamics:

The Religious Warriors can vote their convictions and ensure survival of requirements that are important to them. This goes a long way towards defusing future passive/aggressive behaviors for things not to their liking.

The same passionate stakeholders cannot browbeat or dominate outcomes. You (as a stakeholder) spend your 40 points/’dollars’; that is the end of your influence. Sit down.

Nonsense/protest requirements (see above) fall away naturally, but without having restricted the brainstorming that is so crucial to the collection of valuable requirements. When all stakeholders realize there’s no net underneath them, and that they’re being expected to act like adults, they generally do.

Of course, not all votes, and certainly not all early votes, go smoothly. So this author suggests 2-3 rounds of voting, with the final vote being definitive. The multiple rounds allow first-time participants in the voting process to see group dynamics in action; specifically, how their vote stacks up against others on the team. There are group effects that then affect later voting rounds. After 3 rounds, you and the other team members can be pretty sure everyone has voted their conscience / beliefs and will not be up for second-guessing themselves or their peers.

One dynamic this author has seen at least once is clique voting by individual stakeholder communities. This occurs when the procurement spans several stakeholder communities such as application development, capacity planning, performance, and Service Level Reporting.   While this author does not have completely satisfactory answers for this, you must at least demand the parties voting be physically present (that is, no voting by proxy) in order for their votes to count. Proxy voting is a no-no because of its power to distort the voting process and place exceptional power in the hands of those presumed to hold the proxies (they don’t always).

Once everyone has voted their points/’dollars’, then it is a simple matter of force-ranking the requirements by score and getting them ready for the procurement process.

In rough summary, the process is open and democratic and therefore is considered ‘fair’ by all, winners and losers alike. This author remember one gratifying comment from an otherwise opinionated, very intelligent and active stakeholder after this, and other later phases were over. He said his initially favored solution/vendor did not win, but he felt to have been treated very fairly, he felt a competent solution had been chosen, and would work with us to implement the chosen solution. Success!

Conversion to Use Cases

Types of Requirements

Before we get into Use Cases in particular, we need to understand various categories of requirements and why we care to make the distinction.

The requirements put in front of solution advocates/defenders (essentially: vendors) are in two broad categories:

  • Non-functional requirements, and
  • Use Cases

First, let’s define these types of requirements:

Non-Functional Requirements:Non-functional requirements are requirements which specify criteria that can be used to judge the operation of a system, rather than specific behaviors. This should be contrasted with functional requirements that specify specific behavior or functions (which will be defined as Use Cases). Non-functional requirements are often called qualities of a system. Other terms for non-functional requirements are "constraints", "quality attributes", "quality goals" and "quality of service requirements". Qualities, a.k.a. non-functional requirements, can be divided into two main categories.

(1) Execution qualities, such as security and usability, are observable at run time.

(2) Evolution qualities, such as extensibility and scalability, embody (sic) in the static structure of the software system.” [Wiki2008][1]

To describe your provider capabilities on the requested non-functional requirements we will ask you to respond with yes/no or narrative answers, or both, depending on the question format. 

Use Cases: In traditional RFPs, this category would be known as functional requirements.

Use Cases are essentially highly structured stories that allow stakeholders to communicate requirements and solutions in non-technical jargon.

This author has used them certainly for their already-cited non-technical jargon advantage. But their real strength is carrying forward the requirements from the earlier phases in this paper while isolating the ‘how’ of the solution.   While we avoided the ‘how’ during the requirements gathering/ranking phase more to avoid the religious wars, the avoiding of ‘how’ here has slightly different purposes.

When stating requirements to solution advocates/defenders (read: vendors), you want them to know a) that you have not already decided on a solution and are not merely going through the motions (mentioned earlier in the paper), and b) they are expected to use the most appropriate / creative solution possible to meet the requirement. To this last point, giving vendors complete freedom to meet the requirement has in the past created, in the strong opinion of this author, entirely unexpected and break-through solutions to problems.

While fully-developed Use Case examples suitable for a procurement document are outside the scope of this paper, such a sample RFP document and supporting artifacts are available from the author at: [email protected]. That said, we’ll discuss enough of the Use Case artifacts to allow an understanding of how to translate functional requirements into Use Case formats.

Each of the attributes below needs to be completed for each Use Case. This first group is filled out by the organization holding the requirements (RFP team, as an example).

Since a Use Case is all about changing states of a system, let’s organize this by Beginning State and End State:

Beginning State attributes – before the Use Case is executed. 

Preconditions: This would represent the Operating Environment for the data collection (in the case of capacity planning and performance reporting). An example might be ‘Linux OS environment with all performance/ availability monitoring tools in place and reporting’.

Trigger: This is the set of circumstances that causes the Use Case to launch. In the case of capacity planning, it might be a certain transaction rate during peak hour (example: planning was for 40 transactions per second for a certain system, and that system reached 35 TPS during peak hour last Thursday – time to consider upgrades; in the case of performance management, response time exceeded the threshold of 2 seconds at the 90th percentile for more than a 5 minute period – time to get the performance team involved). 

End State attributes – after the Use Case has executed. 

Success End Condition: The successful end condition depends on the Use Case: Let’s use a typical Capacity Planning Use case: at a high level, a Success End Condition would give capacity planning decision makers clear and compelling evidence for your analysis and recommendations (examples: specific CPU, Memory upgrades). If you are dealing with Performance Management, you must provide clear and convincing evidence for your fault isolation or root cause analysis (example: sluggish DB response time – Oracle DB instance shows Buffer Hit rate is below 90%, and recommendation is increase in SGA size). 

Failed End Condition: In Capacity Planning specific Use Cases here, this is either unable to make recommendations or sizing a system with too little capacity (business operations risk) or too much capacity (unnecessary use of funds), or no recommendation at all when a capacity increase is, in fact, required. In Performance specific Use Cases, this is the inability to isolate a component causing delays or outage, creating too many possible at-fault components, or incorrectly identifying the at-fault component.

Main Success Scenario – what actually happens when the Use Case is executed

Just as a reminder, this part of the Use Case is filled out by the solution defender (example: incumbent vendor) or solution advocate (example: competing vendor); it is NEVER (ever) filled out by the requirements owners (example: issuer of RFP). That said, here are some basic rules for those that do, in fact, fill out the main success scenario (solution advocates/defenders). Here are their instructions:

While traditional Use Case responses are narratives / text-based, <ABC Company> is encouraging you to go beyond text and use screen shots from your solution to complete this section. Narrative can then be added to the scenario to make any additional points you wish to the reviewers.

So, for example, if you are dealing with a Capacity Planning Use Case, such as projection of hardware for a specific transaction mix at, say, 2PM on Thursdays, you need to present / explain CPU, memory, disk and (optionally) network information.

As a separate example, if you are dealing with a Response Time Degradation Use Case, you need to explain how you recommended a particular component as causing the delay or being the root cause.

Your individual Use Case responses will be graded on:

Clarity – how well can the response be understood by someone viewing the response for the first time.

Completeness – how well can someone viewing the response for the first time understand the progression of events / underlying thought process (if necessary to make progress through the scenario)?

Simplicity – how many steps and use of help files (if required) are necessary to get to the Success End Condition?

Total Cost of Ownership – how much use is made of your product to get to the Success End /Condition and how much depends on other non-product services and commands? What is the labor content, hardware and other 3rd party software necessary to configure and operate your solution (this is over-and-above what <ABC Company> has already invested as Sunken Costs).

Final Steps in preparing the RFI/RFP

So, at the end of this phase, you’ll have a list of non-functional requirements and a bunch of Use Cases. As a (first) Rule of Thumb, try to make the non-functional requirements response list a Yes/No response (example: must provide GUI interface via Web Browser – Yes / No). As a (second) Rule of Thumb, try to keep the number of Use Cases below 20; otherwise it takes the solution advocates/defenders too long to fill them out and it takes your internal teams too long to evaluate the results.

So… which vendors / solution advocates do you send the RFP/RFI to? Obviously, if you are a long-standing member of CMG <ah…hem>, you’ll have a list of vendor sponsors you can review. Second, professional evaluator organizations such as IDC, Forrester and Gartner can provide lists. A third, but less reliable/consistent, venue is personal contacts. In all prior engagements, this author has used a blended model of all three. As a general rule, if you can stand the deluge of responses, more participation is better.

Do not forget the incumbent Solution Defender… you may simply be using their product the wrong way… you need to let them play and be creative (or, Heaven Forfend, use the current version of their product if you happen to currently be back-level).

Give the solution advocates a decent interval in which to respond, up to a month. Expect to hold at least two solution advocate Q&A sessions to allow the vendors to ask clarifying questions or to propose changes to the RFP. This is deeply to your advantage. The key is making sure the solution advocate Q&A process is anonymous among them. Also, given that responses, or communications generally, can be quite large because of the insistence on graphics/screen shots, set up a secure upload site for each solution provider’s responses.

Managing the Vendor Review and Selection Process

This part requires as much discipline as the earlier parts, but a lot less creativity. There are some broad guidelines:

Ask your executives (this is tough to enforce, since the vendors sales guys will subject them to saturation bombing of invitations to lunch, dinner, golf, and sporting events) to not interfere, or, if they are really looking for a positive result, to not even ask. It’s a lot, but it does materially affect the outcome.

Tell the vendors to work through a single point of contact in your organization. Your contracts people may already enforce this, but it is crucial to enforce Project Silence during this period. As a personal experience, this author has not had to pull a vendor from a procurement, but be prepared to do so if their behavior (going around the Point of Contact, button-holing executives) is so egregious that they put your procurement credibility at risk.

Keep your vendors well-informed through BCC: copies of activities that affect them all. And do allow vendor questions. This often exposes things that the procurement team has not thought through, or would add to the success of your effort. Private communications with individual vendors are of course off limits.

Make sure the vendor responses are partitioned into ‘technical’ and ‘business’, where ‘business’ is the financials of the bid. The stakeholder team should not review or even be aware of the financial components of a vendor response. If you’re a governmental organization, this rule is generally well-enforced.

With these guidelines in place, the formal evaluation of responses can start.

First, if there is a pricing proposal imbedded in the response, make sure it is set aside and handed to someone from the procurement organization. Technical evaluations should not be clouded by financial considerations. The blended evaluation of technical and cost will occur later.

Second, make all the responses available to the evaluation team on a secure common-access site, such as SharePoint.

Third, set up an evaluation matrix which includes all the solution providers with approximately the following format:

MIT 14.3 Halbig Table 1 (2)

In this model, each person (no proxy votes, please. See above comments) fills out an evaluation sheet such as is shown in Table 1. The scores are summed and the numerically high scoring evaluations take the day.

There is a variation on this: Most procurements can weight the scores evenly (that is, Clarity score has same impact as Total Cost of Ownership (TCoE)). However, the evaluation team can decide ahead of time to weight certain criteria more heavily than others. At the end of the day, it then becomes a simple mathematical exercise to find the high-scoring technical solution.

Summary and Conclusion

The author’s opinion of running procurements is that it compares favorably to herding cats. Getting a lot of head-strong, intelligent, and already-stressed stakeholders to work with you for a period of weeks or months is tough. But unequivocally, the process outlined above will in fact allow you to select very worthy vendors. Most importantly, it retains team cohesiveness and cooperation well after the award has been made. Avoiding even the hint of favoritism and emphasizing transparency within the team pays huge dividends.

[1] [Wiki2008] Wikipedia, “non-functional requirements”, accessed at: http://en.wikipedia.org/wiki/Non-functional_requirement on 11Feb2008.