What I Learned This Month: IDAA Update

What I Learned This Month: IDAA Update

Author by Scott Chapman

American Electric Power

I first wrote about our IBM DB2 Analytics Accelerator (IDAA[1]) back in the November MeasureIT[2]. We still are working with it and are approaching a production implementation. We've learned a good bit in the process.

First, one thing to keep in mind that I didn't mention in November, perhaps because I didn't fully appreciate it: IDAA is not DB2. When trying to understand how the IDAA works or performs, the first thing you have to do is forget everything you know about DB2. The IDAA does work with DB2, but it's not DB2. If you're trying to understand IDAA and are thinking about how DB2 does a similar function, you're almost certainly thinking incorrectly. The IDAA almost certainly does it differently.

The second very important lesson is that if you're going to enable replication from DB2 into the IDAA to keep the data in the IDAA current with what's in DB2, you need to size the IDAA with replication in mind. We did not do that; we only considered our query performance and data size. When we did the initial study with IBM, their analysis indicated that all the queries we sent them would run at least 10x, and maybe 100x, faster in even the smallest of IDAAs. And the smallest configuration contained far more disk storage than we'd need for all of our data. The sales team suggested that we should consider a larger one because once it was in place people would probably start doing a lot more queries than they are today. But we weren't convinced of the business value for a larger accelerator, given the expected performance improvements in just the smallest box. Unfortunately, we didn't discuss replication performance.

As it turns out, replication can be a very significant workload in the IDAA. And if you have a requirement to keep operational tables updated with the changes that are happening in DB2, you need to carefully consider how much IDAA capacity you might need to meet your replication goals. Because we see or hear about the IDAA executing fantastically complex queries 100x faster (or sometimes even more) than DB2, it's easy to think of the IDAA as some magic box that can handle anything we can throw at it. It is, of course, not magic. In fact, it can take the IDAA minutes to apply changes that took DB2 seconds.

This has a couple of implications. First, if you want to run scheduled reporting queries in the IDAA immediately after the data was updated in DB2, you need to take some extra steps to make sure that the data was updated in the IDAA before you start your queries. Secondly, if you want to run operational queries against the current operational data, you need to understand how far out of date the data in the IDAA might be.

There are a couple of solutions for the first problem. One answer is to run the ACCEL_CONTROL_ACCELERATOR stored procedure with the "waitForReplication" command. The stored procedure will wait and not return until replication has committed changes in the IDAA that were committed in DB2 at the time the stored procedure was started. This has worked well enough in my testing, but I've heard that there may be a problem with it in certain situations.[3] The other possible answer is that if you know a particular update was applied to DB2, you can force a query to run in the IDAA[4] to look for that update. If you don't find it in the IDAA, then you need to wait and check again in a few minutes.

The second situation is a little more interesting. Data Studio will indicate a replication latency in generic terms such as "low" or "high", which doesn't provide the level of detail that most of us would prefer. The "getAcceleratorInfo" command for the ACCEL_CONTROL_ACCELERATOR stored procedure returns a value for "latency in seconds", which sounds great. Unfortunately, it's not obvious what this really means.

One might assume that the reported latency might be the average latency across all tables that are being replicated. Or one might assume it is the maximum latency recently experienced. However, it's more appropriate to think of it as the minimum latency value!

There isn't room here to explain all the details of how replication works, but there is a setting in the IDAA for the target latency. The important thing is that this latency target sets how much time the IDAA allows to pass before it starts to apply and commit the changes from DB2. So if you have it set to 5 minutes, the IDAA will allow 5 minutes to expire before starting to apply the changes from DB2. Latency is calculated when the IDAA has finished applying the changes as the difference between the IDAA commit time (after all changes have been applied) and the DB2 log timestamp of the last change that was committed. By definition, that ignores the time of the earlier changes.

If it takes the IDAA 150 seconds to apply the changes that occurred in DB2 across the previous 5 minutes then latency will be reported as 150 seconds. However, if there were changes committed at 1 second into the 5 minute interval, then those changes had to wait 449 seconds to be committed in the IDAA: 299 seconds for the latency period to expire, then 150 seconds for all the changes to be applied.

There are nuances that I'm not detailing here and likely even more details I don't understand, but the net result is that it is best to think of the reported latency as the minimum latency. The maximum latency is the reported latency plus the target latency time.

So why does the IDAA take longer to apply changes than DB2 does? The short answer is that while inserts into the IDAA aren't very expensive, updates and deletes are because the IDAA needs to find the affected rows. Unlike DB2, the IDAA does not have direct indexes that point to an exact storage location for a given key. Effectively, deletes and updates trigger the IDAA to scan the entire table to find the affected rows. How long it takes to apply deletes and updates is therefore a function of how large the tables are and how much hardware you have in the IDAA to handle the scanning. The combination of a "small" IDAA and "large" tables can result in substantial time to handle what is trivial in DB2.

Organizing keys appear to be critical to reducing the time it takes to apply updates in the IDAA. IBM has told me distribution keys can help significantly as well, but in our testing we have not seen a significant improvement with distribution keys.

In our case, I believe we will be able to usually keep the maximum latency to less than 10 minutes, but I don't believe that we will be able to keep it under 5 minutes during the normal course of the business day. Currently our calculated maximum latency is normally running about 8 or 9 minutes, but I have hope we can shave a few minutes off from that.

However, we haven't been running real large query workloads. We know that replication and queries will affect the performance of the other, but we're not yet sure how much.

So if you're considering implementing an IDAA for its query accelerating benefits and need to keep the data up to date with DB2, carefully consider how much load the replication might place on the IDAA. Likely you will want to size it larger than you would have just to satisfy your query work.

Finally, if you already have an IDAA, carefully review the results of queries that involve joins that use the columns in the organizing keys. We have seen cases of incorrect results in this situation. We had thought that this was Netezza issue 78814 which was supposed to be fixed in NPS 7.0.4.4, with IDAA v4.1 PTF2. However, as I write this it appears that the problem remains. The Netezza issue description indicates the problem only occurs with EBCDIC data, but I expect most z/OS DB2 shops are still storing application data in EBCDIC. While I don't know what triggers the bug, I do know that we've seen it across multiple queries, and I've heard from another customer that also saw the problem. So at the moment, I suggest taking a close look at the results of any queries with joins that involve the columns that are also in organizing keys. Unfortunately, organizing keys appear to have a significant positive impact on replication performance for our workloads, so we can't easily just remove all the organizing keys. But if you aren't using replication or have a larger IDAA, maybe you won't need them.

As always, if you have questions or comments, you can reach me via email at [email protected].

[1] "IDAA" is apparently no longer IBM's preferred abbreviation, but many people understand what we mean (at least in this context) and continue to use it, so I will too.

[2] See http://www.cmg.org/publications/measureit/2013-2/measureit-november-2013/ for background information on what the IDAA is.

[3] I believe the situation was related to running in a DB2 that is effectively dormant, not processing any further updates at the time the stored procedure was run, but I'm not sure of that. I suggest testing your use case.

[4] Do a "set current query acceleration = all" to force the query to run in the IDAA, or return an SQL error.