Keep It Simple Stupid
I’ve been spending a lot of time recently working on performance tuning projects. Sometimes the BI apps are slow, sometimes it’s custom, sometimes it’s a mix. I’ve gotten the chance to see what works in both Oracle and SQL Server.
My conclusion about both of these databases is that they are like a cat or dog that gets fooled when you play hide the ball; they aren’t very smart sometimes. The only way you can really truly ensure database engines, even modern advanced ones, do things the right way to is to make it as simple and easy for them to understand as possible. I guess this is nothing new; the KISS principle comes from Kelly Johnson of The Lockheed Skunk Works, the guys who created the SR-71 Blackbird spy plane. I think he knew a few things about complexity in systems and how they tend to break or become difficult to maintain. (BTW more on that topic as it pertains to OBI and the BI Apps at a later date.)
As I’ve been looking at performance tuning many reports and queries over these days, I find that a lot of time is spent trying to get the database to do the smart thing. Too much time in fact. Usually this is due to some small piece of non-simple SQL that causes the problem. In more unusual cases I’ve seen something on one table completely break down the query plan, even something that should be trivial and very innocuous.
<Vent>For example, not being able to use an index on a table with 2,000 records should not radically alter the query plan, but in fact it will do that on you. After you spend hours upon hours with it, after you’ve called up the DBA for help to dig into the extreme nitty-gritty details of the query plan, you make a change it works for that one query but not any others. Then you decide to write this article because you’ve spend 10 hours on something so minuscule that in the end doesn’t even work consistently. If only the query had been clean to begin with…</Vent>
In this brief article I’m just going to lay out a few things to consider to help make you system simpler for the database engine to understand and therefore do a better, faster job in answering a query.
Star Schema not Snowflake
The first and probably most important one is to make sure you use a simple, clean star schema model. Oracle and SQL server both have the ability to recognize such queries and models, and once they do that they have the ability to do some pretty specialized performance tricks called Star Transformations. If your model breaks down from this, it is no longer simple and whether or not the database optimizer will know how best to handle it or not is up in the air. Yes, sometimes it will do it correctly, but mostly it won’t.
Things that will break a simple star query pattern include:
- Snowflake data model, as with Financial Analytics. I’ve written on how to remodel the data model into a cleaner star here; it’s a lot of work to rewire the model but in the end the performance benefit is excellent.
- Complex views/Opaque views in your SQL: If your view does any sort of joins or has even a mild level of complexity to it, consider ETLing it into your tables properly
- Bridge Tables – model them at the lowest grain (the bridge table grain) for the majority of scenarios
- OBI Time Series – I just wrote about this one – for performance OBI’s time series SQL is terrible. If you think you may have a large data set and are worried about performance from day 1, avoid these like the plague and use the old fashioned technique
- Two Fact tables in a query: This is nearly always a sign that the data model is not working properly and/or OBI was not configured correctly. However I have seen cases where this happens, the results are correct. Unfortunately there is little change that database optimizer will get this one right. One fact table per query please.
- Circular Joins: These 99% of the time mean there is a modeling problem in the RPD. But if you are lucky and the results are correct even with the circular join, find a way to break it up so again you have a clean star schema join.
Map straight to fields
Anytime you start doing logic on a field in the OBI RPD or reporting layer, especially a dimensional attribute, your odds of using an index begin to diminish. There are two examples of how this might occur:
In the RPD you have the classic Full Name = “LAST_NAME” || ‘, ‘ || “FIRST_NAME” which is so commonly used as an example. However, suppose now this Full Name field is used in a prompt and/or a filter of a report. The database engine may not be able to use in index cleanly on it. For this simple example it seems overkill to pre-build it into the data model, but it will help our friend the database out by keeping it nice and simple.
Another example I just saw was functions on actual columns, such as:
WHERE IfNull(Region, ‘Unspecified’) <> ‘Unspecified’
By putting a function on a field, you are not going to be able to use a lookup index on it unless you have the exact function built into a function based index on the table. The above filter should be recorded to better use indexes:
WHERE Region <> ‘Unspecified’ AND Region is not null
Additionally, try to perform your calculations on the right hand side of the operator such that the database field is compared to a result instead of a function on the field is compared to some value. An example:
Date.Day > Cast((Cast(Left(VAR_TODAY,4) as INTEGER) - 1 *1000) + Cast(Right(VAR_TODAY),2 as INTEGER) as DATE)
Is better than:
Cast((Cast(Left(Date.Day,4) as INTEGER) + 1 *1000) + Cast(Right(Date.Day),2 as INTEGER) as DATE) < VAR_TODAY
The second example has no chance at all of using an index; furthermore it has to be calculated for each row as opposed to resolving to a constant early in the query.
Use Good Date Joins
Please stop doing bizarre joins with your date table. Stop with the math in the join, casting date to number to string and back to dates. Stop casting MCAL_WIDs into regular Calendar _WIDs with functions. Stop joining on date fields that you’ve added to the fact table.
Simply extend your calendar table with the appropriate fields as needed. For example, build in a MROW_WID into your W_DAY_D table to use the Multi-Calendar format – pre-compute it it the small dimension table and you can use a nice simple join on compatible indexed fields. (The Multi Calendar format has a prefix and suffix surrounding the YYYYMMDD format.)
Along with that, if you need to display a new format in a report, add the format to the table instead of doing it at query time. The last thing the database want to do is group by your monster case statement or function converting ‘YYYY / MM’ to ’YY – MM’. Filters on it have the same problem as above. Add a new field and throw an index on it please.
This also goes if you need to do some sort of range of x months based on a user selected prompt. Do not use subqueries to return the list of values that make up that range; be more clever about it. And when doing so, again make sure the calculation is on the right of the comparison so that the date table does not have any functions on it.
Finally, make sure all of your queries use the Fact Foreign key that is your partitioning key; do not use BALANCE_DT_WID for one set of queries and BALANCE_DT for another set. Of all of the foreign key joins, the one on time is the most critical. As partition elimination does not happen when you are not using the partitioning key, you must make sure to use the partitioning key at all times.
Outer Joins Should Be Outta Here
An outer join in your data model should be something that is done .001% of the time – for very rare exceptions. Outer joins not only cause OBI to choke to some degree by adding in more tables than are needed, they also hurt your database query performance.
In general, make sure you have full RI and can use equi-joins. If you think you have to use one, really take a hard look at it; can the records be made to be 1:M instead of 1:0,M? Can you use an OBI technique to simulate the outer join using level based metrics? These are all worthwhile things to look into.
If you are forced to use one, put it in a separate LTS and map only the metric you need to it so that it is used as an exception and not used in a general capacity.
So those were just a few simple yet effective things you can do from the beginning to make sure you aren’t sending overly complex queries to your poor database engine. Although you will receive no thanks from Oracle or SQL Server, you will from your users!