Minimalist Design for the Criteria Tab
The picture above is an example of minimalist design. It features an uncluttered and simplistic visual scheme that is almost relaxing to the eye and mind. Optional or unnecessary elements are removed. It focuses on only the bare essential elements – the essence of the building or room’s function. Notice there isn’t even a towel rack.
It’s not really for me – it reminds me too much of a concrete future dystopia that I’ve seen too many times in sci-fi movies. It also doesn’t look comfortable at all. But hey that’s just me; I could be wrong.
However when it comes to OBI report design, minimalist is the only way to go when it comes to the Criteria Tab. Whereas architecture and design are subjective, when it comes to keeping the criteria tab minimalist there is a strong, demonstrable benefit to this philosophy. It’s so important that this is the very first thing I look for when I am called in for a performance healthcheck review.
This post will take a quick look into why.
I’m going to demonstrate the concept using a very simple case that in all reality would probably not cause much of an issue in the real world. What you need to do is to extrapolate this concept to other situations that you might encounter and identify how it could become a problem on your dataset.
When you look at this chart (based off of the OOB Usage Tracking Subject Area), how many rows would you expect to have been returned from the database?
If you said 7 you are correct.
We expect the core of the query to look like this:
Select YEAR_MONTH, COUNT(*) FROM FACTS, DATE_TABLE GROUP BY YEAR_MONTH;
If however the criteria tab happened to look like this:
We actually see this in the query log:
select count(T374106.QUERY_TEXT) as c1, T374138.DAY_DT as c2, T374138.PER_NAME_MONTH as c3, T374138.CAL_YEAR as c4, T374138.MONTH_NAME as c5 from S_ETL_DAY T374138, S_NQ_ACCT T374106 where ( T374106.START_DT = T374138.DAY_DT ) group by T374138.DAY_DT, T374138.MONTH_NAME, T374138.CAL_YEAR, T374138.PER_NAME_MONTH) Rows 166, bytes 65072 retrieved from database query id: <<2830774>> [[
Woa. Expected 7 rows, actually returned 166 rows. Why?
The query includes the day date in the group by is the reason why. As there are roughly 30X more days than months (give and take based on actual usage records), the resultant dataset is that much larger. We’ve asked OBI to run the query at a lower level of grain than we actually need.
Think about that for a minute – what impacts will this have to your performance? Here are some impacts that this causes:
- The Database has to do a group by operation on a larger data set
- Any sorting on that larger dataset or other subsequent calcs or outer selects will be slower
- All of this work impacts overall database load and I/O
- If you are using database result cache it fills it up more quickly
- The larger dataset has to be transmitted over the network to the BI server
- Since the report is at the Month level but the data is at the day level, the BI server now has to do its own group by aggregation to get to the correct grain of the query
- This of course adds delays and load to the BI server
- It also fills up the BI cache faster
Now with this tiny example the differences between the correct query and the bad one are tiny. However, what if you had a 4,000 X difference in grain? Or a very large dataset? Or had to bring back many wide, descriptive fields?
Even worse, what if there were other dimensions that were on the criteria tab and cause the database to needlessly join to them? This in particular can be the largest impact to overall performance.
I have a real world example of how bad this can get: One customer had a report on an Exalytics that was taking over one minute to run. The reason? It was returning 1.2 Million rows from the database when it only needed a few hundred.
So, once we remove the extra stuff from the criteria tab, namely the Day column, the query is what we expect it to be:
select count(T374106.QUERY_TEXT) as c1, T374138.PER_NAME_MONTH as c2, T374138.CAL_YEAR as c3, T374138.MONTH_NAME as c4 from S_ETL_DAY T374138, S_NQ_ACCT T374106 where ( T374106.START_DT = T374138.DAY_DT ) group by T374138.MONTH_NAME, T374138.CAL_YEAR, T374138.PER_NAME_MONTH) Close Row Count = 7, Row Width = 232 bytes [[
Even with this simple example, the data volume returned to OBI is 232 bytes vs 65,072 bytes. That’s a 280X improvement! Again, think about some of your real world examples and how they can be much worse.
Know What OBI Does Before It Does It
This whole post gets down to 2 fundamental OBIEE design and development concepts I have been preaching about for years:
- Know what the SQL is supposed to be before you even look at it
- The SQL is the Truth
How do you know OBI is doing the right thing unless a) you know what the right thing is and b) you actually look at what its doing? Performance is the least of your concerns here – correctness and accuracy are the primary concerns – performance comes in after correctness. Without these two principles, your OBI system is nothing but a black box and you really don’t know what its doing at all.
For performance checking, always look at the group by clauses in the SQL – if the don’t match your output views then you very likely have a Criteria Tab that needs to go on a diet.