Achieving Good Performance With OBIEE
Originally posted 1/30/2007
How do you extract good performance from Oracle Business Intelligence (OBIEE)? Are there any tool specific features that can be used to achieve better performance, or is it all about the backend? This post will discuss some things to consider to make a page or report perform well; scalability is not included here.
Division of Labor
The first thing to recognize with OBI/SA is where the heavy lifting gets done. OBIEE is a ROLAP engine, which means it is a SQL generation engine. As such, the backend database does the vast majority of the work in most normal environments. Take a look at the Usage Tracking feature of OBI or the log file, and you’ll notice that for longer queries, more time is spent waiting for the database as a percentage of the total. There are a few other things that may slow down your system that have nothing to do with the database which we will discuss in a moment. But for the most part, the best bang for your buck will come from tuning the backend and making sure the correct SQL is generated.
The term Back End here refers to the database(s), flat files, XLSs, their design and how they all work together.
Consolidate Data Sources Although being able to seamlessly query multiple databases or files simultaneously is a great feature, it should only be used in small scale scenarios or prototypes. The laws of physics simply dictate that bringing multiple datasets across a wire to the Analytics server where joins can be executed will be slower than having the join occur in a single database. Some old-school nQuire fans tend to ignore this reality in order to showcase one of the tool’s primary technological advantages over its competition.
Proper Design There is simply no substitute for a proper data model design. In Computer Science terms this is roughly equivalent to the algorithm used. No matter how fast you code and tune a Bubble Sort, a Quicksort will blow it away even if poorly coded. For those of you who remember, this is called the Big O, and describes the order of magnitude of an algorithm’s efficiency (i.e., n² vs. nLog(n)).
The same thing goes with Data Model designs. Are you having to join large table to large table to a dimension table when a simple Dimension to Fact should be done? I recently saw a model having Order Line Item join to Order Header to then join to various dimensions. In a case like this, the Order Header table should be removed from the query, with its FKs denormalized into the Fact table to allow for a single join from Dimension to Fact. Follow the rules of Dimensional Modeling and optimize your data model for query performance; this is what Dimensional Modeling is all about to begin with.
Move Complex Logic to ETL In some cases, the structure of the Data Model is sound, but some of the calculations are requiring very complex statements or possibly even multiple passes. In a Data Mart environment where additional back end work can be used to solve complex analytical application issues, consider letting the ETL do the work instead of a user’s query. Refer to a prior post about RPD vs. ETL for more information.
Aggregates Dovetailing nicely with the idea behind letting the ETL do the hard work are aggregate tables. Aggregate as necessary, but only after you have exhausted database specific tuning features, such as Partitions, Indexes, Database parameters, etc. Keep in mind that there are costs associated with Aggregate tables that must be weighed.
Use Database Performance Features Every database at least has indexes that can be used to improve certain queries with minimal effort. The major databases go much further with advanced features and parameters – use them. In Oracle, this means making sure your database instance and its tablespaces are optimized for Data Warehousing with parameters such as BLOCK_SIZE and DB_BLOCK_BUFFERS. It also means that you are applying Oracle features appropriately, such as proper indexing, table partitioning and the use of Star Transformations where applicable. I find it somewhat incredible that even in this day and age many companies simply don’t use the database features they have available to them that have a dramatic effect on performance. Partitioning and Star Transformations in the Oracle world combined will deliver incredible performance advantages without adding a single ETL routine.
The items above will for the most part get you what you need. However, there are many specific cases which you may be encountering which there may be an OBI fix.
Caching First and foremost, caching is not a performance tuning solution. There are many scenarios where caching simply will not be sufficient, such as real-time data sources, ad-hoc environments, large volumes of reports & dashboards, or environments with significant drill down options. If you have done the backend steps correctly, your need for caching is greatly reduced or even eliminated. Caching should never be considered a replacement for proper tuning activities.
That said, caching can improve the performance of your dashboards considerably, particularly if resources are not available to get the back end design correctly. Use cache seeding iBots to pre-cache key dashboards or problem reports as needed.
Proper SQL Generation An efficient back end data design is worthless if OBI isn’t using it correctly by issuing the wrong SQL. In more than one system I have seen outrageous performance steps undertaken when the real cause was incorrect SQL. You must know the SQL that will be generated as if you were to write it by hand; if OBI doesn’t generate it correctly, then figure out why. I recently saw a client who was doing a TOPN() query, but the OBI server was performing the Rank on the detailed records instead of the database. The difference was from tens of thousands of records returned vs. 25 records returned.
There are two main things to check here aside from just basic correct mapping:
- Function Shipping In some cases, the manner in which you have mapped a field causes OBI to not generate the proper SQL and instead rely on the OBI server to perform the mapping. This is a killer for performance. This tends to occur on Date functions on Date/Time fields quite frequently, so make sure you check them. Make sure the SQL generated that uses OBI functions includes the proper database function. If not, see the next item:
- DB Features Tab Perhaps the reason for improper SQL is due to some misconfiguration in the DB Features tab. Verify that your features are correct. Two common ones to check: COUNT_DISTINCT_SUPPORTED and DERIVED_TABLES_SUPPORTED, which helps Rank() and TopN() queries operate efficiently.
Simplify the UI This boils down to having fewer reports on a dashboard page – don’t put 15 on a page!. Beware – Guided Navigation sections always execute the reports in the section, even if they are not visible!
Use fewer reports If possible, use one report with multiple views instead of multiple reports that contains similar data. This will reduce the number of queries executed, but beware – it could increase the complexity of the queries sent to the database which may negate any benefit. The same problem occurs when having many levels of sub-totaling on a report.
Reduce Record Volumes Wherever possible, try to reduce the number of records that any report returns. Although this means eliminating data dumps, it also includes having default values in prompts and reports, so that when a user first enters a page a minimal record set will return instead of an unrestricted query.
Consolidate InitBlocks Slow login times and RPD start-up times can sometimes be attributed to simply having too many initblocks fire. Try to be smart about the SQL you put into you initblocks, shooting for few but more broad reaching blocks as opposed to a bunch of simple ones. Ever wonder why your BI Apps repository takes forever to load, or the Windows Services applet says that OBI Server didn’t start when it is still loading? It’s the init blocks – they keep on firing, and the RPD isn’t officially loaded until they have all completed. With the BI Apps, clean out (or disable) any unused Initblocks as if they were viruses (exaggeration intended).
What is not listed here? The size of the RPD and the size of the WebCat really have no bearing on performance, aside from the startup times of the server components.
Although each of these can be discussed in greater detail, this post should get you going in the right direction for things to look at.
Posted on July 10, 2009, in Data Warehousing, OBI EE Development, Performance, Repository and tagged Dimensional Model, Performance, Repository, RPD, Star Transformations. Bookmark the permalink. 5 Comments.