Over Reliance on the RPD
Originally posted 12/15/2006
This is one of my more popular old posts from IT Toolbox, and it still holds perfectly true today. I even added a few more sections on it to freshen it up a bit
One of the more common causes of problems with projects I see as a consultant is the over reliance on OBIEE instead of the Data Model ETL Over the years I’ve been brought into a few red accounts when I was at Siebel and Metricsphere where the RPD configuration was way too complicated. We must remember that Analytics is a tool, and all tools have limitations. Push too hard on the tool, and it will break, or almost as bad, it will become an unmanageable, unmaintainable mess that will break eventually.
What kinds of things am I referring to? The easiest one is the use of a normalized schema, one that requires lots and lots of joins to solve the query. If the database table doesn’t have the field you need in the dimension, then just join to the table that has it by adding it into your logical table source (LTS) is the way the thinking goes. But, there are problems relying on this admittedly simple solution.
First, what is the alternative solution? If you need an additional field from another table, then do so in the data model, and populate it with ETL. That’s the classic Dimensional Modeling approach – denormalize, denormalize, denormalize (your dimensions that is).
Why is that? Well, its real simple: Joins slow most databases down. So, one of your goals should be to reduce joins as much as possible but be realistic – big flat tables are no good either! OBI will have to let the database execute the join every time a query hits it. By doing it in the ETL, it is done once at night, and does not require a run-time join.
But there is another consideration here, one that is real world in nature. Sure its great to say to always do let the Data Model and ETL code handle the extra work, but the reality is that doing ETL takes more time and effort than simply making a join in the BI tool.
This is a common tradeoff that an architect or PM needs to make on any project – higher effort, enhanced performance, possible more storage on one side vs. lower effort, reduced performance and a more complex BI Tool implementation. Consider wisely. You know what the right solution is, but may be forced to take a short-cut by letting OBI do all the work at run time vs. the overnight ETL. But be forewarned!
Ideally speaking, your Business Model and your Physical Models should look as similar as possible. Just doing so will entail limited to no Model-to-Model translation overhead which is sure to not only slow your queries down but will also be harder to maintain.
If you find that you need Opaque views to solve a tricky RPD problem, strongly consider building them as aggregates. If not, maybe the Opaque views can use some aggregates that already exist.
The classic scenario where this comes up all ot often is when a customer needs to map OBI into multiple data sources. An Oracle Sales Consultant will tell you ‘sure – Fragmentation is a great feature of our tool that the others don’t have or are playing catch-up on. Just map your sources straight in an its all seamless to the user!’
The reality is much different however. Data Integration is where you will get your bang (for more buck unfortunately) – The tighter the data can be integrated the greater benefits your organization can reap, not to mention a simpler RPD and a fast user query response time. This is one of those features a-kind to an action movie, where the hero gives the heroine a gun and tells her to use it as needed. The Heroine’s goal should be to get out of the situation without ever using the gun and not just start blasting her way through all the time.