OOW MetaData Presentation Sunday @ 3pm
As part of the Sunday ODTUG OBI day, I’m presenting on Advanced Metadata topics. This is the same presentation I gave at the Rittman-Mead forum in April of this year. Its useful if you’ve been working on OBI for a while and are starting to feel comfortable with modeling the basics. I’ll cover the following topics:
Prototyping with OBI
I’ve always felt that traditional approaches to developing BI content left much to be desired. Going through a traditional requirements approach with users to determine what reports should do, then spending a significant amount of time documenting them, only to finally show them to users after development and find out they aren’t happy with the result is something that I suspect most of you have encountered. I believe a better way to come to a solution that makes users happy is by starting right away with a Prototype. In this article I’ll discuss the generals of prototyping, and specifically how OBI is well suited to this approach.
What is Prototyping?
First off, what do I mean when I say Prototyping? How is it different than say a Pilot or a Proof-of-Concept? Well, I’m not going to answer the latter as I misuse these terms all the time and honestly it’s not that important. Nor am I going to cite a formal definition from some book.
What I am referring to is really more of a process/approach than a piece of software. I think there are a few key attributes on what I think a prototype is with respect to OBI:
- It is used to determine specific reporting requirements, techniques, and designs
- It is used instead of a traditional Requirements and Design approach
- You can begin prototyping almost immediately on your project
- It covers RPD, Reports and Dashboards
- It is throw-away
Let me briefly describe a prototyping experience I recently had with a customer on a large custom OBI project. My main OBI developer on the team and I went to a customer site with the knowledge of 8 reports that they mocked up in wireframes. The OBI developer set up a prototype on his notebook, and proceeded to build out a sample data set and then an OBI Prototype. As we went through the reports and began to discuss what the customer really wanted, we were able to push them towards much more of a BI solution. Instead of 8 self-contained reports, we were able to jointly craft a multi paged BI structure while working with the customer to determine exactly how the reports work and what the overall user experience would be. I was able to get at the root of what they were trying to accomplish with this system.
As it turned out, we increased the number of reports substantially, but due to the fact that we had been working on them for so long, I knew we could still complete them in the same amount of time. In the end we delivered a solution that was far better than what the customer had initially envisioned. And we did so knowing very well that all of the functionality could be accurately delivered in OBI as the customer had expected.
How Does Prototyping Work?
Let’s get into how you actually go about running a Prototyping effort before discussing the benefits and why you should do it.
Start with whatever input you have from your business customer, whether it’s an Excel mock-up, Wireframes, a PPT, a whiteboard diagram, or even an existing report in a legacy BI system. If you have none of these (pretty rare), you can still continue – it’s just a bit more difficult to get going.
Next, open up MS Access and start making some tables and put some data in them. Start thinking along the lines of your main entities and how they might fit into a Star Schema model. I prefer Access for a variety of reasons, but primarily because the product is everywhere, is lightweight, allows for easy table design and data entry, can do some data manipulation, and is SQL based. Others prefer Excel, but to me Access is far faster to use and debug. I wouldn’t bother too much with a more powerful database server – they are too much effort to make changes to tables and edit data. You simply can’t match Access’s speed for all of the changes you are going to make, and trust me you will be making a lot of them. Plan on this being a highly dynamic process where the tables you start with barely resemble the final tables.
When you think you have some of the base data in Access, then map it to OBI. Don’t go crazy here – get the basics going that you think you need to start producing something. Then move on to Answers and even dashboards and start building some reports. Your initial goal is to get something back to the end users as soon as possible – you want a straw-man to show them.
Once you have something to review with them, you’ll get into a repeatable cycle until you’ve fleshed out nearly everything you need:
- Show users what you’ve done and how the reports/dashboards work during an interactive session. Get their immediate feedback. If they suggest a different way, then make note of it, and tell them you’ll get back to them with the next version of the prototype. You might be able to make the change in front of their eyes for them.
- Go off and work on what you discussed – do what you have to do in order to make the adjustments they asked for. In some cases you may wish to present multiple options for their decision. Clearly lay out the pros and cons of each, and actually show them how it works and the limitations. If you have an idea that they haven’t thought about, work on it and present it as an option.
- Document your decisions in a lightweight Requirements and Design document. Any global decisions made are well suited for such a document.
- Go back to #1 and repeat until done.
Ideally you can have these workshops every day or two. They can be relatively short, say 1 hour or so, allowing you to be respectful of their time. While doing this for a fixed number of weeks, say 4, keep in mind the following:
- The reason this technique is effective is that it allows users to see what they are getting immediately, and they are actively involved with deciding how it will work in the tool; it’s not all up to you so you don’t take the heat for it. Visual feedback facilitates expectation setting.
- One thing you must convey to the customer, regardless of their level of familiarity with OBI: This is a tool, and you’ll have to see how well what you asked for can be done. Even if you are a super expert, you need to “see how well it works”. If OBI can’t do it, just say “the tool doesn’t do that, but here is another way – is this ok?”
- As you are getting into the weeds with the customer on what their dashboards and reports should look like, take time to discuss with them the real business purpose of the report. You may find that one report becomes three, or three reports become one. Ask “What are we really trying to show here?”, and then come up with an option for them. The difference between a Report Developer and a BI Consultant is the BI Consultant thinks about other and better ways to solve the customer’s problems as opposed to implementing to a spec.
- Think about ways to enhance the overall experience and dashboarding environment. Most likely the customer is expecting Just a Bunch of Reports (JBOR). Use this opportunity to show them drill downs and navigations from summary to detail actually look and feel. Sprinkle in some exception based BI or conditional formatting.
- Along the way a dashboard structure may unfold. Make sure to prototype this as well and fully discuss it. Ask about plans 2 years from now to make sure you don’t box yourself in with a short-sighted Dashboard structure.
- Make sure someone is driving the prototype, even making small changes to it in front of their eyes, while someone else keeps track of notes and decisions. It is also useful to have someone else driving the conversation if possible.
- Don’t over-do it with the prototype; everything doesn’t have to be perfect and fully features. Constant re-adjustment of the Access tables, their data and the RPD model is a given. If you need to do something 10 times then just do it once to show the users, then document where else it should be done. Shortcuts are essential.
- Nail down names for reports and fields and document them.
At some point near the end, when everyone feels like you’ve captured all of their needs, you can start to solidify the design, particularly the RPD and the Data Model. If you’ve done the prototype efficiently, you will have taken a lot of shortcuts that you’ll need to discuss or complete the full design for.
You will also make sure up front that the users know that this effort is timeboxed and cannot go on forever. For this reason I prefer to keep the prototype on a developer’s notebook and not on a shared server; if users have access to it they will never stop asking for changes and tweeks. Furthermore, it’s simply faster to develop on a notebook than a shared server somewhere. Keeping things lightweight for the prototyping process is essential; accept no slow-downs!
Why Do It?
There are several key reasons why Prototyping has a distinct advantage over a more traditional approach.
Expectations Setting
One of the biggest problems with traditional, strict report layout requirements is that you will bump into unforeseen problems when you get to development. I don’t know how many times a user asks for a specific format or UI feature, and during the requirements and design phase the fact that it can’t be done like that is overlooked. I have seen it happen on nearly every non-prototyped project so far, and I’ve been a part of quite a few. No matter how amazing you are with any tool, there will be solutions which you simply are unprepared for or overlooked. With a more traditional approach, it happens rather late in the project when you find out that you can’t deliver a signed-off feature in the manner in which you said you could. This leads to a disconnect between users and the project team, and the project team will lose the argument. Prototyping’s main focus is to eliminate this surprise to end users by addressing it up front from the very beginning.
Enhancement
Along the way you have the option of improving what the customer asked for by showing them different solutions and having in depth discussions with them. A rigid requirements approach has little opportunity for improvement, as there is nothing to look at from a holistic perspective. I cannot stress this enough – seeing the solution develop allows for a much better big-picture view of the solution, thereby allowing users to also focus on the overall structure and user experience as opposed to only the details of a report.
Customer Stake in Design
Instead of simply throwing a wireframe over to IT to work on, by bring this users into the discussion and working side-by-side with them to solve problems, they are more active in the outcome. The users feel as if the outcome is more of their work than a design document would be, as they were there crafting how things should work. For IT or the consultant, this also means every part of the solution is jointly owned and not just your responsibility that you screwed up. In other words, if something is less than optimal, they can’t blame you for it – they were part of the decision making process from the beginning. (Sorry for the cynical view, but it is a reality that consultants and IT face all the time.)
Risk Mitigation
Waiting until report build time to determine that a feature can’t be done or a report actually requires a whole new set of tables is a disaster on a BI project. Prototyping early on allows the development team to substantially increase their confidence that they can deliver a specific report format. If they cannot, they will know at the beginning of the project as opposed to the end.
Additionally, UAT no longer becomes a surprise to the users; they are getting exactly what they knew they were getting because you showed them (and proved to yourself) that it can be done.
Better BI Designs
As I mentioned earlier, I believe your outcomes will be better with a Prototype approach that by rigorously focusing on writing detailed report specs. The system has the ability to come alive for the users very early on. They’ll be able to see what they wanted, plus even get the juices flowing on even better designs. Think of it of getting them to UI 2.0 right away instead of being stuck on UI 1.0.
Work out Difficult Metrics and Config
Although I think Prototyping is mostly about the UI and Reports, there are times when a prototype can help you determine a complex metric definition. Sometimes metric calculations simply are not known and need to be tried out first to confirm the cases and logic is correct. Additionally a particularly complex piece of RPD config may be needed; with a Prototype you have some time to work on the solution and try it out.
Faster
Yes faster. Instead of spending weeks or months writing down what you are going to do, and then ultimately finding that some of it didn’t work as expected, you can start of building the reports right away. All of those weeks of prototyping are effectively like a trial development cycle. When it comes time to actually develop the real reports and dashboards, the development team will have already built them once before – this will allow your developers to go much faster through the final solution than ever before. In fact if they forget a specific technique, they can always refer back to the prototype.
Ultimately, detailed report specs I think are a waste of effort. Writing down every sort, filter, column format, etc. just isn’t as compelling or useful as doing it. Reports will change over time; will the team go back and update report design specs? Probably not. I have seen customers that spend more time documenting the report than actually working on it; I feel like this is a massive waste of effort. Ask yourself this: do power users do extensive documentation when building a report? Why not?
Why Not Just Make the Prototype the Final Production Solution?
Most likely your prototype behind the scenes is a pure mess. If you’ve done it right, you’ve taken a lot of shortcuts, cheated quite a bit, and most certainly ended up with a different folder structure than before. You will have all sorts of junk reports, prompts and pages littered throughout your WebCatalog. Your RPD will likewise have many shortcuts that you’ve made – perhaps you needed 5 metrics but you cheated and used the same one with 5 different names. In reality, the RPD stands a better chance of becoming production config than the UI does, but I won’t go so far as to say you can use it. If you are lucky you might be able to. I think it really would be the holy grail to simply “swap out the physical layer” in the RPD and suddenly your development is done, but I don’t think this is realistic.
Summary
Prototyping, much like the Agile Methodology, focuses on getting things in front of customers as soon as possible. With this in mind, they both are designed to adapt to change – in fact that is the reason they both exist. Far from shying away from change, a good prototyping process can actually increase the amount of changes to your UI solution. But not all change is bad; as I’ve seen happen before change can lead you down a better path that anyone had intended at the beginning. Furthermore it allows the team to focus on the outcomes as opposed to writing about what the outcomes should be. Finally I believe it engages the customer to a greater extent and minimizes any user expectation discrepancies.
Let’s apply part of our BI craft to how we build BI systems: A large part of BI is about effective communication via appropriate visual representation. I submit to you that we follow what we preach during a project; visual communication tools such as a stop light or a colored trend indicator (the prototype) is far more effective than a large list report full of text and numbers (spec documents).
Let me know how your prototyping experiences go!
Making Dims & Facts Work Together
This is a very common RPD modeling question on IT Toolbox – it comes up every week it seems. The problem is stated something like this:
I have 2 fact tables and 3 dimension tables. One of the dimension tables doesn’t work with Fact #2 while the other 2 dimension tables work with both facts. When I make a report with all 3 dimensions and both facts, Fact #2 is incorrect or missing.
Let’s restate the problem a bit using good BI language. 2 of the dimensions are conformed across both of your fact tables, but one is non-conformed. Of course conformed is a fancy word meaning can be used across multiple facts, also known as shared. So the question then becomes simply: How do I make a non-conformed dimension work?
I addressed this topic in a recent presentation, so I’m simply going to use some of those slides and annotate a bit more. In fact, I will discuss two different solutions that you can have in your toolkit. As is usual with me, its not the example so much but the concepts behind the example.
So when OBI gives you the wrong answer, in this case it’s because you never told it how to answer it. (its your fault!) OBI needs to know how to handle Order Amt by SR Status as right now it doesn’t. We will have to give it a little bit more smarts to do this.
With this solution we are essentially tricking OBI into thinking there is a real join between them and the dimension is perfectly conformed. Of course with every trick there can be consequences:
If you look into the 1st query (the one we’re interested in), you’ll see that a filter was indeed applied to the dimension table. Unfortunately for this example the dimension table also happens to be a fact table (aka a Combo table). In a normal star model, this really wouldn’t present much of a problem, as the W_SR_STATUS_D table might have only 1 record in it with the value of ‘Open’. However in many cases this will not be true; with a fact table it is certainly not true. We will have many records that are Open, and each will return from its portion of the SQL. Note also that the W_SR_F doesn’t join to any other of the tables aside from W_Customer_D. What is going to happen most likely in this query is it will be over-counted due to these multiple records coming back and being joined with the rest of the query. Although indeed that is a bad scenario, keep in mind your specific setup. If you can guarantee that only record will come back all of the time, then this technique works quite well.
A different solution is shown below, one with less trickery but more effort:
Here we are solving the problem by using Level Based Metrics. Specifically, when we set a field to the Totals level, we are telling OBI to ignore that dimension when considering this fact. In a weird sense, Total = All = None at All! One reminder: setting totals level on the Content tab does absolutely nothing – it is assumed to be at totals level if left blank. Unfortunately we have to do the config on the metrics themselves and not globally on the LTS.
The big downside to this technique is that you have to do it for every metric that you might possibly conform. This can be a lot of work, easy to make mistakes, and difficult as all heck to QA. This is the reason I don’t like this approach – the first one is easier but potentially dangerous or not even possible. Lets look a bit further:
When you run it now, things look a bit better. Note anything you do with the SR Status dimension will be totally ignored by the Orders half of the query, so you may need additional filtering to clean the report up to your liking.
To summarize:
I use both of these techniques frequently. Sometimes in the same Business Model. I’ve even had times where I converted from one to another.
Best of luck!
Link to OBI Performance Webinar is Up
KPI Partners just posted my Webinar on High Performance BI. Its free to watch, so take a look. Or if you prefer, here is the slide deck called Planning for High Performance OBI – June 2011. Comment on this blog as opposed you YouTube if you want me to respond! – Jeff M.
Still room for next week’s Performance Webinar
There are still slots open for my KPI Partners webinar on High Performance OBI next Tuesday at 1pm EST. I hope to see you there!
Performance Tuning and Financial Analytics
In the past I’ve written and presented on OBI performance from the ‘before perspective’: before you begin development, what are the things you should plan on and how should you implement them. This paper however is with the ‘after perspective’; what to do if you are stuck with a performance problem that you need to solve quickly. It will use the Financial Analytics (All versions) application from Oracle’s BI Applications (aka the BI Apps) to walk through a plan of attack and demonstrate specific changes using the Oracle Database 10x. Thus, this paper has two purposes:
- Specifically to document actual changes for Financial Analytics
- Generally to walk through a performance tuning effort that you yourself may undertake
Note: You do not need to be working on Financial Analytics or even the BI Apps for the concepts in this article to apply. It merely uses Financial Analytics as its example, and where appropriate I will explain the model.
I’m going to do something a bit different with this article in that I will tell the story of a recent performance tuning project for a client. A previous integrator had delivered Financial Analytics and it was up and running in a production environment, but the performance was terrible. Many queries were over 8 minutes. We were asked to tune the GL Transactions star, but the lessons learned here will work for all modules of Financial Analytics, regardless of version. In fact, implementing them for only one star actually boosted the performance of the other Financial Analytics Subject Areas.
I hope to impart the many aspects of performance tuning to highlight areas which you should also look at when needed. As usual, I will focus on general concepts and key takeaways. Additionally, some knowledge of GL structure and Financials will be useful but not mandatory. Shown below are the basics of a Financial Analytics Star. Of course there are many more dimensions, but I removed many of them so we can focus on the real heart of the solution.
Assessment
After a quick review of the customer’s BI Apps implementation, what they had done, how bad the performance was and their key pain points, it was time to get our hands around the problem and open up the hood. Myself and a small team gathered a large list of about 40 reports that were not performing up to the report author’s expectations. We made sure we had good coverage from a variety of key ad-hoc power user/report developers – focusing on breadth was more important than depth. It turns out that all of the reports in question were developed by these power users; none were Out-of-Box (OOB) reports or modified versions of them.
Next, I did a review of each report and cataloged it. We ran the reports, looked at their logic and how they were built, and tried to determine a few things about each one:
- Was there heavy logic in the report?
- Did it have unions?
- Was it filtered properly or at all (e.g., did it have a time filter)?
- Did it do anything unusual?
- Was it limited by not having a good field to use?
SQL Catalog
Next, I needed to gain some detailed information about what they were doing and how they were doing it. Once I had done this, then I could look for patterns and ultimately recommend an aggregation strategy (more on that later). To do this I cataloged each one using their generated SQL from the OBI Query log. The result of this helped me to understand the following:
- Tables used
- Joins
- Which tables had filters and selects
- Based on the above, determine each report’s granularity
- Baseline each report’s performance to measure improvement.
Understanding the grain was the ultimate goal of this exercise – it will become useful for identifying aggregate candidates later on. Note I was not interested in the metric calculations or other report logic – this is more about tables than fields. Complex logic will be dealt with later after the bases are covered.
Baselining
We had to demonstrate gains and understand if the changes were helping or not, so baselining performance was important as well. For consistent baselining, we broke each report out into its own separated dashboard and had only 1 report per page so that its execution could be isolated and results more consistent. Make sure to run each report multiple times and take an average. I also recommend to clear out Oracle Buffer Cache (and result cache if you use that in 11g (which you should!)) between each execution to as not to taint the results:
alter system flush shared_pool; alter system flush buffer_cache; BEGIN dbms_result_cache.bypass(FALSE); dbms_result_cache.flush; END; /
Note also we ran these reports as both Administrator and as a user account to see the effects of Data Security on the reports.
Key Take-Aways:
1) Isolate and capture accurate performance numbers right away
2) Use the SQL to gain a big-picture understanding of what is happening. Focus on Breadth.
Report Layer
Normally, BI performance is all about the data model and database features. You’ll look at partitioning, star transformations, bitmapped indexes, etc. to improve performance, and we’ll get to that in a short bit. But that assumes that your reports are built correctly and that your data model is used properly within the BI tool. It’s a waste of time to tune a model if you are looking at the wrong queries.
I had noticed that some of the queries were returning thousands of rows from the database, but the reports only showed a few dozen rows, fewer if a pivoted report. This happens when the grain of the views (charts, Pivots, etc) is different than the grain of the Select portion of the criteria tab. Generally speaking, anything that is not a metric will appear as a group by in the SQL. If there are extra columns in the group by list, the resulting grain will be lower than what is actually needed.
It was clear that we had to do a report grain clean-up pass. Essentially, this amounted to removing things from the criteria tab that were not needed in the output. Usually these columns get there as a developer is building the report, they put a filter on those columns and never remove them. To demonstrate the problem:
GL Account Example:
The GL contains a series of complex account codes which are actually glued together segments in the form Seg1-Seg2-Seg3-Seg4-Seg5… . The number of segments is different for each customer; my customer was only using 3. Each of these segments has a large hierarchy associated with it, with many codes and descriptions.
Consider a report were we show dollars by Segment 1’s hierarchy. If there are 10 Segment1 values, 20 Segment 2 values, and 15 Segment3 values, we expect only 10 records to come back. But if the developer left Seg2 and Seg3 on the select list, we can expect about 10x20x15 records to return (3,000 rows). We’ll have less if Seg2 and Seg3 were filtered on in a prompt, but the point still remains. 3,000 rows will take much longer to process inside of the database than 10 will, will take longer over the network, and OBI will then still have to aggregate them further to get to the final 10.
Thus, we compared the selected grain with displayed grain and made some deletions which resulted in a very nice performance gain. Plus, we could now worry about analyzing and tuning proper queries instead of bogus ones. Don’t forget to do a proper regression test of the reports to ensure you haven’t broken anything!
Key Take-Aways:
1) Report construction matters too – start with it as it affects the entire stack.
The Data Model
Armed with proper SQL, I then drew the picture of what was happening in order to see the Data Model. Financial Analytics is designed as a multi-level snowflake schema instead of a clean star. When you look further at the joins between the segment tables, you’ll notice that they happen to be the same fields:
In the diagram above, W_HIERARCHY_D could just as easily been joined to W_GL_ACCOUNT_D as W_SEGMENT_D as they join on the same fields. W_SEGMENT_D actually provides no value to these queries and in fact just gets in the way! But why stop there? The W_GL_ACCOUNT_D table holds the linkages that quite frankly should be in the fact table. Since W_GL_ACCOUNT_D is usually a relatively large table as compared to the segment and hierarchy tables, the database optimizer will have to process a 3 step linkage to use W_HIERARCHY_D and the Facts together. This causes problems for two reasons:
- Its extra join work by the database that is not needed
- It is more complex for the Oracle database to handle
The result of both of these problems is poor or ineffective use of Star Transformations. Star Transforms are a powerful feature that Oracle has to process a typical DW type query on a Star Schema. If Oracle is able to properly deploy a Star Transform, it can lend to significant performance gains. More on Star Transformations in the next section.
The solution ultimately breaks up this snowflake model and forces a clean star model:
This of course means that the Fact tables need to be extended with the necessary fields for each segment used. Some of the OOB tables already have these FKs added to them, especially the aggregates. In my customer’s case, I needed 3 segments total, and each join was a 2 part composite FK consisting of ACCOUNT_SEGn_ATTRIB and ACCOUNT_SEGn_CODE. These fields are easily obtainable in the ETL during the lookup to W_GL_ACCOUNT_D that gets the GL_ACCOUNT_WID for the Fact table.
A bit about Star Transformations
Star Transformations are a highly optimized feature of the Oracle Database since version 8. They are the best way to handle a selective query that hits a very large fact table. In the case of Financial Analytics, many of the queries are in fact highly selective – they filter on segment hierarchies, codes, Account numbers, Ledgers, Orgs, Time and some others as well. I suggest you read a very popular document by Jeff Maresh about Star Transformations called “Supercharging Star Transformations”.
I do want to discuss when and why star transforms are good and when they are not. Recently I’ve been on a different project where we tried Star Transforms, but they turned out to be slower than a more traditional approach, so don’t assume that they will always be suitable for you.
Star Transforms are best when:
- You have a clean star model and not a snowflake or bridge table
- Your fact table is very large (when you need them most!)
- There are several dimensions that are commonly used together to access it
- You are ultimately looking for a small piece of the fact table
In other words, if you have a clean model with highly selective queries identified by many dimensions, then Star transformations are for you. Selecting all of the records for a given month and adding them up is an example of where a star will not work well. With that in mind, Star Transforms may not you’re your more summarized reports, but should provide a better benefit for more detailed ones. Luckily, Financial Analytics suits these requirements very nicely.
Key Takeaway:
1) Star Transforms are best for highly selective queries which use many pieces of information to select a small subset of a much larger table.
RPD Modeling
Merely drawing a picture of how you’d like the joins to work does not make it work that way of course. To change the joins you will have to open up the RPD and begin some remodeling work. I will overview the basics of what needs to be done in order to implement this change. This section is very specific to Financial Analytics, so if you’re not interested in it or the RPD in general then skip ahead.
We want to do two main things: 1) remodel things that are in our scope – Fact tables and Segments, and 2) leave everything else alone so that it still works properly. #2 on that list will cause us to do some things that may not be needed if we were only doing #1 – this is the BI Apps after all and we have to play nice with other pieces. We also want to do only the work that is needed, so if you are only using 4 segments then ignore the other 6 that are unused. Since much of our work is on conformed dimensions, there will be some cases where these changes are only partially implemented.
With that in mind, we want to have the following 4 types of join paths:
- For in-scope fact tables and used segments: This is the optimal path.
- For in-scope fact tables and not-used segments: (Same path as OOB)
- For out-of-scope fact tables and used segments: (Slightly altered OOB model – still uses GL Account though)
- For out-of-scope fact tables and not-used segments: (OOB Join path)
The reality is that in order to support all of the above, the W_GL_ACCOUNT_D table will need to be aliased to support two different join paths – New and OOB.
| Fact Table | Segment/Hier | Join Path |
| In Scope | Used |
|
| In Scope | Not-used |
|
| Out of Scope | Used |
|
| Out of Scope | Not-used |
|
The main steps for the RPD work are as follows:
- Split up the W_GL_ACCOUNT_D to support the two join paths:
- Make another Alias for W_GL_Account_D and signify it as “standalone” or “Star”. Remove joins to any of the in-scope segments and hierarchies.
- Add it to the Dim GL Account Logical Table as a new LTS :
- Replace it in any in-scope Fact LTS. In many of these LTSs the W_GL_ACCOUNT_D table is included in the source for calculations and also for reaching to the W_SEGMENT_D table. The Replace Table wizard will help you here. You should end up with:
- Extend the Fact table with the SegN_Code and SegN_Attrib column pairs and join straight to W_HIERARCHY_D from the in scope fact tables.
- The final step is important to ensure proper usage in an ad-hoc environment and for other reports. It will ensure that one can use W_GL_SEGMENT_D directly without having to filter on W_HIERARCHY_D first, which a user may not always do. Break apart the logical Segment tables so that W_GL_SEGMENT_D and W_HIERARCHY_D are their own standalone logical tables:
- Make a segment logical table with only the W_GL_SEGMENT_D table in it. Use the same LTS filter from the original.
- Clean out the OOB GL Segment tables by removing the W_GL_SEGMENT_D table and switching the LTS filter to use HIER_CODE. Rename these tables to indicate Hierarchy as opposed to Segment.
- Link in these new segment-only tables to the in scope fact tables (Physical, Business Model and especially the Content Tab). Note the content tab & Business Model joins will have to be done for all other stars as well that use GL Hierarchies as GL Segment no longer lives inside the logical hierarchy table. Although easy to make this work, it can be mildly tedious.
- Remove the circular join by breaking the join from W_GL_SEGMENT_D to W_HIERARCHY_D.
- Replace the old segment columns in the Presentation layer with the new segment columns from the new Segment Logical table
- Remove the segment column from the Hierarchy logical tables
At the end of this involved re-wiring of the RPD, the join paths should look as in the diagram below. Note that Segment4 is a stand-in for all unused segments. Note the joins for the two GL Account D tables.
Thus:
1) This new model has Segment and Hierarchy completely separated and models them as different dimensions.
2) It maintains a link to unused segments in the same way as OOB, but using a different W_GL_ACCOUNT_D table.
3) The used segment & hierarchy tables are forever split, and as such they operate with other fact tables in a slightly different join path. Note they are still separate, but the work of going straight to the fact is not done.
4) The OOB GL Account table maintains the links for all other stars to ensure the model still works. Note that a lin
At this point, the main RPD modeling rework is completed. You should notice a nice performance boost right away. In looking at the Explain plans for the SQL, you can see how it is employing a good and complete Star Transformation. In fact, we’ve added more selectivity to the Fact table, which we know is a good thing for stars. Oh, don’t forget to regression test as much as you can at this point.
Partitioning
Partitioning of fact tables is pretty much something you should always do if you have even mild data sets. Financial Analysis tends to do lots of month (also known as Period) level analysis, along with some quarterly and yearly work. Partitioning by month makes perfect sense.
Although partitioning has benefits for administration, backup, even ETL, what we’re interested in is performance! Specifically, we’re interested in Partition Elimination. A partitioned table is really just a bunch of physical tables glued together logically. With a table partitioned by month, all data for a given month only goes into that one underlying table called a partition. If a query asks only for that one month, Oracle will only use that one partition and ignore all of the others. Hence, you can potentially ignore most of the data your table has. Queries that need more than one partition will include only those it needs.
In the newer version of the BI Apps 7.9.6+, Oracle has moved to a multi-calendar date table, W_MCAL_DAY_D. You will have to pick the appropriate date field in the fact table on which to base your partitioning. In some cases this may be difficult. For the GL Detailed Transactions Star which I focused on, it was quite easy: ACCT_PERIOD_END_DT_WID. Thus, partition your fact table on this field as follows:
PARTITION BY range (acct_period_end_dt_wid) ( partition P1 values less than (100120061231000), partition P2 values less than (100120070101000), … partition P61 values less than (100120111231000), partition P62 values less than (100120120131000) )
Oracle 11g introduces Interval partitioning to make this task a bit easier, so look into that if you have it.
Key Takeaways:
1) Partition fact tables in an effort to remove most of the raw data from your query.
Indexing
Next, I looked at indexing. Generally speaking, you want to have the following bitmapped indexes:
- On all FKs in your fact table. For us that means a) reviewing to make sure none are missing from OOB (there was one missing on DOC_STATUS_WID for my customer) and b) adding new bitmaps on the new fields added to the fact table. Note these should be local bitmaps, meaning they are also partitioned along with the underlying table. Thus a query that hits one partition only needs to look at a much smaller bitmap as opposed to a much larger one for the whole table (called global). Note for composite column FKs like our new segment code & attribute pairs, Oracle preferred single column bitmaps instead of a single composite bitmap of both code & attrib..
- On any fields used in the dimensions for filters. For Financial Analytics, this may mean:
- Any HIERx_code field in W_HIERARCHY_D. OOB only has a few of these, so this is a must.
- GL_ACCOUNT_NUM on W_GL_ACCOUNT_D. This is so common that I’m surprised it is not in OOB.
- Anything else such as ORG_NUM in W_INT_ORG_D.
The cataloging effort of the reports in scope will help determine which are needed:
Here I would build bitmapped indexes on Hier2_Code and Hier5_code (hierarchy_ID already exists).
At this point your performance may be good enough. For my customer, implementing these changes by themselves made massive improvements. Here is the summary of what we had achieved to this point without aggregates:
| 17 Reports: | Orig | No Aggs |
| Avg Report Time | 2:36s | 48s |
| # < 30 seconds | 4 | 13 |
| # < 10 seconds | 0 | 4 |
| # Meeting Customer Expectations | 1 | 14 |
The numbers above are slightly skewed unfortunately. I would love to be able to simply show you how impressive just doing the basic things above truly are, but unfortunately there are a few reports which skew the results due to the manner in which they were built (more on those towards the end). These problem reports need special attention.
Key Takeaway:
1) Index dimensional lookup fields with bitmap indexes
2) Index Fact FKs with local bitmap indexes
Aggregates
I like to say that aggregates are the last option when performance tuning for a good reason. Until you’ve done the things above, why both going through the effort of adding aggregate tables? You may not even need them. In other words, make sure your bases are covered with proper design and physical implementation before adding more work to your plate. Also recall that while Stars are good for selective (detailed) queries, they are not designed to be overly effective at summary queries.
I of course did not want to stop at only 4 reports being < 10 seconds, so I pushed on for even better numbers to completely surpass customer expectations. In any event, there is some very real benefit to be gained from Aggregates. Generally speaking, the rule of thumb for years has been to make aggregates that offer at least a 10:1 compression ratio. Unfortunately for us, the aggregates we built were not quite this good as far as the # of rows goes. But when you factor in column width into what you call ‘compression’, the numbers look much better:
| Aspect | W_GL_OTHERS_F | Agg Candidate #1 | Agg Candidate #2 |
| # Rows | 16M | 9M | 4M |
| Avg Row Length | 503 | 121 | 115 |
| Total Size | 8Gb | 1Gb | ~0.5 |
| Total Compression | N/A | 8:1 | 16:1 |
This is one of the very detailed downsides to anything generic – it is not tuned at all. It is impossible to do so and still be generic. The data models for the BI Apps contain an enormous number of fields in them, many of which are unused or populated with defaults (W_GL_OTHERS_F has about 125 fields!). They need to do this in order to support as many customers as possible. Unfortunately many of these fields do take up storage space, and as a result more I/O reads are needed to get a set of rows. When building our aggregates, we also threw out anything we didn’t need, resulting in a much narrower table, one where more rows can be crammed into a single I/O operation. Looking at the numbers above, although the rowset compression is not very good, the row length compression is about 4.5:1. When taken together, the resulting aggregates are very effective.
How to Determine Aggregates
Although there is some art to it, there is a good deal of science as well. First and foremost, you have to understand the grains of your queries. (Protip: Know this right after requirements!) Going back to the beginning where I cataloged the grains of all of our queries a pattern emerged that drove the design of our two aggregates.
Examine the grains of the selects and the filters in these queries. If a table or known level is used by either, then identify it. Do this for each of the queries and a pattern emerges: (The actual analysis had 20 queries in it, but I’m showing only 5).
It appeared as if maybe we should have two different ones. The base one would have all of the dimensions shown above, and the higher one removed the GL Account dimension. For our sample, 2/3rd of the reports did not need the GL Account dimension. I went back and cross referenced the proposed aggregates to each report to determine if it would be used or not.
But did we need Agg1, Agg2, both, or none? The only way to be sure was to create the aggs, map them in and benchmark them. I recommend prototyping this as easily as you can with a CREATE TABLE AS SELECT statement. Although the lowest agg in our case may have been sufficient, the higher level agg impacted the performance of 2/3rds of our reports, and most of them in a meaningful way.
In the end, aggs are all about cost vs. benefit. If our higher agg#2 had only provided a nominal benefit (as measured by a small gain on a lot of reports or a larger gain on only a few reports), it would not have been worth the effort. Typical costs to assess include:
- More ETL and DAC coding
- More Data Model work
- More RPD work
- More QA
- Longer ETL Load times
- More objects to manage.
Ultimately, we elected to go with both as the benefit was there. Both aggregates will be added to the ETL load stream and integrated into the DAC for incremental processing. The RPD mapping is straightforward, and they directly link to the W_HIERARCHY_D and W_GL_SEGMENT_D tables just as with the base fact. Note: These tables were also partitioned by Month, on MCAL_PERIOD_WID.
Results
The results of the performance tuning effort were very good but not yet excellent:
| 17 Reports: | Orig | No Aggs | With Both Aggs |
| Avg Report Time | 2:36s | 48s | 31s |
| # < 30 seconds | 4 | 13 | 14 |
| # < 10 seconds | 0 | 4 | 10 |
| # Meeting Customer Expectations | 1 | 14 | 15 |
Back to the Reports
At this point we had met our performance objectives. However there were a few cases where the reports were still running slower than ideal. Going back to the reports, some of them had very complex logic in them. The reports showed the following problems:
- Hierarchy remapping via massive CASE statements. Some of these were monstrous, and resulted in queries totaling 11 pages of SQL! The obvious solution is to push these down into the ETL layer and provide the fields the users actually need to use. This will allow for much simpler SQL, less errors, greater consistency and better performance.
- Complex data logic. The subject area provided barely had any date fields in it to support what the users needed. Horribly painful logic persisted through many of the reports: CASE WHEN “Fiscal Period Num” between 1 and 3 Then 12 ELSE CASE WHEN “Fiscal Period Num” between 4 and 6 Then 3 ELSE …. Even after all of the changes above, this one report took 4 minutes to run. In looking at it in OEM, it was all CPU bound, churning through the complex logic. In some cases all that was needed was a “Fiscal Quarter End Date” and much of the complex logic can be eliminated. We will be sure to provide the tools the users need to get the job done optimally.
- Unnecessary Unions – some of the reports no longer needed their union queries. These are to be re-written, cutting the database load in half.
At this point we were forced to take on the problem reports one by one and see if there was a better way to build them. And in fact there was. There was still one report which was taking 4 minutes to run, and 2 were taking over 1 minute. These 3 were skewing our #’s wildly, and they simply had to be fixed!
Although we did not look at all three of the reports due to time, we did in fact look at the 4 minute report. The report author had been forced to build in a tremendous amount of manual logic to get the report to work. We were able to provide 2 new fields in the RPD which allowed us to re-write the report. We added a Quarter End Date to the Date dimension and a Dollar amount YTD to the metrics. With these in place, the report became exceedingly simple to build, and now it runs in 3 seconds.
Key Take-Aways:
1) Modeling and model tuning do have limits – they can’t do the impossible.
2) Make sure the report authors have the right tools to do the job. In our case, adding a more robust calendar table and a few time series metrics were all that was needed.
3) Once the tools are there, a report re-write can have enormous benefits.
Database
One thing that hasn’t been mentioned is database tuning. We did not have the ability to do this on our test database server as it was determined that DBA work should be done directly on Production. However it turns out the production database was already in good shape, and only one parameter was changed.
Key Take Away:
1) Database parameter tuning is not always the problem! Look at what your application is doing and how it is using the database first.
Summary
With addition time to rewrite the other reports, even those that were running in 12 seconds, plus pushing down complex logic into the ETL layer, I am confident that every report can run in less than 8 seconds and most likely in 5. We were getting decent result on reports with tremendously complex logic in them; simplify them and the results will only improve. And this is on a test database with lighter hardware power than production! Unfortunately the tuning project has ended before I could complete the final steps to really show you the end results.
| 17 Reports: | Orig | After Worst 4 Report Fixes |
|
| Avg Report Time | 2:36s | 7s | |
| # < 30 seconds | 4 | 15/15 | |
| # < 10 seconds | 0 | 12/15 | |
| # Meeting Customer Expectations | 1 | 17 |
Again, there is still work to be done on these reports!
I hope I’ve met what I set out to do – be both generic about tuning approach but yet specific about Financial Analytics. The steps and concepts outlined here can be used on any project, and I hope I get to do so again soon!
Webinar on Performance, June 7th, 1pm EST
I’ll be doing a webinar called Planning for High Performance OBI. I’ve given this preso a few times before, but it’ll be good to do it on the web for a change. I discuss some of the basics you should have when beginning your designs. It is not a performance tuning workshop, but rather what to do first to get it done properly the first time. Register through KPI Partners website.
Presenting at Rittman-Mead Forum in Atlanta this May
Just a quick note -
I’ll be presenting this year at Rittman-Mead’s first US based forum. This year its in Atlanta May 26th-27th. There will be a whole host of different speakers there – I’m looking forward to attending all of the sessions.
One cool thing about this forum is that it is designed for architects and developers who’ve been using OBI for a while. That means no marketing or roadmap fluff, and no intro to OBI sessions.
Space is limited so sign up now!
Thanks
Jeff M.
Oracle’s Data Warehouse Reference Architecture
Last year Oracle published a white paper called “Enabling Pervasive BI Through a practical Data Warehouse Reference Architecture”. I took a read through this paper and have some comments on it (so maybe this is a book report). For your convenience, I’ve uploaded a copy of it here Oracle DW Reference Architecture Feb 2010. I also believe the architecture is a little vague (on purpose I suppose). Depending on how you look at it, you might be looking at how the BI Apps are Architected, or you might be looking at how a real-time Enterprise Data Warehouse is built.
Overview
The document lays out a generic DW architecture to support a variety of functions. It discusses each of the primary layers in the architecture and some basic characteristics of them. Unfortunately, it goes into very little detail, and at times brushes off a large topic with a short sentence. Also, the document is not without its Marketing focus, so I’ll try and call out some of them. Remember, this is a software company with a lot of different products; this paper needs to align with them. Fortunately, I’m more focused on getting things done right and less on being consistent with all of the software my company sells.
So, what does it look like? Below is my version of it. Note I’ve used different terms in some cases from the whitepaper to help explain it a bit better. I’ve also re-organized it to make it a little more tangible.
Briefly:
- The Staging layer is the entry point, mirrors the source system and is temporary.
- The DW layer is a generic 3NF model abstracted away from the staging layer. It is designed to store data in flexible manner.
- The Performance layer is all about optimizing data access by whatever tools needed, and therefore frequently is a Star Schema model.
- Having source, generic 3NF and Star allows each layer to be tailored to its purpose and you can leverage the strengths of each.
I’ll go through the layers one at a time and summarize what is in the document as well as ad my own thoughts. As I have just recently implemented this architecture over the last year, I am in a good position to comment on the feasibility and reality of implementing it.
The Staging Layer
Oracle calls this layer the Staging Data Layer, but Stage is much shorter and gets the point across so I’ll use it instead. Oracle’s statements are shown first in italics followed by my commentary.
- This is the only entry point into the DW. All data must go through this layer. I will not be so heavy-handed here. The reality is that some data should not go into your Data Warehouse, and instead should bypass it all and go straight to the Reporting layer. This can happen for at least two reasons: 1 – it is one off data for a specific reporting or analytics need (such as a small spreadsheet maintained by one individual for reporting) and 2 – your organization may be split along EDW and BI teams, and the EDW team is too busy/understaffed to get your piece of data loaded in. The real world tends to make clean pictures much messier.
- Use whichever loading technique is applicable – Real Time CDC, Traditional Batch ETL, file loads, messaging, etc. However, if you restrict yourself to just batch, you are missing out on a possible very powerful benefit – the real time data warehouse. Clearly each piece of data cannot have the exact same access mechanism – some may be real-time database log reads, others may need to be batch, external files, etc. Use what is appropriate for your business needs of course, but if I were to lay out a DW architecture from scratch, I would want this layer to be as much real time as possible.
- It is temporary in nature. If you are going with a Batch-only approach, then yes this layer will contain only incremental data sets and therefore won’t be of much value on its own. However, if your staging is a real-time layer that brings data in from multiple sources, this layer can have a huge impact to your overall data infrastructure. My current project is doing precisely this – other front end applications are directly querying our stage layer as it offers them a place where multiple data sets exist in the same location. More on this in the next few bullets.
- Data is cleansed, records are rejected and data is in better shape than in the source system. This is record by record cleaning to make them more usable. For example, transformation fixed width fields via UPPER(RTRIM())logic to make them more user friendly, or perform some null-value replacement.
- It closely mirrors the source system in structure – minimal transformation are performed here. Although it doesn’t come out and say that, that is the conclusion you should draw as they do not mention integration, conforming and consolidation in this layer.
- It’s in the same database as the rest of the EDW. For temporary staging tables, yes. For what I propose below – a real-time, integrated replica of sources, I think you should put it on a separate box.
When you look at what document says about the Stage layer, you can come away with very different perspectives on it. On one hand it seems like nothing more than bunch of temp tables, as with the BI Apps and the _FS and _DS tables. However this approach is not very interesting and is so last year darling. If it’s just temp tables then honestly I don’t think it needs a write-up.
If you instead think of the stage layer as a real-time integrated and cleansed version of the sources you have something much more powerful. Limited reporting can occur directly off of it for operational reasons. Systems can use it directly in their applications, as is occurring on my current project. Various source applications are presently and are planning to come to our staging area instead of going to the actual source systems themselves. Soon we will have created a clearing house for all application data across the enterprise.
As we’re still in the beginning phases of doing this, and quite frankly the design never had intended this, we’ve noticed some improvements we can make. Our initial approach was to mirror the source as much as possible. We made only 1 exception: the creation of Oracle DATE fields for DB2 iOS5 which store dates as a numeric field in seconds. However, as we were working with our first stage layer consumer, a few things became clear about the philosophy of the Stage layer:
1. Perform more rigorous cleansing to ensure it is more easily usable. Doing joins on UPPER(RTRIM()) fields does not work well. Assume the data is going to be used directly.
2. Enhance the data mode to make it more useable. Perhaps this is a simple denormalization to eliminate a query join. This is the ultimate challenge however; to perform potentially very complex integrations in a real-time manner. It may be difficult to impossible to accomplish this; on our systems clean integration and consolidation of key entities is simply impossible. However there are areas that we can integrate more, and provide benefits for those areas. Ultimately, solving tough integration challenges is not a back-end DW job, it’s a front end job to make sure things are integrated from the beginning.
3. Break the stage layer out on a separate HW environment. The nature of our real time, transaction oriented stage layer is totally different than a batch oriented DW or DM layer. Our block sizes have been globally set at 32K, which is terrible for essentially an OLTP layer. Furthermore, the staging layer will have uptime requirements that are greater than the rest of the solution, so we plan on running it in a clustered, HA system with Oracle RAC.
If you think of the stage layer as another application, it changes things around as opposed to being a collection of temporary landing tables with no reuse in mind.
Data Warehouse Layer
Oracle calls this the Foundation Layer, but it is in fact a Data Warehouse. I prefer this term, as the overall architecture does in fact include both a Data Warehouse and a collection of Data Marts. Explicitly calling them by name helps to cut down on confusion on the most mis-used terms in all of DW & BI. BTW, I wrote a lot more on this topic a while ago here. I was pleased that the document discusses 3NF vs. Stars in the beginning – it lays it very nicely.
The DW layer is where it all comes together – where all data is integrated, consolidated and conformed into common entities.
- Data Management is the main goal and as such uses 3NF. This layer aims to capture and represent data in a flexible and dynamic format. It is not designed for high-speed queries or end-user usability in mind; that is for the Performance Layer. Efficient storage and supporting unknown requirements are its goals.
- The model is generic. This means that abstraction from the actual source data structures is needed to help insulate from source changes and business changes. For example, one would not make a table for each level of a hierarchy; if the hierarchy changes the tables would not work any longer. Instead, design in generic structures to allow for change. In other words, let the structure be maintained in the data not in the data model. So in our example, you would model a recursive parent-child relationship with any number of levels indicated by a level type. Adding or removing a level would therefore have no impact to the data model, and possibly not even to the code
- All Data is Granular. Aggregates and summaries are not needed here; it keeps transaction as they are at the atomic level.
- They recommend starting with a 3rd party industry model. I don’t have much experience with Industry standard models, but they may be a nice jumpstart for you.
Again the document doesn’t discuss much about how it is built. The assumption then is that it is a combination of real-time and batch as needed.
Performance Layer
The final layer in the EDW Architecture is the Access and Performance Layer.
- This layer is tailored for both end user ease of use and query performance. This layer should be done in a top-down approach, as that is the only way to know what model will perform best for your specific requirements. A bottom up approach will always miss out on how to best solve a given problem using the BI Tool and data model working together.
- There are multiple embedded dependent Data Marts. Embedded means they are on the same hardware platform, and dependent means they are derived from the Data Warehouse. I will lighten up on the definition here somewhat. Oracle is perhaps forgetting about some of their own products such as Essbase cubes which are external. However, as performance allows, it is much easier to build the Data Marts in the same system as the DW with views or CTAS operations than moving it outside the database.
Additionally, I will allow for other data elements to come from external sources (bypassing the DW) such as a user maintained Spreadsheet. Typically this kind of information doesn’t belong in your Enterprise EDW, so make allowances for it. I have had a customer who shot themselves in the foot with the conflicting mandate: “All Data Marts are 100% dependent on the EDW, but we don’t want all data in the EDW”. So what do you do when there is data that you don’t want in there?
Finally, there is more than one DM. As BI Applications projects come along, they are their own separate entity requiring their own data model to suit their specific needs. Even though they may have some overlapping data, as it is all coming from the same source one can ensure that there is a single version of the truth.
- Regular Data Marts are Star Schema for performance and ease of use. Almost self-explanatory. This is where you use the appropriate technique for the job. Stars simply are easier to use and are designed with performance in mind.
- Other mart formats may be used as well – cubes, flat structures, etc. Again, suit the technology and format to the specific needs of a system.
- Subject oriented Subsets. This is where specific business needs are solved. A mature system may have several data marts, each performing a specific set of capabilities on a subset of the overall EDW.
- Implemented with Views or Aggregations. Whenever I hear “Views for the Data Mart” I get worried. Views are logic encapsulation devices and do nothing whatsoever for performance. This is the way TeraData looks at the world – views on top of 3NF. I am not convinced it works well (I’ve seen it fail all 3 times I’ve been on a TD project). Perhaps views can work for very small and simple structures, but do not plan on having your Data Mart be all views. What the Oracle document does not mention is the complexity of metric calculations that can take place here, something far too complex for views, possibly even MVs. Additionally, solutions such as snapshots or SCDs may be implemented here which cannot be done with Views. Plan on persisting the data via Materialized views or even regular ETL code.
- User or Process created data from this layer can be fed back to the Stage layer. I’ll add if needed. Also, for some reason they specifically remove EPM from the discussion; not sure why.
BI Layer
The Architecture talks a bit out the BI layer as well, but honestly this is where the document breaks down into a pure marketing document. Essentially it says: OBI is awesome and can handle anything you throw at it and all other BI tools should kneel before it:
- Use it for metric calculation on top of the Performance Layer and before other BI tools. So model everything in OBI, then put BO or Cognos on top of OBI for their pretty user iterfaces. This is a pretty silly recommendation honestly, one that has been around for a long time. Technically yes it is possible. But it is additional work, layers, and support, not to mention additional cost.
- Use OBI to prototype directly against the source systems and then swap out the physical layer. Honestly this pure marketing recommendation is ridiculous. First off modeling OLTP sources is much harder to do. Secondly it will take significantly more effort for the BI developer to figure out the source model. Third, who wants a prototype model and reporting system being migrated into production with all of the short cuts, temporary folder, mock-ups and general junk that happen during a prototyping effort? Again, although in theory it is possible, reality says you will end up with complete garbage, and your prototype will move at a snail’s pace. Built your prototype off of MS Access.
- Data in the stage can be combined with data in the performance layer. I’ve been seeing this option come up a bit more recently. Basically, it’s using OBI fragmentation to map in both the OLTP and the DM into the same model so that both data sets can be queried seamlessly. I’ve seen this promoted by Siebel/Oracle for years, and quite frankly it doesn’t work except for very specific cases. There are problems mixing data in different states together at query time that chokes this solution.
So in the end you end up with a picture like the following:
The BI Apps
So, how does Oracle reconcile the BI Apps with this architecture? They mention it a few times in a somewhat confusing manner, but ultimately they get to it: They exist in parallel with the EDW. Therefore there are two extracts from the same source system. In fact, they are positioning them as light analysis reporting systems when compared to the much more sophisticated EDW solution. In many ways this is true; the benefit of the BI apps is time to market; it is impossible to develop the same content by hand in as little time as you could by buying the solution.
This is a tricky topic. As they correctly point out, pre-built reporting marts are here to stay, so you’ll need to account for them somehow. I think Oracle’s recommendation is probably a good one, even if it sounds a bit weird. Just like you leverage the benefits of 3NF and Stars for specific purposes, you leverage the BI Apps for time to market, while you leverage the EDW for a more complete and sophisticated solution.
One option that I would also consider is to simply use the BI Apps as the EDW. Of course this is completely against the Reference architecture, but it is an option. The BI Apps can be extended until you get blue in the face; why not use it as a foundation?
Summary
The document is a bit confusing at times, but it makes you think about many things. It’s basic structure is a good one, but keep in mind it is not the only way to do Enterprise Data Warehousing. I’d say that as there is little firm design solutions presented (i.e., Real-time vs. batch), so it turns out to be either highly adaptable or almost meaningless (choose your phrase based on your job title). To put some more meat on it, and take it forward a bit so that you are building for the future, here is one way I would modify the architecture:
1. Real-time CDC capture into a Real-Time ODS layer
2. Real-time layer is cleansed and integrated as much as possible based on source system realities
3. Other applications leverage the Real-Time ODS layer
4. A Generic 3NF EDW model is loaded by batch
5. Data Marts built as needed by batch
6. OBI reports on both the DM layers and the Real-Time ODS layer as needed
Of course one could take this further into everything real-time, but that’s a bit of a stretch for now.

















