OOW BI Imp Panel Questions #3: Can Exadata Replace a DW?

Click here for the first post in the series and here for the second.

Question #3: With the power of Exadata, are Data Warehouses even needed anymore?  Can’t you just map your BI tools to the transactional system without all of the back end ETL work?

Short Answer: For most systems a DW and ETL are necessary.  However a simple reporting system however can leverage this option for a while.

Long Answer: Data Warehouses started out as a means to offload processing of large data set queries from online transactional systems.  Basically breaking up the workload.

However that was a long time ago.  For a while companies have realized huge benefits by bringing data together and integrating it.  Companies also realize that when you start to bring data together you encounter all sorts of problems.  Problems that a data warehouse really isn’t the best place to solve these problems, but at least they are the initiators and key consumers of these solutions.  These problems then spawn solutions and areas where a company can turn its data into assets.  Broadly put, this encompasses the whole set of fields spawned from DW and BI systems such as of Master Data Management, Data Governance, Data Stewardship and corporate KPIs.

Can you simply put EBS or PeopleSoft on an Exadata and have at it with OBI?  In a very limited capacity, yes.  Forget about the Exadata portion of this question for a moment and focus on the key part of this question: modeling OBI on top of a 3NF source system.  That is what this post is really about.  All Exadata does is increase performance and capacity – which addresses none of the items associated with the difficulties of mapping to real time sources.

Disclaimer : This post in no way suggest Exadata is a bad choice for those with large data warehouses.  I would love to put my current Dimensional Model OBI system on one and see what it can do!

Mapping to the 3NF transactional system will work for you if:

  • You have a real time data need, and the other solutions won’t get you there quickly or easily enough.  Making some reports off of the 3NF is fine, but keep in mind not all of your requirements are real-time – only some of them are.  Exadata will help offset the performance setback.
  • For some basic operational reports, sure go ahead and use OBI for it.  All ERP or CRM packages have an integrated reporting component to them.  Why not use the BI tool to do this?  Towards the end of the Siebel days, they began replacing the Actuate reporting tool in the CRM suite with Analytics reports.  But make no mistake; you are building operational reports and not replacing a Data Warehouse nor are you doing Business Intelligence.  Besides, you may be able to leverage the same tool for both kinds of reporting – operational and your BI.
  • You do not plan on using Answers.  I guess if you had something really simple you could, but I’m not sure there is enough value with such a simple model to warrant using the ad-hoc tool.
  • You are ok with a single system reporting solution, and don’t need to integrate any data.  Somehow you’ve reconciled this solution with whatever information comes out of other marts and DWs.

So those are when it’s OK or even a good idea, but let me list out some of the problems and issues with this approach and when it’s not a good idea:

  • Complexity is a Real Challenge: Modeling OBI on a normalized transactional system is significantly more difficult that on a nice, clean dimensional model.  And by difficult I don’t mean more effort – ETL is of course a huge amount of effort.  I mean difficult in the sense that you’ll need to be modeling at 10 out of 10 degree of difficulty.  That in and of itself is a huge risk – sometimes things just don’t work right in a tool and you have no code to help you out.

If all you are doing is some basic reporting then most likely it will be no great challenge.  However once your requirements gain in complexity you will hit a wall that you’ll only be able to accomplish with view on top of view on top of view.  At that point you sort of are writing ETL – the logic is all there – you’re just not instantiating the data.  On several occasions now I’ve seen OBI modeled on top of a view-heavy 3NF transactional system because the client believed in the hype about how amazing TeraData was, and the results were horrible.  I also know of several systems I’ve done where the complexity of the metrics was so great you would be foolish to try and do it real time – I don’t care how much horsepower you have.  Finally, things like snapshot tables you simply will not be able to do as the data is not instantiated in a point in time and therefore retained after updates have wiped out the old state.  Computing power contributes nothing to the development degree of difficulty.

  • Isolation is the Opposite of Power: It can work if you don’t need to integrate with any other systems.  This is the real killer.  Without integrating with other systems, all you have is a reporting copy of your transactions – nothing more. Cross system integration is one of the main reasons for integrating your data into a single place – the integration is often times very difficult and processing intensive.  “But OBI does federated queries!” – and I can build your next house too.  Good luck on that one.  Computing power contributes nothing when some of the data is on a different platform.
  • Unused Capacity Will Soon be Used Capacity: This question has been posed time and time again with each jump in hardware power.  I don’t care how big your hardware is, you will always use it up.  And you will use it up faster than the hardware improves or your budgets allow.  Then you will be stuck with no data warehouse and will have to frantically put one together on a limited budget.  For their own sake, I hope the VP who led you down this path is long gone by this time. Computing power is a limited resource; usage is ever growing.
  • Software > Hardware: A college professor taught us a long time ago that ultimately, great hardware will never overcome bad software.  He was speaking about scalability of algorithms on ever larger data sets.  I’m pretty sure I can write some queries that can bring the box to its knees.  I’ve seen it happen on other boxes such as TeraData and large Oracle databases, and Exadata is no different.  If your queries are junk, you’ll just be using that expensive hardware to overcome them, and eventually they won’t be able to.  I hope the budget owner asks you this question:  have you done all you can to your existing software before you give up, quit and I plunk down a large sum on a Ferrari?  Computing power is a weak excuse for poor software design.
  • High Upfront Costs: A friend’s company was considering purchasing an Exadata said it costs ballpark $2 million dollars to purchase and get provisioned in a data center.  $2 million dollars.  I think there are a lot of smart people out there who can achieve similar or better benefits with a lot less than $2 million dollars.  Oh, and you are left with something afterwards, not a bunch of spaghetti OBI mappings and database views.  My friend was able to dissuade his company from making this mistake.  Computing power is expensive.
  • Data Warehouses > OBI A Data Warehouse is much more than a collection of reports.  A Data Warehouse has uses and functions far beyond use by OBIEE and a set of dashboards.  Other applications can use this data, including other reporting tools.  Perhaps you supply other systems with data (like a Customer Hub concept).  Perhaps you plug Cognos and OBI onto the same Data Warehouse. What about providing a single version of the truth? None of these things are possible without a flexible Data Warehouse layer. Perhaps we might be better off asking this question solely about a Data Mart or an OBI reporting system.  Computing power doesn’t solve your need to use the same data in different ways.

I’m sure I could come up with a few more points about what a Data Warehouse is and why you need it, but I hope by now the point is clear.

The Counter Argument

But what about the perceived benefits of forgoing all of the modeling and ETL work, and particularly its maintenance?  Isn’t it cheaper to forgo all of the ETL development & support cost and time and map directly to the source?  In other words, isn’t it less expensive to buy the big box and have a smaller development team?  Won’t there be less code to go through and maintain?

When a piece of logic changes, it needs to change regardless of what tool you implement it in.  You’ll of course have to make the change in ETL, but I think people tend to write off the change when you map the BI tool directly to the source as zero effort.  This is far from true, especially as your complexity goes up – someone will have to go through the analysis, redesign and re-QA of a view or BI tool change just as an ETL developer will.  Perhaps the turnaround can be faster without ETL, as you need to test you data movement logic in addition to the BI tool/end-to-end logic.  Also, the simpler the change the more the direct map option comes out ahead of ETL, just due to the QA overhead.

Add it all up, and it probably is less costly to forgo the ETL team on a direct year to year basis.  Is it enough to overcome the added cost of the hardware?  That’s up to you to decide on your own system.  I caution you to not underestimate the complexity that you will have to deal with one way or another even with the direct to 3NF option – it is far from zero.

Case Study

My current system uses Oracle’s DW Reference architecture.  Essentially, it has three layers: 1) a real-time table by table copy of the source tables, 2) an integrated 3NF ODS layer, 3) a performance/aggregation layer called the Data Mart.  In our system, 80% of the logic is used to merge 8 different systems into a single integrated ODS layer.  The DM layer is really just some denormalizations and aggregations for performance reasons and for mapping into OBI more easily.

Although the ODS is a 3NF model, not all 3NF models are created equal.  Our model abstracts out the various event types and places them all in a clean and tightly integrated model.  However our sources are somewhat 3NF, and quite frankly they are all over the map on how they were designed.

If we were to map to our real-time layer in OBI, we would be halted right away with the very first thing we’d need to know – what is our distinct list of customers?  Since customers come from many different systems and have weak links between them, some of our more complex ETL logic went into the integration effort.  We have even more complex integration logic elsewhere, but I don’t know how we’d be able to get past this very first challenge using Views and OBI mappings.  This is the deal breaker in other words.

Conclusion

There are a lot of reasons not to map directly to your 3NF, and there are some good reasons to map to your 3NF.  I think when you add all of these direct to 3NF negatives up and what you end up with is a system with a low ceiling.  If you intend on using this as your primary BI platform for the foreseeable future, the integration problem, the complexity problem or the capacity problem will eventually make this option a dead-end.  When that happens, you will be completely stuck with nothing to build from to take you to the next level.  However for a simple, stand-alone operational reporting system with no data integration needs, it works perfectly fine.

Note that the power of the hardware really doesn’t factor in to the equation much.  It will overcome some % hit when compared to a well designed and ETL’d system, but maybe not enough – depends on your scenario.

Posted on December 1, 2010, in Architecture, BI Theory and Best Practices, Data Warehousing and tagged , , , , . Bookmark the permalink. 2 Comments.

  1. Thanks Jeff…this is exactly what I was looking for.

  1. Pingback: Dw modeling | BigRunup

Leave a comment