Blog Archives

Oracle’s Data Warehouse Reference Architecture

Last year Oracle published a white paper called “Enabling Pervasive BI Through a practical Data Warehouse Reference Architecture”.  I took a read through this paper and have some comments on it  (so maybe this is a book report).  For your convenience, I’ve uploaded a copy of it here Oracle DW Reference Architecture Feb 2010.  I also believe the architecture is a little vague (on purpose I suppose).  Depending on how you look at it, you might be looking at how the BI Apps are Architected, or you might be looking at how a real-time Enterprise Data Warehouse is built. Read the rest of this entry

Advertisements

Informatica CDC for Real Time Data Capture

Thanks to my colleague Sobhan Surapaneni for helping me out with some of the details of CDC.  Sobhan was the guy who made it all happen on the project and really knows this stuff cold.

Introduction

Traditional Data Warehousing and BI systems rely on batch data extraction and load routines (ETL or ELT) to acquire their data from sources.  Depending on data timeliness needs, this may work for intervals down to perhaps one hour for small data sets.  However as data timeliness needs go below what is reasonable with a batch process, a complete shift in technology is required.  There are tools out there that acquire data in a “real-time” manner directly from the database log files and send individual transactions out over the wire to a target database.

This post is about Informatica’s CDC product, but the lessons and the manner in which it works are similar for another popular product called Golden Gate from Oracle.  Note the name Change Data Capture is not the best; this really is more about a real-time solution as any DW has to implement change data capture as part of its Data Acquisition approach. Read the rest of this entry

SSDs – A Short Story

Many of you are aware that there is a relatively new kind of storage out there called a Solid State Disk.  Instead of spinning platters with moving heads, SSDs are flash memory chips.  Think of them as a super fast, large thumb drive.

There are 2 main kinds of SSDs, SLC and MLC.  They roughly define the difference between enterprise (SLC) and consumer (MLC), based on both performance and reliability (which of course means price!).  Then there are 2 main interfaces out there – a traditional SATA II/SATA III (just coming out these days) or one that plugs into the bus directly (PCI for desktop PCs and most Intel servers).  The ones that plug into the PIC bus are really only for businesses, as they are very expensive.  But by removing the SATA interface bottlenecks, they are unbelievably fast. Read the rest of this entry

Approaches to Building BI Applications and Data Warehouses

Original Post date 2/27/09

I’d like to comment a bit on the differences between BI and Data Warehousing as I believe they are commonly confused and used interchangeably.  Having just come from a project which mixed the two together with poor results, I think some discussion on the topic is a good idea and one that is fresh in my head.  This is a long post, so go get some coffee (or Red Bull) first.

Before I get into this, I fully understand and accept that there are varied opinions about this topic.  I am not going to propose a one-size-fits-all philosophy.  There also may be some technical features out there which some may claim make my comments obsolete.  I am going to focus on the purpose of the two and how that drives how you build a BI system.  If you are hoping I’ll get into a Kimball vs. Inmon data warehouse architecture or modeling discussion you will be disappointed.  I recommend you read their books and articles for the details of their solutions.

I am going to approach this by defining two very distinct items at different ends of a spectrum, and then discuss how they can and cannot be merged into the same thing.  The point is to contrast the two needs so conflicts become apparent. Read the rest of this entry