Last year Oracle published a white paper called “Enabling Pervasive BI Through a practical Data Warehouse Reference Architecture”. I took a read through this paper and have some comments on it (so maybe this is a book report). For your convenience, I’ve uploaded a copy of it here Oracle DW Reference Architecture Feb 2010. I also believe the architecture is a little vague (on purpose I suppose). Depending on how you look at it, you might be looking at how the BI Apps are Architected, or you might be looking at how a real-time Enterprise Data Warehouse is built. Read the rest of this entry
Thanks to my colleague Sobhan Surapaneni for helping me out with some of the details of CDC. Sobhan was the guy who made it all happen on the project and really knows this stuff cold.
Traditional Data Warehousing and BI systems rely on batch data extraction and load routines (ETL or ELT) to acquire their data from sources. Depending on data timeliness needs, this may work for intervals down to perhaps one hour for small data sets. However as data timeliness needs go below what is reasonable with a batch process, a complete shift in technology is required. There are tools out there that acquire data in a “real-time” manner directly from the database log files and send individual transactions out over the wire to a target database.
This post is about Informatica’s CDC product, but the lessons and the manner in which it works are similar for another popular product called Golden Gate from Oracle. Note the name Change Data Capture is not the best; this really is more about a real-time solution as any DW has to implement change data capture as part of its Data Acquisition approach. Read the rest of this entry
Many of you are aware that there is a relatively new kind of storage out there called a Solid State Disk. Instead of spinning platters with moving heads, SSDs are flash memory chips. Think of them as a super fast, large thumb drive.
There are 2 main kinds of SSDs, SLC and MLC. They roughly define the difference between enterprise (SLC) and consumer (MLC), based on both performance and reliability (which of course means price!). Then there are 2 main interfaces out there – a traditional SATA II/SATA III (just coming out these days) or one that plugs into the bus directly (PCI for desktop PCs and most Intel servers). The ones that plug into the PIC bus are really only for businesses, as they are very expensive. But by removing the SATA interface bottlenecks, they are unbelievably fast. Read the rest of this entry