Category Archives: Other Technologies & Tools
The term Big Data has existed in some form or another for years but recently has taken on a new and more official meaning. In today’s world of massive internet applications, digital instruments streaming non-stop data, scientific data collection and fraud detection, Big Data has grown far beyond what even a large company used to consider large – into the hundreds of terabytes or even petabytes. Furthermore, Big Data has a large unstructured component to it, whether comments on websites, blog data, internet usage, images or documents. This kind of information typically does not map well to traditional database technologies which rely on a very structured table/column arrangement.
Considering high volume and great variability of data, along with very high uptime and extremely short response times needed, traditional RDBMSs simply won’t work – they will not be able to scale out to provide 1 second response time when a Facebook user posts a picture or visits a friend’s wall when there are millions of users looking at petabytes of data. Thus, completely different kinds of data access and storage technologies are needed, ones which are designed to scale far beyond even a very powerful systems such as Oracle ExaData.
This article discusses Oracle’s view of Big Data and in particular how it pertains to Data Warehousing and Business Intelligence. Keep in mind there are many offerings and capabilities pertaining to the acquisition and use of Big Data which are well beyond the scope of Data Warehousing and BI systems; I’m going to focus on just a slice of it here.
Thanks to my colleague Sobhan Surapaneni for helping me out with some of the details of CDC. Sobhan was the guy who made it all happen on the project and really knows this stuff cold.
Traditional Data Warehousing and BI systems rely on batch data extraction and load routines (ETL or ELT) to acquire their data from sources. Depending on data timeliness needs, this may work for intervals down to perhaps one hour for small data sets. However as data timeliness needs go below what is reasonable with a batch process, a complete shift in technology is required. There are tools out there that acquire data in a “real-time” manner directly from the database log files and send individual transactions out over the wire to a target database.
This post is about Informatica’s CDC product, but the lessons and the manner in which it works are similar for another popular product called Golden Gate from Oracle. Note the name Change Data Capture is not the best; this really is more about a real-time solution as any DW has to implement change data capture as part of its Data Acquisition approach. Read the rest of this entry
Many of you are aware that there is a relatively new kind of storage out there called a Solid State Disk. Instead of spinning platters with moving heads, SSDs are flash memory chips. Think of them as a super fast, large thumb drive.
There are 2 main kinds of SSDs, SLC and MLC. They roughly define the difference between enterprise (SLC) and consumer (MLC), based on both performance and reliability (which of course means price!). Then there are 2 main interfaces out there – a traditional SATA II/SATA III (just coming out these days) or one that plugs into the bus directly (PCI for desktop PCs and most Intel servers). The ones that plug into the PIC bus are really only for businesses, as they are very expensive. But by removing the SATA interface bottlenecks, they are unbelievably fast. Read the rest of this entry
Originally posted 2/19/2007.
Having recently attended one of Oracle’s coming out parties for its new Data Mining software, I thought I’d take the opportunity while its still fresh in my head to talk about how the two work together in Oracle’s vision of BI. Read the rest of this entry