Modeling Header & Details

Originally posted 3/16/2007

A common question I hear is about how to model & configure a Header-Detail scenario.  By saying header-details, we are referring to a transaction that is split over more than one granularity.  This is common modeling scenario such as Order & Order Line Item, Invoice & Invoice Detail, Agreement & Agreement Line Item, etc.

First, let’s describe the scenario using the classic Order Header & Detail as an example.  Think of going to the supermarket and buying a bunch of products.  Think about the receipt that you get.  You will have a list of products, with Quantities and Unit Prices.  At the bottom, in addition to the total, you will also have additional values such as Taxes, Discounts, Points Earned, and Coupons (i.e., Promotions) that apply to the whole of the transaction, not the individual line items.  In a transactional system, the Order Header would contain most of the information that pertains to the whole transaction, such as location, time, employee, and customer (if known).  The list of line item details would contain all of the above, plus the addition of Product.  This is what is meant by different granularities – they are the exact same, but the line item details also contains the product dimension, and is thus at a lower level of granularity.

So how should such a transaction be modeled?  The reality is that there is no one right answer.

The Problem

First, think a bit about how they might be used from a pure data modeling/SQL perspective.  Consider joining Order Header and Order Line Item together as follows:

Select
    Order_Header.OrderID,
    Order_Header.Tax_Amt,
    Order_Line.Sale_Amt
From
    Order_Header,
    Order_Line
Where
    Order.OrderID = Order_Line.OrderID;

What will happen if an Order has 5 line items, totaling $200 is sales, and a total Tax Amount of $10?  5 records would return in the query:

OrderID Tax_Amt Sale_Amt
1093A-23 $10.00 $50.00
1093A-23 $10.00 $40.00
1093A-23 $10.00 $30.00
1093A-23 $10.00 $10.00
1093A-23 $10.00 $70.00

There are 5 records returned, each one is for a different product, but we just happened not to show them.

Now what if you aggregated them to the Order Header level?

Select
    Order_Header.OrderID,
    Sum(Order_Header.Tax_Amt),
    Sum(Order_Line.Sale_Amt)
From
    Order_Header,
    Order_Line
Where
    Order_Header.OrderID = Order_Line.OrderID
Group by Order_Header.OrderID;
OrderID Tax_Amt Sale_Amt
1093a-23 $50.00 $200.00

Here, although the Sale_Amt is correct, the Tax_Amt has been overstated.  Whenever you mix granularities, you will ALWAYS encounter this problem.  In Dimensional Modeling, Facts must be Normalized.  This means that following solutions by themselves will not work, as they all produce the same resulting data set :

  1. Physical Table: Making one table (W_ORDER_LINEITEM_F) that holds both values from Header and Details.
  2. View: Making 2 separate fact tables, but making a view that joins between the two
  3. LTS View: Joining the 2 tables in a single Logical Table Source in OBI

If you are still learning SQL, it is imperative to understand that the joins and WHERE clauses will determine the base data set, which is then aggregated as was done with the above queries.  You can kind of think of it like the query below, where the inner portion runs first to get the dataset, then the outer portion runs:

Select
    BASE_DATA_SET.OrderID,
    Sum(BASE_DATA_SET.Tax_Amt),
    Sum(BASE_DATA_SET.Sale_Amt)
FROM
(Select
    Order_Header.OrderID,
    Order_Header.Tax_Amt,
    Order_Line.Sale_Amt
From
    Order_Header,
    Order_Line
Where Order.OrderID = Order_Line.OrderID) DATA_SET
Group by BASE_DATA_SET.OrderID;

In fact, that is how the database will actually go about executing the query internally.

So, what is the solution if we want Tax_Amt on this query to be $10.00 and not $50.00?  There are two main categories to choose from.

#1 Lower the grain with a pro-ration

Essentially, divide the header facts and pro-rate them down to the details level based on some business rule(s).  Thus if a header has $10 in tax and 5 detail records, each would get $2.

Or would it?  Maybe there is a better more sophisticated way than assigning them equal prorated tax amounts, perhaps based on value, quantity, or some other business rule.  The point here is that you are essentially creating a new definition for a new metric, one that begins as a prorated tax amount.  Sometimes this is difficult to do, and your user community may not like this idea.  This is the primary solution proposed by Ralph Kimball.

This concept can be implemented using the three techniques identified above (Table, View or LTS Join), with the difference being that the pro-ration rule is applied to take a header value and pro-rate it over multiple child records:

1.      Physical Table:  Use the ETL to calculate the rule and store the prorated values in one table.  The main benefit here is that there is only one fact table which contains everything, meaning good performance and simpler BI Tool mapping.  Downsides are primarily the additional effort required, and I guess you could throw in additional data storage which I don’t see as a problem for most deployments.

2.      Database View: The calculation is contained in the view SQL, but still requires two fact tables.  If the business rule to prorate is difficult, then this may not be realistic

3.      LTS View: Pretty much the same as the Database view option, except the business rule has been moved into the BI tool and out of the database

Of the three examples above, the first one will be your best bet in most cases, as it is simpler and will outperform the other two due to not having a Header-Details join.  With large tables, it will significantly outperform the options with joins due to simply the smaller dataset.  In Oracle systems it also aligns with Oracle’s Star Transformation feature better, which is a major performance booster.

There is another variation, one that is essentially the same technique used when displaying metrics with a multi-valued dimension.  In the multi-valued situation, a single fact record applied to multiple dimension records, such as Sales Amt and a set of Sales Rep who worked on the deal.  In this scenario, a weighting factor is recommended to be added to a bridge or relationship table that captures the M:M relationship.  The metric can then be computed by multiplying the header fact with the dimension’s weighting factor for that fact to arrive at a prorated amount.

If you think about it, that’s really what the Order Line Item Detail table is: it holds the M:M relationship between Order Header and Product, plus some additional information that happens to be on the Line Item Detail record.  The grain of either solution is Order-Product whether you have a normalized solution using a bridge table or a denormalized Line Item table or a full denormalized one where you merge Header and Line together.  Thus, if so inclined you may substitute a pro-rated value in the details fact table for a weighting factor, which can then be multiplied by the header base fact to arrive at a prorated factor.  Make sure your weighting factors add up to 1.00000 for each Header record however!

Note that all of the techniques discussed above are essentially the same thing; they are forcing a higher level fact to a lower level by prorating it to a lower level.  They simply differ in their techniques, effort, performance characteristics and storage needs.

Key Benefits:

  • Eliminates an extra fact table (storage, ETL processing, DB Management, Less BI tool mapping)
  • Can perform exceptionally well
  • Mapping into the BI tool is simpler

Key Drawbacks:

  • An acceptable pro-ration rule must be identified
  • Creates a piece of data that does not exist (A header level metrics pro-rated to a lower level)
  • A header-only query must use the details table all of the time; can be slower if the ratio between header and details becomes large (> 10 details per Header)

#2 Use multiple queries and merge the results together

This technique puts more of the work and sophistication into the BI Server.  With this solution, two separate fact tables are used, one for each grain.  This refers to both Physical tables and also in the Business Model layer.  When using a metric from Header and a Metric from Details, BI EE will issue two queries and merge the result set.  Thus,

Select
    Order_Header.OrderID,
    Sum(Order_Header.Tax_Amt),
    Sum(Order_Line.Sale_Amt)
From Order_Header, Order_Line
Where
    Order.OrderID = Order_Line.OrderID
Group by Order.OrderID;

Will be essentially re-written as:

Select D1.OrderID, D1.TTL_TAX_AMT , D2.TTL_SALES_AMT
FROM (
Select
    Order_Header.OrderID,
    Sum(Order_Header.Tax_Amt) TTL_TAX_AMT,
From Order_Header
Group by Order.OrderID ) as D1,
(
Select Order_Header.OrderID,
    Sum(Order_Line.Sale_Amt) TTL_SALES_AMT
From Order_Line
Group by OrderLine.OrderID) AS D2
WHERE D1.OrderID = D2.OrderID;

In reality, the BI server will most commonly issue two separate queries to the database and do the join on the server, but the SQL above illustrates how that work from a data set and logic perspective.  With this solution, each inner query maintains its proper grain, most importantly the header query does not get lower to the details level and thus maintains its accuracy.

Key Benefits

  • Simpler (although more) ETL, as tables map more 1:1 with a source system
  • May be the only option when mapping to some data sources such as a transactional one
  • Header only queries may be faster as line items are not scanned in the query

Key Drawbacks

  • BI Tool configuration becomes more complex and there is more of it
  • For queries that use both header and detail, longer query times as 2 queries are executed
  • Requires 2 fact tables instead of 1 (storage, ETL processing, DB Management, Less BI tool mapping)

Summary

If you can identify a workable and agreeable pro-ration rule, then go for the single, lower level fact table.  However this is usually its downfall, in which case use the BI tool solution.

Advertisements

Posted on July 16, 2009, in Data Modeling, OBI EE Development and tagged , , , , , . Bookmark the permalink. 2 Comments.

  1. Dimitry Faynerman

    Great article, Jeff. Thank you very much for the info.
    Just one small correction. The where clause in the first query should be
    Where Order_Header.OrderID = Order_Line.OrderID;

  1. Pingback: OBIEE – Modeling Header & Details | My WordPress

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: