Build Your Own Time Hierarchy
Every single BI system has a time/date/period/calendar hierarchy in it. If you use the BI Apps, it actually comes with 2 different tables for this purpose alone, along with a ton of OOB capability. However, when you are faced with a pure custom system from scratch, how do you build one? What does it look like? What sort of data should it have in it?
I’m going to lay out what a decent Time Hierarchy looks like from an OBI and data model perspective. From there I’ll then demonstrate how to actually build one yourself instead of relying on complex code. Along the way we’ll mix in some best practices and theory as well.
Your mileage may vary depending on the specifics of your system, but my goal is to make the design and building of your hierarchy take only 3 hours and last the lifetime of your project (or career).
There Are Multiple Hierarchies to be Aware of
The clearest and most common way to demonstrate this is via the Fiscal Calendar and the day-today calendar hanging on our wall called the Gregorian Calendar. In such a scenario, the definitions of weeks, months, quarters and years can be very different from each other, and both exist at the same time.
For example, January 31, 2014 will always be in January 2014 in the Gregorian calendar, but depending on your company’s Fiscal calendar it might be in 2013, or it might be in February 2014 or even February 2013. The year depends what month your calendar begins in, usually based on when the company first incorporated and began tracking its financials. For example, the US Federal Government’s Fiscal calendar starts in October not January.
To further add to the confusion, some companies use a 4-4-5 system that ensures that their Quarters always have 13 weeks in them to make sure Q/Q analysis is highly accurate. With this system, Month 1 & 2 of a quarter always have exactly 4 weeks and the 3rd month always has exactly 5 weeks. Doing so of course completely ignores Gregorian Month definitions and even years. Note some fiscal calendars have 52 weeks while others have 53.
This type of system makes rollups from Day to Week to Month to Quarter to Year very clean and balanced. But note if this is not done like this a given week can cross months. If drill down is required from Month to week, then some sort of business rule is needed. For example, one way is to simply assign the whole week to whichever month the Sunday or Monday falls into. Another way is to leave it split across – a M:M scenario between Month and Week. That is up to the business customer how they wish to have numbers allocated and drill downs operate, so make sure to ask these questions.
Both of these two hierarchies though are nothing more than higher level classifications of days or dates. There is only 1 January 31st, 2014 in the real world; how you choose to roll up and classify those dates does not change the unique set of dates. Think of it then that Week, Month, Quarter, etc. are all just grouping fields, class fields, category fields, type fields, etc. The hierarchy then simply becomes a custom roll-up all sharing the same bottom grain of day/date. Once you make that abstraction, dealing with multiple hierarchies at the same time becomes a bit easier.
In some cases you may even encounter 3 or more time hierarchies. A current customer has just such a scenario – they have their own Division’s Fiscal Calendar, but also the “Mother Ship” has a different one as well; no one is really concerned about Gregorian Calendar (for the moment that is). This is no big deal however; they are simply 3 different roll-ups of Dates.
Given that any date hierarchy we’re interested in is a roll-up of dates, you can abstractly think about it as in this visual:
Here, each hierarchy is independent of the others – a key point. Think of them as independent moving parts. Along with that the roll-up paths can also be different. The Yellow path represents one way to build a Gregorian calendar (weeks do not roll up to months but do to Quarters), while the others represent a different path.
A Closer Look at a Level
The picture above represents the first two parts of building a calendar Hierarchy:
- Identify the different calendars
- Identify the relationships in those calendars
Next we’ll want to look at what goes into a level as far as fields go. There are a few things to keep in mind when planning on a set of fields to have at each level:
- Have a user friendly short display of the level across years, meaning within a year or Intra-Year. For example, for Month this would be Jan, Feb, Mar, etc., for year it might say FY14, or Q3 for Quarter
- For all text fields, always have a numeric value alongside it to be used for sorting. In this case, to sort the month display fields, you would have a number from 1-12 and for Quarter it would be values from 1-4. You would use this column in the RPD on the Logical column for the text fields in its Sort Column setting. This holds true for year as well, although in simple cases this isn’t necessary.
- Have a user friendly long display of the level that includes the year, meaning Inter-Year. The format of this is customer specific, but the point is that the year is included in the text so that cross year analysis is possible. This field does not have to have year at the beginning to make it sortable. Some examples: “Jan-14”, “14 / Jan”, “Jan 2014”, “2014 / 01”, “FY14-01”, “FY14 Q3”. This is a field and format that the business owns; make sure to discuss the formats with them early on so that it can be built early and used for reporting without late arriving data value changes.
- To facilitate sorting of the year inclusive long names, a numeric value is also needed. This field will take the format of YYYYMM or YYYYQ, etc. This field is critical for OBI Time Series functions to work; without these that feature will not work. It will serve as the Level’s Chronological key in the Dimensional Hierarchy in OBI. Secondarily this field is used for the sort order of the above text field.
Those are the 4 “mandatory” fields at each level (doesn’t apply to the Year level completely). For the lowest level, Day, you would also have a date field truncated and mapped in OBI as a DATE not a DATETIME.
Primary Key: Note also that the Year inclusive numeric column #4 at the day level will be in the format of YYYYMMDD. This is the primary key of the table, and is the only exception I can think of for Surrogate key usage. The reason? Debugging, fix, testing, and general working with the data becomes much easier if you can quickly identify that the load for 20140319 (March 19, 2014) has double the records it should. I do not like formats of YYYYDDD (3 digit day of the year as they are too difficult for humans to work with). Furthermore Oracle Database understands this format to some degree and can be used for clean table Partitioning of Fact tables.
Although those 4 fields are the basics at each level of each hierarchy, you may wish to add a whole variety of supporting fields to the table. These fields frequently help ETL jobs perform lookups, contain current period flags useful for reports and variables, date ranges, date math, holidays, weekend flags, etc. Use as many as you feel you need. Some examples:
- If desired (usually optional and less important for pure BI work), a long display name to correspond with #1 above. So January, February, March, etc.
- Begin and End dates. Identify the Period/Month/Quarter being and end dates and capture as both a date field and a numeric field in the same format as the numeric column. The _WID columns do serve a purpose, especially for monthly snapshots where all data for a month is tagged to the last day of the period. So for example, Month might have the following 4 fields & values for March 2014:
- MONTH_BEGIN_DT ’01-MAR-2014’
- MONTH_BEGIN_WID 20140301
- MONTH_END_DT ’31-MAR-2014’
- MONTH_END_WID 20140331
- Current flags, updated by ETL each night:
- MONTH_CURRENT_FLG Y|N
- In some cases other marker flags might be useful, such as a FIRST_WEEK_OF_QUARTER_FLG. Of course if you use _WIDs for these types of things you can always find that row by filtering on FIRST_WEEK_OF_QUARTER_WID = ROW_WID.
- Other Day classification flags:
- Legacy fields: Useful for doing ETL lookups depending on your data set. If there is a specific field format used in a source file, then have a custom prepared field to make the lookup easy and fast.
- Julian Date Numbers are sometimes used (and are part of the BI Apps). I don’t find them overly useful however.
I’m a big fan of a good naming convention. It helps organize things and makes development easier and less error prone, and makes life easier for users. I’m proposing a naming format of: <Hierarchy>_<Level>_<Field Type>
- Prefix the column with the Hierarchy Name. In most places that boils down to: FISC_ and CAL_
- Next is the level name, such as YEAR, QTR, MTH, PERIOD, WEEK, etc.
- Field Type suffixes come at the end and help keep like columns (described in the sections above) together.
- IntraLevel Display Name (e.g., Mar): _NAME
- IntraLevel Long Display Names: e.g., (March): _LONG_NAME
- IntraLevel Sort (e.g., 3): _NUM or _ORD for Ordinal
- InterLevel Display Names (e.g., Mar-14): _FULL_NAME
- InterLevel Sort (e.g., 201403): _KEY or _SORT or _WID
- Dates: _DT
Putting the Table Design Together
With all of the above items, shown below is a sample of what the structure and formats of such a table might look like. Your formats and hierarchies of course will vary. Note that the template also has some OBI configuration details as well.
There are only a few things of interest on the OBI side. Recall this is one logical table and one hierarchy with two or more drill paths; you do not usually need to create two physical or logical tables. First is the Hierarchy itself which follows the structure laid out in the above spreadsheet. There are a few things to keep in mind:
- Set up your display columns with proper sort columns underneath them as in the 5th column in the XLS. This will ensure data is always sorted properly.
- Make sure to drag every column to its appropriate level. Not doing this will 100% cause you errors. Only the display key and sort key should be set up as level keys, but all of the fields need to be on the right level. The only exception to this are items at the day level (Details level) – If a column is not on a level then it is assumed to be at the details level.
- Make sure the Hierarchy is setup as a Time Hierarchy with the checkbox on the General Tab of the Hierarchy.
- Recall we are setting up one Hierarchy with one Totals level but 2 tops that end up at the same bottom. Graphically the hierarchy looks like this:
- Set up 2 Level keys for each level:
- For any sort of interesting aggregate navigation, enter the # of elements at each level. Keep in mind that for Month the value is not 12, but 12* # years you have in the table.
- Make sure your Logical Table Sources have the appropriate levels set on the Content tab (as always).
One thing I recommend is to use the table you’ve just built as the basis for any Repository variables related to date that you use. Why bother reproducing all of the logic in a second place (bad design)? What if the format needs to change?
In one easy step, based off of today’s date you can look up a whole army of date variables with one Initialization block. Fast, efficient, and isolated from some logic and data changes.
Building Calendar Data
OK great, I’ve defined everything, but how do I get the data in the table?
You could write some complex ETL job with lots of logic that needs to be debugged like the BI Apps do. You might generate it almost entirely in code, or you might pull it from a source calendar table. I advise against these due to the simple fact that they take too much time and are overkill.
You can build a fantastic, rich and robust table using MS Excel in just a few hours. I recently built one with 2 different Fiscal Calendars, a Gregorian calendar, and a special twist for support of an Adjustment Period (think of it is period 13) in about 2 hours. That’s 40 columns for 32 years, all with a very high degree of certainty of accuracy and no code to write or debug.
I’ll show you a few techniques and formulas to help speed the process along. You may have some data that you wish to retain from either another DW calendar or a source system, so modify the steps below to suit your needs.
- Start with Column A as your Date column. Enter in 1/1/2010, then drag that value down the rows until you reach the end of the time rang you need. For everyone at the customer who is using the calendar I built to 2030, we’ll all pretty much be retired by the time we get that far along.
- Start decomposing the date into Years, Months, Day of week, etc. using Excel functions. Once you get the first few rows correct, again drag the formula all the way to the bottom. Use date functions such as Year() and Month() to get those values.
- Sometimes it might be simpler to simply type a few values in, select the block and repeatedly copy and paste. I do this with Jan – Dec fields if I don’t feel like messing with Excel functions.
- A few functions to help you out along the way:
- Get the last 2 digits of a 4 digit # (for getting the 2 digit year): = MOD(B2,100)
- Convert number to text format : Text(B2,”00”) – take a look at the Excel help for different codes in the “00” portion
- Text concatenation: Concatenate function or &. Ex: to build “FY98” from 1998, I used: =CONCATENATE(“FY”,TEXT(MOD(B2,100),”00”)) or, or “FY98 Q4” as =C3 &” “&E3
- Use intermediate columns to help you through multiple layers of logic
- Use basic math to build various _WIDs: FISCAL_QTR_WID (YYYYQ)= =FISCAL_YEAR_NUM*10+FISCAL_QUARTER_NUM
- For certain logic use the If() statement. In this example, Fiscal Period of 13 was and Adjustment period so it was flagged as such: =IF(L9=13,”Y”,”N”)
- Another example is assigning Quarter numbers based off of Month #s: =IF(MONTH_NUM<4, 1,IF(MONTH_NUM<7,2,IF(MONTH_NUM<10,3,4)))
The reality is that if anything needs to be dynamic (like current flags), you can apply this logic to the base structure defined here in Excel. Trickier items such as holidays might require a synch up with another source for such holidays, but the core data set is a one-time build. There is no real reason that I have seen yet to have the whole table be dynamic in nature.
That said, build it in Excel and consider it Seed data to be imported as part of your deployment process.
Sorry you are not done until you’ve tested the hierarchy. Take a simple base metric such as Sales $, and build versions of it along each of your levels in you hierarchy. Do this for AGO and YTD type metrics. The goal here is simply unit test, but also it gives you and your team confidence that it works and you need not be too concerned with it again.
Build a simple report of Month and Sales $ for a reasonable data range, run it, look at SQL and Results.
Then add in Sales $ MAGO and look at the results. From this point forward the SQL will be unreadable so we simply have to rely on the results.
Continue adding different metrics to the mix, changing the levels, etc. until you have a degree of confidence that it works.
A few final wrap up items.
- Put bitmap indexes on all of your filter value fields, usually the _FULL_NAME fields.
- As the table is static, why not build the DDL with the COMPRESS option in the Oracle database?
- Time Series metrics must be defined based off of a specific hierarchy path and level (if there are more than one). So a metric cannot be called simply “Sales QAGO” if there are 2 Data hierarchy trees, it would need to be called “Sales Fiscal QAGO” or “Sales Cal QAGO” – the configuration of the time series metric forces you to select a unique level. Thus, be precise when capturing the requirement and naming the derived metric.
- If you need aggregate tables at certain months, it will be a straightforward task. The table can be built using a SELECT DISTINCT off of the base day table, selecting the columns that are at the desired level and above. So for a Monthly Fiscal Calendar aggregate W_FSCL_MONTH_DA table, you might issue something like:
SELECT DISTINCT FISCAL_MONTH_WID, FISCAL_MONTH_FULL_NAME, … FISCAL_QUARTER_WID, FISCAL_QUARTER_FULL_NAME, …, FISCAL_YEAR_NUM, FISCAL_YEAR_NAME
ORDER BY FISCAL_MONTH_WID ASC;
- If you end up with lots of aggregations, you will encounter a problem if you need aggregation by both hierarchies your Logical model will not be able to handle it easily. Why? Think of the content tab – there is room for only one level to be selected. For example, even if your Fiscal Months and Calendar months align perfectly with each other, you still have defined a Fiscal Month level and a Calendar Month level in the Hierarchy. When you map in the Logical Table Source, you can only pick one not both. The best thing to do here is to make 2 Aliases and have 2 LTSs, pegging each to it specific hierarchy level.
- Time does not belong in the Date hierarchy! Solve that with a time or clock table – do not mix the two together.
A good time hierarchy should be very robust and flexible for users. We really shouldn’t ever need to build special RPD or report logic to concatenate or do funky things with the calendar dimension – build them into the table. Hopefully you’ve seen how easy it actually is, and adding a new column isn’t that daunting. I know I was lucky enough to be on an OBI project 11 years ago where we had to build the calendar out 40 years into the future. Once that was built I took a copy of it and have been using it for side things and demos ever since – I’ll never have to worry about it again for as long as I do this kind of work!
Posted on August 28, 2014, in BI Theory and Best Practices, OBI EE Development, Repository, Data Modeling and tagged RPD, OBI, OBIEE, Dimensional, Time Series, Hierarchy, date, time, Data Model, Oracle BI, Oracle Business INtelligence. Bookmark the permalink. 2 Comments.