The Importance of Naming Standards
Originally posted 5/2/2007
Ah, Naming Standards. IT loves to talk about naming standards, particularly when they’re talking about databases or code modules. In such a world, naming standards allow other developers to more quickly pick up on a piece of code or a database design and be productive understanding it more quickly. They can also help yourself out, after you revisit a piece of old code or database that you worked on 3 years ago.
Years ago when I was developing C and PL/SQL, I was all about “self documenting code” – the code was so easy to understand because it was very structured and well named. If you’ve heard of Hungarian Notation, then you know what I’m talking about.
Standards are all about reducing confusion and to become fluent and comfortable more quickly. Sure, without a standard one could eventually figure it out, but if by following a simple protocol you can eliminate wasted effort, why not do it?
Of course these concepts carry over to DataWarehousing in the form of Data Modeling Standards, ETL Standards, even Repository Standards. Inside the RPD there are probably only a few to consider. In particular, I like to use about 4: Physical Layer Tables, Logical Fact Tables, Dimensional Hierarchies, and Initialization Blocks. You could probably come up with some more for Logical Table Sources and variables as well.
But this post isn’t about RPD, Database or ETL Naming Standards; it’s about Business Model Naming standards for logical tables and columns.
Why We Need Them
First, why do we need to create naming standards for our ad-hoc user community? The answer is: for the same reason that IT needs them. Their purpose is to make an overwhelmingly large and complex thing a bit easier to understand. With consistency comes ease of use, and we all would like to make an easy to use deployment
Although a given project may be small in nature, it may not always be so; it may grow over time to encompass more and more tables and objects. The larger something becomes, the more upfront effort will be needed to determine naming standards for end users. Think of a system with hundreds or thousands of columns; organization and consistency will be paramount. Think of an IT developer or a power-user who has to navigate through all of these items inside dozens Subject Areas. If the names of objects are pure chaos, the power-user will struggle to achieve results quickly, and may possibly make an error. If such an ad-hoc environment was neatly organized, one can see the benefits of reduced confusion.
Generally speaking, end-users own the names of objects on the UI; IT can own the hidden things. Since they own the names, they must be involved in the construction of their own naming standard to follow. Leave the details of what the naming standard for US Currency should be up to them – it doesn’t matter whether they choose Name $, Name Amt USD or $ Name USD. The key points are that a) they are involved in making the decision, and b) they are bound by their decision when it comes time to name columns.
A couple of items to consider:
- Date vs Date/Time vs Time fields
- Currency Codes
- How to account for Avg, Min, Max, Totals, etc in the name of the metric
- How should derived groups / categories / bins be named
When to Decide
If you are following good BI project methodology using a top-down approach to your design, your requirements will drive the logical model that supports those requirements. The OBI Business Model (or whatever tool you use) will in turn support the logical model. Since you are going to build the logical model in the BI tool, and the Logical model comes from requirements, your BI naming standards must be done before you begin requirements gathering. (ok, call it the first step if you like.) Part of the requirements gathering process is to determine not only the definition of every single field and metric in the system, but also what its name should be. Throughout the subsequent gathering and definition process, enforce the naming standards that the user community themselves have developed.
This will most likely not be a smooth process however, particularly if you have one or more of the following scenarios:
- Your project is doing JBOR: Just a Bunch Of Reports
- Your project is very small – thinking BIG is difficult to explain why
- You encounter fields or metrics that have an obscure acronym that everyone has used forever. Also certain business functions all work in the same way (think Financial Metrics)
- Your names are too long to fit into a tight report size on a dashboard (common)
- You have specific names for external (Customer or Partner) users
In these scenarios, the name shown in the report may be different than the real name that you have derived and built into the RPD. This is a fact of life. Try then to come up with perhaps not only a good, standards abiding field name, but also an abbreviation or alternate. Capture it in your requirements documents and if you publish a metadata dictionary, include it there. It is unfortunate, but Analytics does not have capabilities to use alternate names pulled from metadata. The downside to this is that not only does it circumvent the good naming standards you have come up with, but now there are two names for the same piece of information, making a switch from a report to ad-hoc more difficult.
While You’re At It
This belongs in a whole different post, but is along a similar topic. In addition to naming standards, there are a lot of things to consider when arriving at a good name. In fact, making good metric and dimensional attribute names can be an exercise in balancing directly opposing objectives: Be descriptive but concise.
Obtaining the buy-in for even having naming standards will be your biggest challenge; coming up with the actual standard will be relatively easy. There may be some arguing among different users on the details, but you’ll get by them. However, if you are able to arrive at a tight naming standard, your result will be a consistent, meaningful ad-hoc area and your users will feel included.