As a response to my previous post, one of my readers asked me this question – “ why are you hell-bent on making a star schema? You yourself have told that a snowflake schema would give the best performance during cube processing and once the cube is processed, it doesn’t matter which schema was used – the query performance would be the same. So make the designer’s life easier by making all the dimension attributes in 3NF, obviously nothing can go wrong there! “ Well, I have been extremely lucky to work along with a highly technical support & maintenance team at the client site, who challenges and makes me give appropriate reasons for every design decision I take. We as a vendor develop our applications and pass it on to the support team for the final code review and acceptance (this is in addition to the UAT or data testing part which would be done along with the business users). This made me think of them as two separate customers with two different priorities for the same application. If I have to sell my application to them, I need to appease both of them and the only way I can do is by doing a fine balancing act. The Business Users would want the best performing system (both in terms of cube processing as well as query performance) while the Support Team would want a system which is easier to support and maintain. If I designed all my attributes in 3NF, the support team would straightaway reject the application as they wouldn’t make sense what is happening in the DSV with the hundreds of attributes and relations. Since my cubes are usually less than 50GB, the processing time difference between the two schemas is less than 15 minutes which my business users would not even notice. In this case, it makes sense to use a star schema and appease my other customer, the Support Team. Your support team would love you for the additional work you do in your DSV’s diagram organizer when you neatly show them the star schemas for every fact. In my last post, I had quickly summarized some of the best practices that I apply in the design of my dimension sources and also explained the first scenario in which I would snowflake the dimension sources for the DSV. The three scenarios as well as the example product dimension are repeated below for quick reference
Scenario 2 – Multiple facts at multiple granularity Consider the example in the above figure where there are facts coming in at multiple levels of the same dimension (at Prod Group, Brand Pack, Local SKU and Nielsen SKU levels). In this scenario, all the members of the levels at which the fact comes are needed and hence, the following views are made a) Individual views for each of the level at which fact granularity comes – For each level at which the fact comes, a view is made which will contain all the attributes / entities till the next entity where one of the three scenarios’ happen. Since there are 4 levels at which fact comes in this example, the following views are made // View for Local SKU and attributes SELECT // Scenario 3 at Retail SKU Local_SKU, RSKU#FK FROM <table> // View for Nielsen SKU and attributes SELECT // Scenario 3 at Retail SKU Nielsen_SKU, RSKU#FK FROM <table> // View for Brand Pack attributes SELECT // Scenario 2 at Product Group BrandPack, BrandSF, LocalBrandSF, BrandFamily, RPCT, RPCTGrp, PG#FK FROM <table> // View for Product Group attributes SELECT ProductGroup, ReportingProductGroup FROM <table> Some things to note here
Meanwhile, as usual, let me know till what level you agree or disagree on these points.