SSAS

When and How to Snowflake Dimension Sources : SSAS Design Part 1

When and How to Snowflake Dimension Sources : SSAS Design Part 1

May 31, 2011

One of the best practices while designing a SSAS solution is to decouple your source database (which is in ideal cases, the data warehouse or the data mart but could also be an OLTP database) with the help of a semantic layer or view layer. Even though the Data Source View (DSV) of Analysis Services does provide a semantic layer, it is best to do all your intended operations/calculations within a SQL view and then use that view as the source of your dimensions and facts in the DSV. This will make sure that if the query for a dimension or fact needs to be changed at a later point of time, the SSAS solution need not be opened and make life easier for the DBA who is maintaining the system. It also has some performance benefits for the cube processing in the case of complex queries, which is beyond the scope of this post. This post tries to discuss a scenario in which you would want to snowflake your dimensions in the view layer (and not the scenarios in which you would want to snowflake your dimensions in the relational database or SSAS). Before all the Kimball enthusiasts come to bash me up, hear me out when I say that my preferred option would be to use the denormalized dimensions to create the popular star schemas. Even if the dimension is snowflaked in the relational database, I usually end up making a view on top of these normalized tables which will denormalize them into a single dimension giving a star schema look to the dimensions in the DSV. I am completely for the star schemas because of it’s simple, easy to understand and maintain structure. With this, I would also like to clear a popular misconception that the star schema is the best model for SSAS because of performance benefits due to lesser joins. Actually for large dimensions, the star schema would take longer time for cube processing than the snowflake schema and you can get a taste of this from a demo in MVP Ashwani Roy’s (blog | twitter) presentation on Analysis Services Best Practices.

Star to Snowflake

Vincent Rainardi (blog) has written an excellent post on When to Snowflake your dimensions in the data warehouse side, which would be a good read prior to this post. If your data modeller generally follows a star schema approach and has already taken care of the concepts outlined in the above article by snowflaking appropriately, you would be lucky enough to just create the view layer as the exact copy of your data warehouse. Else have a look at the scenario below while making your view layer-

  • Multiple lower levels for the dimension (Scenario 1)
  • Multiple facts coming at multiple granularity (Scenario 2)
  • Multiple levels joining to the same parent (Scenario 3)

We will use the following Product dimension as an example for explaining the three scenarios.

Product dimension linked to facts

Scenario 1 – Multiple Lower Levels

Multiple Lower Levels

Consider the above figure where there are multiple lower levels (Nielsen SKU and Local SKU) in the dimension. Even though SSAS can handle multiple lower levels, it is always better to create a surrogate key which can be used as the dimension key (Read this post to learn how to create a dimension with multiple column key). For this scenario, the following views would be created in the view layer which would be then used as such in the DSV:-

    a.     View for dummy level – For the dummy level key, a surrogate id or sequence id can be used as it would be unique. The view would be of the following format

SELECT

   // SID as dummy level key and the other two as foreign keys

SID, LocalSKU_ID#FK, NielsenSKU_ID#FK

FROM

<table>

  1. Individual Views for each lowest level – For each of the lowest level, an individual view needs to be created which would contain all the attributes / entities till the next entity where one of the three scenarios’ happen. Since there are two lower levels, two views need to be created as shown below.

// View for Local SKU and attributes

SELECT

Local_SKU, RSKU#FK

FROM

<table>

// View for Nielsen SKU and attributes

SELECT

Nielsen_SKU, RSKU#FK

FROM

<table>

It is to be noted that the attributes / entities are taken only till the level of Retail SKU because Scenario 3 (multiple levels joining to the same parent) applies at Retail SKU level. Now, why do we have to snowflake in this scenario? Let us see how the denormalized table would be for the same three levels

SID LSKU NSKU RSKU
1 L1 NULL R1
2 L2 NULL R2
3 NULL N1 R1
4 NULL N2 R2
5 NULL N3 R2

Usually, I replace the NULLs by –1 in my design to avoid using the Unknown Member feature of SSAS. If this table is taken as the dimension source for SSAS and the appropriate relations are specified as SID—>LSKU—>RSKU and SID—>NSKU—>RSKU, there would be errors during dimension processing saying that there are multiple parents for the same child (for eg, –1 for NSKU would point to R1 and R2 values for Retail SKU). Snowflaking the source views as shown above would solve this problem. Please note that the SSAS Product dimension would still be a single one which would contain all the attributes from the snowflaked source views. The other scenarios would be covered in the subsequent posts. Meanwhile it would be great to know how much you agree or disagree on these points.

Posted by SQLJason, 0 comments
Performance problems with Dynamic Named Sets

Performance problems with Dynamic Named Sets

April 19, 2011

The last couple of days, I have been fiddling around with an interesting mdx query passed on to me by Hrvoje Piasevoli (blog | twitter | MSDN). Though I wouldn’t be explaining about the query as Hrvoje himself would blog about it soon (once he gets off the SSAS forums and twitter, which could be never Winking smile), I was trying to recreate the scenario in the cube and was involving a lot of dynamic sets. That is when I hit upon the problem – queries which were running in split seconds started to drag. And I am not even speaking of the measures which involved those dynamic named sets. If you are a beginner to named sets, I would suggest having a quick glance through the post – Static Named Sets v/s Dynamic Named Sets before reading further. Dynamic Set Performance The steps to demonstrate the issues are given below. Problem 1) Before illustrating the problem, let us take the average times of a simple query in cold cache. For the demo, the query shown below is used SELECT [Measures].[Internet Sales Amount] ON 0
FROM [Adventure Works]
WHERE {[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2001]} The average time in cold cache was around 63ms in my laptop. 1 Query before set creation 2) Now a simple dynamic set is created in the cube calculated member script as shown below CREATE DYNAMIC SET CURRENTCUBE.[ExSet]
AS (EXISTING [Date].[Calendar].[Calendar Year]); The average time of the previous query is again checked in cold cache and is found out to be around 63ms still. All well and good so far. 3) Now create a measure which will refer the dynamic set, like shown below CREATE MEMBER CURRENTCUBE.[Measures].SetString
AS SetToStr(ExSet),
VISIBLE = 1  ; The average time of the previous query is again checked and suddenly, it has come up to 1.1s which is an increase of 75%. – Issue 1 2 Query after set creation 4) Let us also note down the time taken to display the SetString measure which was created. 3 Query for SetString measure Even this takes around 1.1s on cold cache which is a lot for such a simple operation. – Issue 2 Reason On closely checking the traces from Profiler, it is found that when the SetString measure referring the dynamic set is not created, only the above query (in step 1) gets executed (after the calculated member script is loaded into memory). However, after the SetString measure is created, in addition to the above events, a series of Query dimension events and Calculate Non Empty Current events are generated which indicates that the dynamic set is being evaluated, even though it is not being used in the query. 4 Reason for performance degradation Workaround A workaround for this issue is to trick the engine into thinking that the dynamic set is not being referred. This can be done by replacing the set with StrToSet. Eg – the dynamic set ExSet will be replaced by StrToSet(‘ExSet’). Hence, the calculated member definition in the script will be modified as shown below CREATE MEMBER CURRENTCUBE.[Measures].SetString
AS SetToStr(StrToSet(‘ExSet’)), VISIBLE = 1 ; This will solve the issue and bring the performance back to normal. 5 Solution Note This issue had been raised in Connect almost 20 months ago and Microsoft had responded saying that they will investigate this in the future release. As of now (SQL Server 2008 R2 – 10.50.1600.1), there has been no improvements and part of it could be because of the very low number of votes. Considering that the performance could get a lot more worse in the case of complex dynamic sets, it is a serious issue for me and needs to be fixed. If you also feel the same, please vote for this Connect issue – Calculated member with a reference to dynamic named set Kills the cube’s performance.

Posted by SQLJason, 1 comment
Where to store calculated measures – SSAS cube or SSRS reports?

Where to store calculated measures – SSAS cube or SSRS reports?

March 15, 2011

As I mentioned in one of my previous posts, nowadays I am busy preparing the design document of my new project and sending it around for reviews. If there is one question I could pick from the list that keeps coming back to me frequently, that would be this – why are you recommending to store all the report level calculations in the cube rather than in the reports? Usually I get away by just answering it is better for query performance and the reports would come out faster. Of course, I didn’t see any value explaining the technical behind-the-scenes part on why the reports would come out faster to my reviewers (ok, I admit, I am just lazy). But this time, one of my over-zealous peers was adamant on knowing the reason and I decided to demonstrate it to him finally. Store calculated measure in cube demo Before I illustrate the actual demonstration, it is necessary to understand the importance of caching in SSAS. Designing a system which will optimally utilize the caching feature is very critical for query performance and user experience. To quote SSAS 2008 Performance Guide “ During the execution of an MDX query, the query processor stores calculation results in the query processor cache. The primary benefits of the cache are to optimize the evaluation of calculations and to support reuse of calculation results across users. To understand how the query processor uses caching during query execution, consider the following example. You have a calculated member called Profit Margin. When an MDX query requests Profit Margin by Sales Territory, the query processor caches the non null Profit Margin values for each Sales Territory. To manage the reuse of the cached results across users, the query processor distinguishes different contexts in the cache: • Query Context—contains the result of any calculations created by using the WITH keyword within a query. The query context is created on demand and terminates when the query is over. Therefore, the cache of the query context is not shared across queries in a session.
Session Context —contains the result of any calculations created by using the CREATE statement within a given session. The cache of the session context is reused from request to request in the same session, but it is not shared across sessions.
Global Context —contains the result of any calculations that are shared among users. The cache of the global context can be shared across sessions if the sessions share the same security roles. The contexts are tiered in terms of their level of reusage. At the top, the query context is can be reused only within the query. At the bottom, the global context has the greatest potential for reusage across multiple sessions and users. “ So, in short, calculations defined in Query Context can be reused only within the query while calculations defined in Global context can be reused across sessions. This means that measures which are defined within the global scope will come out faster after the first time it has been run (since it is already there in the cache) compared to the query scope measures. Now let us see what happens when we execute an MDX query with the same calculation defined within a report and within a cube. For the demonstration, I would be using the Adventure Works cube and the following calculation – ([Measures].[Internet Sales Amount] / 2) +1 Option 1 – Make calculated member within SSRS Report 1) Make a simple report dataset with the following query WITH MEMBER [measures].[non cache] as
[Measures].[Internet Sales Amount]/2 +1
SELECT NON EMPTY { [measures].[non cache] } ON COLUMNS,
NON EMPTY { ([Customer].[Customer Geography].[State-Province].ALLMEMBERS ) }  ON ROWS
FROM [Adventure Works] non cache is the name of the new calculated measure. 2) Use the fields from the query to make a simple table in the layout. Report Layout 3) Before previewing the report, clear the cache of the database cache by running the following XMLA query in Management Studio. <ClearCache xmlns=”http://schemas.microsoft.com/analysisservices/2003/engine“>
  <Object>
    <DatabaseID>Adventure Works DW</DatabaseID>
  </Object>
</ClearCache> 4) Keep SQL Server profiler ready and preview the report now. You would be getting the below activity in the trace 1 query scope measure - after cache clearing We can see some Progress Report Begin and End events which indicates that some I/O is happening to fetch the data. The EventSubClass column of Query Subcube event clearly says that the data being fetched is non-cache data, as expected. 5) Now refresh the report once again and check out the profiler activity. 2 query scope measure - using cache We can see that there is a Get Data from Cache event. But on further observation, we can see that this data is from the measure group cache, which means that this is from the cache of the storage engine. Data would need to be passed back to the formula engine to execute any possible calculations, as is shown by the following AS querying architecture diagram. MDX Query execution architecture The formula Engine processes MDX Queries, figures out what data is needed to cover them, and forwards this information to the storage engine for retrieval. It then performs all calculations needed for the query. The Storage Engine handles all the reading and writing of data. It also retrieves and aggregates the data the formula engine requested at query run time. Now let us see what happens when we define the calculation inside the cube and use it in our reports. Option 2 – Make calculated member within SSAS Cube 1) Make the calculation in the calculated member script of the cube as shown below Create measure in cube cached is the name of the new calculated measure. 2) Edit the dataset query in the report to use the new measure SELECT NON EMPTY { [measures].[cached] } ON COLUMNS,
NON EMPTY { ([Customer].[Customer Geography].[State-Province].ALLMEMBERS ) } ON ROWS
FROM [Adventure Works] 3) Make the appropriate changes in the table also to include cached instead of non cache measure. Report Layout for cached measure 4) Before previewing the report, clear the cache of the database cache by running the following XMLA query in Management Studio. <ClearCache xmlns=”http://schemas.microsoft.com/analysisservices/2003/engine“>
<Object>
<DatabaseID>Adventure Works DW</DatabaseID>
</Object>
</ClearCache> 5) Keep SQL Server profiler ready and preview the report now. You would be getting the below activity in the trace 3 global scope measure - after cache clearing We can see that the activities and time are pretty much the same compared to the first approach when the cache was cleared. 6) Now refresh the report one more time and have a look at the profiler activity. 4 global scope measure - using cache Voila! Now we can see that there is a Get Data from Cache event and the Text Data clearly says that it is of Global Scope. The description in the EventSubClass column says that the data is from the calculation cache which means it is from the cache of the formula engine. Note : For the more inquisitive minds, the three main EventSubClass column values of the Get Data from Cache event are

  • 1 – Get data from measure group cache :- storage engine cache, so data from here would need to be passed back to the formula engine to have any calculations executed. This could mean that something in your query is not allowing the calculations to be cached in the formula engine. Some
  • 2 – Get data from flat cache :- formula engine cache, but this is used to store individual cell values and is only used when a query is executing in cell-by-cell mode. So if you see this it is an indication that your query is not executing in block mode
  • 3 – Get data from calculation cache :- formula engine cache and is probably the best type of cache because it is used in block mode and stores the final results of any calculation.

Conclusion Calculated measures defined within the report use the query context for caching while those defined within the cube uses the global context. Hence, from a query performance angle, it is always better to define measures within the cube than in the reports as the global context would be much faster after the first time the report has been run and also because the cache can be re-used across sessions. References This is an advanced topic and at least some of you would have found it a bit confusing (now you know why I don’t explain such stuff! Winking smile). You might want to take your time and go through the following references to know more about Storage Engine & Formula Engine or MDX performance tuning in general. 1) Microsoft SQL Server 2008 Analysis Services Unleashed (Book)
2) SSAS 2008 Performance Guide (Whitepaper)
3) Identifying and Resolving MDX Bottlenecks (Whitepaper)
4) Blogs by Mosha Pasumansky particularly
      a) Inside OLAP Engine: SE cache sharing
      b) Inside OLAP Engine: Cache Prefetching
      c) Take advantage of FE caching to optimize MDX performance
5) Blogs by Chris Webb particularly
      a) Formula Caching and Query Scope
      b) Reporting Services-generated MDX, Subselects and Formula Caching
      c) Missing Members and the Formula Engine Cache
      d) Now() and the Formula Cache
      e) Arbitrary-shaped sets and the Storage Engine cache

Posted by SQLJason, 1 comment
Is it the end of the road for SSAS?

Is it the end of the road for SSAS?

November 12, 2010

If you have been following the PASS summit notes coming from the attendees, you would have noticed an unmatched sense of pandemonium in the SSAS community. The reason is none other than the announcement of Microsoft’s decision to move from the traditional multidimensional SSAS database to the new in-memory Vertipaq BISM. My first source of this news was from Chris Webb’s blog – PASS Summit Day 2 and after reading it, I was in a state of shock for quite some time. It seemed hard to digest that Microsoft could be doing this to one of their most successful tools in the market. This could have been aggravated by the realization that one of my core skills was going to get obsolete and the hesitation or resistance to change from my comfort zone. Even Teo Lachev seemed to mirror the moderate disappointment that seemed to be floating around in Chris’ writeup in his own blog – The Battle of the Models. It wasn’t disappointment everywhere as experts like Boyan Penev (Thoughts on BISM, SSAS and MDX) and Siddharth Mehta (SQL Server Denali – Boost to SSIS career, Start of End to SSAS and MDX career?) have tried to find reason with this development and at least my understanding is that they are more on the happier side. After this feeling had finally sunk in, I decided to write a pre-mature obituary for one of my favourite tools in my company’s technology page as given below:- “ Is it the end of the road for Microsoft SQL Server Analysis Services? It certainly seems so, Microsoft has dealt a deathblow to the multidimensional approach of OLAP and has announced in today’s PASS summit that it’s focus is on a newer and less multi-dimensional approach – the BI Semantic Model. The BISM – BI Semantic Model – is the name for the new type of Analysis Services database that gets created when you publish a PowerPivot model up to the server. This is clearly a strategy to bring in more ROLAP experts to the tool as the popular perception is that the learning curve for SSAS is pretty steep and many people are put off by it. What it means is that while the MOLAP SSAS won’t be deprecated, the efforts of the SSAS dev team are concentrated on BISM and PowerPivot and we shouldn’t expect any radical new changes for the future. This could mean a gradual death for SSAS by 2012 once Project Denali (which is expected to be renamed as SQL Server 2011, and rumored to be released in the last quarter of 2011) is released and established commercially. So, welcome DAX and bye MDX.” dead-end-sign
Why I termed this as pre-mature is because a new comment has appeared in Chris Webb’s blog from Amir Netz (Lead Architect for Microsoft’s BI offering) and it seems to be promising. Some excerpts from that

  • BISM and VertiPaq are all new features of SSAS. SSAS is gaining massive new capabilities in Denali. Nothing it taken away, nothing is deprecated. It is all net positive. I hope you can not only internalize it but also communicate it to others.
  • The best way I think of the relationship of the “MOLAP UDM” to the “VertiPaq BISM” is akin to the relationship between C++ and C# in the year 2000. While C++ will still stay with us forever, C# is advancing rapidly and is able to take on broader and broader workloads. And the most important thing, Visual Studio – offering both C++ and C# is a much better product then the one offering only C++. It offers developers the option of choosing the right tool for the right task. Now – replace C++ with MOLAP UDM, C# with “VertiPaq BISM”, and Visual Studio with “SSAS” and you got the exact situation of today. 
  • Even with VertiPaq BISM introduced, MOLAP UDM is still used for all the high end heavy lifting and it is just as important as it had always been.
  • As for the roadmap – MOLAP is here to stay. It will have new features every release (just like we have new important MOLAP features in Denali). Yes – BISM being less mature will see a faster innovation pace and being based on a more innovative foundation it will likely be the one creating exciting breakthroughs as we move forward.
  • We worked hard to preserve the investments you made in the UDM and MDX. For example, the BISM supports ODBO and MDX. In fact – this is the only way Excel connects to it. All of the MDX lovers can still send MDX queries and create calculated members in the BISM. This is how Panorama works with the PowerPivot model. AMO works with the BISM as well as with the UDM. etc. Make no mistake about it – MOLAP is still the bread and butter basis of SSAS, now and for a very long time. MDX is mature, functional and will stay with us forever.

Yippee! Again, if I can trust my sense of understanding things, it would mean all is not lost and SSAS is here to stay. I guess I would have to wait for another blog from Chris to completely enlighten common people like me and allay my fears. Till then, I would recommend you guys to go and read Amir’s comments completely in Chris’ blog. And definitely, don’t miss out on the other great blog links that I have given. Update (13/11/2010) Microsoft has come out with an official reply now in the SQL Server Team Blog – Analysis Services – Roadmap for SQL Server “Denali” and Beyond. The blog has tried to put out the fears by underlining that the new BISM model does not replace the traditional UDM model. This is also evident from the model diagram they have posted (given below also) where we can see the relational and multidimensional data models existing side by side. BISM (BI Semantic Model) Some salient points of the blog are:-

  • While PowerPivot is targeted at business users to build Personal and Team BI applications, Analysis Services is the platform for building Professional BI applications.
  • With the introduction of the BI Semantic Model, there are two flavors of Analysis Services – one that runs the UDM (OLAP) model and one that runs the BISM model. This is a side-by-side offering – you can choose to run one instance of Analysis Services hosting a UDM model and another instance hosting a BISM model on the same server.
  • You might ask – Why do we have two types of models in Analysis Services? Which one should I use? Is the UDM going to be deprecated now that we have the BISM? The answer is NO! Analysis Services just got a lot better in SQL Server “Denali” with the BISM and VertiPaq and DAX! However the UDM is a mature and industry leading technology and is here to stay.
  • Some existing UDM applications that are undergoing a major overhaul might be migrated to run on the BISM if appropriate. For new BI applications, the choice between UDM and BISM will depend on the complexity of the application. For BI applications that need the power of the multidimensional model and sophisticated calculation capabilities such as scoped assignments, the UDM is the way to go. For a broad majority of BI applications that don’t need that level of complexity, the BISM will offer a rich, high performance platform with faster time to solution
  • The Analysis Services team is committed to staying on the cutting edge of BI innovation and leading the industry with breakthrough technologies such as VertiPaq. At the same time, we recognize our rich heritage in OLAP and continue to be humbled by success of our product and the vast ecosystem of customers and partners that it has helped us build.

So this blog from Microsoft should answer the question which we asked, the answer – SSAS is here to STAY!

Posted by SQLJason, 1 comment
The Curious Case of Joins while Cube Processing

The Curious Case of Joins while Cube Processing

November 5, 2010

One of those days, I was just chatting with a colleague of mine about some database designs and he was showing me some exquisite piece of work in the database. That is when I happened to glance upon a stored procedure which apparently seemed to materialize an already existing account dimension view as a table. This stroked my inherent sense of curiosity and I unleashed a barrage of questions at him regarding it’s relevance. My colleague started explaining that somehow the dimensions were getting joined with all the fact tables during cube processing and hence the processing takes a very long time to complete if the dimension view was used as such. To add some justice to his words, the dimension view indeed was doing some complex logic and it looked natural that the cube processing would take time if the view was used in the joins instead of a table. But what didn’t look natural to me was that the dimension views were being use din the joins. After all, I was under the assumption that it was just a ‘select from fact table’ that gets executed or at max, the fact views if some logic is used. So I decided to open this curious case of joins for investigation. Even though it wasn’t because I didnt trust my colleague, I just had to see with my own eyes to believe, the doubting Thomas that I am. So I asked him to process a particular fact table and I looked up the query that was being used. With materialized Reference relation Sure enough, the dimension view was being used in the joins. I was dumbfounded and decided to take a break to think. Adding tobacco to my pipe, I put on my thinking cap and stared aimlessly at the setting sun. Random thoughts started racking my brain and I even ventured far to think that it could be a bug in analysis services. Suddenly, something clicked in my mind and I started running towards my colleague and delivered in true Sherlock Holmes style – “Come, Watson, come! The game is afoot. Not a word! Into your clothes and come!”. My colleague also decided to humour me and tagged along. Me : Chance has put in our way a most singular and whimsical problem, and it’s solution is it’s own reward. Colleague : Can you please cut the crap and let me know why this happens? Me : Watson, you know I can’t resist a touch of the dramatic. I have no desire to make mysteries, but it is impossible at the moment of action not to enter into long and complex explanations. Let’s go step by step and have a look at the query once again. Colleague : Duh, ok! Me : Perhaps when a man has special knowledge and special powers like my own, it rather encourages him to seek a complex explanation when a simpler one is at hand. The importance of careful observation can never be emphasized enough. If you would remember the way that you explained the problem to me, you mentioned that all the dimension views were getting joined in the fact table. But if you look at the query (especially the highlighted part), what do you observe? Colleague : Hmmm, just the account dimension is getting joined. Me : How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth? We know now that all the dimensions are not getting joined. We also know that the account dimension is getting joined for some reason. What could be the reason? Colleague : <light dawning on him> It could only mean that we have defined the usage of this dimension differently. Me : Elementary my dear Watson! So let us open the dimension usage tab and check. Dimension Usage Colleague : Yes!!! There is a reference relationship between the Geography dimension and the fact table through Account, that should be the reason! Me : To let the brain work without sufficient material is like racing an engine. It racks itself to pieces. Why should a reference relationship induce a join? After all, it could also be just resolved at run time like the regular relationship. To understand more, click to view the relationship details. materialized option Colleague : Finally! It is because the Materialize option is ticked. Me : Elementary, my dear Watson. When a reference relationship is materialized, the joining across dimensions is performed during processing and not in querying. Also, the attributes in the materialized reference dimensions follow the aggregation rules of standard dimensions. If you remove the tick in the materialize checkbox, you will notice that the joins would not be present anymore. without Materialize Colleague : OK, so this was it. So simple once we know the reason. Me : There, Watson, this infernal case had haunted me for days. I hereby banish it completely from my presence. Colleague : Could you please cut out the Sherlock Holmes part, it’s freaking me out! So, here rests the curious case of joins while cube processing in my chronicles. Note : Special mention for eQuotes for help with the Sherlock Holmes quotes. Also, to read on how the ticking of the Materialize option would affect your data, click on the blog below by Alberto Ferrari – SSAS: Reference materialized dimension might produce incorrect results

Posted by SQLJason, 1 comment
SSAS Dimension Attribute Properties : Part 2

SSAS Dimension Attribute Properties : Part 2

June 23, 2010

Yesterday, I was reading through some articles and happened to glance through a very memorable quote by  Berthold Auerbach – “ The little dissatisfaction which every artist feels at the completion of a work forms the germ of a new work “. That was when I realized I haven’t yet completed my dimension attribute article and so here I am with the second part. In SSAS Dimension Attribute Properties : Part 1, we reviewed the Advanced and Basic properties of dimension attributes. This article would be continuing with the Misc, Parent-Child and Source properties.

Dimension Attribute Property

The properties are explained below:-

MISC

1) AttributeHierarchyOrdered : This particular property specifies whether the members of the attribute are ordered or not. The values of this property can just be True or false. If the order of the members do not matter, setting this property to false for attributes where the attribute hierarchy is enabled or high cardinality attributes can significantly increase the processing performance.

2) GroupingBehavior :  This property is used to give a hint to the client application whether to encourage or discourage users to group on this attribute and does not affect any of the structures on disk. The values are EncurageGrouping and DiscourageGrouping.

3) InstanceSelection : This property also is used to give a hint to the client application’s UI on the recommended means of selection of a member from the attribute. The options available are

  • None – (Default) no selection used.
  • DropDown – Appropriate for situations where the number of items is small enough to display within a dropdown list.
  • List – Appropriate for situations where the number of items is too large for a dropdown list, but not large enough to require filtering.
  • Filtered List – Most useful in scenarios where the number of items is large enough to require users to filter the items to be displayed.
  • Mandatory Filter – Appropriate for situations where the number of items is so large that the display must always be filtered.

4) MemberNamesUnique : This property indicates whether the member names are unique across the attribute hierarchy and this property affects the way member unique names are generated. The available options are True or False.

Parent-Child

1) MembersWithData : In a parent-child hierarchy, some of the non-leaf members may also have data associated with them (unlike normal hierarchies, where the non-leaf members have a value equal to the sum of it’s leaf values). These members are called data members and are present only for parent-child hierarchies. This particular property is used to set the visibility of the data members in parent-child hierarchies and the available options are NonLeafDataHidden and NonLeafDataVisible. This MSDN article – Working with Attributes in Parent-Child Hierarchies does a very nice job of explaining this property with an example.

2) MembersWithDataCaption : This particular property is used as a naming template for the system generated data members. For eg, if we have the MembersWithData property set to NonLeafDataVisible, then a leaf member representation of the data member is added. For eg, if Jason is a data member, with Thomas and Tom as his leaf members, then there would be an additional Jason added as a leaf member. Now to differentiate between the leaf member and the data member (in this case, both are Jason) would be difficult and hence we can use a template like *(leaf member) as the value of this property. The asterisk symbol is a placeholder for the original name. So our example would become Jason for the data member and Jason(leaf member) for the leaf member.

3) NamingTemplate : This property specifies how levels in a particular parent-child hierarchy would be named. Click the ellipsis button (..) in this property’s cell and then you should be able to view a popup window as shown below:-

Level Naming Template

You can specify a name for the level by clicking on the Name column of the second row and entering for eg, Employee *. This will ensure that instead of Level 02, Level 03, etc., you will be getting Employee 02, employee 03 and so on. For more details, refer to the MSDN article – Defining Parent Attribute Properties in a Parent-Child Hierarchy.

4) RootMemberIf : This property is used to specify the criteria by which we can identify the members of the highest level (excluding the ALL level). Again quoting from an article of William Pearson – “ The four selection options include the following:

  • ParentIsBlankSelfOrMissing – (Default) Only members that meet one or more of the conditions described for ParentIsBlank, ParentIsSelf, or ParentIsMissing are treated as root members.
  • ParentIsBlank – Only members with a null, a zero, or an empty string in the key column or columns are treated as root members.
  • ParentIsSelf – Only members with themselves as parents are treated as root members.
  • ParentIsMissing – Only members with parents that cannot be found are treated as root members.

The behavior of the RootMemberIf property in determining how the root or topmost members of a parent-child hierarchy are identified, is, therefore, dependent upon which of the selections above is made. The default, as noted above, is ParentIsBlankSelfOrMissing.

5) UnaryOperatorColumn : We can control how members on a hierarchy (usually parent/child, but not necessarily) aggregate up to their parents by defining unary operators. To do this, you must create a new column in your dimension table to hold the unary operator values and then set the attribute’s UnaryOperatorColumn property to point to it. This property specifies the column which holds the unary operator. You will find a very good example under the Unary Operators and Weights heading of this article – Measures and Measure Groups in Microsoft Analysis Services: Part 1. The values are (none) and (new) for this property. On clicking New, a dialog box opens which will prompt us to select the binding type, source table and the source column.

Source

1) CustomRollupColumn : Unary operators do not give enough flexibility for rollup, and in such cases, we can write our own rollup formulas as MDX expressions. This property is used to specify a column which will contain the custom rollup formula. A valid expression will ensure that the aggregation logic defined in the AggregateFunction property of the measure would be overridden for this attribute.

2) CustomRollupPropertiesColumn : This property is used to contain the properties of a custom rollup column. Refer Custom Member Formulas heading of this article – Measures and Measure Groups in Microsoft Analysis Services: Part 1 to learn more about the above two properties.

3) KeyColumns : This property contains the column/columns that constitute the key for the attribute, which is the column in the underlying relational table in the data source view to which the attribute is bound. If the NameColumn property is not defined, the value of KeyColumns property would be used to display the attribute members.

4) NameColumn : In most of the cases, the key of the attribute would be a integer value, and this would not make any sense to the user who is viewing the attribute members. For this, we can specify a column in this property which will have the user friendly name of the attribute member.

5) ValueColumn : This property identifies the column that provides the value of the attribute. If the NameColumn element of the attribute is specified, the same DataItem values are used as default values for the ValueColumn element. If the NameColumn element of the attribute is not specified and the KeyColumns collection of the attribute contains a single KeyColumn element representing a key column with a string data type, the same DataItem values are used as default values for the ValueColumn element. In KeyColumns and NameColumn property, there are additional properties which can be got on expanding the plus symbol on the left. This article gives a pretty good overview on them. With this info, you should be all set and ready to give that killer answer in your next interview. Godspeed and good luck to you! 🙂

Posted by SQLJason, 6 comments
SSAS Dimension Attribute Properties : Part 1

SSAS Dimension Attribute Properties : Part 1

June 13, 2010

There could not be any SSAS interview where at least one question is not asked about the dimension, attribute or measure properties. Many at times, I myself have kept on searching the web to find more information about a particular property only to find that the information is pretty scattered and you really have to spend some effort to consolidate all of them. Through this post, I will consolidate all relevant information regarding the attribute properties and try to put it under one roof. A dimension attribute contains a limited list of key values. Each unique key is associated with a dimension member and these dimension members included in the attribute are called attribute members. In the multidimensional model, the attribute also contains properties that define different characteristics of the attribute members called attribute properties. All attribute members have the same properties. To see all the properties of an attribute, click on a particular attribute and press F4. On doing so, you would be able to see the property panel as shown below

SSAS Dimension Properties

Before you go forward and read, you might want to understand some of the terms that are going to keep coming in the explanations below:-

  • Discrete Attributes and Contiguous Attributes : Quoted from Introduction to Attribute Discretization (William Pearson) – “ Whenever we work with attributes, we can expect to encounter two general types of values:
    • Discrete attributes: Discrete values stand apart distinctly, and have clearly defined logical “boundaries” between themselves. The possible values are naturally discrete for the lion’s share of attributes occurring in the business world.
      Example: The Gender attribute, within the Customer dimension of the Adventure Works sample UDM, is (at least for purposes of the sample cube) considered to have only one of two discrete values, female or male.
    • Contiguous attributes: Contiguous values do not stand apart distinctly, but flow along, as if in a continuous line. Contiguous values, especially within large populations, can have very large numbers of possible values. Information consumers can find it difficult to work effectively and efficiently within such wide ranges of values.
      Example: the Vacation Hours attribute, within the Employee dimension of the Adventure Works sample UDM, can have a wide range of possible values, depending upon how many employees are involved, whether there are limits on how many vacation days they can accumulate, and considerations of this nature. The member values are based directly upon the unique values contained within the VacationHours column of the DimEmployee table (with many of the unique values shared among multiple employees). The sheer number of values might make working with them cumbersome for information consumers, if they are simply made available in their existing state.

The properties are explained below:-

ADVANCED

1) AttributeHierarchyDisplayFolder : This property is used to group attribute hierarchies into a particular folder. For eg, if you assign this particular property to Stocking for the attributes Class and Color in Product dimension, you would be able to see as shown below when you browse through the dimension

AttributeHierarchyDisplayFolder

2) AttributeHierarchyEnabled : The values of this property can just be True or False and is used to determine the absence or presence of attribute hierarchies. By default, all the attributes in the dimension would be assigned a value of true which would mean that attribute hierarchies would be enabled by default. From a performance point of view, this property is pretty important which is quite evident from the quote given below from OLAP Design Best Practices for Analysis Services 2005 Attributes add to the complexity and storage requirements of a dimension, and the number of attributes in a dimension can significantly affect performance.  This is especially of attributes which have AttributeHierachyEnabled set to True. Although SQL Server 2005 Analysis Services can support many attributes in a dimension, having more attributes than are actually used decreases performance unnecessarily and can make the end-user experience more difficult. It is usually not necessary to create an attribute for every column in a table. Even though the wizards do this by default in Analysis Services 2005, a better design approach is to start with the attributes you know you’ll need, and later add more attributes.  Adding attributes as you discover they are needed is generally better a better practice than adding everything and then removing attributes.

3) AttributeHierarchyOptimizedState : The values of this property can be FullyOptimized and NotOptimized (beats me why it shouldn’t have been a True or False instead) and is used to enable or disable the optimization of the hierarchy. Turning this property to a NotOptimized state would save resources when the hierarchy is being built during the processing and hence, should be turned off for those attributes which wouldn’t be used frequently for analysis. Turning this off would mean a slowdown in the requests that reference this attribute hierarchy which is the reason why we should just turn this off for less frequently used attributes.

4) AttributeHierarchyVisible :The values of this property can be True or False and is used to set the visibility of the attribute to a client application. If it is set as false, the client application would not be able to determine the presence of this attribute hierarchy (which means it would be invisible when the dimension is browsed through the application) but it would still be able to use the attribute hierarchy in queries. It is a best practice to make this property as False for those attributes which are also used in user defined hierarchies as anyways, you would be able to access that attribute from the user defined hierarchy and you can eliminate redundancy.

5) DefaultMember : This property can be used to specify the default member of the attribute. If no value is specified, then by default the ALL level is used. MDX expressions can also be used to set the default member of the attribute to any attribute member by clicking on the button besides the property which will open up the pop-up window as shown below

DefaultMember

6) DiscretizationBucketCount : Discretization is basically the process of grouping contiguous values into sets of discrete values.

Attribute Discretization

Analysis Services supports several variations of attribute discretizations based on different algorithms and to enable it, DiscretizationBucketCount is one of the two properties that you have to set. As the name suggests, this property defines the number of groups that the values will be placed in.

7) DiscretizationMethod : This property basically determines which discretization algorithm should be applied. The values of this property are

  • None – Discretization would not be done.
  • Automatic – Automatically chooses the best grouping technique among the available values.
  • EqualArea – If the distribution of contiguous values are plotted as a curve, then the areas under the curve covered by wach range would be equal.
  • Cluster – Uses the K-Means algorithm to find ranges on the input values.

Apart from the above algorithms present in the Dimension Editor, it is also possible to specify a user defined discretization method through Data Definition Language (DDL) by defining the boundaries of every group.

8) EstimatedCount : This property specifies the estimated number of members in the attribute. This is usually populated when aggregations are designed (which could mean that when you change environments, the counts could be of the previous environment unless you count again) or specified by the user. This property is used when aggregations are being built and helps the server to make decisions on how to make the aggregations.

9) IsAggregatable : This property is used to determine whether the data associated with the attribute members can be aggregated. If True, then the system will define an ALL level which will have the aggregated data which will be used as the default member when queries which doesn’t reference a particular member of this attribute are executed. For the same reason, if this property is set to False, then a default member should be specified.

10) OrderBy : This property specifies the method by which the attribute members should be sorted and the options are Name, Key, AttributeName and AttributeKey.

11) OrderByAttribute : This property is used to select the attribute on which the ordering should happen if AttributeKey or AttributeName has been used in the OrderBy property.

BASIC

1) Description : This property specifies the description of the attribute and any free text can be entered.

2) ID : This property specifies the unique identifier of the dimension and is non-editable usually in the dimension editor.

3) Name : This property specifies the name of the attribute.

4) Type : I found the best explanation of this property from an article of William Pearson which is quoted below :- “ The value of the Type property for an attribute determines the attribute type – and specifies the type of information contained by – that attribute. Within Analysis Services 2005, attribute types help to classify an attribute based upon its business utility or functionality. Many of the available options represent types which are used by client applications to display or support an attribute. However, some attribute types also have specific meaning to Analysis Services. Some attribute types identify attributes that represent time periods in various calendars for time dimensions. Many attribute types for dimensions or attributes are set via the associated wizard that we use when defining these objects. Attribute types can also be set when we employ wizards, such as the Business Intelligence Wizard, to add functionality to dimensions within Analysis Services. A good example is the application of various attribute types to attributes in a dimension when we use the Business Intelligence Wizard to add Account Intelligence to a given dimension: the wizard applies several attribute types to attributes in the affected dimension, for example, to identify attributes that contain the names, codes, numbers, and structure of accounts within the dimension. Attribute types in Analysis Services fall in into five categorized groups. These attribute type groups include:

  • General: These values are available to all attributes, and exist only to enable classification of attributes for client application purposes.
  • Account Dimension: These values identify an attribute that belongs to an account dimension.
  • Currency Dimension: These values identify an attribute that belongs to a currency dimension.
  • Slowly Changing: These values identify an attribute that belongs to a slowly changing dimension.
  • Time Dimension: These values identify an attribute that belongs to a time dimension. “

5) Usage : This property defines whether the attribute is a key attribute, an additional attribute for the dimension or a parent attribute. (The rest of the properties would be continued in SSAS Dimension Attribute Properties : Part 2)

Posted by SQLJason, 5 comments
Setting default member in role playing dimensions

Setting default member in role playing dimensions

April 29, 2010

Today I got a call from one of my colleagues asking how to set a default member for a dimension attribute . I was a little surprised as I thought setting a default member is quite straight forward and was even more surprised to learn that the dimension had processed successfully but the cube had failed with errors. P.S. : If you are a SSAS beginner, please read this msdn article on how to set the default member of a cube to grasp my blog contents completely – Specifying the Default Member I straightaway went to my colleague’s desk as I clearly saw it as an opportunity to learn something new. First thing I did was to check up the dimension in the dimension designer and to verify that the default member was set up correctly. Set default member The default member looked to be correct and then I processed the cube to see the error message. error message The error message clearly mentions that the level [Brand Family] (which is the name of the dimension) was not found in the cube when the default member string was parsed. So this could mean two things:- a) either the default member would have been misspelt b) there would be no dimension / level called [Brand Family] But as I could see, there really was a dimension called [Brand Family] and the default member was not misspelt also. That is when it struck me to have a look at the dimension usage tab. Dimension Usage tab As you can see from the above screenshot, the Brand Family dimension was set up as role playing dimensions, and later I understood from my colleague that the requirement was to setup a common default member in all those 5 role playing dimensions. Now it made sense why the error was coming, because the [Brand Family] dimension is not existent in the cube. Instead, it is being referenced by the role playing dimensions [Main Brand], [Dualist TMC Main Brand], [Other Brand Family]. [Awareness Brand] and [Seg ClassF Brand]. So I suggested to set the default member in the format of <level>.value instead of <dimension>.<level>.value. Since the level names are unique across dimensions, I presumed it should get resolved properly. Setting the edited Default member As expected, with this workaround, the cube processed successfully and we were able to get the same default member set correctly in all the role playing dimensions. After this, I also chanced to view a line from Teo Lachev’s book – Applied Microsoft Analysis Services 2005 “You cannot set the DefaultMember property of a role playing dimension in the Dimension designer because it is not clear which specific role version it is intended for. Instead, the only way to set up the default member is to do it programmatically by using the ALTER CUBE statement.” Syntax of Alter Cube is ALTER CUBE CurrentCube | YourCubeName UPDATE DIMENSION <dimension name>, DEFAULT_MEMBER='<default member>’; Eg: ALTER CUBE CURRENTCUBE UPDATE DIMENSION [DATE].[CALENDAR DATE], DEFAULT_MEMBER = [DATE].[CALENDAR DATE].&[729] The default member would be set up at run time if the ALTER CUBE statement is used and hence, the dimension designer would not show the value of the default member. Even though I haven’t tried, I think it should be possible to set different default members to each role playing dimension through the ALTER CUBE script. This would not have been possible with the particular workaround that I suggested in the beginning. Update I found an old link in the forums in which Deepak Puri (MVP) confirms that it is possible to set different default members for each role playing dimensions using the ALTER CUBE statement. http://social.msdn.microsoft.com/forums/en-US/sqlanalysisservices/thread/74feffe2-4706-4a5e-9e56-97d1fcf3f6ba/

Posted by SQLJason, 0 comments
Blog Review

Blog Review

April 24, 2010

It’s been almost a month of steady posting and I thought it would be good to get some feedback on my blogs. So, last day I got my blog reviewed by the team at BeyondRelational and here is the review. So taking their advice, there is some good news and some bad news that I would like to convey to you. The bad news is that I am going to split up my posts between the 2 blogs, which means you would not find all my blogs here. The good news is that I will be posting links / updates here whenever I post something unique at my blog in beyondrelational, so that you guys don’t miss anything 🙂 By the way, I just blogged on Improving cell security performance in SSAS, feel free to have a look and comment. I would love to hear your views on the topic. Title

Posted by SQLJason, 0 comments
Creating solution file from a SSAS database

Creating solution file from a SSAS database

April 22, 2010
In many of my projects, I have heard my developers asking each other whether it is possible to get the solution file of a SSAS database and I have turned a deaf ear to almost all the cases in spite of knowing the answer. Now before you give me that flabbergasted look, let me try to justify my position. As an IT company, we do have certain processes and one of them is to keep your code in VSS (used for version management). Unless you follow this rule, there is every chance that you might miss upon a particular Change Request / functionality and end up implementing a new fix on an outdated version. And it is not common to see people doing their development work on the online version of SSAS (which is the SSAS database) because it is much easier. You just have to save your work to see the changes while if you need to do the same changes in the offline version (which is the solution file), you will need to save it and then deploy to see the changes (talking about changes like modifying the calculated members script. For most other changes, the cube has to be processed for the changes to be reflected). So I would rather not let them know the answer so that they are extra careful in the future and avoid making changes in the online version.
But sometimes, in support scenarios, it is necessary to know this technique. Let’s say, you have been supporting an application which has been live from the past 5 years and now a change request has come in. Now let us face the ground reality, the application has changed hands so many times that you don’t know how many vendors have been involved nor do you know where the latest source code is. At this time, the best option is to retrieve the source file from the online version and this post will show you exactly how to do that.
1) Go to Start–>Programs–>Microsoft Visual Studio 2005 and click on Microsoft Visual Studio 2005 which will launch the IDE (If your SSAS database is 2008 version, use Microsoft Visual Studio 2008 instead)
2) Click on File–>New Project
Select the Import Analysis Services option and after giving the required Name, Location and Solution Name, click OK.
3) A Wizard message will pop up prompting to continue by clicking Next. Click Next and the fill in the server name as well as the DB which you need to import. Click on Next after that
4) The Wizard will start importing the source files from the database and on completion, the Finish button will be activated. Click on Finish.
5) Presto! You have the source files right in front of you.
Disclaimer: Use with caution, Tech Lead advice recommended. I don’t want companies coming after my neck just because my blog meddled with their processes 🙂
Posted by SQLJason, 2 comments