Aggregations: Unlocking Large Scale Analytics
What Does it Mean to Use Aggregations in Power BI: What Makes Them so Special?
Have you ever struggled finding an optimal balance in Power BI between speed and data size? If you’ve worked with really large datasets, then chances are you have faced this challenge.
Aggregations in Power BI is possibly the biggest leap in scalability within Power BI thus far: for me this was a mystery at first. How does this work? Why is this such a powerful feature? My hope today is to demystify Aggregations for you, and help you understand their simplicity and purpose.
Before I can explain what Aggregations are in Power BI, we need to get some background sorted.
Background Check:
Power BI has two distinct types of data connectivity, Import, and Direct Query. They each have their ups and downs. Before we can understand Aggregations we have to know the difference between these two connection types.
Import:
Import is Power BI’s typical way of connection to data. With this method Power BI stores your data in an internal cache, readily available for Power BI to consume.
The upside to Import is that all your data is always ready. This means that anytime you make a cross selection in your report, Power BI is ready. It can sent the EVALUATE statements generated by your selection straight to its internal engine and display your data.
This is very effective in smaller sized solutions (roughly less than 2M rows). However, the toll on your reports and machine from caching larger amounts of data will quickly begin to be noticeable.
Direct Query:
The other option of connecting forgoes caching all your data at once and rather retrieves the data you need when you need it. It does so by sending the EVALUATE statement to your database engine and then displays the returned results.
This method allows you to report on far larger datasets than Importing can.
The downside however is that your reporting will always contain a bit more lag between selections. This is because Power BI needs to issue a new query for each selection. Additionally, direct query is only available on with certain sources, typically only sources with modeling engines.
What to Take Away From This:
The key piece here is the trade off between size and speed. Import works incredibly fast on relatively small datasets, but becomes unusable on big data sets. In opposition, direct query is not optimal on small datasets, but has the ability to handle massive datasets (1B+ rows), since we only grab what we need.
Aggregations give you an easy tool to work conveniently with the best of both worlds.
When is Direct Query Slow?
Before we get into details about aggregations we’ve got to understand the cases that direct query suffers in performance. The types of datasets that direct query is typically applied to are big, and detailed. I’ll emphasize again that using direct query, Power BI sends a query to your database engine for every visual every time a report change is made. That means your engine will have to summarize over your big detailed tables multiple times. This is computationally expensive, and will cause major performance issues.
Enter Aggregated Tables:
These repetitive, expensive summations, are the reason for the common practice of aggregated tables. An aggregated table is a table with a few specific metrics aggregated over a few specific dimensions. Typically a 1B row table can be crunched down into a 1M row aggregate table with a few dimensions. Even further, these aggregate tables can always be made small enough that we can Import them!
Now we’ve got two different tables with the “same” data. For many this is all that is required for a viable solution, but Power BI is renown for its interactive and user friendly sandbox environment. We can’t trust all users to use the right table in the right place, and further more, even experienced developers may not utilize the right table at the right time while writing complex measures.
Now for the big unveil.
Aggregations:
Aggregations in Power BI ensure that your aggregated table will always be used at the right time, and in the right place. This is what it all boils down to.
Setting up aggregations in Power BI adds an additional layer of logic to your data model which replaces items in your queries to your detail table with with items querying your aggregated table whenever possible. You work with your aggregate table hidden, Power BI manages it for you. It takes the guessing game out of which tables to use; there will only be one to choose from.
Why Aggregations?
Work Flow:
As I’ve highlighted above, Aggregations in Power BI are built to improve the sandbox side of Power BI. They give your users no guess in which table they should utilize for their visuals. Always use the detail table. Power BI will handle the rest.
Its Applied at the Lowest Level:
Undoubtedly you will have many measures in your report. When a change is made and Power BI generates its EVALUATE statement, it will utilize your aggregated table when ever possible. This means that the aggregated table isn’t only utilized for basic calculations. If any items in complex measures can use the aggregated table they will.
Speed:
When Aggregations are set up properly, ad-hoc reporting on big data sets will be faster that ever before.
How to Use Aggregations:
I’m not going to go into to much detail on how to use aggregations, but I’ll go into a quick example to get you started.
Step 1: You’ve Gotta Have some Data:
In order to work with Aggregations you need your detailed data table along with any dimension tables that you want to work with. Your detailed data table must be in Direct Query mode.
Step 2: Create your Aggregate Table:
You can do this out of Power BI: Or you can use Power Query’s Group By functionality to do this. Here’s how:
-Create a duplicate of your table:
-In the Home tab click Group By.
-In the given interface choose the columns to group by along with the aggregations you want to see. In my example I have three dimensions, so these are what I group by.
Once you are done with this you can rename this table.
Close and Apply:
Step 3: Setup Aggregations:
Now we’ve got to do some work in the relationship tab. First ensure all your relationships are set up between your dimensions and your data tables.
-Right click your aggregate table header and click manage aggregations.
-Now fill in the boxes for every field as to how they correspond to your detail table.
-Lastly we want to change our aggregate table to be set to import mode. Click on our aggregate table in the Fields pane, open advanced under Properties, and choose storage mode: Import.
And we’re done, that is really all it takes.
Conclusion:
Aggregations in Power BI sound complicated at first, but once you understand their purpose the mystery vanishes fast. The combination of improved workflow, redundancy, speed and quality assurance all make aggregations something you should seriously consider if you are working with large datasets.
Our Latest Blogs
About FreshBI
Operating throughout USA and Canada, FreshBI is a boutique Microsoft Partner using Power BI to unlock the value trapped in your data so that your business can succeed.