Search This Blog


Sunday, July 31, 2016

Power BI: If you have a Cloud of Words, let them Fly!

[2016-July-31] Last week was my 2nd time participating at the local Toronto Open Data meetup ( The audience of this meetup is quite diverse and covers people with various occupations, such as: data enthusiasts, students, public service workers and other citizens who are united under one idea to explore publicly available open data sets and discuss ideas on how to improve public services of the city of Toronto.

This time we were looking at the 211 Youth Services Data data set that lists all social services available for youth in Toronto. 

Before I begin to share my experience working with this data set and building my new Power BI visualization, here is a link to a live Power BI report 211 Youth Services (Word Cloud) that I had a chance to present there

The data set itself was an Excel file with 24 worksheets representing different categories of services for 1447 agencies, where each of the agency had the following attributes:
  • AgencyName
  • LegalStatus
  • OfficePhone
  • EMail
  • Website
  • Address
  • Neighbourhood
  • Accessibility
  • Hours
  • Eligibility
  • Languages
  • Application
  • DescriptionService
  • DateUpdated

And I thought to myself, what if I can retrieve more insights out of this data in order to get better understanding of what actual work each of them do? And is there an efficient way to work work with such unstructured text data. So, I decided to specifically look at the DescriptionService column which is a free text attribute that describes different activities and programs for each of the agencies. 

And I think, that I've found a perfect visualization instrument that works really well with text data; it's the Word Cloud visual control was introduced for Power BI in November of 2015 (Visual Awesomeness Unlocked - The Word Cloud). Basically this visualization shows a set of words in a form of a cloud where font size of each word is defined by how frequently it is used within a given data set.

Usually they demonstrate a use case with Twitter data being analyzed, however I decided to challenge this with a text field from 211 data set where column size varied from 17 to 4197 characters for different agencies. Could Power BI Word Cloud handle that much of text data in a single row and could I work with at least a thousand records to build my visualization? It actually worked!

The beauty of the existent Power BI reporting approach is that your can build a report where each of the visualizations can interact with each other. So if I click the word "employment" in my Word Cloud visualization then both Bing Map and Agency count with this word are affected as well:

It's amazing that such interaction can drive further data analysis and possibilities are limitless: from better understanding of high frequently activities and outlier analysis within a set of programs that each agency provides.

A few additional thoughts if you want to work with this Power BI Word Cloud visualization:
- Spend more time cleaning and preparing your data, you wouldn't want to let your cloud of words to be filled with some incorrect text sentiments.
- Keep the "Rotate Text" option Off, otherwise the Word Cloud looks a bit messy.
- And definitely set "Stop Words" option to On; it helps to hide some common words (to, from, or, and, etc.) from your cloud, also you can add some other exceptional phrases to your stop list.

In overall, I've enjoyed working with this Word Cloud visualization and would highly recommend it for any unstructured text data analysis. 

So, If you have a Cloud of Words, let them Fly! :-)

Thursday, June 30, 2016

Power BI Pulse Chart vs. Dancing bars in Excel

[2016-Jun-30] Recently Microsoft Power BI has been enhanced with a new Pulse Chart custom visualization (Visual Awesomeness Unlocked: Pulse Chart) which is extremely good and useful for a visual data story telling. I believe that the less we have people explaining their data visualization and more we let those charts and graphs talk for themselves is better for end users. Those self explanatory visualizations work as proxies for data that stays behind them.

Basically the Power BI Pulse chart lets you build a line chart based on a time series where additional columns could describe certain events and their descriptions during a range of timestamps of your data set. And the beauty of this custom visualization is that it has an autoplay feature that starts playing and builds your line chart along with showing your defined event descriptions; I think it's very fascinating and it also attract people's attention to your dataset.

I used an Ottawa transit company open data set that shows bus routes, bus stop locations and bus stop times of different dates within a year to build my Power BI Pulse chart ( So I created a data model that contained all bus stop times within a single day. Then I added some custom events with descriptions to briefly describe various Ups & Downs for the overall time stops. So here is what I was able to create, a Pulse chart based on regular day of the OC Transpo company.

It's the same dataset that I had already used in the past for my PowerMap demo in Excel; at that time I called it as Dancing bars of OC Transpo with basically showed the same busy day with bus stop times as a fact base.

and here is a YouTube link to the complete PowerMap video: Dancing bars of OC Transpo – Busy day 3D Map of transit service for Ottawa

There is one limitation though that I had experienced building my Pulse Chart. Current version of this visualization supports only a Date hierarchy (Year-Quarter-Month-Day) based on your timestamp dataset column. However, my OC Transpo time series was only within a single day, and my expectation was to see Hours on the chart X-axis, which it didn't have.  

I sent a note to Microsoft Power BI team and they were prompt enough to suggest to add this request into their pool of other Power BI ideas which then get further reviewed and used as a base for future improvement and corrections.  So perhaps, in a near future, my wish will turn into reality :-)

I hope you will find this blog post interesting and helpful. Happy data adventure!

Tuesday, May 31, 2016

Analyzing David and Goliath datasets on a same chart; Dual Axis in Excel, Power BI and SSRS

[2016-May-31] Two different metrics, two different datasets, one may represent a very high value and the other one is very low. What's interesting is that both metrics could be very close related; like Total Product Sales vs. Product Price or Average Monthly Temperature vs. Rainfall over a period of time. 

One way to analyse how such metrics may influence each other is to look at their numbers, the other way is to visually place them on a bar or a line chart and let those graphical elements communicate to you how correlated or not those metrics are. However there is one downside of using graphical tools to analyse both very low and high data values on the same Y-axis; it's just visually difficult. Therefore many analytical tool vendors allow to portray differently scaled data with the use of Primary and Secondary Y-axes (so called Dual Axes).

Let's take a simple data set and try to check 3 different visualization tools that can show data across different data axes. There is an open dataset of National Pollutant Release Inventory (NPRI) provided by Government of Canada ( that shows different industry related pollutant releases to air, water and land for the past five years. 

I've chosen only 2 industries' air pollution data: (1) Oil and Gas Extraction and combined (2) Pipeline Transportation of Crude Oil and Natural Gas, with an idea to find if these 2 metrics' results are correlated over a period of time.

Here is an Excel version of this dataset:

Just looking at the data itself, it's quite obvious that Oil and Gas extraction generate far more air pollution than these natural resources transportation by pipelines. Let's see how Excel chart could show this data:


Excel is a good analytical tool and it even gave me an option to see linear trend-lines for each of the metrics (however it's a bit difficult to understand why with the air pollution decrease for the oil and gas extraction, the air got more polluted when this oil and gas had been in a pipeline transition; we'll talk about this later).

Power BI:

The Power BI currently doesn't allow creating a bar chart for a secondary Y-axis; thus we can use a line to show the Pipeline transportation in our case. However all other settings including good data labeling provides us with a very interactive experience for big & small data analysis.

SSRS (SQL Server Reporting Services):

It's a bit different story with the SSRS: from one hand it has a very solid dataset extraction mechanism that is shared across all other SQL Server BI tools, however its visualization part to show both Primary and Secondary axes is not very good: you will have to additionally adjust each of axis top limits in order to sync horizontal lines and I also wish SSRS data label positioning to be more flexible.

I will let you to decide on your own what tool you would prefer to use in order to analyze 2 different metrics data on dual axes. I personally like the Power BI version, then I would go with the Excel way to analyze Primary and Secondary axes data and use of the SSRS dual axes chart would only be considered if the technology is specifically requested by a customer.

As far as the early question about the pollution decrease of oil and gas extractions vs. pollution increase to transport those natural resources using pipelines, one could suggest that during the period of time of 2010 and 2014 resource company have learned to apply modern technologies to produce less air pollution. However the existing pipeline system in Canada may have slightly deteriorated and need more maintenance and modernization. But whatever the reasons are for the air pollution decrease and increase, there have to be some some facts to prove such hypothesis, which may be a part of another blog post or discussion.

Happy data adventures!

Saturday, April 30, 2016

Power BI Box and Whisker visualization vs. Box Plot chart in SSRS

[2016-Apr-30] I don't work with box plot charts very often, however when I do then it becomes a very interesting experience. First, it starts from data extraction and then playing around with various visual settings to make those boxes and whiskers to look good.

I assume that sometimes we tend to work with more familiar chart types to analyze our data, like lines and vertical or horizontal charts; however box plot charts could be very well used to review how your data is distributed within your set of metric values; and besides, it looks different as well :-)

About 3 months ago Microsoft made this type of visualization available in Power BI (Visual Awesomeness Unlocked – Box-and-Whisker Plots), so I decided to compare it with the same box plot chart that I can create using SQL Reporting Services.

My data comes from Berkely Earth ( that I had already used for one of my previous PowerMap visualization (Earth Surface Temperature on YouTube). And I took 4 years data of May monthly average temperature from all Canadian Geo stations (1997 - 2000) and showed them both in Power BI and SSRS.

Power BI:
Currently there are two version of the Box & Whisker charts: (1) created by Brad Sarsfield and (2) another created by Jan Pieter Posthuma.

(1) Brad Sarsfield box and whisker chart
- Customization for chart quantile value is available 
- Outliers visualization
- Simple data set required Values/Group (values and groups those values are divided to)
- Margins for quantiles are rounded to zero decimal places

(2) Jan Pieter Posthuma box and whisker chart
- Boxes are colored 
Margins for quantiles are not rounded to zero decimal places
- No customization for quantile value (it's preset to Min, 0.25, Mean, Median, 0.75, Max)
- Along with Values/Group settings for dataset, Samples has to be specified (which is basically an attribute for lowest granularity you want to analyze your data)


- The only SSRS chart for box plot visualization
- More settings can be adjusted compared with similar Power BI visualization
- Chart dataset has to be prepared in advance for all 6 data elements (Min, Low, Mean, Median, High, Max)
The only SSRS chart for box plot visualization.

I personally like Power BI Brad Sarsfield box & whiskers visualization with the way it looks and its customization level, however I also like that margin values for quantiles are not rounded to zero decimal places at Jan Pieter Posthuma visualization.

Thanks for reading this blog post, and happy data adventures!

Tuesday, February 2, 2016

Excel Power Map vs. Power BI Globe Map visualization

[2016-02-02] PowerMap in Excel is a great tool with good design and comprehensive user experience. I’ve started using it about a year ago when our team participated at first Canadian Microsoft Big Data hackathon (

So I gradually developed a taste to this Geo analytical tool that helps to map data points and describe insightful and informative stories. Later on in November of 2015, visually similar tool has been added to the Power BI ( And this gave me an opportunity to compare and analyse both tools’ functionality.

Basically they’re not competing with each other, but rather live in two different worlds: one is a client application in Excel, the other one is a custom Power BI visualizations. They’re friends and they’re siblings :-)

Things that Excel Power Map and Power BI Globe Map have in common:
  • Mapping data points to a 3D Globe visualization using geo coordinates (Latitude/Longitude and regular address coordinates)
  • Data facts explorations being attached to associated geo coordinates.

Things that Power BI Globe Map doesn’t have compared with the Excel Power Map:

o   Ability to change globe themes
o   Ability to create visual tours and save them as video files
o   Change the chart shapes
o   Change the height and thickness scale of visuals
o   Change color for a series
o   Find location feature
o   Add an annotation with image
o   Legends
o   Bubble charts, pie charts and region charts
o   Power Map graphics options

Things that Excel Power Map doesn’t have compared with Power BI Globe Map:
  • Geo filtering based on additional visualizations that are sourced from the same data model (this is one of my favorites).
  • Power Map has an option to add a 2D chart to a globe visualization too, but Power BI integration of multiple visualizations is on a different level (great way for a real data exploration!).
  • Power BI Globe visualization is free compared with a slightly cumbersome licensing aspect to enable a Power Map option in Excel. 

Few suggestions if you want to build a data visualization in Excel Power Map:
  • Power Map requires really powerful machine to process and create video (go with 64 bit Office; more memory, especially if you try to work with very complicated shape data files and attempting to add images as annotations).
  • When you add a new layer to existent scene, make it invisible first (do all the adjustments) and then make it visible, otherwise as soon you place a field name to Location then your scene will be turned.
  • No Save button and Undo/Redo buttons' behavior are difficult to understand.
  • A few times I've had a glitch with a text box when later you try to change its text or background color (no changes after pressing OK  button); only by recreating this text box all necessary adjustments were possible.
  • Scene effect (difficult to predict, without actually trying them), but when you master your skills changing different effects between different scenes, then you get all the enjoyment of creating nice Map visualizations :-)

[2016-02-06] Update:
Three days ago Microsoft announced Power BI "publish to web" functionality ( and I've just tried to publish my latest Power BI report of with public Berkeley Earth data of over land temperatures (

And now I can really rotate a Power BI Globe inside my blogger post, and you can do this as well; plus, you can interact with all other controls of the reports and check how the Globe rotate itself based on your data criteria selections (Country or Temperature range).

It works! Great feeling!

Happy data adventures!

Thursday, December 24, 2015

Specifying Paths to External Items (Report Builder and SSRS) in SharePoint with PowerShell

[2015-Dec-24] My story: two SSRS items (main report and its sub-report) need to be deployed to SharePoint. Trivial task, and even if I need to locate sub-report to a different SharePoint document library or folder I can easily do that during a development phase or using Report Builder after deployment by specifing relative or absolute path for sub-report within a definition of the main report. And here is a good MSDN resource about this - Specifying Paths to External Items (Report Builder and SSRS).

But with multiple environments to deploy the very same set of SSRS items to adjust will be a time consuming task. However we can prepare and adjust SSRS XML files during the time of deployment and PowerShell is one of the many ways to achieve that.

Here is a top section of the main SSRS RDL file:

And here how this very same section may look like with the absolute path for its sub-report in a targeted deployment SharePoint environment:

And portion of a PowerShell script that does it all is very simple:

So, at the end of the day, it's a win-win situation both for Development and Deployment teams:  less time for deployment and more time to develop other things.

[2015-Dec-30]  A brief update for the PowerShell script. I needed to make it more flexible to support sub-report paths update wherever they are located within their main SSRS report. So I had to change my XML way of dealing with the RDL report file to a text file way (XML file is still a text file :-).

And I changed this part of the code:
# update of the connection string to the found sourcing Excel files
[xml] $xdoc = get-content $rptReference

# Getting to the $rsr_environment XML block
$SubReports = $xdoc.Report.ReportSections.ReportSection.body.ReportItems.Subreport
foreach($SubReport in $SubReports)
$SubReport_New = $SubReport
$SubReport_New.ReportName = $targetDataSources+"/"+$SubReportFolder+"/"+$SubReport_New.ReportName+".rdl"
$xdoc.Report.ReportSections.ReportSection.body.ReportItems.ReplaceChild($SubReport_New, $SubReport)


To this:
(Get-Content $rptReference).replace('<ReportName>', '<ReportName>'+$targetDataSources+"/"+$SubReportFolder+"/") | Set-Content $rptReference

(Get-Content $rptReference).replace('</ReportName>', '.rdl</ReportName>') | Set-Content $rptReference 

And it worked like a magic, regardless of how many sub SSRS reports a main report could have and their locations in there!

Wednesday, November 18, 2015

PowerMap with Custom Region shapefiles - Demystified

[2015-Nov-18]  Today is a global GIS Day ( where people across the globe celebrate and promote the use of various geo locating technologies. I used to work for one company ( about 5 years ago as a data analyst and I saw their GIS specialists worked very hard with different Geo data sets, so I respect this type of work.

Recently I had a chance to participate in the Toronto Public Library Open Data Hackathon ( to brainstorm and come up with different ideas that would improve library services. Our team designed a prototype of web application that would enhance library users experience with book clubs. And I proposed to build a reporting layer for library staff who may analyse a potential impact of improved book club experience to a number of new visits and new registrations to the library.

I also wanted to explore Microsoft PowerMap technology in its way of portraying geo layers based on custom region KLM and SHM files which contain custom polygons (Power Map for Excel–May update for Office 365). Before I share how easy or not easy use of shape files for PowerMap was, let me show you a final version of our team PowerMap video that was created for this hackathon:

So after pulling some historical library datasets that they provided to us into my model; I created my initial PowerMap tour graphics. Which I further wanted to enhance with custom region geo files form the (City of Toronto Open Data Catalogue):
- Regional Municipal Boundary
- Neighbourhoods

Currently PowerMap custom regions support only County/Region, State/Province, Postal Code and Custom Region based on geo shape files:

However I was able to overcome this limitation by just renaming the City field (column) name to something else (e.g. CityName or CustomRegion, it’s your choice); and once it wasn’t longer as City, then I was able to map it to the Toronto Regional Municipal Boundary shape file attribute.

The other more difficult thing was to portray a set a polygons from another shapefile which contained information of 140 Toronto neighbourhood regions. I had a list of neighborhood regions in my data model and no matter how hard I tried, I couldn't make them visible in my PowerMap visualization.

So I started to research this a little bit more and found one hint in one of the PoweMap examples that Microsoft provided. For one of their models their provided this note that their shape files was slightly adjusted by a GIS software [QGIS tool]. And I installed this software on my machine because I didn’t know other way of what was inside of downloaded shape files.

So using the QGIS tool I was able to see how that shape file was constructed and I also saw the reason I wasn’t able to map my data model data sets with attributes from the shape file. 

I didn’t dare to change geo file and thought it would be easier to adjust my model data set, and that change made all the difference. Now all 140 regions became visible in my PowerMap visualization! Good for Microsoft and thanks to GIS and Toronto Open Data Catlog for providing software and necessary data sets.

My key learning in this PowerMap with custom regions (shape files) exercise are:
- Don’t trust region shape file metadata (use GIS tool to check shape file content, especially attribute table);
- I can easily overcome existent PowerMap limitation of mapping only County/Region, State/Province, Postal Code to geo location points, it’s your choice how to name them;
- This leads to my 3rd finding that my model data sets can be mapped not only by names but ID or other fields as well. Because when you import a region shape file it will ask to pick a region name from a list; and I was able to map my polygon and model data sets by their reference IDs as well. So it's like a table to table relationship by reference fields that you choose.

I’m was very pleased with how Microsoft improved PowerMap with the new Custom Region feature based on external geo shape files. And I am even more happy to learn that they plan to enable this PowerMap visualization in the PoweBI! How soon, I don't know.

Please leave your comments or if you have some further questions, I'd be glad to share all that I have learned.

Happy data adventures!

[2015-Nov-24] Just a brief update: I’ve recently contacted both PowerMap and Power BI teams in terms of enabling the Excel PowerMap visualization in Power BI. The PowerMap team informed me that there was no plan to move their client application to Power BI; however they had also mentioned that 3D Map would be available in a form of custom visualization. And today Power BI team has announced availability of the Globe Map visualization (

So synergy is in the air! :-)