Friday, April 10, 2015

Big Data Hackathon in Microsoft Canada: Power Map will find your way!

Big Data, what a term! Some people instantly think of a huge sometimes cumbersome volume of information, others define it with three or four Vs (volume, velocity, variety, and variability). Debates, other social and technical events are organized in order to promote various technologies that work with big data sets.

One of these events (Big Data Hackathon in Mississauga on March 13-14, 2015) was organized by Microsoft Canada to call for  data scientists, developers and simply data enthusiasts for a contest to select a specific data science problem and then solve it with the help of Microsoft Big Data and BI tools in 2 categories (Data Modeling and Data Visualization). And our team won in Data Visualization category (http://blogs.technet.com/b/cansql/archive/2015/01/28/big-data-hackathon-in-mississauga-bdh.aspx)! 
I didn’t stay for a final demo, was a bit tired and left early Saturday afternoon after working all night on a PowerPivot model and Power Map tour.

Freedom through responsibility, this could be one of the terms that characterize approach taken during this Big Data Hackathon. All the teams were offered with a choice to select any public data available (http://blogs.technet.com/b/cansql/archive/2015/03/12/microsoft-big-data-hackathon-resources.aspx) and utilize HDInsight & Power BI tool set for a winning data science scenario.

Out team decided to work with Toronto Parking Ticket information a combine it with the Toronto Green P Parking location data set to possible prove a case that number of parking tickets would be lower in areas of public parking locations:
http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=ca20256c54ea4310VgnVCM1000003dd60f89RCRD http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=b0993228eb22a310VgnVCM1000003dd60f89RCRD 

Parking Tickets data files were moved to our Azure Blob storage, HDInsight cluster in Azure was created and all the data files were organized into Hive tables to be further queried for our data insights.
Power Queries were built to bring both Hive table’s data and JSON files with Parking locations, datasets were then moved to PowerPivot model and uses as a source for PowerBI and PowerMap visualization. During a preliminary comparison of parking tickets data with parking locations we had discovered a few “hot” spots with high volume of parking infractions at the of December of 2014, mostly in downtown, and there was one such hot spot with a street address “34 Little Norway Cres” that was right across the Billy Bishop Airport.  



I spent a few hours searching any flight related information available and finally found them at the
NAV Canada monthly statistics. It was just needed to build additional Power Queries, filter data for the Billy Bishop Airport only and combined this into one data set for a whole year:
i. http://www.statcan.gc.ca/pub/51-007-x/2013001/t001-eng.htm
ii. http://www.statcan.gc.ca/pub/51-007-x/2013002/t001-eng.htm
iii. http://www.statcan.gc.ca/pub/51-007-x/2013003/t001-eng.htm
iv. http://www.statcan.gc.ca/pub/51-007-x/2013004/t001-eng.htm
v. http://www.statcan.gc.ca/pub/51-007-x/2013005/t001-eng.htm
vi. http://www.statcan.gc.ca/pub/51-007-x/2013006/t001-eng.htm
vii. http://www.statcan.gc.ca/pub/51-007-x/2013007/t001-eng.htm
viii. http://www.statcan.gc.ca/pub/51-007-x/2013008/t001-eng.htm
ix. http://www.statcan.gc.ca/pub/51-007-x/2013009/t001-eng.htm
x. http://www.statcan.gc.ca/pub/51-007-x/2013010/t001-eng.htm
xi. http://www.statcan.gc.ca/pub/51-007-x/2013011/t001-eng.htm
xii. http://www.statcan.gc.ca/pub/51-007-x/2013012/t001-eng.htm

Final PowerMap tour video showed all the steps of data analysis and getting into attempt to find a correlation between parking infractions and air flights. To my big disappointment Parking infractions within area of the airport were not that much correlated at all with the number of flights, or perhaps there might be some data quality issues; but this whole data journey was worth to try!




A PowerMap tour video was also posted to a YouTube channel:

https://www.youtube.com/watch?v=3UMtaYoYHiY

My key learnings from this Big Data Hackathon were:
- Amazing volume of publicly available data sets that could be used for helpful and meaningful insights; need to work with HDInsight technology more
- Power Query is an awesome tool to make X steps forward in your data set modification and Y steps backwards like nothing ever happened; and set of Power Query functions is a whole new world to explore.
- Power Map being powered by Bing Maps allowed to geo locate data points not only using altitude, longitude or street address information; I liked how “other” category worked as well by specifying a particular geo location using its name. That’s how I located “Billy Bishop Airport” just by this phrase in questions marks.


I’ve really enjoyed this Microsoft Big Data Hackathon event, great time to learn and practice HDInsight Azure and Power BI technologies; additionally we won a prize and you don’t get to spend every night at Microsoft office working with those technologies :-)








Tuesday, March 31, 2015

SSRS report footer is NOT lost in Excel

With all the recent changes and updates within the Microsoft PowerBI universe, SSRS core technology gets less and less attention. However a recent experience with one of the SSRS report’s functionality triggered my interest to share it.

I always thought that WYSIWYG principle for SSRS report exporting formats only worked for PDF, Word, HTML file types; the rest of the file export formats only cared about the actual report data (such as CSV & XML data types). And Excel file format was among them; yes, report header and body of a report nicely or almost nicely were fitted into Excel worksheet columns, however I never saw a report footer being exported and I just thought that it was a feature by design (only data was important).

However in a recent case, a customer raised an issue about a missing SSRS report footer when this report was exported to an Excel file. So I thought that either I would need to find a proof for such “by design” functionality” or find a way to show SSRS report footer in exported Excel file.

And I has discovered that SQL Server Reporting Services actually does render a report footer to an Excel file; however this report footer is just not shown in the in the normal View mode (by default), but can be seen in the Page Layout.

Normal SSRS report footer layout:


Exported SSRS report in Excel (Normal View), footer is not visible:


Exported SSRS report in Excel (Page Layout View), footer is at the bottom of the page:


You can also see actual report footer but going to Insert => Header & Footer, and voila!... our original SSRS report footer is shown there:



So, that means that Excel rendering extension of SSRS does bring all the elements of an SSRS report to an exported Excel file, WYSIWYG principle is preserved and report footer is not lost :-) 

Have a great day and happy adventures!

Wednesday, December 24, 2014

PowerPivot Saga 2: DAX OR function (OR there is something better)

I didn't realize that the DAX OR function only accepts 2 arguments compared with the similar function in Excel where a limit of arguments is set to 255. This difference only occurred to me when I received an error in my PowerPivot data model with an attempt to list more than 2 arguments for the OR functions; and only then I decided to check this DAX function syntax:


DAX: OR(<logical1>,<logical2>) vs. Excel: OR(logical1,logical2,...)

The MSDN documentation has a little remark (http://msdn.microsoft.com/en-us/library/ee634951.aspx):
The OR function in DAX accepts only two (2) arguments. If you need to perform an OR operation on multiple expressions, you can create a series of calculations or, better, use the OR operator (||) to join all of them in a simpler expression.

I liked the use of the word “better” in this section, and I made a little test with the data from the AdventureWorksDW2014 database. I pulled a small set of data from the dbo.DimCustomer table into my PowerPivot table:


SELECT [CustomerKey]
      ,[FirstName]
      ,[LastName]
      ,[EnglishEducation]
  FROM [AdventureWorksDW2014].[dbo].[DimCustomer]

Also, for this list of customers I added a column that would show me if a customer has finished a school. Actually I added two columns that showed me the same thing using two different types of OR logic (one with OR function and the second one with pipes):

Finished School A:=IF(OR([Education]="Partial College", [Education]="Partial High School"), "No", "Yes")
Finished School B:=IF(([Education]="Bachelors") || ([Education]="High School") || ([Education]="Graduate Degree"), "Yes", "No")



Then I used the [Finished School B] column to build one of the sample PowerView reports with the Customer count data:




So basically, for me this whole “number of arguments” limit for the DAX OR function was a lesson to learn, to check function syntax as well; however why this “number of arguments” limit differs with the limit of the Excel version of the function is still a mystery to me :-)

Happy data adventures!

Monday, November 3, 2014

PowerPivot Saga: Scheduled Refresh more than once a day

PowerPivot data models in Excel is a great way to quickly pull data from various sources, build relationship between different tabs along and creating different calculations. Once a model is built then it can be further utilized in reporting. 

Users want to have their reports to show updated information that means the PowerPivot data model that support those analytics should hold updated data, i.e. all data tabs being sourced from different data sources need be refreshed. 




We can refresh data in PowerPivot in two ways: 
Manually or interactively by refreshing all or specific data connector. 




Or we can schedule a data refresh if a PowerPivot model is deployed to SharePoint.



That's where a light disappointment starts developing when we realize that the most frequent data refresh cycle could be scheduled only once a day. Some people tried to interfere and tweak tables in the PowerPivot SQL Server database in SharePoint that are responsible for data model refresh schedules: http://smithicus.wordpress.com/2011/08/09/using-a-custom-data-refresh-schedule-in-powerpivot-for-sharepoint/


A Request has been at the Microsoft Connect website to enhance this functionality and add more granularity to how often a data model could be refreshed. However this request has been closed as by design in December of 2012: https://connect.microsoft.com/SQLServer/feedback/details/739949/powerpivot-2008-2012-for-sharepoint-data-refresh-more-than-once-a-day

At the recent PowerBI Saturday event in Mississauga Microsoft office I questioned a program manager who works in the SSRS team about this very aspect of the data refresh frequency, and he confirmed that there was no plan to bring that functionality to the PowerPivot model in SharePoint, however he suggested a workaround to solve this problem. The very same approach (suggested by one of my colleagues) is to store my tabular model in a SQL Server where I can have more control. 

So this is what I needed to to: 
1) Import a PowerPivot data model from an Excel file  into a Tabular Model solution  
2) then deploy this solution to an SSAS instance 
3) create a SQL job to process the newly deployed SSAS tabular database with my unique schedule (hours, minutes, seconds,.. anything you want to) 
4) create a BISM data connector in the SharePoint for the newly deployed SSAS tabular database 
5) and finally change sourcing or starting developing all my PowerView reports based on that new BISM data connector in SharePoint. 






And why this is a viable solution, because PowerPivot BI Semantic Model Connection (.bism) support two data source types:
- A tabular model database running on a standalone Analysis Services instance in tabular server mode.
- and PowerPivot workbooks saved to SharePoint.

At the end users were able to analyze their reports with the updated data. 

If a daily data refresh is sufficient for business users reports than we can use out of box feature of SharePoint, however if more granular time intervals for PowerPivot model data refresh are required that we can: 
A) design a PowerPivot data model in Excel  
B) Import it to the SSAS tabular database 
C) schedule more frequent database processing jobs for the new SSAS tabular database and use it as a source for your PowerView reports.



Happy data adventures!

PS. This material was also presented at the Toronto's SQL Server User Group on 2014-Oct-30.

Thursday, October 16, 2014

SSRS report rendered in EXCELOPENXML then something went missing

Recently I had been requested by a client to look into one of their SSRS reports issues. They had one report with a text box that contained some padding from the left side. All seems to be OK, report showed indented text values both in RPL and PDF rendering extensions; however when the report was exported to Excel those indentions were lost.

I removed the Left Padding information and substituted it with hard coded extra spaces in text box expression. This tweak didn't solve the problem; now my Report server didn't show those extra spaces (i.e. Excel Exported version of the report didn't show them either).

Then I removed extra spaces from the SSRS text box expression and added those spaces into a data set that sourced that text box. Now Report Server showed all the indented text boxes, however Excel exported report didn't show those extra spaces again; what ?!

Let’s be crazy, and then I removed all extra spaces from data query and built my text box expression using a conditional check for Globals!RenderFormat.Name="EXCELOPENXML" to only add extra spaces for my text box when a report gets exported to Excel; and this… solved the problem: all RPL, PDF and EXCELOPENXML versions of the reports showed me the left padded text box as it was intended.

Then I noticed another text box attribute (LeftIndent) that I didn’t try to adjust.  After adjusting the LeftIndent  value all my rendering extensions worked perfectly, so I no longer needed to use the conditional check for Excel rendering extension.

It’s funny that I’ve dedicated the whole blog post to EXCELOPENXML rendering  extension which I didn’t need to check at all; but … now I know how to solve this problem differently :-)


Wednesday, October 15, 2014

SSIS Pivot makes you stronger

At the recent Toronto SQL Server User group meeting Michal J Swart (@MJSwart) gave a good presentation based on 10 of his blog posts. There was one (Forget About PIVOT, You Don’t Need It) that got my attention; first, I remembered that I had read a while ago and my second thought was that I could add two cents to it.

Very early in my career I got fascinated by the MS Access TRANSFORM statement, then in 2005 version of the SQL Server the PIVOT/UNPIVOT operators came out and that made me even more excited in the way that now we can forget about all custom T-SQL code and write a simple statement to turn a data set anyway we want to.

I agree with Michael, that all the available reporting tools are much better for preparing a final data set for business users. However I've used PIVOT many times and I like using it in a middle data layer when one data set gets transformed (or in Excel terms, 'transposed') to another form and then moved further along the way.

The SSIS Pivot with UI gave some additional flexibility to transform input data set, however I still prefer the T-SQL method and I can explain you why.

Here is my sourcing query that pulls data the AdventureWorksDW2014 database that I want to transform:

























In SSIS I can do the same thing and push the sourcing query through the Pivot task:


















However when I execute it will fail due unique key constraint violation of my Pivot Key of the Pivot task:


















So I had to change the sourcing query and added aggregation function to it:












and then it worked:

















So, with a few hiccups I was able to adopt SSIS Pivot task, but I still prefer writing the Pivot T-SQL statement manually, more control, more flexibility :-)

Lessons learned:
1) Sourcing data set for SSIS Pivot task needs to be pre-aggregated.
2) If you don't like Pivot output columns (based on Pivot Key and Pivot Value) you can always rename them in the Advanced Editor.

Working with SSIS Pivot task didn't destroy my attitude but made me stronger :-)

Happy data adventures!

Thursday, August 28, 2014

SSRS report: Multi-value parameter going after Data-driven subscription

It is always good to enrich SSRS reports with multi-value parameters; users have more flexibility to combine different reporting scenario with no need to run a report for different parameter values separately. On the other hand, developers can craft their skills to embed multi-value parameters into report datasets; referencing different rows, labels and values from those parameters into some custom controls within reports is another competence which can also be quickly obtained.

Alright, report is developed and deployed for users. They requested to create report subscriptions and now enjoy luxury of scheduled results delivery in different formats. Report Manager in Native mode and SharePoint in SSRS Integrated mode read report parameters metadata pretty well and allow setting new values for multi-value parameters or keep default settings.

However when you’re dealing with data-driven subscription, it’s not that easy to pass data query values for multi-value parameters. You have a choice:
1) To be happy with reports default values and change nothing in report subscription 
2) You can still be happy and create regular subscription with manually setting parameter values
3) You can programmatically define parameter values using CreateDataDrivenSubscription method of the ReportingService assembly
4) Also, you can be crazy enough and built set of XML parameters tags in Parameters column and update the dbo.Subscriptions table of your Report Server instance.

But still, it doesn't let to link a data element from data-driven subscription query with your multi-value parameter; how would you pass several values in one columns of a single record. And I think you’ve already guessed one of the right answers: delimited string of values!

Here is how I’ve made it working for a project, where my client wanted to trigger report subscriptions from their web UI and each SSRS report executions could have different multi-value parameters values.

1) I've created two multi-value parameters in my SSRS reports:
        a. ReportParameter; made it Hidden, Available Values – None, Default Values – Specified values for just 2 or 3 lines for my reports (number of them is not really important at this moment)
        b. ReportParameterActual; kept it Visible, Available Value – Query from a dataset, Default values – Specify values with the following expression:
=Split(@ReportParameter, ",")
2) Once this report was deployed, I’ve created a data driven subscription where my data query looked similar to this: select ReportParameter = ‘Value1,Value2,Value3’ from MyTable
3) And then I mapped ReportParameter from the report with the ReportParameter column from this query.

This was the whole solution to the problem. ReportParameterActual was visible parameter for users and they could interact and set different values themselves beyond the subscriptions definition. ReportParameter was the actual interface and gateway for default values (which with data-driven subscription now became a dynamic control on set of values that would be supplied to multi-value ReportParameter (i.e. ReportParameterActual) in different subscription scenarios). And Split(@ReportParameter, ",") was the whole trick :-)

Update (2014-09-03)Richard D'Angelo asked me to share the RDL with the solution. I couldn't upload the actual client report, but I've made a sample one based on the [AdventureWorksDW2014].[dbo].[DimGeography] table. You can get this sample through this link - MultiValueParameterReport.rdl.

Happy data adventures!