Search This Blog

Saturday, March 31, 2018

Geometry Objects in SQL Server using Latitude/Longitude coordinates

(2018-Mat-31) Support for spatial geometry functions was initially introduced in SQL Server 2008 and then it was greatly improved in Denali (SQL Server 2012) version of the product -

Back then, when I was reading about the spatial features in SQL Server, I wondered where I could use this functionality with the creation of geo points, line polygons; however, working with geometry objects a bit more helped me to see some real use cases where this could be applicable.

Let's say you have the London Tower Bridge and you want to know when a taxi cab with your very important guest crosses this bridge. By creating a geo polygon for the bridge you can use spatial functions in SQL Server to check if a taxi cab GPS tracker coordinates intersect with the bridge polygon geometry object - STIntersects (geometry Data Type).

First, I've located my map objects in the QGIS tool using Google Satelite layer:

Then I created a polygon that would resemble the bridge area over the Thames River:

Then by extracting the nodes of my polygon, I can see each individual geo points coordinates:

And then after migrating this dataset into my dbo.LondonBridgeCoordinates SQL Server table, I can use this script to create the very same geometry object in my database:

-- POLYGON definition
DECLARE @coords nvarchar(max);

WITH polygon
  CONCAT(LTRIM(STR(longitude, 22, 6)), ' ', LTRIM(STR(latitude, 22, 6))) AS coords
FROM [dbo].[LondonBridgeCoordinates]),
    ', ' + coords
  FROM polygon p
  FOR xml PATH (''), TYPE)
  .value('(./text())[1]', 'VARCHAR(MAX)'), 1, 2, '') AS polygon_coords,
    CONCAT(LTRIM(STR(longitude, 22, 6)), ' ', LTRIM(STR(latitude, 22, 6))) AS coords
  FROM [dbo].[LondonBridgeCoordinates]
  WHERE id = 1)
  AS first_coordinate)

SELECT  @coords = (polygon_coords) FROM polygon_coordinates
SELECT  geometry ::STPolyFromText('POLYGON((' + @coords + '))', 4326).MakeValid()

I can also save the output of the STPolyFromText function as a database object and use it with all other available spatial functions in SQL Server. 

Please let me know if you can find other uses besides checking geo objects intersection. There are so many possibilities!

Saturday, February 10, 2018

Power BI with different Network Visualizations

(2018-Feb-10) A few days ago Microsoft provided another update for its desktop version of the Power BI data analytical tool - Among all the major updates in this release, there was one new custom network visualization added by ZoomCharts. It wasn't the first network visualization available in the Power BI and I played a little bit with some of them in the past.

Currently, there are 5 network visual controls that you can add to your Power BI reports:
- Network Navigator Chart by Microsoft
- Force-Directed Graph by Microsoft
- Journey Chart by MAQ Software
- Social Network Graph by Arthur Graus
- Network Visualization by ZoomCharts

I used an open dataset from one of my previous reports based on the City of Toronto Lobbyist Registry and for testing various network visualizations I've used a portion of this data that describes all the communications between registered lobbyist and city officials.

Here is a data model of the original Power BI report and for this test case I only used the highlighted la_communication table:

Most of the network visualizations in Power BI share a similar concept, your dataset needs to have a least one descriptive attribute for a Source and another one as a Target for your network relationship. Let's take a look how all those five visual components are similar and different to each other.

Network Navigator Chart

Network Navigator lets you explore node-link data by panning over and zooming into a force-directed node layout (which can be precomputed or animated live).

From an initial overview of all nodes, you can use simple text search to enlarge matching nodes in ways that guide subsequent navigation. Network nodes can also be color-coded based on additional attributes of the dataset and filtered by linked visuals.

You can set source and target nodes using your dataset, it would also help if you add additional attributes to set different colors for your nodes. The difficult part is that with a very large dataset, this visualization becomes very slow to process and animate all the relationships between the nodes.

Force-Directed Graph

This custom visual implements a D3 force layout diagram with curved paths. The thickness of the path represents the weight of the relationship between the nodes.

Since the relationship and interconnection between a large set of entities could be very complex, the visual positions the nodes in such a way that there are few crossings as possible, making the exploration experience easy, fun. Users can also adjust the layout manually by simply dragging the nodes around.

This visualization is my favorite, it has its own animation engine. Relationships between a source and target nodes are marked with arrows which help to identify who initiates a communication and who is a recipient.

Journey Chart

Journey Chart by MAQ Software enables users to display complex, multi-stage lead paths in a clean, uncluttered visual. Nodes represent categories and vertices represent relationships between categories. The bigger the node or vertex, the larger the value. Customizable node colors make distinguishing between categories easy.

This visualization doesn't have a source or target nodes, it rather creates a metric journey between several categories (two minimum); it also has a root node and then it shows the first level of descriptive category and a relationship is then built using next level category. It's fun to play with this chart, you can move nodes and unfold its vertices; I would highly recommend this visual chart for your own data discovery.

Social Network Graph

Super easy network visualization to use; it has standard source and target nodes that could be linked to you dataset attributes. Along with basic node relationships, you can assign graphics node representations (aka photos) using URL based links. However, if you don't have those links then your report's network graph looks like a bunch of empty animated circles because the current version of this visualization doesn't support text label settings, only tooltip messages are available.

Network Visualization

ZoomCharts Network Visual enables tabular data visualization, exploration, and filtering using network layout. Clicking on any node acts as a filter for the rest of the dashboard, making it easy to drill-through data and boosting Power BI productivity.

In one way this network chart resembles the Journey chart by MAQ Software since it allows you to list all of the chart nodes by listing them in the order, like categories. From the other angle, it works like two other Microsoft mainstream network visualizations: you can see how nodes relationships are constructed and metric values define the size of each node. Plus the ZoomCharts network visualization has additional user controls that you can adjust right within your Power BI report, which is super cool! The one thing that I didn't like about this chart was that I couldn't hide or remove legend of my nodes (Lobbyist and POH Office name in my dataset case).

And, if you have reached to this point of reading this blog post, here is the actual Power BI report, fully interactive and accessible. I will let you decide which network Power BI visualizations you would like to use, and I already have my own preferences :-)

(2018-Feb-11) Update: On the set of unfiltered 13177 records of Source to Target pairs, the fastest network visualization in Power BI became the visual control by ZoomCharts and the slowest one was Force-Directed Graph by Microsoft.

Here is a brief summary of testing each of the individual custom visualizations. 

- Network Navigator - 20 seconds
- Force-Directed Graph - more than 300 seconds and I still didn't get the final network finished rendering (it works well with smaller datasets though).
- Journey Chart by MAQ Software - 120 seconds
- Social Network Graph - 40 seconds
- Network Visualization - 15 seconds

Sunday, February 4, 2018

Power BI with Azure Data Lake - where the water doesn't run dry

(2018-Feb-04) I've had a case working with very large data files as a source for my heat map visualizations in Power BI, where the main file was slightly above 600 MB - Not a big deal when you have only a few such data files that you can directly be connected in the Power BI Query Editor. However, it's a whole different story, when you need to maneuver between several hundreds of thousands of such large data files.

A quick side thought though, why don't we try to use the Azure Data Lake for storing such files and then using them for our Power BI data modeling:

Here are some of the highlights of the Azure Data Lake Store:
Built for Hadoop: A Hadoop Distributed File System for the Cloud
Unlimited storage: No fixed limits on file size, account size, or the number of files
Performance Tuned for Big Data: Optimized for massive throughput to query and analyze any amount of data
All Data: Store data in its native format without prior transformation 

So here is my story of putting the Azure Data Lake to work with my original data file.
First, I created my instance of the Azure Data Lake store and placed my 629 MB data file in the root folder of its HDFS repository:

Then I went straight to my Power BI desktop environment and connected to my Azure Data Lake Store that I had just created:

For the first time, it will ask for your credential to authentication yourself with your Azure subscription and then it will pass you to explore your data files repository in Azure using the Power BI Query Editor, which is really groovy by itself!

In this testing exercise, I'm only interested in my sourcing data.txt file, and by clicking the Binary link I immerse myself into the very structure of this file, which is basically a formation of 7 columns and ~15 million rows of data.

Here is a quick look at how the initial table extract looks in M language:

    Source = DataLake.Contents(""),
    #"data txt" = Source{[Name="data.txt"]}[Content],
    #"Imported CSV" = Csv.Document(#"data txt",[Delimiter="#(tab)", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = Table.TransformColumnTypes(#"Imported CSV",{{"Column1", Int64.Type}, {"Column2", Int64.Type}, {"Column3", type number}, {"Column4", type number}, {"Column5", type number}, {"Column6", Int64.Type}, {"Column7", Int64.Type}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Column1", "Station ID"}, {"Column2", "Series Number"}, {"Column3", "Date"}, {"Column4", "Temperature"}, {"Column5", "Uncertainty"}, {"Column6", "Observations"}, {"Column7", "Time of Observation"}})
    #"Renamed Columns"

It's worthwhile to mention how easy the M language is in converting HDFS data files into a tabular stucture with this Csv.Document function.

And then to transform my Date column value as a year and decimal fraction of the midpoint of the time period being represented, I just copied my previously used M script for this from my previous Power BI report:

Here is the final script that I used in my Azure-based dataset:

    Source = DataLake.Contents(""),
    #"data txt" = Source{[Name="data.txt"]}[Content],
    #"Imported CSV" = Csv.Document(#"data txt",[Delimiter="#(tab)", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = Table.TransformColumnTypes(#"Imported CSV",{{"Column1", Int64.Type}, {"Column2", Int64.Type}, {"Column3", type number}, {"Column4", type number}, {"Column5", type number}, {"Column6", Int64.Type}, {"Column7", Int64.Type}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Column1", "Station ID"}, {"Column2", "Series Number"}, {"Column3", "Date"}, {"Column4", "Temperature"}, {"Column5", "Uncertainty"}, {"Column6", "Observations"}, {"Column7", "Time of Observation"}}),
    #"Removed Columns" = Table.RemoveColumns(#"Renamed Columns",{"Time of Observation", "Observations", "Uncertainty", "Series Number"}),
    #"Added Custom" = Table.AddColumn(#"Removed Columns", "Custom", each [Date]),
    #"Renamed ColumnsB" = Table.RenameColumns(#"Added Custom",{{"Custom", "Year"}}),
    #"Inserted Round Down" = Table.AddColumn(#"Renamed ColumnsB", "Round Down", each Number.RoundDown([Year]), Int64.Type),
    #"Added Custom1" = Table.AddColumn(#"Inserted Round Down", "Custom", each ( [Date] - [Round Down] ) * 12 + 0.5),
    #"Changed TypeB" = Table.TransformColumnTypes(#"Added Custom1",{{"Custom", Int64.Type}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Changed TypeB",{"Year"}),
    #"Renamed Columns1" = Table.RenameColumns(#"Removed Columns1",{{"Round Down", "Year"}, {"Custom", "Month"}}),
    #"Removed Columns2" = Table.RemoveColumns(#"Renamed Columns1",{"Date"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Removed Columns2",{{"Year", type text}, {"Month", type text}}),
    #"Added Custom2" = Table.AddColumn(#"Changed Type1", "Custom", each [Year]&"-"&[Month]&"-01"),
    #"Changed Type2" = Table.TransformColumnTypes(#"Added Custom2",{{"Custom", type datetime}}),
    #"Renamed Columns2" = Table.RenameColumns(#"Changed Type2",{{"Custom", "Date"}}),
    #"Final Change" = Table.TransformColumnTypes(#"Renamed Columns2",{{"Year", Int64.Type}, {"Month", Int64.Type}})
    #"Final Change"

then I was able to get to the very same table that I had previously used in my Power BI heat map visualizations.

Basically, I just replaced my originat data table with the very same structure but being sourced from the Azure Data Lake Store, and Power BI data integration experience was only slightly changed at the very beginning. 

And while you have your data files in this lake then your data waters won't run dry!

Saturday, January 20, 2018

Get Data experience from Excel to Power BI and SQL Server Analysis Services 2017

(2018-Jan-20) It has been a long journey for the Excel data integration tool Power Query from its early version as an external add-in component in 2010 and 2013 Excel to the built-in feature of the Excel 2016. 

Power Query for Excel provides a seamless experience for data discovery, data transformation and enrichment for data analysts, BI professionals and other Excel users. With Power Query, you can combine data from multiple data sources and shape it in order to prepare the data for further analysis in tools like Excel and Power Pivot, or visualization in tools like Power View and Power Map.

It only takes to click New Query button of the Data ribbon to start working with Power Query in Excel, and then the whole experience of selecting different data source begins:

Currently, Excel Power Query allows creating 30 different types of data source connections.

Limiting your data discovery experience with the Excel tool, you tend to ask yourself about the existing worksheets and data models constraints that may prevent you to work with larger file datasets or connecting to a wider range of server databases.

Some of the existing limitations of data modeling in Excel 2016 are:

A) Total number of rows and columns on a worksheet:
  • 1,048,576 rows by 16,384 columns

B) Maximum limits of memory storage and file size for Data Model workbooks:
  • 32-bit environment is subject to 2 gigabytes (GB) of virtual address space, shared by Excel, the workbook, and add-ins that run in the same process. A data model’s share of the address space might run up to 500 – 700 megabytes (MB), but could be less if other data models and add-ins are loaded.
  • 64-bit environment imposes no hard limits on file size. Workbook size is limited only by available memory and system resources
Making a further step into the right direction of the data modeling and data discovery experience with Power BI, you just have to say, where is my data, and in a natural way it gives the option but choosing the Get Data path:

Currently, Power BI allows creating 79 different types of data source connections.

Power BI data modeling limitations:

A) Power BI Desktop

  • there is no restriction on how large a desktop Power BI file can be locally

B) Power BI Service Online (Free and Pro licenses):

  • workspace limit is 10 GB
  • dataset limit is 1 GB

C) Power BI Service Online (Premium license):

  • workspace limit is 100 TB
  • dataset limits are (they may change in future):
    • P1 SKU < 3 GB
    • P2 SKU < 6 GB
    • P3 SKU < 10 GB

The exciting thing is that with SQL Server 2017 Analysis Services for tabular data models of 1400 compatibility level and higher, Microsoft has introduced the very same Get Data experience:

And even Data Query editor with M languages capabilities is now available in Visual Studio, isn't this amazing! 

What's new in SQL Server 2017 Analysis Services

Currently, Visual Studio Data Tools for SSAS Tabular allows creating 35 different types of data source connections. But even if the Power BI experience has a lot more variety of various data types to connect, SSAS Tabular now has more advantages of migrating a self-service Power BI models into an organization analytics platform. 

And from the data size limitations perspective, the sky is the limit, or Azure, to put it correctly :-)

Saturday, January 13, 2018

When it’s cold outside, perhaps Power BI heat maps can warm you up

(2018-Jan-13) You never know how the weather will behave itself, especially when the temperature gets dramatically changed from +10 to -10 within a day. During this Christmass and New Year holidays, it was very cold here in Canada. So I thought to explore some open data set that may contain historical temperature records and if possible geo-coordinates to locate those thermal data points to.

About two years ago I already had created a blog post “Excel Power Map vs. Power BI Globe Map visualization” based on the Berkley Earth datasets - At that time I was probing the 3D Globe Power BI visualization, the one that you can use and rotate it right within your report.

This time I thought to check and compare heat map visualizations in Power BI. Currently, there are two ways to see variously heated or cooled geo areas in Power BI, either using a standard ArcGIS Map visualization or a custom Power BI Heatmap visualization from the Office store -

The Berkley Earth dataset for this data case exercise came in a form of multiple flat files:

I only used the following fields from the data.txt file:
- Station ID, 
- Date, 
- Temperature (C)
And from the site_detail.txt file I extracted the following fields to provide more geo details for each of the weather stations that there was temperature data point available:
- Station ID, 
- Station Name, 
- Latitude, 
- Longitude, 
- Country

And with some minor data transformation, I was able to build my Power BI data model with two tables:

Just a side note on a data extract technique, SQL Server Management Studio (SSMS) version 17.3 or later now has a new feature to Import Flat File along with the common Import and Export Data, you can read more about it here -

So, after populating the data: temperature ~15.5 million records for the date time range from the year 1763 till 2017 and close to 43 thousand of various weather stations around the globe, I was able to place it into my Power BI report and test both types of the Heat Map visualizations. I additionally filtered my dataset then to show the Canadian weather stations landscape only and chose the year 2017 for the most recent data.

I could also change colors for both visualizations, either by manual selection or by specifying a color palette in the edit mode:

ArgGIS Map visualization has a lot of detailed settings that you can choose and apply for your various geo-analytics scenarios (and not only your heat map ones), however, the custom Heatmap visualization brings even more flexibility for adjusting your visualization that ArcGIS Map doesn't have.

So, there are no winners or loosers and I can easily use them both!

Friday, December 29, 2017

Power BI Sankey Chart: a case to review your Budget books

(2017-Dec-29) Sankey diagram (chart), named after Matthew H Riall Sankey, is a good visualization tool for information flow within a system. It can also identify main players (contributors) to different stages of this flow, where those stages can be textually described and flow thickness its distribution can be attributed to a numeric value of your dataset.

A usual use case for Stanley charts could a population migration between countries, as it has been shown in the Microsoft Power BI demo - Or, I've found it's a very fascinated case to show voters flow between first and second rounds in the recent presidential elections in France - French election results: Macron’s victory in charts via @financialtimes

I have been thinking about another case where Sankey diagrams could be used as well, which is a budget analysis, with multiple revenues flow contributing to an overall budget and then spending streams coming out of the same budget.

So I took the recent City of Toronto Budget 2018 open dataset: With naturally good pie charts for separate revenue and spending categories. Here is how my two initial datasets looked like in Power BI:

Then for Revenue, I renamed my customer Category column to Source and added one more additional custom column Destination with a value "Budget". For Spending dataset, I added one custom column Source with a value "Budget" and renamed its custom category to Destination. Thus to show how sourcing revenue streams flow to the budget and then they flow from the sourcing budget to the sending destination. Here is how my final Sankey Budget dataset looked like:

Which could be used for the Sankey Power BI visualization that mainly requires Source, Destination, and Weight of the flow.

And now it looks like a budget book with revenue and spending data streams together.
Here is my actual report in Power BI that show this all information along with the regular pic charts, please let me know what you think.