Simulated Streaming Data -> Power BI Embedded

This post is about something new I have tried last week. The goal was to create simulated streaming data source, feed it into Power BI as a streaming dataset, create a report out of the streaming dataset, and then embed it to an web application. With proper directions provided by my teammates, I finished the implementation from end to end within 1.5 hours. I was super impressed by how awesome it is and how easy it is to implement so that I want to share those directions to you.

Main Step 1 Create simulated streaming data and push it to a Power BI as streaming dataset

  1. The basic steps include:
    1. Set up an Azure IOT hub to capture the data
    2. Use IOT Simulator https://www.azureiotsolutions.com/Accelerators#dashboard to simulate streaming data
    3. Create a Power BI App Workspace in PowerBI.com
    4. Set up Streaming Analytics Job to connect data coming from IOT hub to Power BI datasetstreaming analytics job

  1. There is a free tutorial course from EDX talks about the above steps in details. You do not have to finish all the tutorial and you can just go through the ones in green check marks as shown in below.

https://www.edx.org/course/iot-business-intelligence-for-iot-solutions

                 edx tutorial for iot

Main Step 2: Create a Power BI report using the streaming dataset

  1. Open Power BI Desktop
  2. Click on Get Data
  3. Choose ‘Power BI dataset’
  4. Choose the Power BI streaming dataset you just created from Main Step 1

powerbi datasets

  1. Once you connected to the streaming dataset, you can use it to create your graphs and charts.
  2. Lastly, click ‘Publish’ to save your PBIX file back to PowerBI.com.

Main Step 3 Embed your Power BI report to your application

  1. If you need to embed your report into your application, please follow this tutorial https://docs.microsoft.com/en-us/power-bi/developer/embed-sample-for-customers

This tutorial introduced the following important topics which we have not done in the previous steps

  1. How to register an application in Azure Active Directory (Azure AD) which allows your application to interact with Power BI.
  2. It provides a sample application package from GitHub. You can just update webconfig file of the sample package on applicationID you just created, the workspaceid where your report locates, the reportid of your report, and your Power BI credential.

  1. Lastly, to set up automatic refresh of your streaming dataset on your application, please try the following blog

http://radacad.com/integrate-power-bi-into-your-application-part-4-refresh-data-set

 

Thank you,

Your friend, Annie

 

How to calculate PMT and CUMPRINC in DAX

PMT is a function in excel for the periodic payment for a loan.

In excel, The Syntax is PMT(rate, nper, pv, [fv], [type])

  • Rate    Required. The interest rate for the loan.
  • Nper    Required. The total number of payments for the loan.
  • Pv    Required. The present value, or the total amount that a series of future payments is worth now; also known as the principal.

Yet, in DAX, there is no built in function yet for PMT. To calculate this we need to reapply for its mathmatic logic.

So, you can use

DAXVersionPMT = ((Pv*Rate))/(1-((1+Rate))^(-1*Nper))))*(-1) 
// Remind: if you are trying to calculate monthly payment but rate is annual rate, then you need to convert to monthly rate by devide to 12
CUMPRINC is the accumuated principle amount for a loan 

In excel, the syntax is CUMPRINC(rate, nper, pv, start_period, end_period, type)

The CUMPRINC function syntax has the following arguments:

  • Rate    Required. The interest rate.
  • Nper    Required. The total number of payment periods.
  • Pv    Required. The present value.
  • Start_period    Required. The first period in the calculation. Payment periods are numbered beginning with 1.
  • End_period    Required. The last period in the calculation.
  • Type    Required. The timing of the payment.

In DAX right now, there is no built in function fo rCUMPRINC yet.
So, you can use the follow DAX
Accumulated Principle = var p=[PV]

var i=[Rate]

var n=[Nper]

var b=[Start_period]

var e=[End_period]

var m=[DAXVersionPMT]*(-1)

var q=12 // if rate is annual rate and you calculated you PMT as monthly period, then you need to convert to monthly rate by devide to 12, if the rate is for monthly rate, then you just need to use q=1

Return ((p-m*q/i)*(1+(i/q))^(e)+m*q/i)-((p-m*q/i)*(1+(i/q))^(b-1)+m*q/i)

Thanks,
Your friend,Annie

How to use T-SQL to validate all SSIS packages at once

Recently, I need to do a data warehouse migration for a client. Since there might be some difference between the Dev environment source databases and Prod environment source databases. The migrated SSIS packages for building data warehouse might have some failures because of the changes. So the challenge is how can I validate all my DW packages (100 +) all at once.

There are a good amount of posts out there to do one package validation at a time using the Management Studio GUI (just right click on the package under SSIS Catalog)

validate SSIS package

Or using T-SQL store procedure [SSISDB].[catalog].[validate_package]. Here is the explanation of this store procedure from Microsoft.

To get the validation all at once, we can use a loop query to execute [SSISDB].[catalog].[validate_package] one by one.

Here is the code I created and would love to share.

DECLARE @i INT= 1;
DECLARE @validation_id BIGINT;
DECLARE @packageName NVARCHAR(250);
WHILE @i <= (SELECT MAX(package_id)
FROM SSISDB.catalog.packages)
BEGIN
SET @packageName = (SELECT concat(NAME, ‘.dtsx’)
FROM SSISDB.catalog.packages
WHERE package_ID = @i);
EXEC [SSISDB].[catalog].[validate_package]
@package_name = @packageName,
@folder_name = N’(the folder name which the packages are in)‘,
@project_name = N’(the project name which the packages are in)‘,
@use32bitruntime = False,
@environment_scope = S,
@reference_id = 2,   /* this is the reference id of the environment the package may use. If there is no environment reference, you can assign NULL’ */
@validation_id = @validation_id OUTPUT;
SET @i = @i + 1;
END;
After you ran the above statements, you can when use the following query to check the validation result.
SELECT  object_name,
                CASE status
                    WHEN 4
                    THEN ‘Fail’
                    WHEN 7
                    THEN ‘Success’
                END AS ‘Status’
FROM SSISDB.catalog.validations;
Thanks for viewing and I would love to have any feedbacks.
Your friend, Annie

PowerBI – two ways to dynamically change measures in the same visual

I love that you can always find multiple ways to solve one problem. Inspired by the Youtube video by Guys in the cube and blog reading from SQLBI.com, I found there are two ways to dynamically change measures in the same visual. The first way is more towards front end report developer using bookmark and the second way is more towards data modeler using a technique called disconnected tables.

Bookmarks method:

Bookmarks

The basic steps are:

  1. Enable Bookmarks under File ->’Options and Settings’ ->Options->Preview Features->’Bookmarks’
  2. Create one visual for example a bar chart with first measure, for example Revenue by Sales Territory
  3. and copy paste the visual 2 times. in the coped visuals replace the measure you want to replace. For example cost by sales territory.
  4. Overlay the three visuals together.
  5. Go to ‘View’ on the top tool panel and make sure ‘Bookmarks Pane’ and and ‘Selection Pane’ are checked
  6. Add bookmarks with selection of visuals to be visible associated with the bookmark you created
  7. .Bookmark setting
  8. Add the images/shapes which represent the measure you want to select on and link it to the bookmark you created.
  9. Image link

Disconnected Table method:

This method is more towards PowerBI modelers. Basically, the idea is to have a Field in a independent table (no relationship to other tables) as Slicer with your measure choice and then create a measure using SELECTEDVALUE function to have the measure dynamically switch referring measures based on the choice made on the slicer.

Step 1. Created an new table called ‘Measurechoice’ like image below.

Discounnected table

Step 2. Create a measure called ‘Selected Value’

Selected Value = SWITCH(SELECTEDVALUE(‘Measurechoice'[Index]),1,[Total Revenue],2,[Total Cost],3,[Total Profit],BLANK())

Step 3. Create a slicer using ‘MeasureChoice’ from the new table created.

Step 4. Create a visual use the ‘Selected Value’ as Value

Selection Result

Personally, I like the disconnected table method because I don’t need to spend time on creating images and link bookmarks. However, the bookmarks choice can help dynamically change the title of the visual and give you the ability to change the layout of your bookmark views. Now it is your choice to choose either one to fit your reporting needs.

Thanks for spending time reading this blog.

Database table partitioning – Build filegroup and partitions

Step 1

CREATE PARTITION FUNCTION PF_Fact_test (int)

AS RANGE

LEFT FOR VALUES

(20140101,20150101,20160101);

]

Step 2 

CREATE PARTITION SCHEME [PS_iGPT_FACT] AS PARTITION [PF_iGPT_FACT] TO ([iGPT2014], [iGPT2015], [iGPT2016],[iGPT2017])

Step 3

 Assign Partition Scheme and Partition Funtion to a table

USE [iGPT]

GO

BEGIN TRANSACTION

CREATE CLUSTERED INDEX [ClusteredIndex_on_PS_Fact_Test_636351103815297616] ON [dbo].[iGPTextraInfo]

([MaxDateAces])WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PS_Fact_Test]([MaxDateAces])

DROP INDEX [ClusteredIndex_on_PS_Fact_Test_636351103815297616] ON [dbo].[iGPTextraInfo]

COMMIT TRANSACTION

Step 4

Load data into the partitioned table

 –The following query returns one or more rows if the table PartitionTable is partitioned

SELECT *

FROM sys.tables AS t

JOIN sys.indexes AS i

ON t.[object_id] = i.[object_id]

AND i.[type] IN (0,1)

JOIN sys.partition_schemes ps

ON i.data_space_id = ps.data_space_id

WHERE t.name = ‘GPTClaimPendedFact’;

 

 

— The following query returns the boundary values for each partition in the PartitionTable table.

SELECT t.name AS TableName, i.name AS IndexName, p.partition_number,p.rows AS Rows, p.partition_id, i.data_space_id, f.function_id, f.type_desc, r.boundary_id, r.value AS BoundaryValue

FROM sys.tables AS t

JOIN sys.indexes AS i

ON t.object_id = i.object_id

JOIN sys.partitions AS p

ON i.object_id = p.object_id AND i.index_id = p.index_id

JOIN sys.partition_schemes AS s

ON i.data_space_id = s.data_space_id

JOIN sys.partition_functions AS f

ON s.function_id = f.function_id

LEFT JOIN sys.partition_range_values AS r

ON f.function_id = r.function_id and r.boundary_id = p.partition_number

WHERE t.name = ‘iGPTInquiryFact_WGS’ AND i.type <= 1

ORDER BY p.partition_number;

 

 

–The following query returns the name of the partitioning column for table. PartitionTable.

SELECT

t.[object_id] AS ObjectID

, t.name AS TableName

, ic.column_id AS PartitioningColumnID

, c.name AS PartitioningColumnName

FROM sys.tables AS t

JOIN sys.indexes AS i                           ON t.[object_id] = i.[object_id]  AND i.[type] <= 1 — clustered index or a heap

JOIN sys.partition_schemes AS ps  ON ps.data_space_id = i.data_space_id

JOIN sys.index_columns AS ic             ON ic.[object_id] = i.[object_id]  AND ic.index_id = i.index_id   AND ic.partition_ordinal >= 1 — because 0 = non-partitioning column

JOIN sys.columns AS c                           ON t.[object_id] = c.[object_id]   AND ic.column_id = c.column_id

WHERE t.name = ‘GPTClaimPendedFact’ ;

 

 

 

SELECT row_count              ,*

FROM sys.dm_db_partition_stats

WHERE object_id = OBJECT_ID(‘dbo.GPTClaimPendedFact’) –and row_count > 0

order by partition_number

SELECT * FROM sys.dm_db_partition_stats;

SELECT SUM(used_page_count) AS total_number_of_used_pages,

SUM (row_count) AS total_number_of_rows

FROM sys.dm_db_partition_stats

WHERE object_id=OBJECT_ID(‘dbo.GPTClaimPendedFact’)    AND (index_id=0 or index_id=1);

New DAX Functions only in SSAS 2016 and Above

SQL Server 2016 Analysis Services (SSAS)*, Power Pivot in Excel 2016, and Power BI Desktop include the following new Data Analysis Expressions (DAX) functions:

Date and Time Functions

CALENDAR Function
CALENDARAUTO Function
DATEDIFF Function

Information Functions

ISEMPTY Function
ISONORAFTER Function

Filter Functions

ADDMISSINGITEMS Function
SUBSTITUTEWITHINDEX Function

Math and Trig Functions

ACOS Function
ACOSH Function
ASIN Function
ASINH Function
ATAN Function
ATANH Function
COMBIN Function
COMBINA Function
COS Function
COSH Function
DEGREES Function
EVEN Function
EXP Function
GCD Function
ISO.CEILING Function
LCM Function
MROUND Function
ODD Function
PI Function
PRODUCT Function
PRODUCTX Function
QUOTIENT Function
RADIANS Function
SIN Function
SINH Function
SQRTPI Function
TAN Function
TANH Function

Statistical Functions

BETA.DIST Function
BETA.INV Function
CHISQ.INV Function
CHISQ.INV.RT Function
CONFIDENCE.NORM Function
CONFIDENCE.T Function
EXPON.DIST Function
GEOMEAN Function
GEOMEANX Function
MEDIAN Function
MEDIANX Function
PERCENTILE.EXC Function
PERCENTILE.INC Function
PERCENTILEX.EXC Function
PERCENTILEX.INC Function
SELECTCOLUMNS Function
XIRR Function
XNPV Function

Text Functions

CONCATENATEX Function

Other Functions

GROUPBY Function
INTERSECT Function
NATURALINNERJOIN Function
NATURALLEFTOUTERJOIN Function
SUMMARIZECOLUMNS Function
UNION Function
VAR

How to use Dynamic Management Views against on your Desktop PowerBI reports

Power BI contains a local instance of Analysis Services tabular model. By querying Dynamic Management Views (DMVs) query against PowerBI desktop, we can get metadata information about your PowerBI model.

Here are the steps to do so:

Step 1: Open your Power BI report

Step 2: Find your Power BI Analysis Model Instance Port ID.

There are two ways to do that:

Option 1: Open up DAXStudio (a great free tool to help you develop DAX). And then connect to the Power BI report you opened.

Get DAX 1.PNG

Then, Find the local Analysis Service instance address of this Power BI report on the right bottom of the DAX studio window

Get DAX 2.PNG

Option 2: Fine the Power BI temp directory
C:\Users\username\AppData\Local\Microsoft\Power BI Desktop SSRS\AnalysisServicesWorkspaces\…\Data

PowerBI Port

 

Step 3. Open SQL Server Management Studio. And connect to the local instance of Analysis

Get DAX 3

 

Step 4: Create an new query against the only Database under the local instance

Get DAX 4

Step 5: In the query window, you can then run the Dynamic Management Views (DMVs) to Monitor your PowerBI local instance.

The ones I use often including the following:

  • The DMV provide the DAX query behind the report:
Select * from $System.discover_sessions
  • This DMV Provide you all the fields in your model
Select  * from $system.MDSchema_hierarchies
  • This DMV Provide you all the measures in your model
Select * from $System.MDSCHEMA_MEASURES

Find relationships:

Select [ID], [ModelID], [IsActive], [Type], [CrossfilteringBehavior], [FromTableID], [FromColumnID], [FromCardinality], [ToTableID], [ToColumnID], [ToCardinality], [ModifiedTime]
from $SYSTEM.TMSCHEMA_RELATIONSHIPS
Select [ID], [ModelID], [Name]from $SYSTEM.TMSCHEMA_TABLES
Select [ID], [TableID], [ExplicitName] from $SYSTEM.TMSCHEMA_COLUMNS

For More DMV, there is a good post: https://datasavvy.me/2016/10/04/documenting-your-tabular-or-power-bi-model/

Bonus

If you do not have SSMS, you can use Power Query in Excel or Power BI to query those DMVs, here is an sample M query.

let
    Source = AnalysisServices.Database(TabularInstanceName, TabularDBName, [Query=”Select [ID], [TableID], [Name], [Description], [DataSourceID], [QueryDefinition], [Type], [Mode], ModifiedTime from $SYSTEM.TMSCHEMA_PARTITIONS”]),
    #”Renamed Columns” = Table.RenameColumns(Source,{{“ID”, “ID”}, {“DataSourceID”, “Data Source ID”}, {“QueryDefinition”, “Query Definition”}, {“ModifiedTime”, “Modified Time”}})
in
    #”Renamed Columns”

Thanks

Your friend, Annie

 

 

Compare Formula Engine VS. Storage Engine in SSAS Tabular

To improve SSAS tabular performance or to improve your DAX query, it is important to know the different between formula engine and storage engine. Here, I have created a table for you to better distinguish these two in Tabular (I need to reinforce it is for tabular modeling not multidimensional because the back end technology used is quite different). Both engines play vital roles to process DAX query requests.

FE and SE

Category Formula Engine Storage Engine
Query received Interpret DAX/MDX formula Can handle single logic (xmSQL) from formula Engine
Target Data Iterate over datacaches produced by storage engine (datacaches are in-memory tables) Iterate over compressed data in vertipaq column stores
Result Produce result set and send back to requestor Produce Datacaches send back to formula Engine
Thread Single – Threaded Multi – Threaded
Cache utilization No Yes
Area of focus for  Performance Tuning Check physical plan for bottleneck Check xmSQL query for bottleneck

 

Different roles in a typical BI project

Today, I would like to talk about the different roles in a typical BI project. Like all technology-related initiatives, it is vital to assign clear responsibilities and distinguish different responsibilities in order to achieve a successful BI project result.

 

  • Project Manager – This role is needed to understand the overall scope of the BI project including the objectives, timeline, budget, and resources. Based on these parameters, they will need to work with other individuals within the project team to come up with timelines and resource assignments. During the development phase, he/she will monitor the project’s progress and make adjustments as needed.
  • BI Solution Architect – This person will be in charge of the overall design of the data warehouse. They will focus on creating the vision of the entire BI process, development and lifecycle. They will work directly with business users and the project manager to design actionable insights while concurrently working with the ETL developer, DBAs, and Data Modelers to break down the vision into detailed actionable tasks.
  • Data modelers – The Data Modelers will be responsible with the design of the data warehouse schema and the data models. They need to understand the business requirements, the vision produced by the BI solutions architect, and the physical capabilities of the environment. In addition, the will be responsible for creating the data warehouse schema, the ETL requirements, the data models, and the design model processing schedules.’
  • Database Administrators (DBA) – These individuals will be in charge of designing the physical Database Architecture and configuration, security settings, as well as data source identification and data source management.
  • Infrastructure Specialists – These individuals are responsible for ensuring that the servers and networks are set up correctly to meet the current and future requirements. They will also be responsible for the server upgrade and migration.
  • ETL Developers – The ETL Developers are responsible for building the workflows from external source to the Data Warehouse. They will be working closely with the DBAs and Data modelers to understand the source and target form of the data as well as the timing of ETL workflows.
  • Report developers – These individuals, of all the other roles, work closest with the business users. Their role is to work with business users to design the reports and schedule automated report generation while providing the data modelers with future model requirements.
  • Business users – These individuals provide requirements to the BI development team. They provide initial abstract pertaining to what they want at the beginning of the project. As prototype products come out, these abstracts become more detailed and final.
  • Testers – Testers are responsible for Unit Testing, Integration Testing, Functional Testing, and Systems Testing. Performance Testing, Usability Testing and Acceptance Testing.

(https://www.codeproject.com/Tips/351122/What-is-software-testing-What-are-the-different-ty)

 

Often time, all the project team members work closely with each other. Usually, for a large project, one or more team members will wear multiple hats pertaining the roles above. In smaller BI teams, one person may play different roles. Although this may create some extra burden in terms of workload, it reduces the onerous need for establishing communication between different persons. I usually see the BI solution architect, data modeler and ETL developer to be one person at the beginning of the development phase and as the project grows, that person will begin outsourcing his/her roles to other team members.

 

Thanks,

Your friend – Annie

DAX – Filter Context V.S Row Context

A key to understanding DAX is to distinguish Filter Context and Row Context.

Here are definitions of each context.

Filter Context: can be simply defines as filter context the set of filters applied to the evaluation of a DAX expression, usually a measure, regardless of how they have been generated. Usually, every cell of a report has a different filter context, which can be defined implicitly by the user interface (such as the pivot table in Excel), or explicitly by some DAX expression using CALCULATE or CALCULATETABLE. Filter Context always exist before a DAX expression been evaluated.

 

Row Context: the concept of row context is always exist in a DAX expression. In order to get the value of a DAX expression, you need a way to tell DAX the row to use AKA checking the current row of a table and provide the value back. You have a row context whenever you iterate a table, either explicitly (using an iterator) or implicitly (in a calculated column).

 

Use one DAX measure as an Example:

SumofProductSalesAmount:=SUMX(

VALUES([Product].[ProductName]),   

CALCULATE (

        [Sales Amount]

    )

)

 

Explain in detail of this calculation:

  1. VALUES Function returns a table, meaning it provide All the Distinctive ProductName back as a table outcome
  2. SUMX iterate each row of the table in its first argument, then it aggregate the value each row returns in sum. In this case, for each row of a table which provided by VALUES function AKA for each product, it calculate the value in the second argument of the SUMX function. Then, Sum all the value together and return back to the caller.
  3. Things happens outside CALCULATE Function are all treated as Filter Context of the CALCULATE expression. In this case, there are two places feeding the CACULATE section with Filter Context.
    • First place is, the Current Row defined by SUMX, in our case, it is the Current Row of VALUES([Product].[ProductName]) which will be a specific Product depends on where the iterator at. Meaning Evaluated [Sales Amount] Measure with a filter which limit to a certain Product, and then provide the value (sales amount) back.
    • Second Place is, whenever this measure [SumofProductSalesAmount] exposed in a cell user application (like Excel, PowerBI, SSRS), it exposed in a filter context set on that specific Cell. Depends on wherever your cell is at you have different filter context. For example you may have a filter on a cell to limit the time frame to 2017, etc.

 

Thanks,

Your friend – Annie.

Blog at WordPress.com.

Up ↑