Power Platform Tidbits 10: How to split address into separate columns in Power Query

Sometimes address values are stored in a single text column and you’d need to split it into separate columns. For example when migrating data into Power Platform in a data flow, or when building a Power BI dataflow, or simply in Excel. In this post am going to show you how you can do this in a reliable way. I am going to make some assumptions first, but you can always fine-tune the formula for other address patterns.

Assumption

The addresses I’m going to work with look like the followings.

  • François Vervloetstraat 100, 1080 Uccle
  • Koning Albertstraat 14, Menen

The syntax we are looking for is:

first-line, [postcode] city-name

In other words, the first line address is always present. The postcode is optional. the city name is always present. It might be that there is no address at all.

Solution

  #"Split address" = Table.SplitColumn(#"Previous step", "address", Splitter.SplitTextByDelimiter(","), {"Line 1", "Line 2"}),
  #"Split by character transition" = Table.SplitColumn(#"Split address", "Line 2", Splitter.SplitTextByCharacterTransition({"0".."9"}, (c) => not List.Contains({"0".."9"}, c)), {"Postcode", "City"}),
  #"Replaced nulls" = Table.ReplaceValue(#"Split by character transition", null, each [Postcode], Replacer.ReplaceValue, {"City"}),
  #"Replaced postcodes" = Table.ReplaceValue(#"Replaced nulls", each [Postcode], each if [Postcode] = [City] then null else [Postcode] , Replacer.ReplaceValue, {"Postcode"}),
  #"Changed types" = Table.TransformColumnTypes(#"Replaced nulls", {{"Line 1", type text}, {"Postcode", type Int16.Type}, {"City", type text}})

Let’s go over the solution, line by line.

  1. The first line assumes that there is a Previous step and an address column. It splits the address column into two columns, Line 1 and Line 2 by looking for a comma character.
  2. The Line 2 column is split into two other columns Postcode and City based on character transition from digits ({"0".."9"}) to non-digits. If the postcode was not optional, we could stop here, but we need two more steps to address this issue.
  3. This step sets the Postcode to the City whenever Postcode is null. After this step, whenever the postcode is not present in the address, both Postcode and City column will contain the city name.
  4. This step sets the Postcode to null whenever the Postcode and City columns have the same value as the result of previous step.
  5. The last step is just setting the type of columns, because after applying Table.SplitColumn transition, the new columns might not have proper types.

More about splitting by character transition

The most interesting step of the solution is the one that splits columns based on character transition. Let’s break it down. This step is a mix of two different table functions. If we look at the syntax of Table.SplitColumn function, it looks like this:

Table.SplitColumn(table as table, sourceColumn as text, splitter as function, optional columnNamesOrNumber as any, optional default as any, optional extraColumns as any) as table

The first parameter is the table, the second one is the name of the column that will split, the third one is the function that splits the value. This function will be given the value of the column and should in turn, return a list of values. It could be as simple as (value) => {"value 1", "value"}, but we use a native function that can split values for us, and that is Splitter.SplitTextByCharacterTransition. This function has the following syntax:

Splitter.SplitTextByCharacterTransition(before as anynonnull, after as anynonnull) as function

This function splits text into a list of texts according to a transition from one kind of character to another. The before and after parameters can either be a list of characters, or a function that takes a character and returns true/false. As before we are passing {"0".."9"}, but for after we are passing a function to detect any characters other than {"0".."9"} this makes the detection more inclusive. In other words (c) => not List.Contains({"0".."9"}, c).

Read more

Azure Uploader for Power Platform is here

Azure Storage with its high availability and numerous features has always been a storage of choice for many requirements. When it comes to Power Platform we should carefully weigh the options that we have for storing files. Besides the native Dataverse that is fully integrated with the platform, but comes with the highest costs, there are other alternatives like SharePoint and Blob storage in Azure Storage.

Diagram displays a horizontal line with three logos displayed on top. On the left, the line points to Best integration,
Power Platform storage options comparison

Azure Uploader is a PCF control that is built to reduce the effort of integrating with Azure’s Blob storage. It makes it easy to upload files with virtually no limit in size, directly to blob storage in a secure way.

You might be wondering why directly from the browser and what about the security. To store files in blob container you have two major options.

  • The first option is to upload the file to a proxy service and the proxy service would then upload the file to the storage behind the scene. This gives you the advantage of being able to do some processing (e.g. validate the content) before storing the file. But it comes with the extra effort required to build the service and maintain it over time. In addition to that you need to make sure this new service has the required availability (or SLA) required for your application. As the old saying goes, you will have one more layer that things can go wrong.
  • The second option is to have a very simple proxy service (that you might already have) generate a short-lived SAS token with limited access, share it with the client and the client would take care of uploading the file. This way you can rely on the availability and capacity of Azure Storage service.

You should consider that files coming from the client cannot be trusted, so you might need to put in place a process to validate them before use. In the first approach the proxy service is able to validate files during the transfer, but in the second approach you should do this validation later. For example a Power Automate flow, a Logic App, or an Azure Function App can be triggered and do this asynchronously.

Azure Uploader relies on the second approach. This means that you just need to generate a SAS token in your app and share it with the uploader. Generating a SAS token is very easy thanks to the out-of-the-box Azure Storage Blob connector.

The following Power FX code, generates a new SAS token using Azure Storage Blob connector (called AzureBlobStorage) and stores it in a variable called SasToken.

Set(
    SasResult,
    AzureBlobStorage.CreateShareLinkByPathV2(
        "mystorageaccount",
        "/mycontainer",
        {
            AccessProtocol: "HttpsOnly",
            ExpiryTime: DateAdd(Now(), Hours, 1),
            Permissions:"Write,Add,Create"
        }
    )
);
Set(
    SasToken,
    Mid(SasResult.WebUrl, Find("?", SasResult.WebUrl) + 1)
);
  • The above code, generates a code that is valid only for one hour DateAdd(Now(), Hours, 1) and only has Write,Add,Create permissions.
  • The SasResult variable that is set in the first step contains a property WebUrl which holds a URL that has the SAS token in its query string. That is why the line Mid(SasResult.WebUrl, Find("?", SasResult.WebUrl) +1) is used to extract only the part after “?” store it in SasToken variable.

Introducing PSDataverse, PowerShell’s new super power

PowerShell is the solution for task automation. It has everything you need to iterate fast. A command-line shell with auto-completion, a scripting language that is object oriented and it runs on Windows, Linux, and macOS. PowerShell’s command-line shell is the best in class with tab completion, and even command prediction, aliases, in-console help (Unix man pages anyone) and my favorite feature Pipelines that lets you chaim commands together. You can even debug directly in the shell! If you are new to PowerShell start here.

Dataverse on the other hand is a managed relational database and a lot more. It hosts Dynamics 365’s data, provides rich metadata, and can even host logic and validation with a range of its own productivity tools. PowerShell can be handy to automate tasks, and orchestrate one-off solutions in Power Platform. From another perspective Dataverse can be usefull as a managed database. You don’t need a licese if the user is not a human, by the way.

This picture depicts an equation. The first figure is showing PowerShell Hero, then a plus sign, then the logo of Microsoft Dataverse, then an equal sign and the result of the equation is PowerShell hero with logo of Dataverse hovering over her palm.
PowerShell + Dataverse = PSDataverse

While working with Power Platform, often time you migh need to quicky try something to see if it works, or to just check some data real quick. Other times you might need to automate some tasks or integrate processes that are not exposed through the UI. Take data migration as an example, or simply loading a a couple of huge CSV files with some data transformation. Are you going to build a fully custom .NET solution just for that? or are you going to use a full-fledged SSIS solution with its notoriously troublesome Script components and non-Git friendly projects? you other options are learning 3rd-party one size fits all cloud-based services with their edge cases. Well, now you have a 3rd option. PSDataverse, a PowerShell module that brings Dataverse to PowerShell.

Show me an example

Let’s check how easy it is to load a CSV file called Account.CSV to Microsoft Dynamics 365.

Connect-Dataverse "authority=https://login.microsoftonline.com/<tenant-id>/oauth2/authorize;clientid=<client-id>;thumbprint=<client-cert-thumbprint>;resource=https://<environment>.crm4.dynamics.com/"
Import-Csv "Account.csv" | ForEach-Object {
    @{ Uri = "account", Method= "PATCH", Value = @{
        name = $_.AccountName,
        overridencreatedon = $_.CreationDate,
        ...}}} | ConvertTo-Json | Send-DataverseOperation
        

You know, that technically is two lines of code in PowerShell. Sure, I’m cheating here. I have piped four commands together, but that makes it even easier to read. Let’s go over what I’m doing here.

  • The first line is obviously using Connect-Dataverse command to connect to Dataverse using a client certificate.
  • The second line (which I have broken into multiple lines for clarity):
    • Imports a CSV file called Account.csv.
    • Transforms each line into a Dataverse operation using a ForEach-Object.
    • Converts operations to JSON using ConvertTo-Json command.
    • Sends the operations to Dataverse

A Power Platform professional might say sending several operations one-by-one will not be ideal in terms of performance. I do agree with that statement. Send-DataverseOperation has two parameters for just that.

Improving performance

Dataverse provides two features to improve performance over its API for when needed. The first one is parallelism. Each user can send data in parallel up to a configurable (by Microsoft) number of lanes. By default the number is 20, but you can ask Microsoft to increase it for when needed (e.g. data migration). The other one is called batching. In order to use it, you’ll need to put several operations in a single HTTP request with a special syntax and structure and send it to the API, where it will be taken apart and executed one-by-one. All the operations will run in a single transaction however. In other words they succeed or fail (and rolled back) together. To use these two features, Send-DataverseOperation provides two parameters.

  • -BatchCapacity a number between 0 (default) and 1000 that defines the maximum number of operations that can be put in a single batch.
  • -MaxDop a number between 1 (default) and 1000 that indicates how many data lanes will be used to send data in parallel.

As you see, the default values indicate that batching will be off and only one lane will be used by default. This makes experimentation easier and safer by default. For example, running the following command:

@{Uri="WhoAmI"} | Send-DataverseOperation

Will give you BusinessUnitId, UserId, and OrganizationId of the current user.

@odata.context : https://helloworld.crm4.dynamics.com/api/data/v9.2/$metadata#Microsoft.Dynamics.CRM.WhoA
                 mIResponse
BusinessUnitId : 6f202e6c-e471-ec11-8941-000d3adf0002
UserId         : 88057198-a9b1-ec11-9840-00567ab5c181
OrganizationId : e34c95a5-f34c-430c-a05e-a23437e5b9fa

What’s more

There is a lot more to PSDataverse that what has been covered above. The best way to know it is to download PSDataverse from its GitHub repository and try it. There are several features in its core like automaticall transient fault detection and retry. Automatically adopting the pace based on hints provided by Dataverse and more.

Keep in mind that PSDataverse is a free, open-source PowerShell module and it can benefit from your suggestions, bug reports and contributions. Currently yhe module is in alpha, but I have already used it in a data migration project for a major bank in Europe and I try to use it for different tasks as frequent as I can to enhance it to fit more and more use cases and I put developer productivity above all.

Power Platform Tidbits #7 – Optimize for server-side execution (folding) when adding columns

Power query can take some of the relational transformations and ask the data source to take care of them. This means that the transformations – like grouping, filtering, adding columns and more – can run at the data source, where they are more efficient. This also avoids downloading huge amounts of data and accelerate the execution. In Power BI terms, this is called folding.

When writing a Power query you need to make sure every step that can be passed to the data source is passed to the data source. If you pay attention to the Applied steps, when a green vertical line is displayed besides a step, it mean that the step is passed to the data source. For the following picture shows that the first steps, until “Kept top rows” are passed to the data source.

Please note that once you add a step that cannot be passed to the data source, any step that come after will not be passed to the data source. This means as a general rule of thumb, you should try to apply all foldable steps in the beginning when you can.

You should also try to pay attention to when you are mixing multiple functions in a step. You might notice that when you separate them, a part can fold to the data source. Sometimes it can be tricky to spot them, specially when the UI is mixing them behind the scene. Let’s take an example to make it clear.

For example when you add a new column and define its data type through the UI, Power BI will mix two functions Table.AddColumn and Table.TransformColumnTypes, even though it’s just one column and Table.AddColumn is capable of defining the column type. See for yourself.

Adding new column through UI, will mix Table.AddColumn with Table.TransformColumnTypes.

Now the issue is even though Table.AddColumn is a foldable function, the Table.TransformColumnTypes isn’t. To fix this issue, you either need to separate them is different steps or just rely on Table.AddColumn to both add the column and define the type. In the above example we must change:

Table.TransformColumnTypes(Table.AddColumn(#"Selected columns", "Custom", each 1000), {{"Custom", Int64.Type}})

To:

Table.AddColumn(#"Selected columns", "Custom", each 1000, Int64.Type)

And voila:

|Adding column and simultaneously defining the type in Table.AddColumn makes the step foldable.

NovoDocx – An open source service that generates documents using standard Word templates

Lately, as part of solution I have been building for a client, we had to deal with several document generation flows. I’m talking about really complicated document templates. For example there is one that is composed of several tables and sections that compare different credits / mortgages that a client can take and their financial impact, benefits and downsides and more. This is nothing like the simple mortgage documents that your local bank shares with you. In fact, I am happy for their clients that will receive such well-thought documents, but as for the poor developers, this was not an easy task.

There are two out-of-the-box options when you need to populate Word documents.

  • Word templates feature that existed for ages in Dynamics CRM and now inherited by Power Platform. This is normally the easiest option, but it has many limitation and only works in harmony with rows in a table and its direct related tables.
  • Word Online Connector which is a premium connectors and more capable without dependency on any table. The main issue with this one is when you have flows in your solution that will be deployed in different environments or any time the location of your document needs to change. In this case instead of the name of these placeholders you would need to use random numbers because this connector cannot figure out the schema of the document or lock it after your build one.

For the second options there are some workaround, that I will be writing about soon in future, but what if the logic is so complex that using Flows or a Power App does not make sense or not enough.

In search for an open source library

At this point I started looking for an open source .NET library (preferably) to let us get the job done. It should exist in this day and age, right? The answer is no, it doesn’t. There are only some paid libraries and services that are quite expensive for what they do and working with them is not as easy as one might think. After digging deeper into GitHub, I was able to find some libraries that either rely on Open Office or a headless browser or the brain of the library is in a closed-source black box. I have to say honestly the fact that there is no open-source library and not in .NET was annoying enough, so I decided to start one and I decided to start simple and make it as easy as possible for the developer to use with no knowledge of the underlying format (😎 ehem which I know pretty well, but not to Eric White level well ☺).

Novo Docx

NovoDocx’s first readme file push in Git

Today I share with you my first iteration of Novo Docx, my take on a simple-to-use and simple-to-host library and service that you can use in you own projects. The current features are enough to handle a complex documents and quite fast. I have tried to capture many edge cases too. I will be focusing on enriching the functionality and providing more hosting options. The source code is hosted in GitHub. It includes a simple library and an Azure Functions App that you can use however you like.