Azure Uploader for Power Platform is here

Azure Storage with its high availability and numerous features has always been a storage of choice for many requirements. When it comes to Power Platform we should carefully weigh the options that we have for storing files. Besides the native Dataverse that is fully integrated with the platform, but comes with the highest costs, there are other alternatives like SharePoint and Blob storage in Azure Storage.

Diagram displays a horizontal line with three logos displayed on top. On the left, the line points to Best integration,
Power Platform storage options comparison

Azure Uploader is a PCF control that is built to reduce the effort of integrating with Azure’s Blob storage. It makes it easy to upload files with virtually no limit in size, directly to blob storage in a secure way.

You might be wondering why directly from the browser and what about the security. To store files in blob container you have two major options.

  • The first option is to upload the file to a proxy service and the proxy service would then upload the file to the storage behind the scene. This gives you the advantage of being able to do some processing (e.g. validate the content) before storing the file. But it comes with the extra effort required to build the service and maintain it over time. In addition to that you need to make sure this new service has the required availability (or SLA) required for your application. As the old saying goes, you will have one more layer that things can go wrong.
  • The second option is to have a very simple proxy service (that you might already have) generate a short-lived SAS token with limited access, share it with the client and the client would take care of uploading the file. This way you can rely on the availability and capacity of Azure Storage service.

You should consider that files coming from the client cannot be trusted, so you might need to put in place a process to validate them before use. In the first approach the proxy service is able to validate files during the transfer, but in the second approach you should do this validation later. For example a Power Automate flow, a Logic App, or an Azure Function App can be triggered and do this asynchronously.

Azure Uploader relies on the second approach. This means that you just need to generate a SAS token in your app and share it with the uploader. Generating a SAS token is very easy thanks to the out-of-the-box Azure Storage Blob connector.

The following Power FX code, generates a new SAS token using Azure Storage Blob connector (called AzureBlobStorage) and stores it in a variable called SasToken.

Set(
    SasResult,
    AzureBlobStorage.CreateShareLinkByPathV2(
        "mystorageaccount",
        "/mycontainer",
        {
            AccessProtocol: "HttpsOnly",
            ExpiryTime: DateAdd(Now(), Hours, 1),
            Permissions:"Write,Add,Create"
        }
    )
);
Set(
    SasToken,
    Mid(SasResult.WebUrl, Find("?", SasResult.WebUrl) + 1)
);
  • The above code, generates a code that is valid only for one hour DateAdd(Now(), Hours, 1) and only has Write,Add,Create permissions.
  • The SasResult variable that is set in the first step contains a property WebUrl which holds a URL that has the SAS token in its query string. That is why the line Mid(SasResult.WebUrl, Find("?", SasResult.WebUrl) +1) is used to extract only the part after “?” store it in SasToken variable.

Power Platform Tidbits 9: Truncate a table in Dataverse with one command

The latest version of PSDataverse (v0.0.3) brings several quality and performance improvements. But, it also has a new command that lets you delete all the rows in a table in Dataverse as fast as it can be done. Here is the easiest way to use it:

Clear-DataverseTable lead

If you need to truncate multiple tables in one go, you can just pass multiple table names separated by comma.

Clear-DataverseTable lead, opportunity

You can also pipe the list of table names to Clear-DataverseTable. This can be useful for when you are receiving the list from some other command.

@(lead, opportunity) | Clear-DataverseTable

Note that you should be already connected to Dataverse using ‘Connect-Dataverse’ before running the ‘Clear-DataverseTable`. This command is part of PSDataverse module that you can get from PowerShell Gallery. To know more, I suggest you read about it in here.

Good to know

When passing in a list of values separated by comma, PowerShell assumes that you are sending an array. The full syntax for literally defining an array is @(item1, item2,…). Knowing that, if you pass an array however way to the Clear-DataverseTable it would work. To know more about arrays in PowerShell and all the different ways to define them, check About Arrays in the official documentation.

Power Platform Tidbits 8: Download the full metadata of your Dataverse environment by one PowerShell command

You can always visit get the full metadata of a Power Platform environment (or Dynamics 365) from https://{your-environment}.{region-specific-url}/api/data/v9.2/$metadata. For example if your environment is hosted in Europe, the URL can be:

https://mytestenvironment.crm4.microsoft.com/api/data/v9.2/$metadata

But sometimes the data can be too much for your browser to digest, specially if you have a plugin that formats or beautifies XML content. In that case you can always use PowerShell or more precisely PSDataverse to download and save the metadata as an XML file. It just takes two lines in Powershell, and the first line is to connect.

Connect-Dataverse "authority=https://login.microsoftonline.com/{tenant-id}/oauth2/authorize;clientid=1950a258-227b-4e31-a9cf-717495945fc2;resource=https://mytestenvironment.crm4.dynamics.com/;device=true"
Send-DataverseOperation "`$metadata" | Select-Object -ExpandProperty Content | Set-Content ".\schema.xml"

A screenshot of Windows Terminal. The screenshot shows two commands and their result. The first command is "Connect-Dataverse" with a connection string that uses device flow to authenticate. The second command is "Send-DataverseOperation "`$metadata" | Select-Object -ExpandProperty Content | Set-Content -Path ".\schema.xml".
Both commands are executed successfully.
In my case, it took 8 seconds to download and save the file from a trial environment.

If you are not familiar with PSDataverse, just know that it takes only one line to install it. Read more in the official Github repository

How to import large CSV files in Power Platform (or Dynamics 365)

There are several ways you can import data into Power Platform (or Dynamics 365). When you are importing CSV files, you have the following out-of-the-box options, each with their cons and pros.

  • Import feature in Power Apps – Allows you to import Excel (.xlsx), comma-separated values (.csv), and XML Shpreadsheet 2003 (.xml) formats. The data can be compressed, but the maximum size is only 32 MB for the .zip files.
    • PRO: Easy-to-use UI that guids you through the process.
    • CON: Limited files size.
    • CON: No transformation / advanced mapping logic allowed.
    • CON: Very unforgiving in the format of data
    • CON: No so fast.
    • CON: Cannot be automated
  • Power Platform Data flows – can be used for both data migration and continues data integration. Data flows is a new and exciting way of importing data to Power Platform based on Power Query. It supports tons of different data formats that includes even external Web APIs, Parquet files, and even another Dataverse!
    • PRO: Based on Power Query – I consider it a pro, because you can learn Power Query in minutes and the UI is unparalleled to any tool that I have seen in my career.
    • PRO: Can handle large files (although I haven’t tested at gigabyte level yet).
    • PRO: Advanced data transformations can be applied easily on the fly
    • PRO: Support a wide range of data sources, although we are concerned about CSV here, but you never know maybe that CSV comes from another source that is supported 😉.
    • CON: Absolutely no way to manage failures in the data flow. You will have access to a report at the end that helps you track down which records failed, but if you have serveral failures it will not be practival to read the report and filter those manually!
    • CON: Performance is usually better compared to the import feature, but very unreliable. I used it in project where the scope of tables were limited and eveyy time I tested I got totally different performance.
    • CON/PRO: Can be automated, but only based on a schedule.
  • Custom cloud flow (or Power App) – Sky is the limit when it comes to app development right? but, what about the effort? What if you need to do it in different environments, are you goin to also implement CI/CD? How reusable your flow / app will be?
  • Using Azure Synapse Analytics – is as the name suggests, an analytics service on Azure. While it is not a free service like the previous options that I iterated, but provisioning it for the sake of a full data migration project when the number of tables is more than a handful can be extremely benefitial. The fact that you can just focus on the transformation logic and Synapse would take care of managing the data storage will save you so much time. It supports so many data source and targets out-of-the-box and if you already have some background with a database technology (e.g. SQL Server) the learning curve won’t be steep.
  • SQL Server’s SSIS – is a wellknown ETL tool that has been used for data integration and data migration for ages.
    • PRO: Familiar tool (if you know your way around SQL Server family of products) and an easy-to-use UI.
    • PRO: Part of SQL Server
    • PRO: Several out-of-the-box and 3rd party components. In case of Power Platform I recommend Kingswaysoft’s components which I have used several time in many projects with success. They also have a great support team for those edge cases 😉
    • PRO: Reliable tool with good performance
    • CON: Implementing complex logic and debugging can be difficult.
    • CON: In many cases you will have to rely on ScriptComponent to run C# code pieces that will be hidden in multiple components that is not reusable.
    • CON: Although you would be producing code behind the scene, the storage format makes it impossible to trace code changes or revert changes.
    • CON: Since the introduction of Synapse, I find it really hard to recommend SSIS for any but a very narrow range of use cases (e.g. you already have it and use it for another similar ETL process).

As you see, there are several options, but all of them are somehow limited by either size, or performance or fragile and limited when it comes to handling fault, apart from Azure Synapse Analytics. The latter might not be justifiable in terms of cost or learning curve compared to the scope of the project. Today I’m going to show you another, very easy way to load large amounts of data in Power Platform that can be another tool in your toolbox. This one aims to address the shortcoming of the above options that you have.

Using PSDataverse to load large CSV files

Chances are you have landed here because you have a large CSV file and you might be asking how large are we talking. The answer is I have not faced any limit yet. When done right, virtually you should not face any limit. I have personally tested with larg CSV files of several gigabytes without any issues.

I have posted about PSDataverse before in this blog and in fact previously I showed an example where I load a CSV file into Accounts table. Let’s start from there. The only prerequisite is to install PSDataverse, a PowerShell module that has no other dependecy. You can either install it once from the official PowerShell gallery using the following command:

Install-Module -Name PSDataverse

Or, you can get it from its official Gihub repository and install it manually.

Connect-Dataverse "authority=https://login.microsoftonline.com/<tenant-id>/oauth2/authorize;clientid=<client-id>;clientsecret=<client-secret>;resource=https://<environment>.crm4.dynamics.com/"
Import-Csv "Account.csv" | ForEach-Object {
    @{ Uri = "account", Method= "PATCH", Value = @{
        name = $_.AccountName,
        overridencreatedon = $_.CreationDate,
        ...}}} | ConvertTo-Json | Send-DataverseOperation

The first line uses a client-id and a client-code to authenticates to Power Platform. There are different ways you can connect to Power Platform. Using a client-id and a client-secret, using a client-id and a client certificate or simply using a client-id and an automatically generated device code. In all the cases you will need to create an application registration and then create an application user from it. This can be done in literally a minute through the UI. Read more here if you haven’t done it before.

The second line does all the magic! It first imports a CSV file, then for each record in that CSV file does a simple transformation which is a basic mapping, and then converts it to a JSON model and sends it to Power Platform using Send-DataverseOperation command.

Tuning the performance

At this point, if you run the script, it will start pushing data to Power Platform one record at a time. It waits for the response and returns it to you in a simple PowerShell object. The problem with this approach is that while it is reliable and you will get back the response, the performance won’t be great. If you are a veteran Dynamics developer, you might already know two important features: parallelism and batching.

By parallelism, I mean PowerShell allows you to push data concurrently using up to 20 streams in parallel which can virtually increase the speed by almost 20x and this limit can be raised by Microsoft during your data migration by opening a support request.

Batching allows to send multiple operations in one batch, which means you will save on the amount of data transferred over the wire that can contribute to increasing performance. The hard limit is 1000 operations, but you should try different numbers, because depending on the table, its relations and the logic around it, the number of operations you put in a batch can worsen the performance as well. My suggestion is start with 500 and fine-tune per each table to get the best performance.

Let’s rewrite our script to benefit from parallelism and batching.

Connect-Dataverse "authority=https://login.microsoftonline.com/<tenant-id>/oauth2/authorize;clientid=<client-id>;clientsecret=<client-secret>;resource=https://<environment>.crm4.dynamics.com/"
Import-Csv "Account.csv" | ForEach-Object {
    @{ Uri = "account", Method= "PATCH", Value = @{
        name = $_.AccountName,
        overridencreatedon = $_.CreationDate,
        ...}}} | ConvertTo-Json | Send-DataverseOperation -MaxDop 20 -BatchSize 500

As you see just by adding two parameters the performance will be increased by an order of magnitude. I will not give you numbers to get you hops high or low, because every environment is different, but I can guarantee that you will be impressed.

Advanced scenarios

While the example I gave before is very basic, you can extend it to add complex data transformations or even get data from other sources or even multiple sources. The Send-DataverseOperation will try its best to succeed. For example in case of transient errors (e.g. brief network outage) it will wait and retry multiple times and even loger each time. There is a lot more to it that you can learn from the Github repository.

I also suggest that you add a proper fault-handling to isolate and categorize failed records, so you can analyze how many records failed and why (e.g. name is one character too long 😄) . That way you can mitigate the problems and retry only the records that failed.

For scenarios where you have multiple tables with different transformations, I recommend a staging database. That way you can have a multi-stage migration. First you import data into the staging DB and apply some cleanup. Then you will transform and load data into Power Platform. It might sound like a lot of work, but the time you will save is not even comarable with the effort. Perhaps in another blog post, I will show how you can do it in a matter of minutes.

Introducing PSDataverse, PowerShell’s new super power

PowerShell is the solution for task automation. It has everything you need to iterate fast. A command-line shell with auto-completion, a scripting language that is object oriented and it runs on Windows, Linux, and macOS. PowerShell’s command-line shell is the best in class with tab completion, and even command prediction, aliases, in-console help (Unix man pages anyone) and my favorite feature Pipelines that lets you chaim commands together. You can even debug directly in the shell! If you are new to PowerShell start here.

Dataverse on the other hand is a managed relational database and a lot more. It hosts Dynamics 365’s data, provides rich metadata, and can even host logic and validation with a range of its own productivity tools. PowerShell can be handy to automate tasks, and orchestrate one-off solutions in Power Platform. From another perspective Dataverse can be usefull as a managed database. You don’t need a licese if the user is not a human, by the way.

This picture depicts an equation. The first figure is showing PowerShell Hero, then a plus sign, then the logo of Microsoft Dataverse, then an equal sign and the result of the equation is PowerShell hero with logo of Dataverse hovering over her palm.
PowerShell + Dataverse = PSDataverse

While working with Power Platform, often time you migh need to quicky try something to see if it works, or to just check some data real quick. Other times you might need to automate some tasks or integrate processes that are not exposed through the UI. Take data migration as an example, or simply loading a a couple of huge CSV files with some data transformation. Are you going to build a fully custom .NET solution just for that? or are you going to use a full-fledged SSIS solution with its notoriously troublesome Script components and non-Git friendly projects? you other options are learning 3rd-party one size fits all cloud-based services with their edge cases. Well, now you have a 3rd option. PSDataverse, a PowerShell module that brings Dataverse to PowerShell.

Show me an example

Let’s check how easy it is to load a CSV file called Account.CSV to Microsoft Dynamics 365.

Connect-Dataverse "authority=https://login.microsoftonline.com/<tenant-id>/oauth2/authorize;clientid=<client-id>;thumbprint=<client-cert-thumbprint>;resource=https://<environment>.crm4.dynamics.com/"
Import-Csv "Account.csv" | ForEach-Object {
    @{ Uri = "account", Method= "PATCH", Value = @{
        name = $_.AccountName,
        overridencreatedon = $_.CreationDate,
        ...}}} | ConvertTo-Json | Send-DataverseOperation
        

You know, that technically is two lines of code in PowerShell. Sure, I’m cheating here. I have piped four commands together, but that makes it even easier to read. Let’s go over what I’m doing here.

  • The first line is obviously using Connect-Dataverse command to connect to Dataverse using a client certificate.
  • The second line (which I have broken into multiple lines for clarity):
    • Imports a CSV file called Account.csv.
    • Transforms each line into a Dataverse operation using a ForEach-Object.
    • Converts operations to JSON using ConvertTo-Json command.
    • Sends the operations to Dataverse

A Power Platform professional might say sending several operations one-by-one will not be ideal in terms of performance. I do agree with that statement. Send-DataverseOperation has two parameters for just that.

Improving performance

Dataverse provides two features to improve performance over its API for when needed. The first one is parallelism. Each user can send data in parallel up to a configurable (by Microsoft) number of lanes. By default the number is 20, but you can ask Microsoft to increase it for when needed (e.g. data migration). The other one is called batching. In order to use it, you’ll need to put several operations in a single HTTP request with a special syntax and structure and send it to the API, where it will be taken apart and executed one-by-one. All the operations will run in a single transaction however. In other words they succeed or fail (and rolled back) together. To use these two features, Send-DataverseOperation provides two parameters.

  • -BatchCapacity a number between 0 (default) and 1000 that defines the maximum number of operations that can be put in a single batch.
  • -MaxDop a number between 1 (default) and 1000 that indicates how many data lanes will be used to send data in parallel.

As you see, the default values indicate that batching will be off and only one lane will be used by default. This makes experimentation easier and safer by default. For example, running the following command:

@{Uri="WhoAmI"} | Send-DataverseOperation

Will give you BusinessUnitId, UserId, and OrganizationId of the current user.

@odata.context : https://helloworld.crm4.dynamics.com/api/data/v9.2/$metadata#Microsoft.Dynamics.CRM.WhoA
                 mIResponse
BusinessUnitId : 6f202e6c-e471-ec11-8941-000d3adf0002
UserId         : 88057198-a9b1-ec11-9840-00567ab5c181
OrganizationId : e34c95a5-f34c-430c-a05e-a23437e5b9fa

What’s more

There is a lot more to PSDataverse that what has been covered above. The best way to know it is to download PSDataverse from its GitHub repository and try it. There are several features in its core like automaticall transient fault detection and retry. Automatically adopting the pace based on hints provided by Dataverse and more.

Keep in mind that PSDataverse is a free, open-source PowerShell module and it can benefit from your suggestions, bug reports and contributions. Currently yhe module is in alpha, but I have already used it in a data migration project for a major bank in Europe and I try to use it for different tasks as frequent as I can to enhance it to fit more and more use cases and I put developer productivity above all.