There are several ways you can import data into Power Platform (or Dynamics 365). When you are importing CSV files, you have the following out-of-the-box options, each with their cons and pros.
- Import feature in Power Apps – Allows you to import Excel (.xlsx), comma-separated values (.csv), and XML Shpreadsheet 2003 (.xml) formats. The data can be compressed, but the maximum size is only 32 MB for the .zip files.
- PRO: Easy-to-use UI that guids you through the process.
- CON: Limited files size.
- CON: No transformation / advanced mapping logic allowed.
- CON: Very unforgiving in the format of data
- CON: No so fast.
- CON: Cannot be automated
- Power Platform Data flows – can be used for both data migration and continues data integration. Data flows is a new and exciting way of importing data to Power Platform based on Power Query. It supports tons of different data formats that includes even external Web APIs, Parquet files, and even another Dataverse!
- PRO: Based on Power Query – I consider it a pro, because you can learn Power Query in minutes and the UI is unparalleled to any tool that I have seen in my career.
- PRO: Can handle large files (although I haven’t tested at gigabyte level yet).
- PRO: Advanced data transformations can be applied easily on the fly
- PRO: Support a wide range of data sources, although we are concerned about CSV here, but you never know maybe that CSV comes from another source that is supported ๐.
- CON: Absolutely no way to manage failures in the data flow. You will have access to a report at the end that helps you track down which records failed, but if you have serveral failures it will not be practival to read the report and filter those manually!
- CON: Performance is usually better compared to the import feature, but very unreliable. I used it in project where the scope of tables were limited and eveyy time I tested I got totally different performance.
- CON/PRO: Can be automated, but only based on a schedule.
- Custom cloud flow (or Power App) – Sky is the limit when it comes to app development right? but, what about the effort? What if you need to do it in different environments, are you goin to also implement CI/CD? How reusable your flow / app will be?
- PRO: Possibility to implement logic using cloud flow designer that can be familiar.
- PRO: Several options to automate the processes based on different events (e.g. uploading a file to a document library)
- PRO/CON: Flow / app can benefit from CI/CD when you need to support multiple environments, but it can be a burden as well.
- PRO/CON: Flow / app can be implemented in a reuable way, but the effort will be a lot more in this case!
- CON: Limited file size support. Specially if you rely on Excel to benefit from Office Script
- Example:
- Using Azure Synapse Analytics – is as the name suggests, an analytics service on Azure. While it is not a free service like the previous options that I iterated, but provisioning it for the sake of a full data migration project when the number of tables is more than a handful can be extremely benefitial. The fact that you can just focus on the transformation logic and Synapse would take care of managing the data storage will save you so much time. It supports so many data source and targets out-of-the-box and if you already have some background with a database technology (e.g. SQL Server) the learning curve won’t be steep.
- SQL Server’s SSIS – is a wellknown ETL tool that has been used for data integration and data migration for ages.
- PRO: Familiar tool (if you know your way around SQL Server family of products) and an easy-to-use UI.
- PRO: Part of SQL Server
- PRO: Several out-of-the-box and 3rd party components. In case of Power Platform I recommend Kingswaysoft’s components which I have used several time in many projects with success. They also have a great support team for those edge cases ๐
- PRO: Reliable tool with good performance
- CON: Implementing complex logic and debugging can be difficult.
- CON: In many cases you will have to rely on ScriptComponent to run C# code pieces that will be hidden in multiple components that is not reusable.
- CON: Although you would be producing code behind the scene, the storage format makes it impossible to trace code changes or revert changes.
- CON: Since the introduction of Synapse, I find it really hard to recommend SSIS for any but a very narrow range of use cases (e.g. you already have it and use it for another similar ETL process).
As you see, there are several options, but all of them are somehow limited by either size, or performance or fragile and limited when it comes to handling fault, apart from Azure Synapse Analytics. The latter might not be justifiable in terms of cost or learning curve compared to the scope of the project. Today I’m going to show you another, very easy way to load large amounts of data in Power Platform that can be another tool in your toolbox. This one aims to address the shortcoming of the above options that you have.
Using PSDataverse to load large CSV files
Chances are you have landed here because you have a large CSV file and you might be asking how large are we talking. The answer is I have not faced any limit yet. When done right, virtually you should not face any limit. I have personally tested with larg CSV files of several gigabytes without any issues.
I have posted about PSDataverse before in this blog and in fact previously I showed an example where I load a CSV file into Accounts table. Let’s start from there. The only prerequisite is to install PSDataverse, a PowerShell module that has no other dependecy. You can either install it once from the official PowerShell gallery using the following command:
Install-Module -Name PSDataverse
Or, you can get it from its official Gihub repository and install it manually.
Connect-Dataverse "authority=https://login.microsoftonline.com/<tenant-id>/oauth2/authorize;clientid=<client-id>;clientsecret=<client-secret>;resource=https://<environment>.crm4.dynamics.com/"
Import-Csv "Account.csv" | ForEach-Object {
@{ Uri = "account", Method= "PATCH", Value = @{
name = $_.AccountName,
overridencreatedon = $_.CreationDate,
...}}} | ConvertTo-Json | Send-DataverseOperation
The first line uses a client-id and a client-code to authenticates to Power Platform. There are different ways you can connect to Power Platform. Using a client-id and a client-secret, using a client-id and a client certificate or simply using a client-id and an automatically generated device code. In all the cases you will need to create an application registration and then create an application user from it. This can be done in literally a minute through the UI. Read more here if you haven’t done it before.
The second line does all the magic! It first imports a CSV file, then for each record in that CSV file does a simple transformation which is a basic mapping, and then converts it to a JSON model and sends it to Power Platform using Send-DataverseOperation
command.
Tuning the performance
At this point, if you run the script, it will start pushing data to Power Platform one record at a time. It waits for the response and returns it to you in a simple PowerShell object. The problem with this approach is that while it is reliable and you will get back the response, the performance won’t be great. If you are a veteran Dynamics developer, you might already know two important features: parallelism and batching.
By parallelism, I mean PowerShell allows you to push data concurrently using up to 20 streams in parallel which can virtually increase the speed by almost 20x and this limit can be raised by Microsoft during your data migration by opening a support request.
Batching allows to send multiple operations in one batch, which means you will save on the amount of data transferred over the wire that can contribute to increasing performance. The hard limit is 1000 operations, but you should try different numbers, because depending on the table, its relations and the logic around it, the number of operations you put in a batch can worsen the performance as well. My suggestion is start with 500 and fine-tune per each table to get the best performance.
Let’s rewrite our script to benefit from parallelism and batching.
Connect-Dataverse "authority=https://login.microsoftonline.com/<tenant-id>/oauth2/authorize;clientid=<client-id>;clientsecret=<client-secret>;resource=https://<environment>.crm4.dynamics.com/"
Import-Csv "Account.csv" | ForEach-Object {
@{ Uri = "account", Method= "PATCH", Value = @{
name = $_.AccountName,
overridencreatedon = $_.CreationDate,
...}}} | ConvertTo-Json | Send-DataverseOperation -MaxDop 20 -BatchSize 500
As you see just by adding two parameters the performance will be increased by an order of magnitude. I will not give you numbers to get you hops high or low, because every environment is different, but I can guarantee that you will be impressed.
Advanced scenarios
While the example I gave before is very basic, you can extend it to add complex data transformations or even get data from other sources or even multiple sources. The Send-DataverseOperation
will try its best to succeed. For example in case of transient errors (e.g. brief network outage) it will wait and retry multiple times and even loger each time. There is a lot more to it that you can learn from the Github repository.
I also suggest that you add a proper fault-handling to isolate and categorize failed records, so you can analyze how many records failed and why (e.g. name
is one character too long ๐) . That way you can mitigate the problems and retry only the records that failed.
For scenarios where you have multiple tables with different transformations, I recommend a staging database. That way you can have a multi-stage migration. First you import data into the staging DB and apply some cleanup. Then you will transform and load data into Power Platform. It might sound like a lot of work, but the time you will save is not even comarable with the effort. Perhaps in another blog post, I will show how you can do it in a matter of minutes.
Leave a Reply