Power Platform Tidbits 10: How to split address into separate columns in Power Query

Sometimes address values are stored in a single text column and you’d need to split it into separate columns. For example when migrating data into Power Platform in a data flow, or when building a Power BI dataflow, or simply in Excel. In this post am going to show you how you can do this in a reliable way. I am going to make some assumptions first, but you can always fine-tune the formula for other address patterns.

Assumption

The addresses I’m going to work with look like the followings.

  • Fran├žois Vervloetstraat 100, 1080 Uccle
  • Koning Albertstraat 14, Menen

The syntax we are looking for is:

first-line, [postcode] city-name

In other words, the first line address is always present. The postcode is optional. the city name is always present. It might be that there is no address at all.

Solution

  #"Split address" = Table.SplitColumn(#"Previous step", "address", Splitter.SplitTextByDelimiter(","), {"Line 1", "Line 2"}),
  #"Split by character transition" = Table.SplitColumn(#"Split address", "Line 2", Splitter.SplitTextByCharacterTransition({"0".."9"}, (c) => not List.Contains({"0".."9"}, c)), {"Postcode", "City"}),
  #"Replaced nulls" = Table.ReplaceValue(#"Split by character transition", null, each [Postcode], Replacer.ReplaceValue, {"City"}),
  #"Replaced postcodes" = Table.ReplaceValue(#"Replaced nulls", each [Postcode], each if [Postcode] = [City] then null else [Postcode] , Replacer.ReplaceValue, {"Postcode"}),
  #"Changed types" = Table.TransformColumnTypes(#"Replaced nulls", {{"Line 1", type text}, {"Postcode", type Int16.Type}, {"City", type text}})

Let’s go over the solution, line by line.

  1. The first line assumes that there is a Previous step and an address column. It splits the address column into two columns, Line 1 and Line 2 by looking for a comma character.
  2. The Line 2 column is split into two other columns Postcode and City based on character transition from digits ({"0".."9"}) to non-digits. If the postcode was not optional, we could stop here, but we need two more steps to address this issue.
  3. This step sets the Postcode to the City whenever Postcode is null. After this step, whenever the postcode is not present in the address, both Postcode and City column will contain the city name.
  4. This step sets the Postcode to null whenever the Postcode and City columns have the same value as the result of previous step.
  5. The last step is just setting the type of columns, because after applying Table.SplitColumn transition, the new columns might not have proper types.

More about splitting by character transition

The most interesting step of the solution is the one that splits columns based on character transition. Let’s break it down. This step is a mix of two different table functions. If we look at the syntax of Table.SplitColumn function, it looks like this:

Table.SplitColumn(table as table, sourceColumn as text, splitter as function, optional columnNamesOrNumber as any, optional default as any, optional extraColumns as any) as table

The first parameter is the table, the second one is the name of the column that will split, the third one is the function that splits the value. This function will be given the value of the column and should in turn, return a list of values. It could be as simple as (value) => {"value 1", "value"}, but we use a native function that can split values for us, and that is Splitter.SplitTextByCharacterTransition. This function has the following syntax:

Splitter.SplitTextByCharacterTransition(before as anynonnull, after as anynonnull) as function

This function splits text into a list of texts according to a transition from one kind of character to another. The before and after parameters can either be a list of characters, or a function that takes a character and returns true/false. As before we are passing {"0".."9"}, but for after we are passing a function to detect any characters other than {"0".."9"} this makes the detection more inclusive. In other words (c) => not List.Contains({"0".."9"}, c).

Read more