Sometimes address values are stored in a single text column and you’d need to split it into separate columns. For example when migrating data into Power Platform in a data flow, or when building a Power BI dataflow, or simply in Excel. In this post am going to show you how you can do this in a reliable way. I am going to make some assumptions first, but you can always fine-tune the formula for other address patterns.
Assumption
The addresses I’m going to work with look like the followings.
- François Vervloetstraat 100, 1080 Uccle
- Koning Albertstraat 14, Menen
The syntax we are looking for is:
first-line, [postcode] city-name
In other words, the first line address is always present. The postcode is optional. the city name is always present. It might be that there is no address at all.
Solution
#"Split address" = Table.SplitColumn(#"Previous step", "address", Splitter.SplitTextByDelimiter(","), {"Line 1", "Line 2"}),
#"Split by character transition" = Table.SplitColumn(#"Split address", "Line 2", Splitter.SplitTextByCharacterTransition({"0".."9"}, (c) => not List.Contains({"0".."9"}, c)), {"Postcode", "City"}),
#"Replaced nulls" = Table.ReplaceValue(#"Split by character transition", null, each [Postcode], Replacer.ReplaceValue, {"City"}),
#"Replaced postcodes" = Table.ReplaceValue(#"Replaced nulls", each [Postcode], each if [Postcode] = [City] then null else [Postcode] , Replacer.ReplaceValue, {"Postcode"}),
#"Changed types" = Table.TransformColumnTypes(#"Replaced nulls", {{"Line 1", type text}, {"Postcode", type Int16.Type}, {"City", type text}})
Let’s go over the solution, line by line.
- The first line assumes that there is a
Previous step
and anaddress
column. It splits theaddress
column into two columns,Line 1
andLine 2
by looking for a comma character. - The
Line 2
column is split into two other columnsPostcode
andCity
based on character transition from digits ({"0".."9"}
) to non-digits. If the postcode was not optional, we could stop here, but we need two more steps to address this issue. - This step sets the
Postcode
to theCity
wheneverPostcode
is null. After this step, whenever the postcode is not present in the address, bothPostcode
andCity
column will contain the city name. - This step sets the
Postcode
to null whenever thePostcode
andCity
columns have the same value as the result of previous step. - The last step is just setting the type of columns, because after applying
Table.SplitColumn
transition, the new columns might not have proper types.
More about splitting by character transition
The most interesting step of the solution is the one that splits columns based on character transition. Let’s break it down. This step is a mix of two different table functions. If we look at the syntax of Table.SplitColumn
function, it looks like this:
Table.SplitColumn(table as table, sourceColumn as text, splitter as function, optional columnNamesOrNumber as any, optional default as any, optional extraColumns as any) as table
The first parameter is the table, the second one is the name of the column that will split, the third one is the function that splits the value. This function will be given the value of the column and should in turn, return a list of values. It could be as simple as (value) => {"value 1", "value"}
, but we use a native function that can split values for us, and that is Splitter.SplitTextByCharacterTransition
. This function has the following syntax:
Splitter.SplitTextByCharacterTransition(before as anynonnull, after as anynonnull) as function
This function splits text into a list of texts according to a transition from one kind of character to another. The before
and after
parameters can either be a list of characters, or a function that takes a character and returns true/false. As before
we are passing {"0".."9"}
, but for after
we are passing a function to detect any characters other than {"0".."9"}
this makes the detection more inclusive. In other words (c) => not List.Contains({"0".."9"}, c)
.
Leave a Reply