A very common data cleaning task is to separate numbers from text in 2 different columns. In this post I am going to share with you 3 cases
- Separate numbers from text when the number is at the end of the text
- Separate numbers from text when the number is at the start of the text
- Separate numbers from text when the number is anywhere in the middle of the text (and can also appear multiple times)
And I am going to solve these by using native Excel Formulas and Power Query
Case 1 – Separate numbers from text when the number is at the end of text
Consider this mashup of text and numbers (first column) and from this we need to separate the text and numbers
Formula to extract Text
=LEFT(C5,MIN(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},C5),''))-1)
Formulas to extract Numbers
=SUBSTITUTE(C5,D5,'')
- Under the Text column is the formula to extract text. Note that this is an array formula and should be confirmed using CTRL + SHIFT + ENTER
- Once you get the text it is pretty straight forward to get the number. All I do is use a simple the SUBSTITUTE function
Case 2 – Separate numbers from text when the number is at the start of text
Okay, again a similar text-number mashup and this one is a slightly tricky. Let’s parse the numbers and text
Formula to extract Numbers (Confirm using Ctrl + Shift + Enter)
=LEFT(C13,MATCH('*~',IFERROR(VALUE(MID(C13,ROW(INDIRECT('1:'&LEN(C13))),1)),'*~'),)-1)
Formula to extract Text
=SUBSTITUTE(C13,E13,'')
- Yeah.. you don’t need to say it, it looks pretty daunting but this nasty buoy works! I am sure this can optimized further but I din’t spend time doing it 🙁
- Although to cool you down the formula for parsing the number is the same old SUBSTITUTE function as above
Case 3 – Separate numbers from text when the number is anywhere in the middle of text
Another tricky situation could be where numbers continuously (or intermittently) appear in the middle of the text string
Formula to extract Text
=SUBSTITUTE(TEXTJOIN(,,IF(NOT(IFERROR(VALUE(MID(C21,ROW(INDIRECT('1:'&LEN(C21))),1)),FALSE)),MID(C21,ROW(INDIRECT('1:'&LEN(C21))),1),'')),0,'')
Formula to extract Numbers
=TEXTJOIN(,,IFERROR(IF(VALUE(MID(SUBSTITUTE(C21,0,1),ROW(INDIRECT('1:'&LEN(C21))),1)),MID(C21,ROW(INDIRECT('1:'&LEN(C21))),1),''),''))
- I tackled extracting the text using an array formula again
- But since the numbers have the possibility of appearing intermittently in between so an array formula again for separating the text
Case 4 – Separating Text from Numbers using Power Query
Thankfully power query has a built in option to parse numbers from text or vice versa
Scenario 1 – Numbers in the End of the Text
- Just right click on the column header
- In the split menu
- Choose Non-Digit to Digit (i.e. separate at the first digit found)
- And you are done!
Similarly Scenario 2 – Numbers at the Start of the Text
- Just right click on the column header
- In the split menu
- Choose Digit to Non-Digit (i.e. separate at the first text found)
- And you are done!
Feel the need for a video ?
More Excel / Power Pivot / Power Query puzzles..
- Separate Managers and Employee Names
- Unique Dates Challenge – it’ s one of favourite
- Allot the correct shift problem
- Calculate total hours challenge
- Shit timing problem and solution