The data cleaning problem that I’ll write about today, is pretty common especially when you scrape the data from a website or a pdf.
You select a couple of columns by dragging your mouse across the screen hit Ctrl+C and then Ctrl+V in Excel. The data that gets pasted in Excel is quite unpleasing. All columns are converted into rows stacked one below the other, it’s not quite the look that you expected.
In this post I’ll write about how can you unstack the rows in separate columns, quite easily using Power Query.
Care to go for the Video First?
Let’s start with this Data
- The data is grouped in a stack of 4 rows
- Cecilia is the Name, New York is City, 42 is Age, (257) 563-7401 is her Phone
- After each group there is a blank row in between and then the next group starts
- Note – The blank row may or may not be there, the important thing is that every 4 rows makes one record (rows)
I would like the above data to be in this shape with 4 separate columns for Name, City, Age and Phone
- One way to shape the stacked rows in separate columns is to use the Transpose Function or feature but that’ll be too much manual work.
- The 2nd way could be using an OFFSET formula but it won’t be a dynamic solution if more data get’s added.
- The neatest way to solve this would be using Power Query.
Unstack Rows in Separate Columns
After Loading the Data in Power Query
Step 1 – Apply Remove Empty Filter to remove the null or blank values from Column1
Step 2 – Add an Index Column from the Add Columns Tab >> Index Column Drop Down >> From 0
Step 3 –
- Select the Index Column
- Add Columns >> Standard Drop Down >> Modulo
- Input the value = 4 (since there are 4 rows in one record)
- The result will be a Modulo Column which is the remainder if the Index Column is divided by 4
Step 4 – Add another Column to Subtract the Modulo from Index.
- Select the Index Column First
- Then select the Module Column
- Add Column >> Standard >> Subtract
Step 5 – Remove the Index Column.
Step 6 – Now the trick is to Pivot this Data.
- Select the Subtraction Column
- Transform Tab >> Pivot Column
- In Values Column >> Select Column1
- Advanced Options >> Don’t Aggregate
Step 7 – The data looks as we expected. Finally, I can remove the Subtraction Column and of course rename the Headers
Another part of the same problem could be if the number of columns for each record are dynamic. For eg.. Cecilia data needs to separated in 7 columns but Iris could just have 4 columns. Solving this is slightly tricky, needless to say, I’ll add that solution too in a day/two in this very post.
Some More Power Query Good(ly)ness
- Change Dates from MM-DD to DD-MM format or vice versa
- Text to Columns by Variable Columns in Power Query
- Replace error values in Multiple Columns
- Dynamic Column Selection – Really Awesome
- Dynamic Column Renaming
- Running Total in Power Query