A while ago I met the Head of Human Resources for a Financial Services Company. And you can very well guess that the nature of the meeting sure had to do with solving a peculiar spreadsheet problem.
Here is (a part of the problem) what her team was grappling with
Assume this tiny Training Data
- The first 2 columns are employee details
- The rest of the columns are the names of the training programs conducted
- The numbers represent the number of times the person has undergone that training
- If the cell is blank that means that he has not undergone that training
This sounds simple right ?
But this is a terrible way of storing data. If you are unsure WHY?? Let me throw one questions at you
Question – How many unique training programs did each employee take ?
I think I got you now, but in case you aren’t convinced please note that the real time data will not be as tiny as the snapshot. You’ll have hundreds of employees and dozens of training.
That makes it hard to do it manually
Here is the Output expected..
- Every instance (marked as 1 or 2) should be transformed in a single row
- If the employee has attended two training his record will appear twice
- If the employee has attended no training, there will not be any record
I found a way to solve this using PowerQuery
Quick facts
- This solution is dynamic i.e. it’ll work fine if you add more employees (rows) or more training programs (columns)
- In case you want to read more about Table.Repeat Function
How would you have solved this problem ?
I have found many companies capturing similar data incorrect way. Please post your solution (powerquery or otherwise) in the comments. I would love to see multiple ways of solving this