Intro by Chandeep
Today we have a guest in the house. Please welcome, Puneet Gogia from ExcelChamps and he is going to teach us, 5 methods to remove duplicates in Excel.
Enters Puneet –
While working with large data sets, duplicate values are always a problem
No matter what, you will always get some duplicate values in your data. But the good news is, in Excel, you can find them, highlight them, and delete them in no time. Today, I will show you 5 effortless methods to figure out duplicates value in Excel.
So let’s get started.
Table of Contents
- Remove Duplicate Option
- Using COUNTIF Function
- Highlight With Conditional Formatting
- With a Pivot Table
- Using Power Query
Here I have a billing data with some duplicate invoices in it. You can download it from here to follow along.
1. Remove Duplicates Option
It’s a default option in excel to remove duplicate values. With this, you can remove duplicate values in a flash. Here are the steps.
- Select the column from where you want to remove duplicate values (Here I’m selecting invoice column).
- After that, go to Data tab -> Data tools -> Remove Duplicates.
- Once you click on “Remove Duplicate”, you will get a pop-up option.
- From this option, you can select:
- Expand the Selection:
- Continue with the current selection:
- Select “Expand this Selection”, click “Remove Duplicates”.
- Congratulations, now there are no duplicate values in your data.
Different Between “Expand the Selection” & “Continue with Current Selection”
- Expand the Selection: This will consider entire row while checking for duplicate values. If there is a duplicate value in a cell but rest of the cells have unique values then that will not consider as a duplicate entry.
- Continue with Current Selection: It will just remove duplicates from selected column without considering values from other columns. But in real, this is not a good idea to use this option.
Pros:
- This method is super quick. You just need a few clicks.
- It will automatically delete duplicate values.
Cons:
- You can’t review duplicate values from your data
- Data should be sort before you remove duplicates
2. Using COUNTIF Function
An advance method to remove duplicate values in excel. And, my favorite one. Just follow these simple steps.
- Enter the above formula in your data table.
- Now, drag this formula up to the last row of the data.
- Filter the column and select values which are more than 1.
- All the values which you have selected are in duplicate.
- You can review them and then delete them.
Pros:
- It’s a one time set up, you don’t have to apply it again and again.
- It’s a dynamic method, it will get update with data.
Cons:
- Applying formulas may be confusing sometimes.
- You can check duplicate values from a single column at a time. You have to extend the formula to check other columns.
3. Highlight With Conditional Formatting
You can highlight duplicate values with a color. Yes, conditional formatting can help you to highlight duplicate values and after that, you can review them, and delete them.
Steps are below.
- Select the entire your data from which you want to highlight duplicate values.
- Go to Home Tab -> Styles -> Conditional Formatting -> Highlight Cells Rules -> Duplicate Values.
- Here, you’ll get a pop-up option to select a color to highlight values.
- Select the color, you want to use. Click OK.
This will highlight all the duplicate values from the data you have selected.
Now, you can review them and delete them. You can also highlight unique values instead of duplicate values.
Pros:
- It’s quick and easy.
- You can review duplicate values before deleting them.
Cons:
- This is not dynamic, you have to update conditional formatting when you update your data.
- You have to delete duplicate values manually.
4. Using a Pivot Table
Maybe it looks silly to you but you can also use a pivot table to figure out duplicate values from your data. Let’s me tell you how you can do this. Here I am using the same invoice data which I have used in above methods.
Follow these steps.
- First of all, you have to insert a pivot table.
- And after that, using pivot table fields dialog box, add invoices in “Rows” field.
- Again, select invoices and add it into “Values” field.
- Now, change value setting to “count”.
- In the end, you have a pivot table like below.
In this pivot, all the invoices where the count is more than 1 are in duplicate in your data.
Now, you can review duplicate values from your raw data and delete them accordingly.
Pros:
- One time set up.
- A pivot table is dynamic if you are using auto update.
Cons:
- You can’t directly delete duplicate values.
- You have to delete values manually.
5. Using Power Query
Power Query is a revolution for data cleaning. You can easily use it to remove duplicate values from your data. Make sure to apply table to your raw data. And, follow these steps to remove duplicates with power query.
- Select any of the cells from your data.
- Go to -> Data Tab -> Get & Transform -> From table.
- Once you click on the button, it will load your data into power query editor.
- Now, right-click on the column header from which you want to check duplicate values.
- Select “Remove Duplicate” (It will instantly remove all the duplicate values from your data.)
- And after that, click on “Close & Load” (It will paste the entire data into a new worksheet without duplicate values).
Pros:
- It’s fully dynamic. Whenever you update your data, you just have to refresh your query and it will automatically remove all the duplicates.
- It will not make any changes in your main data, in fact, you’ll get new transformed data which will be connected with your original data.
Cons:
- It’s a long process.
- Not possible if you don’t have power query in your excel version.
Conclusion
Well, as long as you work with data, you can’t ignore duplicate values. It will be there for always. But the important thing is you can highlight them, review and delete them by using a right method.
And, I hope all the methods which I have mentioned above will help you in your work.
Now, you have to help me, go to the comment section and tell me if you have any other methods to handle duplicate values in excel.
|
Closing : By Chandeep
Thanks Puneet for taking out time for sharing these cool tricks with us. Sidenote: I learnt how to remove duplicates using Power Query.
Do let Puneet know if you have any questions in the comments section, he’ll be around to reply to your comments. Also download his productivity guide. It is kick-ass!