Clean data. When you’ve got it, you’ve got it. When you don’t, you and everyone else that views your spreadsheet will know. Misspelled words, improper cases and non-printing characters are just a few of your concerns. These and other errors making a guest appearance in your spreadsheet can take your otherwise pristine work and instill uneasy feelings in viewers. Fear not, ExcelTraining is here to go over a few of the common ways to clean your data.
You may not always be able to control where your data comes from or the format in which it is presented. Before jumping into an analysis it is important to clean your data and put it in the proper format that best serves your needs. Sometimes this may be as easy as using the Spell Checker or the Remove Duplicates feature. The following are the basic steps you should take when you first receive new data for an analysis.
1) Import the data and keep a back up of the original in a separate workbook. If you need to go back to square one you will be thankful you had a backup of the original file.
2) Check that the data is in a tabular format with rows and columns. All columns and rows should be visible (learn how to hide and unhide rows), with no blank rows in the range, and similar data in each column.
3) Run the Spell Checker and Find & Replace first before manipulating any columns.
4) Now is the time to manipulate the columns. First, insert a new column to the right of the column you wish to manipulate. In the new column, add a formula, such as TRIM, that will transform the data per your needs. Fill the formula down the entire new column. Copy and paste the new column as values and remove the original column.
To expedite the process you can record a macro or utilize a third-party add-in designed specifically to meet your needs. If you find that you are performing the same cleansing steps over and over, this may be an option to consider.
The Spell Checker may be one of my favorite features of Microsoft Office products. In the blink of an eye the checker will identify misspelled words and suggest a replacement. You can also add words to a custom dictionary. For example, if your company is called “I Luv Excel” the Spell Checker will want to convert “Luv” to “Love”. Adding “Luv” as a custom word will prevent it from being identified as incorrect in the future.
Keyboard shortcut (Windows): ALT + R + S
The Spell Checker can also be accessed by going to Review in the navigation bar.
A good practice when you receive new data is to screen for unique values and then remove duplicates.
To filter for unique values, select the data you wish to review, navigate to Data > Sort & Filter > Advanced and check the Unique records only box.
To remove duplicate values, select the data you wish to analyze, navigate to Data > Data Tools > Remove Duplicates.
If you receive a file that contains common leading strings, such as “www”, you can remove or replace these using Find & Replace.
To accomplish this goal you have a few options:
1) Access the Find & Replace dialog box by pressing Ctrl + F on your Windows keyboard. Enter the text you want to find and what it should be replaced with. If you simply want to remove the text, leave the Replace with field blank.
Text can come in all shapes and sizes. No one wants to read text LiKE tHiS. Thankfully, a few quick functions are all that is needed to fix these text case issues.
LOWER: Converts all uppercase letters to lowercase (String becomes string)
UPPER: Converts all lowercase letters to uppercase (Computer becomes COMPUTER)
PROPER: Converts the first letter in a string to uppercase and other letters to lower case (MOUse becomes Mouse)
Text values may contain leading or trailing spaces or non-printing characters. These extra spaces and characters can cause trouble when filtering, searching, and manipulating your data. The bad news is, it can be tough to see these issues. The good news is, Excel has a few functions that are quick and easy to use to remedy the problem.
CLEAN: Removes non-printing characters. For the advanced Excel geek, these are the first 32 non-printing characters in the 7-bit ASCII code.
TRIM: Removes the space character, both leading and trailing, from text. This is value 32 in the 7-bit ASCII code if you are curious.
When working with numbers you may encounter two common issues: the number was imported as text and the negative sign doesn’t meet your organization’s standards.
Use the VALUE function to convert numbers provided as text to a number. Simply enter the function in the column next to your text numbers and you are good to go.
To manage the signs, use one of the various number formatting commands. DOLLAR, CURRENCY, TEXT, and FIXED are just a few of the options you can consider.
Excel offers numerous ways to present dates and times. Short date, long date, MM/DD/YY, YYYY/MM/DD and so on. A few handy functions to help you manipulate date and time values are as follows:
DATE: Returns the serial number representing a particular date.
TIME: Returns the decimal number representing a particular time.
DATEVALUE: Converts a text date to a serial number.
TIMEVALUE: Returns the decimal number of a given time that is represented by a text string.
Just because your data was imported a certain way doesn’t mean it is the format you want. If you find that your rows should be columns, or vice versa, then you will be happy to know the TRANSPOSE function is available.
TRANSPOSE is extremely useful in that it will convert your rows to columns or your columns to rows. To do this quickly using your keyboard, copy the data you want to transpose, go to the first cell where you want the data to appear and enter CTRL + V + CTRL + T. You can also open the Paste Special dialog box by entering ALT + E + S.
A final method of cleaning your data that we will discuss today is merging and splitting columns. Suppose you receive a file filled with names and the entire name is contained in cell A1. You may want to separate the name into first and last. Manually doing this would not be the best use of your time.
To split data from one cell into multiple columns you have a few options:
1) Use the Convert Text to Columns Wizard. For more details, click the link to read our tutorial on the Wizard.
2) Use the LEFT, RIGHT, MID, LEN, and SEARCH functions to split data from a single column into two or more columns.
When it comes to merging data Excel has you covered as well. Taking the opposite view from the above example, you may have names in two columns and want them in one single column.
1) Use the CONCATENATE function to join two or more text strings.
2) If you need to merge cells, then the Merge Cells, Merge Across, and Merge and Center commands will be useful. Select your data and find these commands in the Alignment section of the Home toolbar.
If you made it this far you are now well-versed in the most common ways to clean your data. However, it cannot be stressed enough that you should always consider GIGO (garbage in, garbage out). You can clean your data all day long. Bad data is still bad data and will produce bad and misleading results. Consider the source of your data and how reliable it is or is not.