Advertisement
  1. Business
  2. Communication
Business

How To Clean Up Data in a CSV File

by
Length:LongLanguages:
This post is part of a series called Working With CSV Data Files.
How To Clean Up Keyword Research Data in a CSV File

Regardless of the size of your business, there are growing expectations that you will be data proficient. While big data gets hyped at an enterprise level, the reality for freelancers and micro businesses owners is that you will need to get your hands dirty playing with data at some stage of your client work.

Whether it be from transferring keywords into a spreadsheet, scheduling tweets, syncing data between business apps, managing contact lists, or importing data into an infographic, you will need to know how to clean data from a spreadsheet and organize it into a format that any online app or software can read.

The good news is that being able to manage data and move it between apps is a marketable skill in itself and with a few tricks up your sleeve, you can also save a lot of time: time that you could be spending on more profitable work.

Tutorial Assets

To complete this tutorial, you will need:

  • an internet connection
  • an email account
  • a Google account (free)
  • Microsoft Excel (optional)
  • data that you want to clean up (see below for more examples)

What is a CSV File?

A CSV file is a “Comma Separated Value” file. This sort of file takes all your data and puts the headers of each column in the first row, then lists all your data in subsequent rows, with all non-numerical data enclosed in inverted commas (“like this”), followed by a comma.

A CSV file can be read in many types of apps and software programs. At its most basic it is a text file. It takes all your data and parses it (i.e. breaks it down into its units of data), so that you can then feed the data back into another app.

For example, here’s the data from Google’s Keyword tool for the keyword “CSV file”:

And here’s a sample of how it looks in a CSV format:

Why Clean Up Your Data?

Increasingly, clients are expecting you to be able to manage data when you work with them. For example, a freelancer working on writing SEO content for a client may be expected to show a list of which long-tail keywords are being used in each blog article scheduled for production, and to have these keywords added to a spreadsheet or content calendar. As a freelance content producer, you will be expected to produce compelling visuals to go with your articles. You can add simple infographics to your content by cleaning up a CSV file of your research statistics and then import this into a diagram or infographic tool.

As big data and opendata continue to become more commonplace, you can quickly increase the value of your business service offerings and reduce your workload by knowing how to move data around.

If you run a micro business that provides social media services, you may be expected to show your client a week’s worth of tweets in advance, and then be able to schedule these to be tweeted at regular intervals throughout the week. Or you may be able to develop a business service where you help a client keep their contact data in sync between their email, marketing campaigns and customer relationship management (CRM) software.

Maybe your business has grown big enough that you want to start using an invoicing and accounting app rather than a spreadsheet. You can clean up a CSV file of all your old financial data so it is imported easily into your invoicing app without having a heap of data re-entry to do.

As big data and opendata continue to become more commonplace, you can quickly increase the value of your business service offerings and reduce your workload by knowing how to move data around. Being data proficient is a skill in great demand and can help you carve out a unique value proposition. Cleaning up CSV files is central to making the most of all of these opportunities.

1. Creating a CSV File From a Spreadsheet

Step 1: Open Your Spreadsheet File

A CSV file is simply a spreadsheet file saved in a text format so it can be moved to another software program or cloud-based app or tool. So first, you will need to open it in your spreadsheet software.

If you are using Excel, open the file of data you want to move around (File > Open…). If you do not have Excel, you can signup to Google Drive with your Google account details. Google Drive offers an online cloud-based toolset which includes a spreadsheet tool. After you have signed up, open your spreadsheet in Google Drive.

The same spreadsheet in Excel (left) and Google Drive (right).

Step 2: Check Your File Basics

In Excel, make sure you only have one worksheet in your spreadsheet. Worksheets are shown at the bottom of Excel. They are normally named Sheet1, Sheet2, etc. Make sure there is only one sheet. If there is more than one sheet in the Workbook, click on the tab for each additional sheet and make sure there is no data on these pages. Then right-click on the tabs for Sheet 2 and so on and delete each additional sheet.

Step 3: Save as a CSV File

In Excel, now select File > Save As…. Choose a name for your file and select Comma Separated Values under the Format drop-down menu.

In Google Drive, select File > Download as…. Select Comma Separated Values from the second menu box and save to your hard drive.

2. Creating a CSV File From Data in an Online App or Webtool

When data is stored in an online app (such as FreshBooks, the invoicing and book-keeping app for small businesses) or website tool (such as the Google AdWords Keyword tool), it is necessary to use the app or tool’s import functions to save the data in a CSV file format.

Step 1: In Your Online App, Look for a Data Export Option

If you are wanting to create a CSV file using data in an online app, look for a menu option that lets you export the data from the app.

In FreshBooks, for example, this is located under the My Account menu. After you click on My Account a second level menu appears, where you can select Import & Export.

Now select Comma Separated File (CSV) next to Export Clients, Invoices, Staff and Timesheets.

Other cloud-based apps may arrange their options differently. If there is not an option under “Account”, look for a “Settings” menu, or a “Data” menu option. You should be able to locate a similar menu option that allows you to save the data stored online into a CSV file.

Step 2: In a Web-Based Tool, Look For a “Save as CSV” Menu Option

For a webtool, look for a “save” menu option. For example, in the Google Adwords Keyword tool, there is a Save All button above the list of keyword ideas. Click on Save All to save a copy of all keywords from your search.

Now click the Download button. Select All search results. A dialog box will pop up, offering you the choice to save in CSV format. Click Download to confirm. Depending on the size of the file, it may take a little time for this to be prepared.

Finally, you will be offered a dialog box to indicate where you would like to save the file.

Other online tools work in a similar fashion.


3. Cleaning a CSV File

Now that you have a CSV file, you will need to clean it up and make sure it is readable to wherever you want to send it. In some cases, you may just be wanting to use it in a spreadsheet.

In other cases, you may need to clean up the CSV file so that you can then import it into another cloud-based app or online tool.

Step 1. Excel: Import Your CSV file into a Spreadsheet

In Excel, create a new workbook. Then select the Import option from the File menu.

A dialog box will pop up. Make sure CSV file is selected and click the Import button.

You can now select your CSV file from where you have saved it.

Step 2. Excel: Confirm Your Data is ‘Delimited’

In Excel, you will now be asked to confirm that your data is delimited. That is, your CSV file is filled with data separated only by a comma. Excel should be able to automatically tell that this is correct. Now click the Next button.

Step 3. Excel: Confirm Your Data Columns

In Excel, you will now be shown a preview of how your data will be sorted into columns, based on the fact that your data is separated by commas, and text items may be enclosed in “inverted commas”. Check the data preview, and select or deselect delimiter options and text qualifier options until the preview reflects how you expect the data to look. In most of these cases, this will be chosen for you automatically. For example, Excel will know that all text data has a text qualifier of inverted commas, so it will let you know it plans to remove the inverted commas when it imports the data. When you are happy with how the data will be imported, click the Next button.

Step 4. Excel: Confirm Data Types

In Excel, you will now be asked to select what type of data is in each row. Select General for all columns now (we will customize each column of data after it is imported). Double-check the advanced setting to ensure your data reflects standard conventions for numbers in their thousands and for decimal places.

For example, if you are using a US data source, a comma is generally used to separate thousands,(e.g. “1,000”) and a full stop represents decimal points. It’s the other way around in Europe! So depending on your original data source, you may need to change these settings.

Now click the Finish button to import the CSV data into your Excel spreadsheet.

Step 5. Excel: Choose Where to Import Your Data

Finally in Excel, you will be asked where you want to import the data. If you opened a new workbook at Step 1 above, select the Existing sheet already chosen for you, then click OK.

Step 6. Google Drive: Import Your CSV File into a New Spreadsheet

In Google Drive, you can do the above steps 1 – 5 in the one dialog box. First, create a new spreadsheet from the File menu. Now choose Import from the File menu.

A dialog box will appear. Select your CSV file from your hard drive.

Now select Replace spreadsheet, as we have already opened a new spreadsheet.

Check that the preview shows how you want your data distributed in columns. In most cases, this will automatically be set up correctly. If not, change the “separator character” to “tab” or “comma” to see if the data is now shown correctly in columns.

Click the Import button to proceed.

Step 7. Tidy Up Columns

As we will be cleaning your data spreadsheets, it will be easier to do so if you can see all your data in neat columns. Adjust the width of columns to better show the data. Move your cursor between the column letters at the top of either Excel or Google Drive until the cursor turns into a width movement symbol and stretch the column width so that your data is easier to read.

Step 8. Check Your Header Row

Now check that your first row has the correct column headings. When you import some data from some sources, you may have the dates down the left column (the y-axis) and the values along the rest of the row, starting from the second column (the x-axis). So the very first column on the x-axis will be left blank.

For example:

If that’s the case, enter a description (for now) for the first column, for example “Date” or “Month”.

If you are using your CSV data in a spreadsheet, you can also set the first row as your column headings. This is useful if you are working with large sets of data where you may scroll down and not be able to see the headings row.

In Excel, click on Page Layout under the File menu. Choose Sheet from the menu in the popup dialog box and click on the row in your spreadsheet for the column headings. It should now be indicated in the box rows to repeat at top usually as $1:$1.

In Google Drive, select Freeze row from the View menu and select Freeze 1 row to indicate the header row.

Step 9. Confirm What Data You Need

If you are cleaning up this data because you will be moving it into another online tool or app, check the requirements for importing data into your destination app (see the Use Case examples under Section 3. Moving Data, below).

If you are cleaning your CSV file to share it in a spreadsheet, confirm with your client what data they want you to be able to show.

Step 10. Delete Unnecessary Data

Note: You may want to save a copy of your spreadsheet before you start deleting any data, just in case you change your mind or need it later on.

Now remove any columns you will not need. For example, if you want to import your keyword research into a content calendar, you may only be interested in the keyword and local monthly search volume columns.

You can delete the columns you don’t need by clicking on the column header letter until the whole column is shaded. Now choose Delete from the Edit menu (in Excel) or Delete column from the Edit menu (in Google Drive).

Step 11. Make Sure Date Columns are in the Right Date Format

Again, check where you will be importing the data for any requirements (see section 3 user cases below).

If your CSV file data includes dates, select the column with the dates.

In Excel, select Cells… from the Format menu. Now select Number from the top row menu, and Date from the lefthand side menu. Choose the date menu that corresponds with the import requirements, or select the most readable version for you and your client, if you are just importing the data into your own spreadsheets.

In Google Drive, select the column and the click on the button that reads 123 in the toolbar. Choose a date format from the list, or select more formats to see an additional menu of date format options.

If your CSV file requires times to be included with the dates, make sure they are in the right format (hh:mm or hh:mm:ss).

If your CSV file requires dates and times to be in separate columns, copy the whole column, then format the first column to be in date only format, and the second column to be in time only format. The rest of the data will be automatically removed.

Step 12. Make Sure All Numbers are Uniform

Any columns with numerical data are best formatted by removing the thousand separators so numbers appear as 23000 for example and not 23,000.

In Excel, select the column, and again choose Cells… from the Format menu. Choose Number from the horizontal menu, and Number from the lefthand menu. Make sure Use 1000 separator is unchecked.

Google Drive insists on using thousand separators. This will not affect your CSV imports, but may be confusing if your clients are European. Click on the column to select it, select the Format menu option and click on Number, then select Normal.

Step 13. Add Columns You Do Need

Check the list of data you need that you created at Step 9. Add columns to your spreadsheet for each of these. Some online tools also require that data be ordered in a particular way (for example, scheduling tweets may require a specific order for your CSV file data). If that’s the case, move the columns now to match the preferred import order where you will be sending the data.

Now make sure the column headings are an exact match for what your importing destination is expecting. Do not use inverted commas at this stage for headings or any text data in your columns.

Step 14. Incorporate Cleaned Data into Your Working Spreadsheets

In some cases, such as cleaning your keyword data, you may not be looking to add it to any destination source, other than a content calendar. Save a copy of your spreadsheet as a backup file, and then select the columns you want to move into your content calendar. Copy these columns now and paste them into their destination spreadsheet.

Step 15. Save Cleaned Data as a CSV File Ready to Import into an Online Tool

If the final destination for your cleaned data is to add it to another program, you are now ready to save this spreadsheet as a CSV file so you can import it into an online app or webtool.

Repeat Section 1 Creating a CSV file from a spreadsheet again.


4. Moving Data from a CSV File into an Online App or Web Tool

In many cases, you will need to move your data into another online app or webtool to complete your work.

To make sure your data will be able to be read by the app you are adding it to, check any requirements for what columns and data need to be included.

Look for a menu option like “Account”, “Settings”, or “Data” and review the import data information. Alternatively, check the help section of the app/web tool. Look for what column headings you need to use and the required format for each column of data. For example, if you are adding data to a Google Calendar, you would need columns for Start Date, Start Time, End Date, End Time, Subject and (as an option) Venue.

On the other hand, if you were importing data on your project work into a time tracking app like Toggl, they want a column for Start Time, but instead of End Time, you need to have a column for Task Duration, in which you indicate how much time was spent on the task rather than it’s end time.

Each online app and webtool may be slightly different, so being able to clean data in CSV files means getting acquainted with user documentation and online guides to importing data.

What are Your Data Needs?

What are your current data needs? Have you used CSV formatted files to move important data for your business from one application to another? Let us know about your data projects in the comments below.

Advertisement
Advertisement
Looking for something to help kick start your next project?
Envato Market has a range of items for sale to help get you started.