How To Clean Up Messy Sales Data

January 19, 2017 Alex Yaseen

This post was written by Alex Yaseen, Co-Founder and CEO of Parabola.

By nature of being a fast-moving process, B2B sales generates a lot of messy data. Factor in the large amount of fantastic sales enablement tools that are available, and it can start to feel like you’ll never fully get your data under control. As highlighted in a webinar last month, sales ops frequently has trouble with:

  • Integrating and normalizing multiple data sources
  • Handling manual data entered by reps
  • Converting data into the right structure for new data vendors

Because Parabola is a tool to solve these problems, we’ve been privileged to watch our customers come up with best practices for how to handle their sales data and how to ultimately increase sales efficiency.

Note: Throughout this blog post there are brief suggestions for how to solve various data problems. You can find a more detailed tutorial for how to solve each problem in Parabola here.

Working With Multiple Data Sources

Almost every sales ops team must work with multiple data sources. CRM, analytics, and email/marketing automation tools all store and generate their own data. Data enrichment services and other vendors help enhance that data. Your team likely also has internal spreadsheets and other proprietary information that can help your business succeed.

As soon as you start working with more than one data source, however, things can get messy. Here are some of the most common problems we see, and some suggestions for how you can solve them.

Problem 1:

Pieces of data may exist in multiple places, leading to duplicates when you bring everything together. This gets particularly messy when the entries are slightly different but still refer to the same lead, contact, etc.

Solutions:

  • Excel: Use =IF(ISNA(MATCH([lookup_value], [lookup_range], 0)), “match”, “no match”) to find identical matches between two sheets. Manually delete matches from one of the sheets each time you want to update the data.
  • Parabola: Use the “List Contains” object with fuzzy matching turned on.
  • Python: Write a custom “for” loop with a comparison check and a fuzzy matching package.

Problem 2:

Different data sources can use different date/timestamp formatting. This can cause missing, inaccurate, or duplicate entries when attempting to further analyze your data.

Solutions:

  • Excel: Manually use the format menu to select a date format you like for each date column.
  • Parabola: Use the “Date Conversion” object.
  • Python: Write a custom “for” loop using the strftime() function on the datetime object.

Problem 3:

IDs or other fields shared by multiple data sets can be in different formats, making it impossible to combine the data sets into one.

Solutions:

  • Excel: Write a VBA macro using Microsoft VBScript Regular Expressions.
  • Parabola: Use the “Find/Replace” object with optional regular expression syntax.
  • Python: Write a custom “for” loop using regular expression syntax and the sub() function of the regular expression object.

Handling Manually Entered Data

When sales reps or prospects fill out text fields, data can get messy. Freeform notes fields in your CRM, content from the bodies of emails, web forms, and internal spreadsheets are just a few of the places this can occur.

If used properly, however, this messy data can be highly valuable. Here, again, are some common problems as well as potential solutions to help unlock that value.

Problem 1:

Often, email addresses are either incomplete or have obvious typos. Some common examples are “user@gmail”, “user@gmail.con”, and “user@gmial.com”. You might be tempted to throw these entries out as junk, but that would mean getting rid of potentially valuable data. What if one of those leads could have been a $100k deal?

Solutions:

  • Excel: Manually use the find and replace tool for each common typo every time you want to clean the data.
  • Parabola: Use a “Find/Replace” object with the applicable rules filled out.
  • Python: Write a custom “for” loop that uses the replace() method of the string object.

Problem 2:

Notes fields may seem difficult to parse in an automated way, but you may not have time to go through each one manually.

Solutions:

One approach is to pick a few keywords to search for, and assign a score to each. This enables you to rank your data based on the occurrence of these keywords in the notes field.

  • Excel: Create a new column and use =IF(ISNUMBER(SEARCH([keyword], [cell_to_search])), [score], 0) for each keyword. You can then use =SUM([first_column]:[last_column]) to add them all together.
  • Parabola: Use the “Scoring” object.
  • Python: Write a custom “for” loop that uses the “in” operator to check each rule and add to the score if there’s a match.

Problem 3:

Many forms ask for a contact’s full name (rather than first and last). This can help streamline data entry and improve form completion rates. However, it can cause problems when you want to use merge tags to send custom emails. “Hi John Smith” doesn’t feel as personal as “Hi John”.

Solutions:

The best approach is to figure out how to split names into first and last, but don’t forget to account for middle names, prefixes, suffixes, etc.

  • Excel: Use the text-to-columns tool to split after every space and then manually clean up incorrect splits due to middle names, initials, etc.
  • Parabola: Use the “Name Parser” object.
  • Python: Write a custom “for” loop with a name parsing package.

Preparing To Use A New Data Source

Sometimes, despite your best efforts to clean up your data, a new data source may require differently structured data than you currently have.

Problem 1:

If you want leverage the fantastic technographic data and prospecting capabilities that Datanyze can provide, you may want a “domain” field.

Solutions:

Assuming you have (or can get) email addresses for each lead, all you have to do is split the email addresses into username and domain.

  • Excel: Manually use the text-to-columns tool to split based on the @ symbol.
  • Parabola: Use the “Column Split” object to split based on the @ symbol.
  • Python: Write a custom “for” loop and use the split() function on the string object passing it the @ symbol as the string to split by.

About the Author

Alex Yaseen

Alex is the Co-Founder and CEO of Parabola.io.

Follow on Twitter More Content by Alex Yaseen
Previous Article
How Data Can Improve ABM Account Selection
How Data Can Improve ABM Account Selection

Learn how Engagio uses data to select and tier accounts and set up their reps for Account Based Everything ...

Next Article
How to Divide Sales Territories Using Technographics
How to Divide Sales Territories Using Technographics

Learn how to create and manage modern SaaS sales territories using technographics, which allow you to segme...

×

Get new posts sent to your inbox!

Great success!
Error - something went wrong!