2 Steps to Clean Data and Import it into CRM ⭐

Imports are tough when your data are messy and full of inconsistencies. Make data clean and import them on the first attempt into the CRM.

Cleaning Customer Data in 2 steps Before Importing it into CRM 

Just as you need to organize your desk before you start working, you need to clean up your data before importing them into the HubSpot CRM system. You know the old saying: “garbage in, garbage out.” The better the data, the better the system.

What are the most common problems? You might recognize yourself in some of these shoes:

  • A third of the names aren’t properly capitalized.
  • Some fields are missing entirely at some records.
  • Addresses are in different formats, and it’s difficult to parse them together.
  • The whole dataset is rather messy and inconsistent.

Inconsistencies in data are a reality of data collection, mainly when you're relying on customers to enter the data by hand on their own.

Why should you care about data quality?

According to Insycle research, almost 70 % of its respondents spend at least one hour cleaning their data sets before each import. A third of them admit it’s more than 4 hours. Most of their effort goes into updating the data, merging fields, format unification, and weeding out duplicates. Most of the tasks are quite routine and can be easily prevented. But when you receive the file to import, it’s too late. You have to roll up your sleeves and start digging.

First step: Clean up your data before import

Let’s move straight away to clean up your data. The following steps will ensure that your data is clean, meaning they’re free of inconsistencies and mainly – ready for importing. 

1. Fix formatting

Keep it simple, masters say. Your CRM will be the most useful if you keep all values in a single category; uniform in format across all your records. Only then you can effortlessly filter and search your database.

💭 Has your database thousands of lines and many columns? Sure, you don’t want to fix data manually. The most straightforward trick is to use CTRL+H in your Google Spreadsheet or MS Excel and replace characters like double spaces with a single space. For more complex edits, you’ll need to understand regular expressions and some text functions that help you, i.e., upper case names. And if this won’t be sufficient, there are always advanced data tools. Ask your developer for advice before rolling up your own sleeves. 

Here are some common areas that will probably need a little touch:

  • Names – a proper case for first and last contact names
Probably the most common error in all datasets. A standard way to organize names is to have separate columns for the first and last names. The case is essential as no one wants to have their name written wrong.

You want to be addressed “Bob” and not “bob” in the follow-up email you’ll receive, right? Such an error could hurt both conversion rates and company reputation because you definitely don’t want your brand communication to look like a reckless robot wrote it.

  • Phone numbers

There are many ways to format a phone number. It can have 9 or 12 digits, starting with a + or double zero, sometimes even with no country prefix at all. With space, dashes, or without anything.

  • +1 234 567 890
  • 001234567890
  • 123-456-7890
  • 5555555555

The proper formatting makes a huge difference – it will ensure that the phone numbers are compatible with any system that uses them, and it will make things easier for your teams that routinely contact customers through the CRM.

  • Mailing addresses

Keeping your mailing address format correct is challenging yet critical. Remember that mailing customers or employees without crystal clear data can lead to not only a waste of your time but also a waste of your budget. Always try to obtain structured data about the street, house number, city, zip code, and the country as it is much easier to merge two values together than separate them.

  • Email addresses

Check if the email addresses have the proper “name@domain.xyz” formatting. Formatting and case errors are easy to spot and easy to deal with. Anyway, you should try to validate values entered for this record before they are saved to your CRM system, as it may be impossible to reconstruct them retrospectively.

💡 E-mail address is the master key to deduplication. HubSpot and many other platforms use e-mail for deduplication. Before the import, every e-mail address must be unique for each Contact, while the domain must be unique for each Company.


💭 Seeking help with data imports? Autoarti got your back. Contact us and let’s help you with fixing your data and importing them to any CRM system. 

data cleaning

2. Remove whitespace and unwanted characters

Removing those two villains from your datasets is essential to effectively search and filter the data. Both are common problems but can severely negatively impact correct use of your data. 

  • Whitespace

Extra space in a data field is very common. Users accidentally hit the spacebar after entering their data without noticing. Or hit it twice between words when they only meant to hit it once. Whitespace can cause usability issues in some situations but can be hard to spot without the help of a replacement tool or macro.

  • Unwanted Characters

You’ve seen it before – those unwanted characters that no one invited, yet they’re here. They might look something like this: Ã, ¢, â, ê. These characters aren't manually entered in most cases but are caused by encoding issues that arise when you save, import, or export data from the websites.

Sometimes, you’ll not only have to remove the characters but fix the field with the correct values, which may require advanced reconstructing of the dataset previous versions. If you’re removing these characters by hand, they should be relatively easy to spot.

3. Consolidate and standardize to improve filtering

Data standardization is critical for making the most of any dataset. It’s not uncommon to see multiple fields attempt to describe the same thing in different terms. 

Two prospects listed within your dataset as “Chief Executive Officer” and “CEO” hold the same position but wouldn’t be featured in the same list if you filtered your data by job title – which is a problem. 

Consolidating and standardizing similar fields makes your data better searchable and more useful to your teams.

  • Job titles

This area is one of the most challenging when it comes to data standardization. There are a lot of acronyms to describe different job titles, as shown in the CEO/Chief Executive Officer example above. That applies to most of the job titles.

And then there are various titles for the same position. What's the difference between a Content Marketing Manager and a Content Manager? Or a Chief Marketing Officer and Marketing Director?

Consolidating and standardizing similar job titles will make things easier for your marketing, sales, and customer service teams.

  • Industry
Companies might mark them as a part of the “tech,” “software,” or “SaaS” industry. The question is, how do you want to categorize industries, and what are your users going to appreciate the most? Clean data will help your support, sales, and marketing teams to bring a better customer experience.

  • Company associations

This is very common in all CRM systems. After some time, you notice that many of your contacts and companies are disconnected. It affects you in two cases – you can’t find any contact person in the company, or you see a bunch of contact persons that you don’t know where they work.

Failing to keep these connections correct makes it difficult to find the right person within the company and hamper your personalization efforts.

4. Remove extraneous contacts

In this step of the cleansing process, we’ll move away from the errors and issues within your dataset and focus on some strategies that can help you clean your data to lower costs and improve effectiveness.

Companies that store a lot of data and collect it over long periods sooner or later notice that some percentage of their data will age out or become less valuable. Plus, the reality is that prospect and customer databases double in size every 12–18 months on average.

There's no reason to keep useless data sitting in your datasets, as it will only drag you down. Keeping your database clean and effective will help you lower costs on data storage marketing campaigns and save your employees a great deal of time in the long run.

When looking for outdated or unhelpful entries to remove, there are a few areas you should start with:

  • Remove contacts that bounce

You don’t want to continually try to reach out to someone who is never going to reply. Worse, high bounce hurts your standing with email providers and reduces the deliverability of your email marketing materials.

The same can be said for disconnected or incorrect phone numbers — reaching out and never receiving a reply is a drain on your resources and budget.

  • Remove contacts from free email domains (Gmail/Hotmail)

You have to be careful with this one, as some of your subscribers will use free domain addresses instead of business domain email addresses when engaging with your company. You don’t want to remove perfectly good leads from a list, but a significant percentage of contacts with free email domain addresses might be useless.

Many people use “throwaway” emails to sign up for marketing purposes to avoid cluttering their necessary business email addresses. These people are unlikely to actively interact with your content, and it may be wise to take them off your list. And some of them might be fakes.
Use your analytics to determine which of the contacts with addresses at free email providers are legit. They'll be the ones that open your emails and interact with your content regularly.

  • Remove contacts that have unsubscribed

Under the CAN-SPAM act of 2003, it's illegal for companies to continue sending marketing materials to prospects that have unsubscribed from their mailing list. Problems may arise when companies have their data separated into different platforms.

A quick import of outdated data that includes unsubscribed individuals can quickly result in some pretty severe violations of the CAN-SPAM act. You should always make sure that the dataset you're using to send out your marketing communications uses your most updated subscriber information

  • Remove contacts that haven’t engaged recently

If you are sending them messages and there is no engagement with those materials, there isn’t any point these records should take up your space.

You don’t want to continue sending emails to someone that subscribed to your mailing list seven years ago and hasn't engaged with a new email in the last four years. Let’s face it; your emails are likely hitting their “spam” folder anyway.

To prevent your database from bloating, be sure to have a strategy to remove disengaged contacts periodically and automatically.

  • Remove duplicates

Whenever you import data into a CRM or transfer data between platforms, there's a high probability that you duplicate some records. When your teams update two different entries for the same record, you quickly lose sight of which one is the most up-to-date.

💡Systems like HubSpot CRM have automatic deduplication functions built-in. It deduplicates records by email address, company domain name, and more. Even then, you should import datasets as clean as possible though.

  • Remove redundant legacy fields

Multiple fields may contain similar information. For example, someone created a form and added a new field to capture “Company team size,” but such a field already exists in the system “Company size”. Now you have values in multiple fields. Wouldn’t it be nice to consolidate them all to the right field and eliminate the redundancies?

💡Check Insycle, a HubSpot feature for cleaning data. Errors and unnecessary work lead to spending a lot of time identifying issues and correcting them by hand. Insycle is a painkiller for these headaches. It helps HubSpot users clean data quickly and effectively via various powerful tools.


When you are done with fixing the data, go through all the problems you encountered and try to find the real origin of these issues. Run a deep analysis if needed, but be sure to fix maximum sources that deliver faulty formats, wrong values, or inconsistent labels.

This is a continuous process that should help you prevent fixing the same mistakes repeatedly. Adopting the same rules for text fields or format checks increases data quality and saves your time.

Instead of manually correcting every line in the database or CSV file, spend a few minutes improving your templates and forms. Add required fields if needed, add format and other smart checks to ensure the data are correct and trustworthy. Also, limit the free-form text field to an absolute minimum.

Second step: Format your data for import 

To import contacts, companies, deals, tickets, or products into your CRM, prepare data for import into a single file that is formatted correctly for your system – the ideal choice is a CSV file with UTF-8 format which supports most of the foreign characters and as one of the Unicode character sets even emojis. 

💡 Export before import. Another easy way is to export a few contacts to get their correct structure and format, replace old records and import them back. 

Check if your import file:

  • Includes a header row in your file
  • Contains only records with all required fields and as many recommended fields as possible.

💡 If the column header does not correspond to any property in your CRM, you might be prompted to create a custom property for it.

💡If the uploaded file contains new values for existing records, the existing information will be overwritten by freshly imported values. Keep the cell blank if you don’t want to overwrite your existing data. It also depends on the capabilities of your CRM system, but the best practice is to use unique record IDs for deduplication.

Final thoughts

Most of the companies identified inconsistent data as the main block on their way towards increasing ROI. Faulty data creates disturbances throughout your company and holds back any efforts for growth. Not all the issues can be prevented, so you should emphasize to CRM and other system users how important data quality is. Make sure that everyone double-checks everything and cares about unification and standardization. 


In a nutshell

Clean data is the first step towards a happy life. The first step is to fix the raw data in the spreadsheet or a database. Removing unwanted characters, standardizing terminology, and proper categorizing. The most common errors are wrong formats of email addresses, phone numbers and missing capitalization of contacts’ names.

When you are done with this tedious work, it’s showtime! Importing data to your CRM system often requires practice but shouldn’t take you too long to master. Prepare the import file by exporting a few records out of the CRM first, as its structure will help you stay on the right track.

To keep your data clean for the next time, focus on methods of how you gather them in the first place. Any manual entry can cause troubles, especially if there is no validation checking the formats.


And if you feel lost in the data jungle, after all, Autoarti is here to help you get on the right track.


Schedule a meeting with us



Sources: Zapier, Trujay, HubSpot

Similar posts

Get notified on new marketing insights

Be the first to know about new B2B SaaS Marketing insights to build or refine your marketing function with the tools and knowledge of today’s industry.