Skip to main content

Deduplicating your data from imports

Learn to create, match, and update records when importing your data to avoid duplicates in your database.

Updated over a week ago

You only want one copy of each record in Beacon - one record for each person, one for each payment, one for each organisation, and so on. Deduplication helps to make sure that happens by avoiding duplicates when you are importing data.

Sometimes, you want to create new records, sometimes update existing ones in your database, and sometimes both! Choosing your deduplication settings ensures a smooth flow of data into Beacon.


Choosing your deduplication field

To avoid creating a new record from an import when one might exist already, you'll need to tell Beacon which field to use to check for a match.

The field types you can choose from will be one of the following:

  • Record ID

  • Email address

  • Phone number

  • Short text

  • Number

  • URL

You will only be able to choose an option from the columns that you've mapped to fields in Beacon (the blue ones columns!). If you haven't mapped it, it won't show up as an option for deduplication.

The option that you choose should have unique values; no two records should share the same value in that column.

When choosing the deduplication field for your import template, it's good to ask a simple question:

"How would Beacon know if this record already exists in my database?"

Your answer might be something like...

"Because the person's email would match"

or

"The event's name would be the same"

This is the field you should use as your deduplication setting for that record!

Some good examples of fields to use are:

  • A person's email address

  • A person's phone number

  • An organisation's name

  • A payment's payment ID

  • A fund's code

  • A campaign's name

You should choose a field to deduplicate for every record type that will be created from your import. For example, if you're importing Payments made by People, you would need two - a field on each - to deduplicate them by (as one person might have multiple payments).

A word about names

A name can be a useful way to deduplicate and match with records, but it can't always be used.

We don't allow the name field on People to be used for deduplication, as a person's name isn't unique; there are plenty of people who share the same name (think of all those John Smiths!), and we wouldn't want to roll them all into one.

Conversely, we do allow an Organisation's name to be used, as we wouldn't expect two organisations to share the same name.

This is the same for Events, but it's good practice to make sure that if you have any recurring events, then you make your names unique within Beacon by using months or years:

  • Winter Run 2023

  • Christmas Appeal 2017


What happens when an existing record is found?

By default, we will update the existing record in Beacon with the data you're importing instead of creating a new entry with those details.

You can amend these settings for each record type to one of three different behaviours:

  • Create or update (default)
    Check for an existing record: If one exists, update it. If not, create a new one.

  • Only create
    Check for an existing record: If one exists, do nothing. If not, create a new one.

  • Only update
    Check for an existing record: If one exists, update it. If not, do nothing.


Frequently Asked Questions

When I try to import, I get a warning about empty deduplication values found; What does this mean?

At least one row in your spreadsheet is blank in the column that you want to use for deduplication. Those rows won't be checked for duplicates, so they will always create new records.

What you can do:

  • Choose a new column to deduplicate by; one which you're sure is unique and has a value in every row.

  • Create a pseudo ID in your spreadsheet for those rows. More here

  • If the number of rows is low and you're ok with those records being duplicated then you can continue the import!

Tip: Make sure to review your spreadsheet to understand why those rows are missing values. If the number of rows with empty deduplication values is higher than 1% of the total rows, then you may want to adjust your settings.

When I try to import, I get a warning about duplicate values found.

At least two rows in your spreadsheet have the same value in the column you've set as your deduplication field (for example, you're deduplicating people by their email address, and more than one person in your sheet has the same email). This will cause earlier rows to be overwritten by any later rows that have the same value during import.

What you can do to ensure your import runs correctly:

  • Make sure that every value in your chosen deduplication column is unique.

  • You can continue if you don't mind the earlier rows being overwritten.

What if I don't have something to deduplicate by?

You should always have deduplication values set. If you don't have something unique in your data to deduplicate by, here are some options:

  1. Add a column to your spreadsheet for 'Record ID', look up the record ID in Beacon for each record you're importing, and add these IDs to your new column in your spreadsheet. You can then use that for deduplication. This is only good for small datasets, as otherwise, it'd take a long time!

  2. Create an ID to use as deduplication. If you don't have a unique ID in your data, you can usually create something close to unique by combining data manually or using an Excel formula like CONCATENATE or TEXTJOIN. You could do this in your spreadsheet, but it's recommended that you use a smart column! Here are some examples:

    1. For People, "John Smith" might not be unique, but "John [email protected] 3886 0205-1 New Street" is quite likely to be unique. Try combining as many data bits of data as possible.

    2. For Payments, try combining the payer's name, payment date, and amount to make a pseudo-ID in a new column and mapping to 'External ID' (e.g. JohnSmith-10/03/2020-55)

    3. For Event attendees, try combining the attendee's name, the event name, and the date of the event. Again, the more values you combine, the more likely it is to be unique; After all, two John Smiths could come to the same event.

Warning: We don’t recommend importing without deduplication settings, and we can’t guarantee accurate data if you choose to do so. You should take the time to set deduplication settings, using the guidance above if needed, and make sure you receive no warnings before importing.

Did this answer your question?