Managing Duplicate Customer Records

Customer Key Management and Duplicate Customer Records

In Connect, customer records may be identified by email, SMS number, or alternate customer key. Since each Connect deployment has different instance requirements, previous versions of Connect did not enforce a unique constraint on customer records. By not enforcing a uniqueness constraint, Connect allows any customer key management scheme to be used.

Brick Street has recently become aware of a scenario where incoming messages to the Event Loader can result in the creation of duplicate customer records. This flaw has been present since the earliest versions of Connect. To deal with this issue, Connect 10r5 contains tools for detecting and merging duplicate customer records, and support for adding database constraints to prevent new duplicate records from occurring.

Duplicate customer records are usually not customer-visible, since customers are not aware of their customer primary key in the Connect database. However, duplicate customer records can potentially skew cross-campaign results, and can complicate opt-out processing.

Identifying Duplicate Customer Records

The best way to detect duplicate customer records depends on which customer fields are used as unique keys. The table below describes how to detect duplicates in each case. The second column gives a query to detect customer keys with duplicate records. The third column gives a query to determine the duplicate records based on the detection query in column two.

Customer Key Field(s) Query to Detect Duplicates Query to Identify Duplicates
ALT_CUSTOMER_ID
SELECT COMPANY_ID, ALT_CUSTOMER_ID, count(*)
FROM CUSTOMER_MASTER
GROUP BY COMPANY_ID, ALT_CUSTOMER_ID
HAVING count(*) > 1
      
SELECT * 
FROM CUSTOMER_MASTER C
JOIN (
    SELECT COMPANY_ID, ALT_CUSTOMER_ID, count(*)
    FROM CUSTOMER_MASTER
    GROUP BY COMPANY_ID, ALT_CUSTOMER_ID
    HAVING count(*) > 1
) DUPS      
ON C.COMPANY_ID = DUPS.COMPANY_ID
AND C.ALT_CUSTOMER_ID = DUPS.ALT_CUSTOMER_ID
      
ACTIVE_EMAIL_ADDRESS
SELECT COMPANY_ID, EMAIL_LOWER, count(*)
FROM CUSTOMER_MASTER
GROUP BY COMPANY_ID, EMAIL_LOWER
HAVING count(*) > 1
      
SELECT * 
FROM CUSTOMER_MASTER C
JOIN (
    SELECT COMPANY_ID, EMAIL_LOWER, count(*)
    FROM CUSTOMER_MASTER
    GROUP BY COMPANY_ID, EMAIL_LOWER
    HAVING count(*) > 1
) DUPS      
ON C.COMPANY_ID = DUPS.COMPANY_ID
AND C.EMAIL_LOWER = DUPS.EMAIL_LOWER
      
ALT_CUSTOMER_ID, ACTIVE_EMAIL_ADDRESS
SELECT COMPANY_ID, ALT_CUSTOMER_ID, EMAIL_LOWER, count(*)
FROM CUSTOMER_MASTER
GROUP BY COMPANY_ID, ALT_CUSTOMER_ID, EMAIL_LOWER
HAVING count(*) > 1
      
SELECT * 
FROM CUSTOMER_MASTER C
JOIN (
    SELECT COMPANY_ID, ALT_CUSTOMER_ID, EMAIL_LOWER, count(*)
    FROM CUSTOMER_MASTER
    GROUP BY COMPANY_ID, ALT_CUSTOMER_ID, EMAIL_LOWER
    HAVING count(*) > 1
) DUPS      
ON C.COMPANY_ID = DUPS.COMPANY_ID
AND C.ALT_CUSTOMER_ID = DUPS.ALT_CUSTOMER_ID
AND C.EMAIL_LOWER = DUPS.EMAIL_LOWER
      
Note: EMAIL_LOWER contains the lowercase value of ACTIVE_EMAIL_ADDRESS.

Merging Duplicate Customer Records

If duplicate customer records are identified, they can be merged into a single record. Connect 10r5 provides the p_merge_customer stored procedure to merge customer records. The stored procedure takes two arguments: p_cust_id, the customer key of the record to keep, and p_dup_cust_id, the customer key of the duplicate record. The store procedure merges the data from the duplicate customer record into the customer record that is being retained. In addition, the stored procedure records the merge operation in the CUSTOMER_EQUIVALENCE table.

A "chained merge" is when customer A is merged into customer B, and then customer B is merged into customer C. Chained merges will break the code that uses the CUSTOMER_EQUIVALENCE table, so the stored procedure will throw an error if a chained merge is attempted.

Database Customer Merge Procedure
Oracle
p_cust_mgr.p_merge_customer(p_cust_id IN NUMBER, p_dup_cust_id IN NUMBER)             
             
Note: Defined in p_cust_mgr package.
SQL Server
p_cust_mgr$p_merge_customer( @p_cust_id DECIMAL(18,0), @p_dup_cust_id DECIMAL(18,0) )
             
MySQL
p_cust_mgr$p_merge_customer( p_cust_id DECIMAL(18,0), p_dup_cust_id DECIMAL(18,0) ) 
            

Defining a Unique Index on Customer Records

Once you have verified that there are no duplicate records, you can create a unique index on the CUSTOMER_MASTER table. The file manual_U_CUSTOMER_UNIQUE.sql contains the definition of the U_CUSTOMER_UNIQUE index.

The default implementation uses a unique index on ALT_CUSTOMER_ID and COMPANY_ID. The file also drops an existing non-unique index on ALT_CUSTOMER_ID. We expect that this implementation will work for most customers. Please contact Brick Street Support if you need to discuss this issue.

If the index creation fails due to duplicate records, use the procedures detailed above to merge duplicate records.

Handling Merged Customer Records

When a duplicate customer record is merged, the merged customer key is recorded in the CUSTOMER_EQUIVALENCE table, and it no longer appears in the CUSTOMER_MASTER table.

Any messages that have been previously sent to the duplicate customer key could be the source of future inbound activity, such as opens, clicks, bounces, or replies. Connect handles this by looking in the CUSTOMER_EQUIVALENCE table anytime it find a non-existent customer key. Thus, inbound activity for merged customer records will be automatically associated with the surviving customer key.

Appendix: How Duplicate Customer Records Can Occur

Duplicate customer records can occur multiple KcXML messages are received for a non-existent customer at the same time. A timeline detailing how these records occur is shown below.

Time Main Thread Service Thread 1 (T1) Service Thread 2 (T2)
T1 Messages M1 and M2 received for non-existent customer C1 and dispatched to threads T1 and T2.    
T2   Queries CUSTOMER_MASTER and determines that C1 does not exist. Queries CUSTOMER_MASTER and determines that C1 does not exist.
T3   Creates new CUSTOMER_MASTER record for C1 and processes the event in M1. Creates new CUSTOMER_MASTER record for C1 and processes the event in M2.

Without the U_CUSTOMER_UNIQUE index, both threads will be allowed to create customer records at time T3. When the unique index is present, one thread will succeed and the other will fail with a constraint violation. In previous versions of Connect, the constraint violation will force a restart of the Event Loader process. The process restarts, the second message will be retried and the record for C1 will be updated. This version of Connect has been modified to respond to constraint violation by retrying the message without requiring a process restart.