Duplicate Detection

Overview

Enrollment Manager addresses duplicates in two ways. First, a well-defined process for matching records is used whenever records are imported into the database. Second, users with appropriate security roles can create Duplicate Detection Rules and then use them to run Duplicate Detection Jobs.  These jobs can be applied to a defined set of records to identify particular types of duplicates that may exist.

Duplicate Detection within the Import Process

The Enrollment Manager import process includes a well-defined, multi-step duplicate detection process where an appropriate set of fields are used to determine if each inbound Person record is a match. Details on this matching process can be found in the Standard Import documentation.
 
Duplicates discovered during imports

Each time you upload data to Enrollment Manager, the import process uses search/match criteria and may find that duplicates already exist in the system. These records are identified and sent to your administrator in a Daily Import Summary email. If you would like to receive this email, contact RuffaloCODY Support.

Duplicate Detection within Enrollment Manager

Users with appropriate security roles have access to a robust duplicate detection process within Enrollment Manager. This process allows authorized users to create Duplicate Detection Rules and run Duplicate Detection Jobs to find and resolve duplicates within Enrollment Manager.  The functions related to duplicate detection within Enrollment Manager are:
  • Advanced Find - Used to specify the group of records that the system will examine to determine if duplicates of them may exist in the system
  • Duplicate Detection Rules - The criteria the system uses to determine if two records are duplicates
  • Duplicate Detection Jobs - Tells the system to run published Duplicate Detection Rules against a specific Advanced Find. Users can create, view and resolve results of Duplicate Detection Jobs.

Duplicate Detection Jobs are governed by Duplicate Detection Rules, which specify how the system should determine potential duplicates. These Rules are created and published within Enrollment Manager using base Microsoft Dynamics CRM functionality which dictates that when  a duplicate-detection rule is published, a matchcode is created for each existing record. Every five minutes, a matchcode is created for any new records that have been added or updated. 

Users can create and run Duplicate Detection Jobs as frequently as desired. The job will run as a background process. Jobs can also be  run on a schedule. For example, a user can schedule a job to run at midnight every day.

When a Duplicate Detection Job completes, users can receive an e-mail notification, evaluate the results and merge duplicate records.
 
Person duplicates can be merged by selecting the "Merge Duplicates" option to allow the user to merge Person records. Opportunity Records can be merged using the Opportunity Merge function. 
 

Duplicate Detection and Overall Performance

Duplicate Detection Rules, by their nature, consume significant system resources--not only while they are running, but in storage and manipulation of the results. Unprocessed duplicate detection results remain in an Enrollment Manager table that is not visible on the user side. This "results" table can become very large and, when it does, can impact performance in an adverse manner.

Performance can be improved by removing the records in this results table once they are no longer used. RuffaloCODY staff can perform this table clean-up using tools that will not result in the deletion of the Duplicate Detection rule that generated the duplicate records. If this service is desired, please place a support request with emsupport@ruffalocody.com and ask us to examine and reduce the size of the results table if required.
 
Note that clients can perform this operation by deleting the duplicate detection rule associated with the records. However, unpublishing the rule will not clear the table nor will deleting the duplicate detection job; the rule itself has to be deleted. If you have a complex rule, it might be best to take a screenshot before deleting it to ease re-creation.

The best way to prevent the growth of the results table is to be very careful about the selection of the records submitted to a duplicate detection job. One example is a duplicate detection rule looking at matching email addresses. If the selection includes Persons with blank email addresses, each of those will match every other blank email address in the system, potentially fanning those records and creating huge result sets. This will increase the size of the results table dramatically and could seriously impact performance.

This selection distinction cannot be made in the Duplicate Detection rule itself, only in the job that is set up or the Advanced Find used. So, two good rules to follow are 1) Always exclude records that have nulls in the detection criteria such as email address and 2) from time to time, have the results table examined and reduced if necessary.

We recommend allowing RuffaloCODY to perform the data cleansing function to save time better put to other uses at your institution. In addition, it is also beneficial to have RuffaloCODY staff design and run the Duplicate Detection process. We can use best practices to address not only this issue, but to make Duplicate Detection as efficient and effective as we can.