Outdated: Stage II jobs

Overnight Execution of Extract Process and other features 
Edit section

Last Updated 6/9/2010

The extract process for copying data from the transactional system into the warehouse runs on a once-daily schedule, and three related features, the Standard Export, the Advanced Export, and the Descriptive Analytics, run with it on the same schedule.

This version-specific document describes the extract from the CRM 4.0 transactional system into the Stage II warehouse and associated features.

Versioning note
Edit section

This page documents the jobs which run against the Stage II warehouse. The current production version is Stage IIa, but except as noted on the Stage IIa page the jobs are as described on this page.

 

 

  1. 1. Overnight Execution of Extract Process and other features
  2. 2. Versioning note
  3. 3. Related Documents
  4. 4. Architecture
  5. 5. Transactional system
  6. 6. Warehouse
    1. 6.1. Warehouse tables
    2. 6.2. Other Extract Process objects
  7. 7. Jobs
    1. 7.1. Job "Extract 1 – Base Records – Org"
      1. 7.1.1. 1. Extract - Get entry term data
      2. 7.1.2. 2. Extract - Get opportunity and person data
      3. 7.1.3. 3. Extract - Get fin aid data
      4. 7.1.4. 4. Fail - Send email
    2. 7.2. Job "Extract 2 – Subrecords and Processing – Org"
      1. 7.2.1. 1. Verify - Check that Extract 1 has succeeded
      2. 7.2.2. 2. Extract - Get subrecord data as of midnight
      3. 7.2.3. 3. Transform - Flatten export scores from analytics scores
      4. 7.2.4. 4. Transform - Parse subrecord data for exports
      5. 7.2.5. 5. Transform - Move slow-changing data from export tables to analytic tables
      6. 7.2.6. 6. Transform – Move data into local sync tables
      7. 7.2.7. 7. Cleanup – Remove opportunities with inactive entry terms from export table
      8. 7.2.8. 8. Cleanup - Truncate export sorting tables
      9. 7.2.9. 9. Export - Create CSV file for export
      10. 7.2.10. 10. Export - Create Zip file for export
      11. 7.2.11. 11. Export - Delete csv
      12. 7.2.12. 12. Analyze - Run SPSS task
      13. 7.2.13. 14. Succeed - Send email
      14. 7.2.14. 15. Fail - Send email
      15. 7.2.15. 16. Cancel - Send email
    3. 7.3. Job "Extract 3 – Analytics – Org"
 

Related Documents
Edit section

The extract process design is defined in "Extract to the Warehouse".

The structures of the Advanced Export, of the Standard Export, and of the Descriptive Analytics are described in "Warehouse column dictionary".

Architecture
Edit section

The transactional system is hosted in one SQL Server database per client; the structure of this database is defined by the CRM 4.0 software and by the customizations done to this software by Admissions Lab which should be identical for all clients. Multiple clients with their databases are hosted together on a single SQL Server instance. The naming convention for each client’s transactional system database is [OrgName]_MSCRM.

The warehouse is hosted in one SQL Server database per client; the structure of this database is defined entirely by Admissions Lab and has an identical base for all clients with optional client-specific additional objects. The warehouse database is hosted on the same SQL Server instance as the client’s transactional system database, and therefore multiple client warehouses are hosted together on a single SQL Server instance. The naming convention for each client’s warehouse database is [OrgName]_Warehouse.

The extract process runs on a schedule by use of SQL Server Agent jobs hosted on the same SQL Server instance as the client’s transactional system database. These jobs run in security contexts which allow them to log directly into the transactional system database and the warehouse database and manipulate data.

Transactional system 
Edit section

In order to retrieve data from the transactional system database, the extraction steps must run in the context of a login which is mapped to a user defined within the transactional system. A database login which is not mapped to a user in the transactional system will not have sufficient permissions to retrieve data.

  • The login used for this is 422X\C422System. This login is set explicitly in each step which extracts data from the transactional system.

A data object is a conceptual unit in the transactional system which may be a record, a linked set of records, or a partial record in the transactional system’s database; some data objects are updated in place in the transactional system while others have multiple instances created to reflect multiple updates. The following data objects are selected out of these entities in the transactional system:

    Data ObjectTransactional system Entity
    ActivityActivityPointer
    ActivityAppointment
    ActivityCampaignActivity
    ActivityEmail
    ActivityFax
    ActivityLetter
    ActivityPhoneCall
    ActivityServiceAppointment
    ActivityTask
    AwardA422_Award
    CampaignResponseActivityParty
    CampaignResponseCampaignResponse
    CampaignResponseContact
    CampaignResponseEt_ExactTargetJob
    CampaignResponseStringMap
    FinAidProfileOpportunity
    IdA422_ids
    InterestA422_Interest
    OpportunityPersonContact
    OpportunityPersonOpportunity
    OpportunityPerson:AddressContact
    OpportunityPerson:AddressCustomerAddress
    OpportunityPerson:<wbr/>FirstActivityActivityPointer
    OpportunityPerson:<wbr/>FirstActivityOpportunity
    OpportunityPerson:RaceA422_ethnicity
    PersonOrgA422_PersonOrg
    PersonOrgAccount
    RelationContact
    RelationCustomerRelationship
    ScoreA422_dom_scoresource
    ScoreA422_PersonScore
    StaffAssignmentOpportunity
    StatusChangeA422 _dom_oppstatuscategory
    StatusChangeA422_OpportunityStatusChange
    -all data objects-FilteredStatusMap


Additionally, metadata for defining the active entry term is selected out of the following entities in the transactional system:

  • A422_Settings_Custom: the client must have the a422_analyticsactivetermid field populated
  • A422_dom_entryterm: the client must have the a422_year and a422_termordinal fields populated for every valid entry term

Warehouse
Edit section

Warehouse tables
Edit section

The warehouse database has warehouse tables which correspond to the types of data object from the transactional system. These tables hold multiple data objects of each type and multiple instances of each data object which track the historical state of the data object; these instances are either effective-date-stamped (created by the extract process and date-stamped with the date of the extract) or creation-date-stamped (created by the transactional system and copied intact by the extract process) so that the warehouse table can be queried by date to retrieve a historical slice of data.

Although the warehouse tables do not contain formal foreign key references to each other, they all contain an effective reference directly to the warehouse table which corresponds to the OpportunityPerson data object from the transactional system. (This is the AnalyticsMaster table.) Each record in a warehouse table carries an effective or creation date-stamp which is independent of any reference to an OpportunityPerson object, so any date-slicing of any type of data in the warehouse must be done individually on each table.

Other Extract Process objects
Edit section

The warehouse database has extraction views which also correspond to the types of data object from the transactional system. These views are pre-compiled pieces of the extract process; because these views can contain references to the transactional system database (which is specific to each client), the scheduled extract process job only needs to reference the view (which is identical for all clients) and provide a security context (which is identical for all clients). The extraction views allow the scheduled extract process jobs to be more generic between clients.

For convenience, the extract process uses a staging table for OpportunityPerson objects which is also the final base table for the Standard Export feature. Additionally, there are several temporary export tables in the warehouse in order to process the Standard Export data which is not placed into warehouse tables.

The extract process has a staging table in the warehouse for the entry term metadata.

Other objects in the warehouse are not used by the extract process.

Jobs Your browser may not support display of this image. 
Edit section

A scheduled process currently performs several functions in one sequence:

  1. It performs the extract process to copy data from the transactional system into the warehouse.
  2. It moves deleted data into audit tables from warehouse tables.
  3. It moves current data into sync tables from warehouse tables for use by Advanced Export.
  4. It places data in the export table for use by Standard Export.
  5. It produces an export file from the export table for the Standard Export.
  6. It launches the analytics application on the report server for Descriptive Analytics.

This is done using three SQL Server Agent jobs per client. If successful, this sequence of jobs will send two "success" emails, one for the Standard Export and one for the Descriptive Analytics. Failure at any point will cause a "failure" email to be sent.

Job "Extract 1 – Base Records – Org" 
Edit section

This scheduled job is hosted on the SQL Server which hosts the transactional system. It is scheduled to run as close to local midnight as possible in order to accurately provide effective-date-stamping.

This job extracts entry term metadata and uses this to extract staging opportunity and person data.

It extracts updateable data objects from the transactional system, currently opportunity, person, and financial aid data, so that a date-stamp of midnight can be placed on any new records created. It also extracts additional Standard Export data for opportunities, persons, and addresses at this time.

This job is expected to take between 2 and 6 minutes per client.

1. Extract - Get entry term data
Edit section

The retrieval from the transactional system database is an extraction step and runs in the explicit context of login 422X\C422System.

The staging entry term table is truncated when this step begins; prior to this step, the staging entry term table is populated with the previous run's data. The list of "active" entry term names and guids is placed in the staging entry term table. An "active" entry term is defined in relation to the "analytics active entry term" custom setting and is one whose year and term are equal to or later than the year and term of the active entry term.

2. Extract - Get opportunity and person data 
Edit section

The retrieval from the transactional system database is an extraction step and runs in the explicit context of login 422X\C422System.

The export table is truncated when this step begins; prior to this step, the export table is populated with the previous run’s data.

All of the following data is placed directly in the export table. Some of this data will be stored in the AnalyticsMaster warehouse table as well, but it is placed first into the export table so that that table can serve as a staging table later in determining the current and deleted opportunity indicators.

  1. The opportunity data is drawn from the live opportunity view.
    • The opportunities selected are limited to those with active entry terms as selected in the previous job step.
    • Additionally, opportunities which exist in the warehouse with an active entry term but which have been updated in the CRM database to an inactive entry term are placed in the export table.
      • These records will be deleted from the export table before the export file is created.
      • These records are placed in the export table in order to allow a final extract of all associated records. In this case, the export tables serves only as a staging table for the warehouse.
  2. The person and address data is added, drawn from the live contact and customer address views.
    • Some transformation of the *Allow fields is performed to show 1/0 as True/False.
    • The BirthDay field is cast as a string.
    • The Address_02* fields are drawn from the most recent customeraddress records (using modifiedon/customeraddressid) with the following hardcoded limitations:
      • “addressnumber NOT IN (1,2)” because addresses with addressnumber 1 are already available from the contact view and addresses with addressnumber 2 are used for a different purpose.
      • “addresstypecode = ‘200000’” because this is an internal CRM software value indicating a mailing address.
  3. The first activity date is added, drawn from the opportunity and activitypointer views (joined with the first activity GUID stored as a string in the opportunity table). The date is transformed to a string.
  4. The hardcoded organization name and department are added to the export table now.

 

3. Extract - Get fin aid data 
Edit section

The retrieval from the transactional system database is an extraction step and runs in the explicit context of login 422X\C422System.

This step defines the date "Yesterday" as a date with no time to be used as the effective date of new records. It treats the previous local day as yesterday only if the local time is prior to 10AM, after which the day shifts forward by one; this is to allow extractions run during the working day or in the evening to register the day’s data.

If any data is already in the warehouse table with an effective date of "Yesterday", that data is deleted so that this run's data can replace it. (Presumably, if an extract must be run twice in one day, the first run was in error.)

The financial aid data is drawn from the opportunity view and placed directly into the corresponding warehouse table.

  • The opportunities selected are limited (by a join to the base export table, which has already been limited) to those with active entry terms.
  • Only data which is non-null in at least one data field is used.
  • This query uses a checksum backed up by a field comparison to determine if data is new or has been updated.
  • The financial aid data maintains an “is current” flag to indicate which record is the current record when multiple historical records exist for an opportunity.

4. Fail - Send email 
Edit section

This job sends email if it fails, but does not report a successful run.

Job "Extract 2 – Subrecords and Processing – Org" 
Edit section

This scheduled job is hosted on the SQL Server which hosts the transactional system. It is scheduled to run between the expected ending of the Extract 1 job and the beginning of the school’s workday, staggered through the night to reduce the server load.

This job extracts all other data not already extracted by the previous job which are date-stamped no later than local midnight (this is the non-updateable data objects which are already date-stamped in the transaction system), updates the warehouse and export tables, and then performs the non-extract processing also needed overnight in the warehouse. When it is done, it calls a third job to run the descriptive analytics task.

This job is expected to take between 5 and 30 minutes per client.

1. Verify - Check that Extract 1 has succeeded 
Edit section

This step checks that the Extract 1 job for this client has completed successfully; if it cannot find a successful run of that job in the last 24 hours, it waits an hour and retries to allow a slow Extract 1 to finish running. If at that time a successful Extract 1 cannot be found, the job cancels all further action and sends a cancellation email.

2. Extract - Get subrecord data as of midnight 
Edit section

Next the job retrieves all subrecord information. This is an extraction step and runs in the explicit context of login 422X\C422System.

The pattern for this step is that each subrecord type is retrieved into a temporary export table (if not warehoused, with the exception of the race data which is warehoused with opportunities later) or into a warehouse table. Records which will be flattened for a "top N records" export have a sort order created during the retrieval, but the number of subrecords retrieved is not limited to N. All subrecords draw from unfiltered views for improved performance. For certain record types, associated data or domain tables are also included in the join in order to retrieve fields not included in the standard record view. Only new or updated records are retrieved; only records which correspond to opportunities with current entry terms (whether the correspondence is direct or via a person record corresponding to an opportunity) are affected; only records created before the "Today" cutoff point are affected. Affected records deleted from the CRM database are deleted from the warehouse database.

This step defines the date "Today" as a datetime to be used as the cutoff date of selected records. It treats the current local day as today only if the local time is prior to 10AM, after which the day shifts forward by one; this is to allow extractions run during the working day or in the evening to select the day’s data. The time of "Today" is local midnight expressed as UTC time so that it can be compared to UTC times in the CRM database records.

  1. Customer Relationship: First it retrieves customer relationship data along with data on the related persons from the contact table and places it in a temporary export table, ordered by modification date for each person, where the customer relationship record is associated with a current opportunity and was created prior to the cutoff.
    1. Customer relationships are limited to records with a PartnerRoleIdName of Parent, Parents, Father, Mother, or Guardian.
    2. The customer relationship data is joined to the base export table to limit it to current records.
    3. The data is grouped to prevent duplicates when multiple opportunities for the linked person are present.
  2. Race: Next it retrieves race data and places it in a temporary export table, ordered by race name (alphabetically descending) for each person, where the race record is associated with a current opportunity and was created prior to the cutoff.
    1. Duplicate race category + race name combinations are eliminated.
    2. The race data is joined to the base export table to limit it to current records.
    3. The data is grouped to prevent duplicates when multiple opportunities for the linked person are present.
  3. Id: Next it retrieves id data along with data on the id name from the id name domain table and places it in a temporary export table, ordered by id name’s display order ascending, modification date descending, and id guid for each person, where the id record is associated with a current opportunity and was created prior to the cutoff.
    1. Id name domain information retrieved is the display order.
    2. The data is grouped to prevent duplicates when multiple opportunities for the linked person are present.
  4. Activity: Next it retrieves activity data along with activity contact and places it in the activity warehouse table, ordered by modification date descending and activity guid (for records with actual end dates and subjects which link directly to an opportunity) or unordered (for the other records), where the activity record is associated with a current opportunity and was created prior to the cutoff.
    1. Each type of activity is retrieved by attachment to opportunity and then by attachment to person.
      1. All activities retrieved have person join data, but only some also have opportunity join data; the opportunity join data always takes priority.
    2. All of the data retrieved is data common to all activities. The types of activities aren’t retrieved individually, but they are defined in the CRM database as:
      1. Appointment
      2. ServiceAppointment
      3. Email
      4. Letter
      5. Fax
      6. PhoneCall
      7. Task
    3. Each activity table is joined to the FilteredStatusMap table to retrieve data only available from filtered views.
    4. The activity data is joined to the base export table to limit it to current records.
    5. First data is inserted for records which exist in CRM but not in the warehouse.
    6. Next data is updated for records which exist in both places with different modification dates or different associated opportunities.
    7. Next data is deleted for records which exist in the warehouse but not in CRM.
    8. Finally the warehouse ordering is updated to reflect the changes.
  5. Award: Next it retrieves award data which is changed from the existing warehoused data and places it in the award warehouse table, ordered by modification date descending and award guid for each opportunity, where the award record is associated with a current opportunity and was created prior to the cutoff.
    1. The award table is joined to the FilteredStatusMap table to retrieve data only available from filtered views.
    2. The award data is joined to the base export table to limit it to current records.
    3. First data is inserted for records which exist in CRM but not in the warehouse.
    4. Next data is updated for records which exist in both places with different modification dates or different associated opportunities.
    5. Next data is deleted for records which exist in the warehouse but not in CRM.
    6. Finally the warehouse ordering is updated to reflect the changes.
  6. Interest: Next it retrieves interest data which is changed from the existing warehoused data, and places it in the interests warehouse table, ordered by modification date and interest guid for each opportunity, where the interest record is associated with a current opportunity and was created prior to the cutoff.
    1. Interests are limited to records with a state code of 1 (active).
    2. The interest table is joined to the FilteredStatusMap table to retrieve data only available from filtered views.
    3. The interest data is joined to the base export table to limit it to current records.
    4. First data is inserted for records which exist in CRM but not in the warehouse.
    5. Next data is updated for records which exist in both places with different modification dates or different associated opportunities.
    6. Next data is deleted for records which exist in the warehouse but not in CRM.
    7. Finally the warehouse ordering is updated to reflect the changes.
  7. Person-Organization/High School: Next it retrieves person-organization data along with data on the organizations from the account table which is changed from the existing warehoused data, and places it in the person-organization warehouse table unordered, where the person-organization record is associated with a current opportunity and was created prior to the cutoff.
    1. The person-organization data is retrieved with a DISTINCT limitation to prevent duplicates.
    2. The person-organization table is joined to the base export table.
      1. This limits the person-organization data to persons who have opportunities; related persons' data is not warehoused unless those relations have their own opportunities.
      2. This excludes person-organization data for deleted opportunities which have not been marked for person-organization deletion.
      3. This limits it to current records.
    3. First data is inserted for records which exist in CRM but not in the warehouse.
    4. Next data is updated for records which exist in both places with different modification dates or different associated opportunities.
    5. Next data is deleted for records which exist in the warehouse but not in CRM.
    6. This data is not ordered because it is not needed for "top N records" export.
  8. Score: Next it clears the score warehouse table, retrieves (unfiltered) score data along with data on the status category from the (unfiltered) score source domain table, and places it in the score warehouse table unordered, where the score record is associated with a current opportunity and was created prior to the cutoff.
    1. Score source domain information retrieved is the official source flag.
    2. The score table is joined to the FilteredStatusMap table to retrieve data only available from filtered views.
    3. The score data is joined to the base export table to limit it to current records.
    4. First data is inserted for records which exist in CRM but not in the warehouse.
    5. Next data is updated for records which exist in both places with different modification dates or different associated opportunities.
    6. Next data is deleted for records which exist in the warehouse but not in CRM.
    7. This data is not ordered because it is not used directly for "top N records" export; the score export transforms this data in a temporary export table before ordering it.
  9. Status Change: Next it retrieves status change data along with data on the status category from the status category domain table, and places it in the status change warehouse table unordered, where the status change record is associated with a current opportunity and was created prior to the cutoff.
    1. Status category domain information retrieved is the display order.
    2. The status change table is joined to the FilteredStatusMap table to retrieve data only available from filtered views.
    3. The status change data is joined to the base export table to limit it to current records.
    4. First data is inserted for records which exist in CRM but not in the warehouse.
    5. Next data is updated for records which exist in both places with different modification dates or different associated opportunities.
    6. Next data is deleted for records which exist in the warehouse but not in CRM.
    7. This data is not ordered because it is not needed for "top N records" export.
  10. Campaign Response: Next it retrieves campaign response data along with data on the campaign jobs from the Exact Target job table, on the email status from StringMap (functionally a domain table), and flattened duplicate email data from the Export table, and places it in the campaign response warehouse table unordered, where the campaign response record is associated with a current opportunity and was created prior to the cutoff.
    1. Campaign information retrieved is campaign job name.
    2. Email status information retrieves is the text of the email status.
    3. Email data retrieves is all three person email addresses.
    4. The campaign response table is joined to the FilteredStatusMap table to retrieve data only available from filtered views.
    5. The campaign response data is joined to the base export table to limit it to current records.
    6. First data is inserted for records which exist in CRM but not in the warehouse.
    7. Next data is updated for records which exist in both places with different modification dates or different associated opportunities.
    8. Next data is deleted for records which exist in the warehouse but not in CRM.
    9. This data is not ordered because it is not needed for "top N records" export.

3. Transform - Flatten export scores from analytics scores
Edit section

The data in the warehouse scores table is used to populate the temporary export scores table, which is the only temporary export table not directly filled by extraction from the CRM database.

First the main scores – the ones that aren’t subrecords of another score – are copied into the temporary export scores table.

  • Because the same score results can be reported in the CRM database multiple times, the scores are grouped by person, test name, and test date, and only the most recently modified from an official source is used.
  • Within these results, the scores are sorted (with the "ord" field) in order of their test dates, falling back on modification date for tests on the same date (which, because of the previous grouping, will always be different tests).

Next, up to five subscores are added to each temporary export score record.

  • These are pivoted to become columns in the temporary export scores table instead of separate records as they’re stored in the CRM database and in the warehouse scores table.
  • The subscores are selected alphabetically ascending, and are not duplicated.
  • The subscore values are transformed to integers, or to NULL if they are zero.

The flattened scores are not moved into the export table in this step; the scores fields in the export table are updated in the next step with the rest of the export subrecords.

4. Transform - Parse subrecord data for exports
Edit section

Next the job copies the data from the warehouse or temporary export tables into the export table using the “ordinal” field on the warehouse tables or the "ord" field on the temporary export tables to select the "top N records".

  • The single active person-org record is used. Person-Org joins on the person id.
    • Only one person-org record is active.
    • Person-Org modification dates (with times) are transformed into strings.
    • The transcript received and last attended dates (without times) are transformed into strings.
  • The top 2 relationship records are used. Relationship joins on the person id.
    • From step 2, relationships were ordered by modification date.
  • The top 8 race records are used. Race joins on the person id.
    • From step 2, races were ordered by race name (alphabetically descending).
    • Only the race value is used for the standard export.
  • The top 5 id records are used. Id joins on the person id.
    • From step 2, ids were ordered by id name’s display order ascending, modification date descending, and id guid.
  • The top 8 award records are used. Award joins on the opportunity id.
    • From step 2, awards were ordered by modification date descending and award guid.
    • Award modification dates (with times) are transformed into strings.
    • The first (most recent) award modification dates is also used as the financial aid modification date in the export table and is transformed (with time) into a string.
  • The top 8 activity records are used. Activity joins on the opportunity id.
    • From step 2, activities were ordered by actual end date descending and activity guid.
    • From step 2, activities without actual end dates or without subjects are not used for exports.
    • From step 2, only activities tied to opportunities (as opposed to persons) are used for exports.
    • Activity values are truncated at nvarchar(100).
    • Activity dates (without times) are transformed into strings.
  • The top 8 interests are used. Interest joins on the opportunity id.
    • From step 2, interests were ordered modification date then interested guid.
  • The top 4 scores are used. Score joins on the person id.
    • From step 3, scores were ordered by test date then modification date.
    • Score test dates (without times) are transformed into strings.

5. Transform - Move slow-changing data from export tables to analytic tables
Edit section

This is the final extraction step; once this step completes, all of the data has been extracted to the warehouse and placed in the proper warehouse tables. The steps after this in the job are related to other features.

Next the job uses the export table as a staging table from which to update the AnalyticsMaster table.

As in the Extract 1 job's financial aid step, this step defines the date "Yesterday" as a date with no time to be used as the effective date of new records. It treats the previous local day as yesterday only if the local time is prior to 10AM, after which the day shifts forward by one; this is to allow extractions run during the working day or in the evening to register the day’s data.

If any data is already in the AnalyticsMaster or StaffAssignment warehouse tables with an effective date of "Yesterday", that data is deleted so that this run's data can replace it. (Presumably, if an extract must be run twice in one day, the first run was in error.) The IsCurrent flag on the AnalyticsMaster and StaffAssignment tables is reset if data is deleted, since it would be possible for changes to be reverted such that new data would not be added to those tables to replace the deleted data as the most current.

The AnalyticsMaster warehouse table is updated from the export table with new and changed "slowly-changing" data. This data is not expected to be changed often, which keeps this table from growing too rapidly.

  • Records in the AnalyticsMaster table are never updated to reflect changes in the CRM database; instead, new AnalyticsMaster records are created so that old and new states can be tracked using the effective date.
  • The effective date of the new record is set to "Yesterday".
  • New records are created with the current flag set on.
  • Data is considered new or changed based on a comparison to the existing data in the AnalyticsMaster table:
    • The join is on the opportunity id, which will never be changed for the existence of the opportunity.
    • Only current warehouse records are considered for comparison. This is done to be sure that records which are changed and then changed back to their previous state have both changes recorded.
    • High Score is compared separately, because it is not stored as a data type which can be included in a calculated checksum column.
    • The calculcated checksum column is a fast method of checking the other columns.
    • Because a checksum can give false equality on changed records in a small number of cases, the other columns are also checked individually.
  • The AnalyticsMaster table is initially updated with all slowly-changing data which was stored in the export table; the race data is also considered slow-changing, but it is added to the AnalyticsMaster table from a temporary export table later.
  • Because the data is drawn from the export table, it is limited to opportunities with active entry terms and associated person records, or to opportunities which had active entry terms and have been updated to have inactive entry terms.
    • Unless the definition on an active entry term changes, this will be the final update to this opportunity record.

The current flag on AnalyticsMaster records is then updated for records which are no longer current because changed records have been added.

Opportunities with active entry terms which have been deleted from the CRM database are copied into the OpportunityPerson_Deleted table with the a deletion date of yesterday; this provides an audit trail of opportunity deletion. Then these records are deleted from the AnalyticsMaster table.

  • Opportunities with active entry terms are defined by using a join to the export table.
  • At this point, records are deleted from the financial aid warehouse table if they correspond to deleted opportunity records.

Staff assignment is not a separate subrecord in the CRM database, but it is recorded in its own warehouse table in the warehouse database. (Previously, this data was recorded in the AnalyticsMaster table, but it is changed often enough that there are notable storage size benefits for most clients in recording this data separately.) The staff assignment table joins on the opportunity id and the AnalyticsMaster effective date.

  • Records in the StaffAssignment table are never updated to reflect changes in the CRM database; instead, new StaffAssignment records are created so that old and new states can be compared using the effective date.
  • The effective date of the new record is set to “Yesterday”.
  • New records are created with the current flag set on.
  • Data is considered new or changed based on a comparison to the existing data in the StaffAssignment table:
  • The join is on the opportunity id.
  • Only current records are considered for comparison. This is done to be sure that records which are changed and then changed back to their previous state have both changes recorded.
  • The only field compared is the StaffAssigned field.
  • The staff assignment table maintains an “is current” flag to distinguish current data from historical data for the same opportunity.
  • Records are deleted from the staff assignment table if they correspond to deleted opportunity records.

Race is a separate subrecord in the CRM database, but it is considered slow-changing data and stored in the AnalyticsMaster table in the warehouse database.

The same race temporary export table is used for the AnalyticsMaster update as was used in step 4 to update the export table. However, the AnalyticsMaster table additionally holds the race category which the export table does not hold.

  • The top 8 race records are used. Race joins on the person id.
    • From step 2, races were ordered by race name (alphabetically descending).
  • The race temporary export table joins on the person id and the AnalyticsMaster effective date.

6. Transform – Move data into local sync tables 
Edit section

This step is executed for the Advanced Export feature.

Next the export job incrementally updates the sync tables with any changes. The sync tables hold only the current records from their associated tables, so that this selection of data can be synchronized for Advanced Export.

  1. The OpportunityPerson_Local sync table is synchronized with the AnalyticsMaster warehouse table.
    1. Only records where IsCurrent = 1 should exist in the sync table.
    2. This definition of “current” is not limited by entry term; all records are used.
    3. Any new opportunities which have been added to the warehouse table are inserted in the sync table.
    4. Any existing opportunities with new effective dates are updated in the sync table.
    5. Any deleted opportunities are removed from the sync table.
  2. The StaffAssignment_Local sync table is synchronized with the StaffAssignment warehouse table.
    1. Only records where IsCurrent = 1 should exist in the sync table.
    2. Only records associated with opportunities in the OpportunityPerson_Local sync table should exist in the StaffAssignment_Local sync table.
    3. If an opportunity in the OpportunityPerson_Local sync table does not have a staff assignment, the staff assignment is inserted in the sync table.
    4. If an opportunity in the OpportunityPerson_Local sync table has a staff assignment with a different effective date, the staff assignment is updated in the sync table.
    5. If a staff assignment in the sync table is associated with an opportunity which is no longer in the OpportunityPerson_Local sync table, the staff assignment is deleted from the sync table.
  3. The FinAidProfile_Local sync table is synchronized with the FinAidProfile warehouse table.
    1. Only records where IsCurrent = 1 should exist in the sync table.
    2. Only records associated with opportunities in the OpportunityPerson_Local sync table should exist in the FinAidProfile _Local sync table.
    3. If an opportunity in the OpportunityPerson_Local sync table does not have a fin aid profile, the fin aid profile is inserted in the sync table.
    4. If an opportunity in the OpportunityPerson_Local sync table has a fin aid profile with a different effective date, the fin aid profile is updated in the sync table.
    5. If a fin aid profile in the sync table is associated with an opportunity which is no longer in the OpportunityPerson_Local sync table, the fin aid profile is deleted from the sync table.

7. Cleanup – Remove opportunities with inactive entry terms from export table 
Edit section

This step is executed for the Standard Export feature.

Next the records in the export table which have non-current entry terms are deleted. These records were added in step 2 of the Extract 1 job in order to update the warehouse tables, but they should not be exported to clients.

The opportunity data is compared to the entry term staging table. The opportunities deleted are limited to those with entry terms not listed in the staging table.

8. Cleanup - Truncate export sorting tables 
Edit section

This step is executed for the Standard Export feature.

At this point the temporary export tables are no longer needed and are truncated.

9. Export - Create CSV file for export 
Edit section

This step is executed for the Standard Export feature.

The job uses the "[Org]Xport.dtsx" package to create a CSV export file in the export folder. (See export documentation for more detail.)

The package is executed by command line. This produces a CSV file on the ftp server.

  • The package location for all clients is \\422rpt001\SSISPackages\
  • The ftp location for each client is \\422biz001\ftp\[organization name]\

Note: Some clients may have custom packages or nonstandard ftp locations.

10. Export - Create Zip file for export 
Edit section

This step is executed for the Standard Export feature.

The job uses WinZip to zip the CSV export file. (See export documentation for more detail.)

The WinZip application is called by command line. The CSV file is zipped into the same export folder.

11. Export - Delete csv 
Edit section

This step is executed for the Standard Export feature.

Next the CSV file is deleted from the export folder. (See export documentation for more detail.)

12. Analyze - Run SPSS task 
Edit section

This step is executed for the Descriptive analytics feature.

Next the job triggers the Extract 3 job (see the next section). The Extract 3 job is triggered asynchronously and this step does not wait to see if the Extract 3 job succeeds before proceeding. (See analytics documentation for more detail.)

14. Succeed - Send email 
Edit section

15. Fail - Send email 
Edit section

16. Cancel - Send email 
Edit section

If there have been no errors, the job sends a success email for the export. This email is sent to the client and bcced to the general support email.

The job sends cancellation email only if the verification in the first step fails. Failure in any other step causes a failure email. This is to distinguish in the email between potentially unreported errors in the Extract 1 job (which would cause the verification to fail) and errors which occur in the Extract 2 job itself. These emails are send to the DBA.

Job "Extract 3 – Analytics – Org" 
Edit section

This job exists on the report server and it is not scheduled to run; it runs only when called by the Extract 2 job. It calls the analytics application with client-specific command-line parameters.

This job is expected to take between 12 and 25 minutes per client.

The first step of this job, which calls the analytics application, will be retried 4 times at 20 minute intervals if it does not succeed. The analytics application has been known to time out and to return an error when called during a heavy load time on the reporting server. In the old process, 4 retries at 20 minute intervals has been enough to run the application successfully after a timeout.

If the analytics application returns no errors on its first attempt or on any subsequent retrials, the job sends a success email for the analytics to the general support email.

This job sends email if it fails on all retrials to the DBA and the administrator of the SPSS application.

See the analytics documentation for more information on this process.

Comments