Scheduled Workflow Troubleshooting

Note: This document references functionality that may not be available for all users. Content is managed in Confluence and copied here for the Support Team. 

  • A rule didn't run when I expected it to.
     
    • Check the Scheduled Workflow FAQ below for a detailed explanation of common symptoms and resolutions regarding the timing of scheduled workflows.

  • No rules are running for an organization.
     
    • First, check to make sure that workflows have their “Scheduled” flag set to yes (this is the first column when looking at the Scheduled Workflows menu); after changing this flag wait at least 25 minutes to see if the Scheduled Workflow runs. 

    • Check the “Errors” section of the scheduled workflow to get more specifics on the error message.  Also, make a note of both the “Include These Records” and “Exclude These Records” views assigned and try to run these views within Advanced Find.  If you receive an error running the view then more than likely the Scheduled Workflow will fail as well.  Within the view check the column sort order (click on “Edit Columns” then “Configure Sorting”).  A lot of the views are sorting by Due Date in ascending order - changing the sort order to descending improves performance of the view and can therefore affect the performance of the Scheduled Workflow.  Make sure to look at both the “include” and “exclude” views for the Scheduled Workflow and change the sort order on both views.

  • A rule is putting duplicate records on an outreach list.
     
    • This should be a rare occurrence, but it can occur if the same rule is started multiple times for the same student and both workflow records are executed at the same time.  The system would normally prevent the duplicate record from being added to the outreach list, but if the workflows run at the same time then it certainly is possible that each process would not have visibility to the duplicate record. 

    • There is also a SQL job in place to remove any duplicate records on an outreach list; if a lot of duplicates exist on an outreach list then it’s possible that the job isn’t running or is failing for some reason.

  • What reports are available to assist with SWF?   
     
    • There are a couple of reports in RNLD that show cumulative information regarding scheduled workflows.

    • Workflow Overview Report
       
      • The “Workflows by Organization” table displays all organizations that have run 1,000 or more workflows in the past day; the data is aggregated by the status code (Succeeded, Canceled, Failed, etc.) below the organization name. 
      • The “Failed Workflows” table displays failed workflows for all organizations over the past day.
      • The “Waiting Workflows” table displays organizations with 5 or more workflows with a status of “Waiting” (again over the past day).
      • The “Scheduled Workflow Check” table displays all scheduled workflows that are set to run every day and are essentially overdue (they should have executed within past day and haven’t yet). 
      • The “Unprocessed Email” table displays all triggered emails for all organizations that haven’t been sent yet and are overdue by 36 hours or more. 
    •  

      • Workflow Overview Report 002 
         
        • The “Workflows by Organization” table displays all organizations that have run 1,000 or more workflows in the past 1, 2, or 3 day(s) based on the “Previous Days” report parameter.  The data is aggregated by the status code (Succeeded, Canceled, Failed, etc.) below the organization name.   
        • The “Failed Workflows” table displays failed workflows for all organizations over the past 1, 2, or 3 day(s) based on the “Previous Days” report parameter.
        • The “Waiting Workflows” table displays organizations with 5 or more workflows with a status of “Waiting” over the past 1, 2, or 3 day(s) based on the “Previous Days” report parameter.
        • The “Scheduled Workflow Check” table displays all scheduled workflows that are overdue along with a possible reason as to why the workflow hasn’t completed (SQL Timeout, Workflow Not Active, etc.). 
        • The “Unprocessed Email” table displays all triggered emails for all organizations that haven’t been sent yet and are overdue by 36 hours or more.
        • The “Outreach List Posting Failures” table shows all outreach lists for all organizations where there posting of list members hasn’t completed.  


    Scheduled Workflow FAQ

  • Can you clarify how the Run Order field is used? I see that is set to 100 in some databases but I’m not sure why.
    • Run order is used if SWF rules need to be run in a particular order, for instance adding opportunities to an outreach list in one rule and sending email to that list in another rule. All rules without run orders run before any rule with a run order. If a school has all/most of its rules set to a run order of 100, that most likely means that at some point it had a rule that needed to be run before any other rules. This has been used to "force out" a high priority rule to ensure that it runs first when the next processing starts.
    • Multiple rules can have the same run order. All rules with the same run order will be run in the same order each time the rules run, but this order cannot be predicted in advance. If the order matters, set the run orders to different values.
    • The run order does not have to be sequential; missing values are allowed. If a long series of rules need to run in a particular order, setting their values at 10, 20, 30, etc. will allow an additional rule to be inserted anywhere into the sequence without having to reset the values for all the rules in the sequence.
    • Short, frequent rules should have their Run Order set to low values and long-running rules set to higher values to ensure they complete before being blocked by a lengthier rules.
  • Clarification on the frequency field-- though we can set the frequency to more often than every 1 hour, I think the process only cycle through every 1 hour. Is that right?
    • The process is currently set to cycle through every 25 minutes.  This is a global setting for all clients on a given “stack”.  All scheduled rules set to run will be run, and then the service will “sleep” for 25 minutes.
    • No rule can run more than once per processing cycle.
    • The frequency field is only the maximum frequency with which the rule will run. A single long-running rule can delay processing of other rules until it completes, so a rule scheduled to run every hour may not run for several hours while blocked this way.
  • What happens if a rule takes a long time to run?
    • A long-running rule will block any other rule for that client from running until the first rule finishes.
    • Rules for other clients will not be blocked.
    • Rules that that were scheduled to run before the current processing cycle started will be run as soon as the long-running rule completes. 
    • Rules that were scheduled to run after the current cycle started, but before the long-running rule finishes, will not run until the next processing cycle. 
      • Example: If “Rule A” is scheduled to run at 8AM, it will not be picked up for processing for a cycle starting at 7:30AM.  If that cycle takes three hours to complete, it will end at 10:30AM without processing “Rule A”, sleep for 25 minutes, and then start processing again at 11:55AM, this time processing “Rule A”.
    • If the rule takes longer than its frequency, the next run date will be set to the next occurrence of the frequency and Start Date/Time. 
      • Example: If an hourly rule with a Start Date of ‘January 1, 2014 9:35AM’ starts at 5:35PM and doesn’t complete until 7:42PM, the Next Run Date will have the time set to 8:35PM, not 6:35PM.  Next Run Date will never be set to the past by the SWF service, though it may be set to the past manually.
  • What do I clear to ensure the SWF runs on the next pass?
    • If you clear the "Next Run Date" or update it to a value in the past, the rule will run the next time processing starts. Note that the current processing cycle has to complete and the next process begin before this change takes place.
  • What field(s) do I set if I want this SWF to run at 8am every day?
    • Set the time of the Start Date to 8AM.
    • Set the Frequency to "Run Every 24 hours"
    • This will result in the rule being processed in the first service cycle to occur after 8AM.  This may be delayed by up to 25 minutes by the normal “sleep” part of the cycle, or longer if blocked by a long-running rule.
    • If a rule is delayed for whatever reason, the Next Run Date will be set 8AM of the next day, regardless of when the rule last ran. 
  • How can the next run on date be in the past? We sometimes see this.
    • In ascending order of concern:
    • The rule is set to Scheduled = No or the End Date has passed.
    • The Next Run Date was set to a past value to force it to run in the next processing cycle and the next processing cycle has not started.
    • The rule is currently running.
    • Another rule is currently running, preventing subsequent rules from being started.
    • The rule's frequency has not been set correctly. (Check that both frequency fields have values.)
    • The rule failed the last time it attempted to run. (Check the Errors tab for information.)
    • The SWF service is using a version prior to 1.5.2014.0907. (This was the case for Stack 2 until 12/3/2014.)
    • The SWF service is off/not functioning.
Comments