Dynamics CRM is a great product for enterprises that want to efficiently track and optimise all interactions with their customers. Often there is a need to perform a particular operation on a batch of records. For example, in a customer-loyalty scenario, one might want to automatically re-evaluate a customer’s gold-star status on a nightly interval. In another scenario, one might want to recalculated an account balance based on the transactions of the previous day. In this post I will outline a robust design that lets a workflow be executed for an unlimited amount of records.
When working in CRM Online, we as CRM developers, vendors and partners have no control over the maximum allowed time that server side custom code components are allowed to execute within CRM. One of these limitations is that any custom workflow activity can run no longer than 2 minutes.
Next to the time limitation, there is also a concept of a depth limitation to prevent the system from containing malfunctioning code and ending up in an infinite loop. This limitation means that custom code calling itself or other custom code components get to do so at most 7 times every hour (for a maximum depth of 8).
These limitation makes the processing of large volumes of records from within custom code in CRM quite challenging.
The Microsoft Dynamics CRM 2015 Asynchronous Batch Process Solution is a good attempt at a framework to perform scheduled operations on a collection of records defined by a Fetch XML query. However, due to the aforementioned limitations of CRM, it is not a robust framework. We had implemented this solution at one of my client’s projects in CRM Online. It worked fine for low volumes of records, but we were only able to process 3000 records before CRM timed out and the batch job failed.
To overcome this problem, the client is usually presented with an external solution that processes these high volumes of records outside of CRM. In this article, I will present a design that will allow CRM to process a virtually unbounded amount of records based on a Fetch XML query.
Note, however, that CRM is not meant to act as batch processing system and thus it is still the responsibility of the system’s architect to determine if CRM is to be made responsible for the operation, or that an external system would be the best solution. In this article, we will assume that no more than 50.000 records will be scheduled in any given batch. This conveniently coincides with CRM’s maximum for the count aggregation. At the end of the article I will outline how to increase this number.
In this article, I’m extending the concepts used in the aforementioned Microsoft Dynamics CRM 2015 Asynchronous Batch Process Solution. This solution generally lets users schedule Workflow Processes in the following manner:
The user configures a batch of processes to be executed (i.e., configures a record of entity Batch). To achieve this the following fields are configured (some details are not relevant and are omitted):
- A target workflow. This is the workflow that will be executed for every record that is scheduled in the batch.
- An periodic interval at which the target workflow is to be executed on all records in the batch.
- A valid Fetch XML query. This query is executed at the periodic interval and the result set defines all the target records for which the target workflow is executed.
- Other details, like whether the batch is currently suspended or active, previous result sets, etc.
Upon scheduling a batch, a background process is automatically started. This process in turn starts the batch at the user configured time and interval. This triggers a custom workflow activity. (Further details of the implementation of this process are not relevant.)
The custom workflow activity reads the batch record configuration and uses the Fetch XML query to retrieve a list of target records for which to start. It then continues to fire an ExecuteWorkflowRequest message for the target workflow on all the target records.
In the environment that we used to test this solution, we had managed to start between 2000 and 3000 workflows before the workflow activity was timed out by the sandbox. Depending on your environment, this could potentially be a larger volume (or smaller!).
Regardless, this is not a trustworthy scheduler unless you are absolutely sure that the size of your scheduled batch never exceeds a low volume of target records.
Counting the scheduled records in the batch
To prevent the sandbox from timing out our scheduler, we will have to build a pipeline of sub-batch records that will perform the batch activation in an isolated manner.
Based on the resources available in your CRM organization you will have to chose an artificial limit of number of process activations that can safely be started from a custom workflow activity before hitting the 2 minute time-out limit. We had found that in our organization a limit of 250 process activations was a very conservative sub-batch size and we decided to go with that.
At the time of the scheduled start, rather than immediately starting the activation of target workflow processes, the scheduling framework first transforms the user inputted Fetch XML query to an aggregate count query. This can easily be achieved by adding the correct attributes to the Fetch XML document in c# code in combination with entity metadata. (I will discuss CRM’s 50k aggregate count limitation in a later section.)
After the amount of target records has been retrieved, the amount of sub-batches that we will create to split up the work can now be calculated as
[Total record count] / [sub-batch limit], rounded up to the nearest integer.
Say that we have a scenario where there are 13.433 records to be scheduled in an overnight batch. This would yield
Ceil (13433 / 250) = 54 sub-batches.
The initial workflow activity, rather than activating workflows for all of these 13422 records, will create 54 sequentially numbered sub-batches that are linked to each other in a pipeline manner.
The sub-batch entity
The sub-batches will contain the following details:
- Lookup to the parent batch
- Lookup to next sub-batch in the pipeline, or null when current record is the tail
- Page number (‘order’), starting at number 1 for records 1-250, number 2 for records 251-500, etc.
- Paging cookie (text field). Must be null upon creation of the sub-batch record.
- A status reason with set of statuses, including ‘Waiting’, ‘Processing’, ‘Completed’ and ‘No More Results’
- Lookup to a batch status record. This record can contain details about the progress of the entire batch, successes/failures etc. Details omitted for this article.
The worker process
A new worker workflow process is introduced that performs the workflow activations for the sub-batch. This worker process calls a custom workflow activity. We will discuss trigger conditions for this workflow in the next section.
When the current sub-batch has page number 1, the fetch query will be retrieved from the related batch configuration. Using c# code, inject the correct paging attributes to this Fetch XML definition.
For example, for batch number 1 and sub-batch size 250, this will be: count=250 page=1.
After the workflow activity has retrieved the entire page and subsequently activated the target process for all target records, the workflow activity will output 2 important arguments:
- the returned paging cookie resulting from the execution of the Fetch XML
- the value indicating if there are more records to be retrieved
The worker process checks the value of the outputted arguments. If there are no more records, and if the value of next sub-batch is not null, then the next sub-batch will be updated with a status of ‘No More Results’ (and this will in turn propagate throughout the pipeline).
If there are more records to be returned, and the next sub-batch does not equal null, then the worker process copies the outputted paging cookie to the designated field of the sub-batch that is next in the pipeline.
Overcoming the depth level restriction – the trigger for the worker workflow process
It is of utter importance that the trigger for the worker workflow process is the creation of the sub-batch record. This will make sure that the workflow processes are not killed by the platform due to iterative call depth.
The consequence of this trigger is that the worker process will need a wait condition, since none of the sub-batches have the correct paging cookie yet.
The waiting condition for the worker process is (OR):
- Page Number == 1, OR
- Paging Cookie is not null, OR
- Status is updated with ‘No More Results’
Since we are working with asynchronous processes and we need to guarantee the next sub-batch record is created before the worker process finishes its batch (to propagate the paging cookie to the next sub-batch and start its worker process), it is required that we create the batches in reverse order.
Sub-batch 54 is created, then sub-batch 53, 52, …, 2, 1.
All sub-batch worker processes that are now started due to the creation of these sub-batches go into a waiting state, except the worker process for the first sub-batch. This is due to the value of Page Number being 1.
This first sub-batch will run the fetch query for the first page without a paging cookie. Its paging cookie is propagated to the second sub-batch, triggering its worker process to start. Then sub-batch 3, 4, etc.
After the last sub-batch has finished execution, the entire batch can be updated as completed.
Extending the maximum size of scheduled records in a batch beyond 50k
It is easy to extend the maximum amount of records included within any scheduled batch to more than CRM’s 50k aggregate count limit. Simply execute the fetch aggregate, and when the result returns the dreaded 50.000 maximum count, this will need to be overridden in code to the new limit, say 100.000.
Rather than creating a pipeline of sub-batches based on the actual aggregate record count result, the amount sub-batches that will be created is based on the configured maximum of 100.000. In a scenario where the sub-batch size is equal to 250, a pipeline with a total of 400 sub-batch records will be created when the aggregate count result equals 50.000.
To minimise wasted resources, a new status reason of No More Results will need to be added on the sub-batch entity. When the paged fetch query is returned with a MoreRecords status of false, the next sub-batch in the pipeline will need to be updated with this status. The initial waiting condition of the sub-batch workflow can then be triggered to continue operation when a sub-batch is updated with a status reason of No More Results. If the workflow has bee restarted due to this new status, it will propagate this status to the next batch in the row and terminate immediately afterwards.
If the result set had somehow been modified during the execution of the batch, for example, when page 53 of the Fetch XML query is retrieved with More Record = False, then the next sub-batch will also be updated with a status of No More Results.
Similar to propagating a No More Record status throughout the entire pipeline of waiting sub-batches, it is recommended to propagate any error states with an error status. This will require the custom workflow activity to output any unrecoverable error states to the sub-batch workflow.
Propagating this error state to all subsequent sub-batches in the pipeline will prevent the buildup over time of a large amount of workflows sitting idle in a waiting condition that will never be fulfilled.
With a proper design adhering to the fundamentals described in this article, it is very well possible to overcome the limitations imposed on custom workflows and code in Dynamics CRM Online and still manage to execute workflow processes on a large volume of records. However, every Dynamics CRM solution is unique and careful thoughts have to be given what system or solution best suits the task at hand.
This article is written by Tomas Schulkes. The design concepts can be freely used in any and all commercial and non-commercial software. The entire text and individual parts of the text is copyright by Tomas Schulkes and cannot be reproduced without my consent.
567 total views, 3 views today