In an earlier post, I summarized the business benefit and overall performance improvement we achieved in a data conversion program that loaded data into a new Microsoft CRM system. I have received several questions asking for more detail on the problem and the techniques used. This post provides some additional detail.
Physical Architecture and the search for bottlenecks
There were several different servers relevant to the data conversion process as shown below.
The Conversion Server is the one that ran the multi-threaded conversion process.
Our initial effort to improve performance involved looking
at all of these servers and their connections, looking for obvious bottlenecks.
During initial operation (without multi-threading), none of these servers
appeared to be limited by CPU, memory, disk speed, or network latency. We found
this puzzling. With such slow performance, we expected to see an obvious
bottleneck in one of those primary resources.
What was clear was that the latency from the time we issued
a web services request to the CRM server until the time it completed was high. But,
we did not know why. We imagined that
either the web services call was chatty with the database or chatty with
itself. We never did additional research
to identify exactly why because we tried an experiment in parallelism that
worked. And in business, after you reach a solution that is good enough, you
Opportunities for Parallelism
Once we decided to explore parallelism, we identified the
following essential sequences in the main program.
These were opportunities to do things in parallel. Any sequential behavior shown above was unavoidable given the module design, which was already implemented at that time. As an example of something that forced sequential operation: we had to first create new customer records in one module before we created the relationships among those customers in the next module.
We used “task parallelism” as described in http://msdn.microsoft.com/en-us/library/dd537609.aspx to achieve parallel processing as shown above.
Additional opportunities for data parallelism
In addition to the coarse-grained parallelism that was used for the main program, there were additional opportunities for parallelism within each module.
Each module typically process hundreds of thousands of records using the exact same algorithm for each record. This was a wonderful opportunity to use data parallelism as described in http://msdn.microsoft.com/en-us/library/dd537608.aspx. We used Parallel.ForeEach quite effectively.
Overlapping HTTP requests
The two previous illustrations demonstrate the use of task parallelism and data parallelism patterns to perform multiple operations in parallel. Since our data conversion program ultimately writes requests to the CRM Web services, using Parallel
processing produces overlapping HTTP requests. This is exactly what we were hoping for. Because the target servers were not constrained in their primary resources (CPU, Memory, Disk Speed, Network Latency), we expected them to be able to process simultaneous requests without much problem. In fact, this is exactly what happened. The servers were perfectly happy processing several simultaneous requests up to a certain point. We tuned our process to limit the parallelism to a point that resulted in maximum overall throughput. The Microsoft task parallel Library provides settings that allow the programmer to easily configure and constrain the amount of parallelism that is attempted.
The diagram below illustrates the simultaneous HTTP requests that resulted from the use of parallel threading in the data conversion process.
When we first used parallelism to issue overlapping HTTP requests to the CRM web services, we experienced socket exhaustion on the server. I wrote an earlier post on how we resolved that. Our failure to resolve that sooner delayed our adoption of parallel techniques.
Another obstacle is set of issues that surround multi-threaded programming. Despite the incredibly clean and simple design of the task parallel Library, you still have to understand the core principles of parallel processing and the dangers involved in processing shared state. Occasionally during our adoption of parallelism, we discovered obstacles related to shared state. One example is the fact that the Linq-to-SQL libraries, or the entity framework libraries, use DBContext; and DBContext is not thread safe. This typically arose when we were reading input records. We easily overcame this problem in our circumstance by simply reading all the records into a thread safe data collection in memory. This worked well in our case because we never had more than 1 million records in our input tables, and modern memory capacities can handle that with ease.
Another obstacle we encountered was related to direct SQL operations. Some of the update operations we performed in parallel overwhelmed the SQL Server and resulted in deadlocks. We never solved these problems. We identified the related code as expendable and we simply removed it. It would have been intellectually satisfying to have solved this particular problem. But the adventure was not justified by the business priorities.
These are the techniques that we used to achieve significant performance improvements in our data conversion process. These achievements are noteworthy because they represent real improvements in a complicated real-world scenario. It is easy to reate demonstration programs that show off the purported advantages of parallel processing. It is a bit more challenging to take advantage of those techniques in the wild.