a web server, running IIS,
Hosting the web services for Microsoft CRM 4.0,
and a console application that runs a conversion program, which tries to update thousands of records per second through the CRM web service,
we run the conversion application at its fastest possible speed,
After only 26 seconds, we see failures in the web services calls of type System.Net.Sockets.SocketException.
If we put a deliberate delay of 100ms in the loop, we are able to process without errors, but at a reduced speed.
Exhaustion of available ports (socket addresses).
Disable HTTP Keep Alive setting on the web server that serves the web services.
for IIS 6
for IIS 7
With HTTP Keep Alive disabled on the web services server, we have been able to achieve processing rates against the Microsoft CRM 4.0 web services at rates from 1,000 – 5,000 records / minute. Before this change, we could only achieve rates of 200 – 500 records / minute. We had to use parallel programming to achieve the higher rates, but in the context of this post, the point is that we could operate at higher transaction rates without socket errors. Before we disabled HTTP Keep Alive, we did not even consider parallelism, as we were overloading the server and exhausting sockets with simple sequential processing.
Certain web applications require the use of HTTP Keep Alives to operate. In general, web applications usually work most efficiently with this setting enabled. Microsoft CRM 4.0 requires HTTP Keep Alives to be enabled to function. So our solution required us to setup a separate Microsoft CRM 4.0 platform server, where we could disable the HTTP Keep Alive. We targeted this separate server for our conversion processes. This left the primary CRM server unaltered, so it could continue to serve web requests from users using their web browsers.
Variations tried and not tried:
I tried altering the authentication mechanism while HTTP Keep Alive was enabled. No combination of authentication prevented socket errors.
Two articles from the references below suggested I leave the server HTTP Keep Alive enabled but disable it for my client calls by altering the web request
I did not test this because we had already established a second server, dedicated to handling the data conversion web requests, and disabled HTTP Keep Alive on that server. This second server provided additional opportunities for boosting performance, because we used it’s processing power exclusively to handle the data conversion calls, and left the main server to handle workflow processing. So we stuck with that model.
I found plenty of articles on the socket error. But finding one that led me to a solution was difficult. That is why I’m posting this. Here are some references.
Trying to understand all this information is another challenge. I don’t pretend to understand the underlying mechanisms at work here. If someone understands this better, feel free to comment. I just know what works and what does not.
Tools to diagnose:
on the web server, you can run
netstat -an >connections.txt
and examine the number of ports waiting. In this scenario, the number of ports in use is enormous.
In a sample I ran, there were over 3,900 ports in TIME_WAIT state. Here is a tiny sample of what is in the connections.txt file, the output of netstat -an. But honestly, I don’t know what this is telling me. I mean, when HTTP Keep Alives are enabled, and we are getting socket errors, the list actually appears to be smaller than when it is running at high speed. So I don’t know what to suggest here.
Active Connections Proto Local Address Foreign Address State TCP 0.0.0.0:80 0.0.0.0:0 LISTENING TCP 0.0.0.0:135 0.0.0.0:0 LISTENING TCP 0.0.0.0:445 0.0.0.0:0 LISTENING TCP 0.0.0.0:3389 0.0.0.0:0 LISTENING TCP 0.0.0.0:47001 0.0.0.0:0 LISTENING TCP 0.0.0.0:49152 0.0.0.0:0 LISTENING TCP 0.0.0.0:49153 0.0.0.0:0 LISTENING TCP 0.0.0.0:49154 0.0.0.0:0 LISTENING TCP 0.0.0.0:49157 0.0.0.0:0 LISTENING TCP 0.0.0.0:55902 0.0.0.0:0 LISTENING TCP 192.168.95.99:80 192.168.95.136:1026 TIME_WAIT TCP 192.168.95.99:80 192.168.95.136:1027 TIME_WAIT TCP 192.168.95.99:80 192.168.95.136:1028 TIME_WAIT TCP 192.168.95.99:80 192.168.95.136:1029 TIME_WAIT
Sample socket error:
Event Type: Error Event Source: <whatever> Event Category: None Event ID: 0 Date: 5/30/2011 Time: 9:23:12 AM User: N/A Computer: <web server computer> Description: System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: Only one usage of each socket address (protocol/network address/port) is normally permitted 192.168.95.99:80 at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress) at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Int32 timeout, Exception& exception) --- End of inner exception stack trace --- at System.Net.HttpWebRequest.GetRequestStream(TransportContext& context) at System.Net.HttpWebRequest.GetRequestStream() at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object parameters) at Microsoft.Crm.SdkTypeProxy.CrmService.Execute(Request Request) at <our custom code called the stack above>