I was bitten very hard for something that is quite hidden and took a while to figure it out.
Moved an application from Windows 2002 to Windows 2003, and in the moving process, services that where on the same machine moved to different machines, which in turn means that they would use the LAN to communicate between them instead of the loopback adapter.
Well just imagine an application running on top of the line servers (8 CPU’s, 10GB RAM) running slower than a six year old server…
The main clue that something was blocking the performance was that there was no CPU load. So the application was not processing anything, just waiting…
Or friendly tool http://www.wireshark.org/ started to bring some light on the issue. It seems that most of the times between packets namely the between the PSH/ACK and the answer took around 200ms. Again processing the application log files show a lot of processing time under 10ms and a “bump” around the 200ms mark.
To make a long story short… Despite of most of Windows server deployments being made on high speed networks on enterprise LAN segments, Windows 2003 has by default an TCP/IP algorithm called nagle active so it can save bandwidth on slow WAN links!!!!!
What does it mean? It means that Windows will reply right away with an ACK packet if it has data to transmit back, so it piggybacks the ACK with the data saving packets. If it has no data it times-out after 200ms and then sends the lone ACK packet… Meanwhile the 1/4s as passed by. Small number theory says that a lot of small numbers added up give on big number, hence seconds of pure slowness.
For all your LAN throughput can be used by your applications you need to switch off the 200ms delay:
After the above documented change, well the application just caught fire and ran like a rocket.
Update: Also check: http://support.microsoft.com/kb/898468
Update2: Also check: http://support.microsoft.com/kb/948496 -> This will disable some “features”…