Mismatched MTUs on iSCSI uplinks cause VMware ESXi server to hang

Over the last few days I’ve been having an issue with a pair of ESXi 5.0 servers in a clustered pair. All appeared to be working normally then one day vMotion stopped working and the progress would hang at 9%. Furthermore ESXi itself would then stop responding to vCenter requests.

Eventually the vMotion would timeout and the host would come back online (or you can kick it via ssh and restart the hostd service). Looking at the events I could see that the system has lost contact with the iSCSI storage during the hang.

The problem was intermittent, occasionally it would work then break again which made it harder to diagnose.

The problem in the end was caused by a mismatch in the MTU sizes on the iSCSI uplinks.

I had created a vSwitch for iSCSI and added two port groups then assigned one uplink per port group. Unfortunately one of the port groups was set at MTU = 1500 and the other at MTU = 9000. This configuration was on both ESXi servers.

Once I set the MTUs all to 9000 the problem went away.

Tagged , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: