Update: see this post on a solution. Thanks Adam for sharing!
One of my customers requested me to have a look at their Microsoft Hyper-V Server 2012 infrastructure. They had a strange problem with the HP BL490c Gen 9 blades. The blades are equipped with a 536FLB Flex 10 network interface card based on a QLogic 57840S chipset.
The Hyper-V hosts are deployed using the bare-metal deployment in System Center Virtual Machine Manager 2012 R2.
The customer has 6 blades in a C7000 enclosure. Blade 1 could not ping to Blade 2. However blade 1 could perfectly ping to Blade 3,4,5 and 6. All blades are in the same IPv4 subnet. Another issue observed on a few other blades was that the ping would time out a few times. So ping worked, then it did not work for a couple of pings and then network was okay again.
The customer has tried many things to solve this issue:
- placed the blades in a different HP C7000 enclosure.
- placed the blades in different slots of the enclosure
- swapped the HP switches
- installed full Windows Server 2012 R2 GUI.
- installed Hyper-V Server 2012 instead of R2
- disabled checksum offloading , LSO, RSS, RSC and VMQ
- called HP. HP was not able to solve this issue.
This all did not solve the issue. The customer was using the latest drivers for the adapter.
HP Virtual Connect 4.31 dated 2014-10-24
HP FlexFabric 10Gb 2 port 536FLB adapter drivers 7.10.39
HP software package 4.01.12
I deciced to start troubleshooting this from scratch by manually installing Hyper-V 2012 R2. The blade which initially could not ping to Blade 2 now was able to ping. Hmm, wondering why.
Next I installed Hyper-V 2012 R2 manually (no network configuration) and then manage it using SCVMM. SCVMM configured the networking of the host. Still the host was able to ping all other hosts in the same enclosure.
So for some reason when the customer is using the bare-metal deployment from SCVMM something goes wrong with networking.We are not sure what exacty. It could be related to BIOS changes. It seems that as soon as the BIOS of the blade is changed, the next bare-metal deployment will result in network issues.
I will update this post when we know more about the cause of this issue.