Hyper-V 2012 R2 virtual machines lose randomly network connections . Be carefull with Emulex NICs! New driver expected in July

<update June 24>
Hyper-V Program Manager Ben Armstrong made a blogpost about this issue titled Hyper-V Network Connectivity Issues with Emulex Adapters

He is blogging about this for two reasons:

  1. I have been contacted by a number of customers who have hit this, and want people to know about what to do.

  2. It is really good to see a hardware vendor documenting status and workarounds for known issues, and I am glad to see this post going up.

<update June 20> 
Hans Vredevoort of Hyper-V.nu wrote a blog about the conference call he had with Emulex. Emulex was unaware of this issue for a long time. It seems HP did not inform Emulex about the issue. When Emulex was aware they could n0t reproduce the issue at first, and then found other issues as well.
His blog here: Additional Background on the VMQ Issue with Emulex and HP

<update June 19>

Emulex posted a blog on the Emulex website explaining the issue described in my post. The workaround is to disable VMQ. We knew that one for a couple of months.

The good news is that a new driver and firmware is expected to be released in Mid July.

See the blog here. 

————————

<update June 17>

I have been in touch with the CEO of Emulex about this issue. He stated “My team is very aware of this and while you may not have been provided the update you deserve, the issue has not been ignored.   I know the team has been very engaged with HP and MSFT on this ”

Lets hope there is some progress on resolving the issue

————————

Virtual machines running on Windows Server 2012 R2 Hyper-V could randomly lose their network connection. The only workaround to restore network connectivity is to perform a Live Migration of the affected VM to another host or to reboot the Hyper-V host.

To do this some ITpro’s wrote scripts which pings all VMs and if no response is received a Live Migation is performed.

In many cases the issue is seen on Emulex NIC’s in HP Gen8 blades on which Windows Server 2012  R2 with Hyper-V is installed.

The problem seems to be related to the number of VMQ’s available in the network interface. If the number of netadapters/ virtual nics in VM exceeds the number of VMQ’s available, some  virtual machines will lose network connectivity.

I found an explaination of how VMQ works here. For much more indepth details about VMQ see this blogpost and the earlier linked in the article.

 

Since the emulex netadapters have 16 VMQ slots total, the first 4 slots are taken up by the host OS. The first of the 4 is supposed to be “special” (i’ll get back to that in a bit). The other 3 are regular adapters. The next 12 are regular VM adapters. Each guest VM is assigned one VMQ slot out of 16.

4 + 12 = 16; all VMQ slots are assigned.

When the 13th VM tries to get a VMQ slot, it fails to receive one.

What’s supposed to happen, is the hypervisor is supposed to just start sharing it’s “first” slot (the special one), with any additional VM’s that can’t get VMQ slots (or any that have VMQ disabled).

What actually happens, on the emulex or broadcom adapters, is that the guest OS simply fails to allocate a VMQ slot, and fails to get any network connectivity at all. It can not talk to the host OS (even if it’s on the same VLAN and not communicating through the physical ports).

Basically, the Emulex and Broadcom give you exactly the VMQ slots avaialable and the “fail-over” technology of failing back to vRSS-like queues for the other VM’s simply fails to work, and any VM that wasn’t issued a direct VMQ fails to communicate.

The intel drivers correctly share the first VMQ slot with any additional VM’s. It ends up with higher-than-normal CPU usage on the first core, but that’s no different than how Windows Server 2008 R2 (or 2012 R2 with vRSS networking) works anyway.

I understand the  Emulex adapters currently support up to 30 VMQ on Windows Server 2012.

The workaround which works for many people is to disable VMQ on the nic by using this command

get-netadapter | disable-netadaptervmq

This blogpost by Ben Gelens describes the same issue. Ben solved it by disabling the  Virtual Machine Queue (VMQ) on just the management nic.

The issue is described at Hyper-V.nu and at aidanfinn.com . It is also reported on the Microsoft TechNet forum.

It seems to occur mostly when Emulex network interface cards are used. These are for example used in HP (HP 554M , HP 554FLB, 554FLC adapters use the Emulex chipset) , Broadcom NetXtreme 57xx NICs, and IBM servers. Especially the 10 GbE cards are suspect for this issue.

Emulex driver versions 10.0.430.570 ,  10.0.430.1003 and 10.0.430.1047 all seem to suffer from this issue. Some information on Emulex adapters in a Hyper-V environment using RSS and VMQ.

Also NICs of Broadcom and Intel are reported having this issue but likely less frequent.

It seems that virtual machines which handle a lot of network traffic are more affected by this issue than virtual machines which do not handle a lot of network traffic.

The probolem is experienced by many people.

There is at the moment no solution but waiting for Emulex to release a new driver.

Some other advises which I found on various sites and might or might not help. There are other network issues reported as well on various blogs. Some servers get a BSOD but this could possibly be resolved by using a Microsoft hotfix

  1. disable encapsulated packet task offload per Disable-NetAdapterEncapsulatedPacketTaskOffload cmdlet
  2. Disabled Large Send Offload v2
  3. Set-NetOffloadGlobalSetting -TaskOffload Disabled

 

StarWind seminar on Virtual SAN for Hyper-V

 

Max Craft, Solution Engineer, StarWind Software, Inc.      LIVE WEBINAR on Thursday, May 15
10:00 AM EDT
FEATURED SPEAKER:
Max Craft
, Solution Engineer, StarWind Software, Inc.

Uptime is crucial for businesses of all shapes and sizes. Traditionally, it takes large investments to ensure uninterrupted workplace productivity.
Join StarWind’s FREE webinar to find out how our Virtual SAN for Hyper-V solves the uptime issue without breaking the bank:

  • Minimalistic Hardware Footprint. Only two hypervisor nodes are needed: no SAN, no NAS, no SAS JBOD, no voting third node, and no Ethernet switches. As there’s less hardware to buy and maintain, both CapEx and OpEx are reduced, resulting in improved ROI.
  • Simplicity. Neither advanced SAN management skills nor special training is required. StarWind’s Virtual SAN is a native Windows application that’s easy to setup and manage for any Hyper-V administrator.
  • Uncompromised Performance. A less expensive solution that brings even better performance.

 

Register for Webinar   Click here to register for this webinar

disclaimer:

This is a sponsored blogposting.

Windows Server 2012 R2 Update KB2919355 causing failed backups of Hyper-V VM’s

After installation of Windows Server 2012 R2 Update KB2919355, backups of Hyper-V VM’s fail. The error is

 

Error: Client error: The system cannot find the file specified

This error is not caused by the backup software nor the storage used. It is a bug in the Microsoft update. Veeam is curently working on a workaround which is expected to be released this Tuesday.

So be carefull installing this Microsoft update.

More info on the Veeam forum.

StarWind V8 Release Candidate Is Available Now!

StarWind Software released the Release Candidate of  StarWind V8. The functionality and architecture of this software is similar to VMware VSAN. It can be installed on both Hyper-V and ESXi and converts local storage to a distributed shared storage.

General availability is expected in Q2 2014

starwind

Download the RC here.

It offers three products that share a common engine:

  • StarWind SAN – a unique system that allows you to create a high-performance and fault-tolerant shared storage on commodity hardware, saving $$$.
  • StarWind VSAN – a virtual SAN sharing a server with Hyper-V on parent partition. Convenient and as easy to manage as any Windows software.
  • StarWind VSA – a virtual storage appliance that merges with VMware and runs in a VM, making your system more efficient and convenient to use.

It is scalable to an unlimited number of nodes, optimized for flash as the primary storage, with its own LSFS and upgraded asynchronous replication.

StarWind SAN, the flagship product, runs on commodity servers’ hardware and converts them into high-performance, fault-tolerant SAN at a fraction of the cost and complexity associated with traditional SAN-based storage infrastructures. With its integrated VSA builder and scale-out architecture, StarWind SAN became the first solution running on Windows that creates clusters of multiple hypervisor hosts without separate physical shared storage (unlike SAS (JBOD), FC or iSCSI). With this release, StarWind supports the latest versions of both VMware and Microsoft platforms.

StarWind virtual SAN (VSAN) is a leading software defined storage for Microsoft Windows Server 2012 R2 and Microsoft Hyper-V 3.0. The functionality and architecture of this software is similar to VMware VSAN. It runs inside parent partitions of hypervisor hosts and turns their directly attached storage (DAS) into fault-tolerant SAN. StarWind VSA follows the VMware VSA concept also, and operates inside a VM on the ESXi host, creating a fault-tolerant shared storage of the hypervisor resources.

StarWind V8 RC presents high-performance caching that allows affordable MLC SSDs to be used as level-2 flash cache and fast RAM to be used as level-1 cache. StarWind utilizes solid-state disks / flash cards as the primary storage with a dedicated flash-aware Log-Structured File System (LSFS).

Another major advantage over multiple virtual storage solutions is built-in asynchronous WAN-replication, which ensures effective disaster recovery. StarWind’s support of WAN-acceleration technologies also enables data replication over slow and cheap connections.

StarWind V8 RC includes a set of powerful experimental features including: LSFS, which eliminates the I/O blender effect; synchronous replication for clustered LSFS devices; and in-memory operation, which creates a HA cluster with one node running on RAM-disk.

FreeBSD 10.0 release has built in Hyper-V support

FreeBSD 10.0 release is the first release which has  native paravirtualized drivers and Hyper-V Integration Services built in the amd64 kernel. In previous releases additional software had to be installed to be able to run FreeBSD on Hyper-V.

Mind these are part of the amd64 GENERIC kernel. For i386 some manual steps needs to be taken.

For instructions see the FreeBSD Wiki page The info below was copied from the Wiki page.

As of FreeBSD 10.0, Hyper-V integration services provide the following functionality:

  1. Support for integrated shutdown from Hyper-V console.
  2. Support for keeping time synchronized between FreeBSD guest and Hyper-V host.
  3. Support for Hyper-V specific IDE and SCSI storage devices.
  4. Support for Hyper-V specific network adapter.
  5. Live migration with and without static IP migration. Note that to enable static IP migration, administrators will need to include the KVP driver and daemon available in FreeBSD 10.0 ports for Hyper-V