Microsoft announced the release of Windows Server 2012 R2 at Microsoft TechEd event on June 3. Windows Server 2012 R2 is expected to be released end of 2013 and will offer many new features. In a future post I will highlight those features.
One of the interesting new features is Storage Quality of Service . This important feature enables administrators to set a minimum and maximum number of IOPS per virtual hard disk of a virtual machine. The settings can be applied when the VM is running and are activated directly. I understand there will be no support for Storage QoS in SCVMM2012 R2.
Why is this an important feature? Because by far the most performance related issues on server virtualization are caused by storage.
This post will discuss the current VMware vSphere feature for storage throttling with the to-be released Microsoft feature.
Storage Quality of Service is an important feature especially in multi-tenant infrastructures. It can restrict disk throughput for less important virtual machines which consume a lot of disk IO. By consuming lots of disk IO, other more important virtual machines might not get the IOPS they need to perform well. Quality of Service makes it possible to manage those Noisy Neighbor VM’s and guarantee Service Level Agreements.
I have seen an example of a noisy neighbor at one of my customers site. They managed a multi-tenant infrastructure running on Windows Server 2008 Hyper-V. One of their customers was using an application which once per month processed a lot of invoices. That particular application consumed many IOPS which caused serious performance issues for applications in use by other tenants. As Hyper-V currently does not have any way to throttle IOPS there was no other solution than to upgrade the SAN to have faster and more disks. Quite an expensive way to solve an issue.
Until the release of Windows Server 2012 R2 there is another solution to throttle disk throughput; Melio FS. I wrote about this in this post.
So how does Microsoft Storage QoS compare to VMware vSphere storage control feature?
vSphere has two mechanism to control IOPS consumption by virtual machines.
1. Hard limit of IOPS set per VM.
2. Dynamic limit with priority in case of congestion. This feature is called Storage IO Control.
IOPS consumed per virtual machine/virtual disk can be set using the Limit IOPS setting. The setting is not as logical as you would think. See the VMware KB for an explanation.
Storage IO Control
VMware has been offering Storage Quality of Service since the release of vSphere 4.1 end of 2010. VMware has spent many hours on the development of Storage IO Control which is quite complex stuff when done right.
Storage IO Control (SIOC) is only available in the most expensive Enterprise Plus edition of vSphere. SIOC dynamically throttles disk io per virtual machine/virtual disk. So it is not a matter of disk io is restricted or not. Nor is it a matter of restricting access to a fixed number of IOPS.
SIOC constantly monitors disk latency to SIOC enabled datastores. As soon as latency becomes higher than the threshold (default 30ms) SIOC kicks in. This mechanism can be best compared to highway congestion control.
Outside rush-hours it does not make sense to restrict speed or reserve a separate lane for car-pooling. All capacity is available for all types of traffic. At rush-hour you want congestion control to get the max out the capacity of the highway. So car-pool lane is opened and maybe a speed restriction is enabled or flow control on ramps using traffic lights.
Each virtual machine virtual disk can get shares assigned by IT-management. Based on the value of shares set on virtual disks, virtual disks with a higher number of shares get more IOPS than a virtual disk with lower number of shares when SIOC kicks in.
Throttling is not done per ESX- host but per datastore. So total IOPS capacity of a datastore is divided over the virtual disks active on that datastore based on the disk shares set.
In the next diagram, Storage I/O Control has been enabled. In this case, during times of contention, all virtual machines accessing a specific datastore will have I/O access limited by their respective share value, rather than per host. As you can see, VM003 is now limited by its 1000 share value.
There are different types of datastores because the disks used are different. One datastore can have SSD only, the other can have SATA disk. So the threshold which indicates congestion is different depending on disks used.
SIOC in vSphere 5.1 can automatically determine the latency threshold per datastore. This is a kind of profiling done by a process under the hood. So there is no need for administrators to experiment with the correct latency threshold.
SIOC supports both blocklevel and filelevel storage (NFS)
SIOC is used for input by Storage DRS. Storage DRS is a load balancing method for virtual disk files. If capacity or performance of a datastore reaches a threshold, virtual machine virtual hard disk files are live migrated to another datastore.
Details on Windows Server 2012 R2 Storage Quality of Service are scarce at the moment. Maybe in session MDC-B345 – Windows Server 2012 Hyper-V Storage Performance to be presented at June 5 we will learn more on Microsoft offering. I am very curious if Storage QoS is a static setting or if QoS is done dynamically. I do know VMware SIOC is a very mature and advanced mechanism to control disk consumption.
So far the new Hyper-V feature looks like the static Limit IOPS feature of vSphere.
The screen dump below shows where Quality of Service for virtual disks is configured in Windows Server 2012 R2.
The image below shows configuration screen of VMware vSphere virtual disk shares