Microsoft Azure Virtual Machines in future will remain active when host reboots for planned maintenance

Currently virtual machines running on Microsoft Azure require a reboot when the Azure host hypervisor is rebooted after software patches and upgrade are applied.

For single instance application this implies downtime. 

Microsoft is working on technology which has a “rebootless” hypervisor so virtual machines will remain running on the host during patching. There is no live migration to another host involved. 

By offering this new feature Microsoft Azure is extending from being a PaaS platform meant for cloudaware apps to a much broader platform supporting traditional legacy enterprise applications which rely on availability features of the infrastructure.

While Microsoft did not release a releasedate this new feature will likely become available in 2015. 



Microsoft Azure is a best effort cloud or hyperscale cloud. This means the infrastructure (hosts up to the hypervisor) has a limited number of availability features. Hosts do not have redundant components and hypervisor does not offer live migration. This is by design to keep costs down and keep it simple. By reducing technical complexity less can go wrong and issues can be resolved quicker. The Keep it Simple principle of design.

This architecture of Azure requires applications to have built in redundancy. There should be no single point of failure in an application running on Azure.

While greenfield developed applicaties offer this redudancy by using multiple nodes for the same role (multiple webservers, applications servers, databases servers) traditional , legacy enterprise applications are not designed for the type of platform like Azure.

These applications have single points of failure. Think about that critical fileserver of CRM server running on-premises.

No SLA for single instance

Currently Microsoft does not offer a SLA for single instance virtual machines. Customer are required to have at least two virtual machines part of the same availability set to ‘get’  a SLA from Microsoft.

The reason for this is because of planned maintenance and unplanned downtime.

Microsoft uses Hyper-V as the hypervisor for Azure. Just like Windows Server on-prem, Azure hosts require frequent software updates. At least once a month but likely more frequent Azure hosts receive software updates. And thereafter the Azure host will reboot.

Until now virtual machines running on that host will reboot as well. If such a virtual machine does not have at least another node running on another host, the application will be unavailable during the time of rebooting the host and virtual machines.

Microsoft is working on a solution to prevent downtime due to planned maintenance. This solution will not be Live Migration. Live Migration will consume too much precious East-West network bandwidth. This capacity is needed for regular server to server communications. Think about his: there are over
1 .000.00  Azure hosts in a single datacenter. These need regular patching. There will be constantly Live Migration network traffic. The soon to be available G-serie virtual machine will offer up to 448GB  of internal memory. That is a lot of bytes to tranfer over a wire!

Instead Microsoft is working on a technology which will ensure that virtual machines remain operational on hosts which are in the process of software patching. No details are known at the moment but Microsoft is quite far in the process of creating this new cool technology. It could be similar to live update technologies available for Linux. KernelCare is an example of a no-reboot solution.

It is likely Microsoft will then also offer a SLA for single instance virtual machines.

More details will be provided once they become available.



Add a Comment

Your email address will not be published. Required fields are marked *

Current ye@r *