Desktop virtualization (VDI) gets a lot of attraction. Its advantages over physical desktops lure many organizations to a VDI-implementation. Reasons to start using VDI are :
- ease of management
- reduce the cost of desktop upgrades
- improve security.
However many VDI-projects fail for the same reason: bad performance and costs going up unexpectedly during the project. While the proof of concept delivers fine performance, soon after adding more users performance get worse to unacceptable. The only solution is ask for more budget to solve the issues.
This VMware video titled Storage Considerations for Desktop Virtualization has a lot of tips for storage design in VDI deployments. This VMware post titled Storage Optimization with VMware Horizon View has information on new VMware View 5.2 enhancements for storage.
So storage cost and performance had been a key obstacle to wide scale VDI deployments. In fact, today virtual desktops are still a very small percentage of overall enterprise desktops, somewhere below 3 per cent.
This post will show solutions which will bring costs down and deliver more than enough performance.
Some of the vendors mentioned in this post are in random order:
GreenBytes, Virsto, Nimble Storage, Tintri, WHIPTAIL, Nexenta, Nutanix, Atlantis
The challenge or problem
VDI has many advantages over traditional pc’s and laptops. However there are challenges as well:
Costs for the VDI-project should not be underestimated. Around 40 to 50% of the costs for VDI will be on storage. Purchase costs per VM for just storage range between $100 to $ 300 per VM.
Different licensing models are used: charge for the device, charge per capacity, charge per VDI seat. An OPEX/ pay per use model makes adoption of VDI into organizations much easier because of the low purchase costs.
This post will focus on the performance challenge and how vendors are solving this.
Many do not realize desktop virtualization has different and higher demands on the storage layer than a regular virtual server. Small, random write io requests push traditional harddisk storage to the limit. On average VDI has 20-40% reads and 60-80% writes during production. At boot time of the virtual desktop there will be mostly reads.
Think about what happens in the morning when staff comes in for work and log in to their virtual desktop. This sudden demand for IOPS is called a boot storm. Another IO consumer is anti-virus scans especially when traditional anti-virus agents are used in each virtual desktop VM. A more innovative approach is using VMware vSphere Endpoint protection which is part of all vSphere 5.1 editions. Read the whitepaper Project VRC made on the impact of anti-virus on VDI.
The different IO patterns at the same time initiated from virtual desktops towards the storage layer is named by many as the IO Blender effect.
Calculation showing how many cores, internal memory, disk capacity and IOPS you need when using VMware View can be done using this calculator.
Stress tests can be performed using the Login VSI tool. A free license for vExpert, MVP and CTP can be obtained here,
For VMware Partners only the View Planner software is available. This is a load generat0r for VMware View. More information on the View Planner here.
Organization starting a to use virtual desktops most likely are already using a legacy, enterprise type of storage array from vendors like HP, Dell, IBM, HDS, EMC for their virtualized servers.
Legacy enterprise storage array’s are likely not be able to deliver the requested amount of particulary write io’s when VDI is used at a large scale.
A lot of innovation is going on in the storage industry to solve this. Storage vendors are all focussed at accelerating performance and reducing costs. One of the hurdles to take to get VDI mainstream is the costs involved. Not only for storage but also for licenses.
This posting will give a high level overview of solutions which claim to solve performance and capacity issues for VDI deployments.These solutions can also speed up Citrix XenApp or Microsoft RDS deployments where traditional enterprise storage cannot cope.
I did my best to mention the well known and innovative solutions on the market. Maybe I missed a few smaller players in this crowded market.
Techniques to solve the problem
They solve the performance problem using a single of mix of the technologies listed below:
- Using SSD or Flash memory in storage array or PCIe card local in server
- Using hypervisor host memory
- Optimizing IO Traffic by serializing random writes.
- Optimizing filesystem
- Bring storage close to the CPU and memory: converged storage
The solutions solve the high costs involved by using SSD and/or Flash by:
- using de-duplication and compression
- extend the life expectancy of Flash memory (STEC CellCare)
- using a multi tier approach (Flash and harddisk in cache or tier configuration )
General overview of solutions
The market for VDI storage acceleration and reduction of cost per GB is very crowded. Lots of solutions which have their own approach and different price tags. Each and every vendor wants to profit from the increased interest in VDI. I guess Gartner will soon publish some sort of magic quadrant for VDI-scenario storage.
VDI storage accelerator solutions are available as software or as hardware.
First the hardware appliances. Those are purpose built. Hardware based solutions have a premium pricetag and result in a vendor lock-in. Mostly flash storage is used either in a storage device or on a PCIe card inside the server. In all cases the traditional storage array can still be used for storage of lazy data: files, databases, userprofiles etc. VDI virtual disk files are stored on the purpose built storage.
There are also software based solutions. These are either appliances running as a virtual machine on a hypervisor host.
Also hypervisor vendors are adding features into the core of the hypervisor to cache blocks of data. This only benefits for read while random writes is the bottleneck in VDI.
Most solutions are strongly integrated with VMware. They use smart software which integrates with VMware View , vCenter Server and the storage subsystem.
Some of the solutions (mostly the ones using storage hardware) can be used with Hyper-V and Citrix XenServer or other hypervisors.
Solutions to accelerate performance for VDI specifically can be categorized as follows:
- Legacy storage based
- SSD/Flash based hardware appliances
- Hybrid hardware appliances
- Converged storage and compute
- Server side flash cards
- Storage hypervisor /software appliances
- Hypervisor based solutions
Legacy enterprise storage arrays
Vendor of legacy enterprise storage like EMC, HP etc are developing new ways to cope with demand for new workloads like virtual desktops. They offer flash and SSD into their arrays. Data which needs high performance is automatically transfered between slow and fast tiers of storage. Non-volatile RAM is used as a read and write cache. HP introduced in December 2012 the all SSD array 3PAR StoreServ 7000.
Purpose built SSD/Flash based storage (all flash arrays or AFA)
These solutions are based on the dedicated use memory for storage. The difference between SSD and Flash is that SSD is memory in a case and an interface and Flash is just memory. To keep the cost of GB per $ low most appliances are using intelligent software which does de-duplication and compression.
Type of memory used in these devices difffer. Some use DRAM to deliver ultra performance.
Examples of all flash array vendors are Pure Storage , GreenBytes, WHIPTAIL , Nimbus Data Systems , Kaminario, Skyera and XtremIO (EMC) . These solutions offer a guaranted performance as all data is stored on flash. There is no spinning disk to be found in these devices. A use case would be real time trading or many virtual desktops. However the purchasing costs are much higher than of hybrid storage appliances. All flash based storage is about 15 times more expensive than disk based storage. The consumption of power is much less then when harddisks are used.
Some vendors wrote a blog comparing features. Pure Storage initially started a blog in which solutions of Pure Storage, EMC, Violin Memory and IBM are compared. Some All Flash Array vendors like Nimbus, HDS and HP were left out. So Calvin Zito of HP created a blog in which the HP 3PAR 7450 is listed as well.
A pretty complete overview of Flash arrays has been made by Chris Evans.
Pure Storage is a new vendor started in August 2011. They position their solution as a general purpose storage that happens to excel and differentiate itself when applied to VDI environments. Pure Storage’s solution for space efficiency includes global, always-on, in-line deduplication. It scored a View Planner score of 0.52 achieved for both linked clone and persistent desktops.
The Pure Storage VDI Reference Architecture is a detailed technical document which describes how to configure a VDI deployment for 1,000+ virtual desktops using VMware View 5. Available here.
WHIPTAIL is using less expensive Mult level cell (MLC ) instead of the Single level cell (SLC). MLC have a less Endurance/reliability than SLC however. Whiptail is using NVRAM as a write cache. A certain number of random IOs are cached and then writen sequentially to SSD. This enhances performance and also prolongs the live of SSDs. More info here.
NetApp is rumoured to be releasing an all Flash array named NetApp Flash Array.
A purpuse buil array for VDI acceleration is GreenBytes IO Offload Engine. GreenBytes initially sold a hybrid appliance named Solidarity. However in a crowed market for hybrid storage solutions the company decided to discontinue the Solidarity but focus on VDI acceleration and capacity efficiency. The result is the IO Offload Engine. This is a purpose built appliance for just one use case: accelerate VDI and deliver efficienty in capacity. It’s appliance can be used together with existing legacy storage of the customer. Currenty in use storage can be used for user data and application data (“lazy data”). The GreenBytes delivers ‘the rocketfuel’ needed to power VDI (golden images clones etc).
The device is installed with just flash memory and targeted at enterprises. Because of strong deduplication and compression the costs per VDI are relative low. The inline deduplication does not have a negative impact on performance due to patented techniques used.
Starting with a single IO Offload Engine capable of 1000+ full clone virtual desktops, the appliance can be upgraded to enable up to 4500 users.
Costs for 1500 full clones IOOE device are around Euro 80,000
Texas Memory Systems RamSan and Violin Memory are ultra-performance and $$$$ solutions which are not likely to be cost-effective for VDI deployments.
Because of the strong de-duplication which is done inline (live) the costs of storage are reduced considerately. A Fusio-io 400GB PCI-e flash card will deliver 7TB of useable storage. The image below shows the management console of the vIOOE and the storage savings.
DCIG has made an objective, free downloadable overview titled ‘DCIG 2013 Flash Memory Storage Array Buyer’s Guide‘ listing 32 all Flash arrays. It is a very comprehensive report and gives the reader a great insight in features and costs.
More information on the Buyer’s Guide here.
Hybrid storage appliances
These appliances are installed with both flash and spinning disk capacity.Two appraoches to accellerate IO are being used: use flash for caching (data is in cache for a short time) or use tiering (most used data remains in flash for days).
Flash is being used for both a write buffer and read cache. Harddisk is used for storage needing less performance but capacity.
Examples are Tintri , GreenBytes, NexGen Storage , Whiptail, Nimble Storage , Tegile Systems and X-IO Storage.
Tintri uses Solid State Flash drives and SATA drives to provide a very high IO and yet efficient capacity combination. 99% of all data is serviced in Flash disk, then the blocks are de-duplicated and compressed before being written to SATA disks. Management is done per virtual machine. It is closely integrated with VMware and storage management is done at the VM level instead of lun or volume level. The Tintri VMstore file system is designed from the ground-up for VMs. Quality of service can be set per VM.
A Tintri 540 device has eight 3TB disk drives and eight 300GB multi-level cell SSDs providing 26.4TB of raw storage and 13.5TB of usable capacity. Virtually all VM I/Os are serviced from the flash. Costs is around $ 75.000 to $ 80.000. The more units purchased the more discount is given. Tintri claims a customer is running 800 VMs from a single Tintri 540 device.
It scored a 0.79 in the VMware View Planner benchmark.
Nimble Storage has a hybrid approach which solves both read and write io latency. It’s Cache Accelerated Sequential Layout Architecture lays out data by taking random writes coming into the system and writing them on a new location on disk sequentially, making more efficient use of low-cost disk. Meanwhile, data that’s accessed often is put on flash. This is done dynamically and very fast.
The approach to take random io writes and write them sequentially to the storage device pretty much is the same approach of what Virsto does. The difference is that Nimble does it in a hardware appliance and Virsto in a software appliance.
NexGen Storage is using an automated tiering approach to moving data. They are using PCIe cards solid-state and low cost, high capacity disk drives (no SSD) for capacity. This makes it a relative low cost GB per $ solution. It is possible to set a limit on the number of IOPS per volume.
Converged storage and compute
These are servers which have both cpu, memory and optimized storage in the same box. Also known as Datacenter in a Box, Infrastructure converge, hypervconverge and assimilated platform. By placing the storage close to the compute without a storage fabric high performace storage can be reached. Servers are easy scaleable. Add a server and ready to go. The filesysyem of Nutanix for example is distributed just like what Google is using.
These devices are not only used in storage intensive VDI deployments but also in scenario’s like data protection, digital surveillance rich media.
Pivot3 delivers a View in a box appliance. A pre-configured box with all the necessary software, deploys up to 100 virtual desktops in under an hour. Each Pivot3 2U vStac VDI appliance supports up to 100 desktops and includes two 10 GbE ports, 150 GB SSD and 3 TB disk storage and 96 GB of RAM. Pricing starts at $38,500 per appliance
Examples are Nutanix, SimpliVity and Scale Computing
Scott Lowe has written a comprehensive blogpost on VSAN, Nutanix, SimpliVity and Scale Computing here.
Server side flash
Server side flash is based on bringing fast storage closer to the processor. This is implemented by a PCIE card with SSD drives installed in the host running the hypervisor. Examples are EMC with VFCache , Marvell PCIe SSD cards, STEC PCIe Accelerator and Fusion-io. Here a premium has to be paid as well. Fusion-io ioDrive 320 GB costs 8000 euro . Nice overview of Fusio-IO products here.For HP Bladeservers HP has the IO accelerator. The 365 Gb costs $ 12.000
VMware might be adding software to vSphere which makes use of local SSD drives and IO Accelerator cards in hosts. The technology preview is names vFlash. Read about this in this VMware blog.
Storage hypervisor & software appliances using local Flash or SSD
Finally there are the storage hypervisor and software appliances. An example is Virsto Software which was acquired by VMware in February 2013. It offers high performance and thin provisioned disks.
Virsto takes a different approach. Customers can use their existing storage for VDI. Each VMware host runs a virtual appliance which acts as a kind of proxy between virtual desktops and storage. The disk of the traditional SAN storage device are mapped as Raw Device Mappings (RDM). The Virsto appliance then creates a storage pool out of these RDM. On here VM virtual disk files can be stored (the vSpace) and the log is stored (vLog). This is presented to the ESXi hosts as NFS datastore.
Writes are written to a logfile where they’re immediately acknowledged back to the guest VMs. Those writes are then asynchronously de-staged to a shared storage pool. This sequentialization of the writes also reduces write latencies while at the same time improving the throughput of the storage system, as perceived by the guest VMs, by as much as 10x.
Pricing is based on the amount of storage capacity checked in to a vSpace.
Virsto claims that for a 3000 seat VDI deployment you can get by with just 3 TB of SSD (instead of 97TB). Pricing appears to start at $2500 per TB. Mind these are just the software costs, not the costs for storage.
Other examples are Atlantis ILIO (InLine IO Optimalization), IBM SAN Volume Controller, DataCore, FalconStor, NetApp.
GreenBytes introduces a software appliance as alternative to the all Flash array. An all Flash based storage device is expensive to buy and not targeted at the SME market. GreenBytes will therefore release the vIO, a virtual storage appliance version of the desktop virtualization solution, the IO Offload Engine. Release data is March 1, 2013. The software will use Fusion-io cards to store VDI data like OS partition and swap.
The vIO enables IT administrators to increase the performance of a new or existing virtual desktop infrastructure by integrating GreenBytes’ patented zero latency inline deduplication software and a PCIe-based flash card, local SSD or flash-based storage controller with a VMware ESXi host. A single vIO is designed to provide IO Offload for 100 or more persistent or non-persistent virtual desktops.
Pricing will be per VDI user starting with a 100 user license. Packs of 300 and 500 user licenses are available as well. Costs are expected to be around Euro 60,- per user.
Atlantis ILIO is using a large amount of RAM in the hypervisor host to serve disk requests, enabling a VMware server to handle most disk requests from memory rather than from disk, thus creating a 10-fold improvement in performance.
Each host gets a virtual appliance which sits in between the VM’s and the storage array. Atlantis also changes the IO writes so these are written sequencially to the storage device. It does intelligent NTFS i/o traffic processing. ILIO understands what it is processing and storing unlike traditional storage arrays. It can adapt and optimize i/o. Both a SAN or local storage can be used to store virtual desktop files.
The Atlantis ILIO system costs $120 to $150 per user for the software, plus the cost of the additional RAM (the minimum standard configuration calls for about 16GB of RAM, with the number increasing with the number of users to about 32GB for 100 users with multiple OSes).
Atlantis Computing released ILIO Persistent VDI 4.0 at February 26 2012. This product enables the use of persistent virtual desktops. Data is stored in RAM making it very fast.
NexentaVSA for View is another software based solution implemented as a virtual storage appliance. Released in summer 2012 this is a storage vendor agnostic solution closely developed with VMware. VMware used NextentaVSA in it’s handon labs during VMworld 2011. NexentaVSA is installed as a virtual machine on each ESXi hosts. It then discovers any raw disk locally in the ESXi host. It aggregates the disks and presents it as a NFS datastore to the VMs.
Hypervisor based solutions
Hypervisor vendors try to solve the VDI IO problem by reserving a part of the physical memory of the host as a read cache. The hypervisor pre-caches frequently requested blocks of data into physical memory. This enhances performance of read IO. Mind this might solve boot storms but does not address writes.
Examples are VMware vSphere View Storage Accelerator, Citrix Xenserver Intellicache and Microsoft CSV Cache. Read more about these techniques here.
Two typs of SSD-drives are available: consumer grade and enterprise grade. The first is mainly designed for consumer usage. It is a cheaper disk but does have some disadvantages. Read Frank Denneman post here for some benchmarking and remarks on usage of consumer grade SSD.