VMware vVolumes: the game changing future for storage is demoed
At VMworld 2011. VMware Virtual Volumes, vVolumes or vVols for short was previewed which I reviewed in my post at the time. You can now see the original presentation:
VMworld 2011: VSP3205 – VMware vStorage APIs for VM and Application Granular Data Management
Well, another year and another VMworld has passed and it seems the storage game changer is getting closer to seeing the light of day if the recent spate of tech preview demos are anything to go by.
First of all, a recap on what are vVols.
vVols are a completely new storage architecture designed to be the next generation of storage with a few key features:
- Allow management of VM storage to be at the VM level, no longer at the LUN / volume level
- All the storage heavy lifting in terms of moving data, snapshots, replication, deduplication etc. is done at the storage array
- No disruption to existing VM creation workflows
- Massive scale of number of VMs per storage system
There are three main components to the vVol system:
A storage container is the storage that is available physically on your storage array. Now I say physically but this could also be virtually presented storage even on an ESXi host but somewhere somehow it will be back-ended on some physical medium, be it HDD, SSD or hey, maybe even a super fast memory disk. Basically its a chunk of physical storage somewhere. Capacity Pools are logical partitions carved out of these to provide a chunk of storage to your VM Admins. Capacity pools may also span multiple storage arrays even across sites. Now you could have a single capacity pool within your storage container or multiple depending on your requirements if you need some sort of logical separation for say separate tenants or separate VM admins needing their own separate chunk of storage but just simply think of it as a chunk of storage presented to your VM admins.
A protocol endpoint is a way for your ESXi hosts to connect to your storage container(s). This is also called an IO Demultiplexer. Currently you need to decide whether you are going to use block based SAN protocols (FC/iSCSI) or file based NAS protocol NFS to connect to your storage with separate datastores which has been one of the longest standing arguments with VM storage. The problem has always been that the layout of your storage infrastructure has always been dependent on your IO protocol which is crazy. NFS has always allowed great simplicity and flexibility allowing larger datastore sizes but block based protocols allow better IO handling with multipathing. Well, you’ll be happy to know that all this goes away with vVols.
Protocol Endpoints can be either block or NAS but there are no more LUNs and a protocol endpoint isn’t a datastore, its just the IO transport mechanism to get to the storage container so you can use your existing protocol of choice. In fact you can probably use different IO protocols from different hosts to connect to the same storage and have it look exactly the same. I’m not sure how multi-pathing will work if you use NFS, maybe NFS 4.1 is on the road map possibly even pNFS.
The storage provider is how the VM infrastructure and the storage array communicate. This is an out of band communication channel. VASA has been leading up to this. VASA currently allows the storage array to give information to the VM infrastructure such as what types of disks or whether there’s thin-provisioning or deduplication. These are currently really just labels that a storage vendor provides to explain its functionality but currently it allows for SDRS to use the information to move VMs around. Currently the communication is a one-way street but with vVols, the VM infrastructure will be able to talk back to the storage and tell the storage what capabilities a VM requires and the storage will place the VM somewhere which satisfied the capabilities requested.
Once the building blocks are in place we can start to think about the possibilities that vVols can provide.
First of all there is no longer the concept of a datastore which binds VMs together based on storage capabilities and IO connection. The focus shifts to the individual VMs themselves and this becomes the new unit of management of policy.
A VM is made up of a number of different files (config, VMDKs, snapshots etc.) and each of these elements is stored as a separate vVol ultimately stored in a Capacity Pool somewhere. So although the focus is on an individual VM, a vVol doesn’t contain an an entire VM but rather a VM is made up of multiple vVols.
Hopefully just knowing this, your mind is already racing ahead with what this means and the possibilities available.
Thinking ahead I’m taking some educated guesses at what the possibilities are. My speculation may be way off as I’m not a partner so don’t have vVol code to try things out myself but once all this clever storage capability is delivered via software as a policy, the possibilities are very interesting.
If your Storage Container is able to provide disaster recovery in some manner and your Storage Provider is able to tell your VM infrastructure that this is available then think how the communication flow would be. You create a new VM and specify that the VM needs to be made available in a disaster. Talking through the Protocol Endpoint you will now communicate back through the Storage Provider and tell the storage array that you are creating a new VM that needs disaster recovery. The storage array will then create a set of vVols to hold the VM and make sure they are replicated. This won’t have to replicate a whole datastore but just a single VM. How powerful and simple is that!
As you don’t have LUNs you won’t have to rescan your storage and add the replicated VMs to the inventory at your recovery site. I’m hoping the VM at the replicated site will be understood to be a replica of the primary one so something like SRM will properly understand where the VM is active and you won’t have 2 VMs in your inventory which are essentially the same VM, just in different sites. All the VM config and metadata are stored in separate vVols so this should be possible.
At the moment you group VMs together in datastores and you may have some control of the entire datastore IO, ensuring all the VMs in that datastore get the IO they need as a group. This may mean they need to be on separate physical disks or on a different tier of disks but again at a whole pre-created datastore level. With vVols you would be able to specify at VM creation time what IO performance a single VM would require. This would then talk to the storage array which would lay out the VM in such a way to guarantee performance. This may mean the vVols are spread across a number of physical disk spindles to handle the IO required or it may host some of the vVols on a faster tier of storage. This is not the only option. You may have many vVols sharing the same set of disk spindles in a Capacity Pool but you may specify certain VMs are more important than others and the storage array would then prioritise IO for these particular VMs over others backed to the same disks. This wouldn’t require a separate datastore or separate disks just a policy placed on the VM at creation time and the VM infrastructure and storage array will communicate and guarantee the IO requests are satisfied.
Also think of the possibilities for something like VDI or vCD where linked clones are used. As the VM is made up of multiple vVols, the primary disk vVol can be stored in a cached memory/SSD tier for super fast reads and the snapshot files which make up the individual linked clones can be stored on something else.
I’ve already mentioned how disaster recovery could work but what about high availability from the storage layer. Say you specify that a particular VM has to have always on storage, think FT for storage. The VM vVols will be created based on the policy and be automatically mirrored across physical arrays, hey maybe even storage arrays from different vendors. If a whole storage array was to fail your VM would continue running and re-create another shadow copy on another array. Could this even work across sites?
What would be great is if the VM snapshot system and the storage array snapshot system could be integrated and maybe this will be possible with vVols. Having your storage array create separate snapshots which are not available to the VM layer without making a different LUN available is a hassle and confusing. The VM layer should just ask the storage layer to create a snapshot for a particular VM, again based on a policy saying how often snapshots should be created and for how long they should be kept, not for a whole datastore but a particular VM. If you need to restore a VM or make a cloned copy for testing, you just browse the snapshots in the VM inventory (as you can currently do browsing NFS datastores) and then tell the storage array to do whatever it needs to do to either roll back the VM or make a clone. If you need to delete snapshots, the storage array would do the necessary and generate no load on the VM infrastructure.
As the vVols are separate entities you can have deduplication set on particular vVols and maybe not on others (perhaps some arrays have performance penalties for dedupe). What about the swap file? Again as this would be a separate vVol for VMs that are replicated for disaster recovery you won’t have to replicate the swap file which takes up space and bandwidth. You will be able to manage your space utilisation in the same way you currently can with NFS and everything would by default be thin-provisioned. No more overprovisioning LUN space to lie to your VM infrastructure about what space is actually available.
The VMware and Partner Demos
What has been interesting lately is many of the storage vendor partners have started showing demos of how some aspects of vVols will work with their hardware.
I’m assuming there is a tremendous amount of work being done by the storage vendors to get ready for vVols. This isn’t just software innovation from VMware but ground breaking changes in the storage array management software. I’m not sure how this will actually be delivered as there seem to be many possibilities with vVols and perhaps the functionality will be rolled out over time.
I find this storage innovation extremely exciting. This is a serious push to bring the often clunky and seemingly legacy provisioning of storage into the software defined data centre vision.
Virtual Volumes (VVOLs) Tech Preview
VMworld 2012 – Psst… Want to see the future of storage with VMware and EMC? Great Post!
VM Granular Storage" (aka vVols) – VMworld 2011 Demonstration
The future of VMware storage – vVol demo
VMware integration: Get Ready for VM Volumes
NetApp and VMware: vVols Tech Preview
NetApp & VMware: vVols UPDATE!
I would be very interested to hear your thought and suggestions in the comments for what vVols will be able to do and hopefully not tame too much of my speculation!