Today I went to a NetApp workshop organized by AVnet in the Netherlands. I wanted to learn more about the MetroCluster solution.
This posting will describe some of the best practices I learned for designing and building a MetroCluster and will give a high level overview of a NetApp Metrocluster.
If you are interested in this workshop: Altimate will organize the same workshop at May 8 in the Netherlands. See for more info of this free event at NetApp MetroCluster Best Practices & Installation Workshop
Slides of the same sort of workshop can be downloaded here. The content of these slides:
Causes of Downtime
MetroCluster Deployment Scenarios
MetroCluster Failure Scenarios
Mind when using NetApp MetroCluster this configuration is not supported by VMware as it is not listed in the HCL. So far the only metrocluster storage supported for vSphere 5 is EMC vPLEX.
What is a MetroCluster?
MetroCluster allows for synchronous mirroring of volumes between two storage controllers providing storage high availability and disaster recovery. A MetroCluster configuration consists of two NetApp FAS controllers, each residing in the same datacenter or two different physical locations, clustered together. It provides recovery for any single storage component or multiple point failure, and single-command recovery in case of complete site disaster.
Designing and building a Metrocluster needs to be done with lots of care. The components used most be certified by NetApp or Brocade. Metrocluster might be marketed as easy and cost-effective but I do not agree for at least the setup and design part. Things like a witness site to prevent a split brain are not mentioned in the workshop.
These are the best practices and guidelines I learned:
-at April 20 NetApp released ONTAP 8.1. When 8.1 is installed the array could be complaining about incorrect cabling. NetApp requires to follow the guidelines in the Universal SAS and ACP Cabling Guide. Only way to get rid of the error/warning is re-cabling.
-check and double check the day you start assembling a MetroCluster the document titled ‘MetroCluster Compatibility Matrix’. Make sure only components listed in this document are used.
-make sure components used are listed in the Brocade Compatibility Matrix. If not listed and still used there is a good chance you are on your own when technical issues happen.
-roundtrip time or latency for a MetroCluster should be a max. of around 2 to 3 ms.
-SATA and SAS drives are supported. SSD drives not supported.
-currently only Brocade fiber switches are supported in a MetroCluster. In future when ONTAP 8.1.1 is released NetApp might support Cisco switches as well.
-a maximum number of spindles is supported. 840 spindles when using ONTAP 7.35 and 8.0.1 or later
-for distances over 10 km you need an extended fabric license for the Brocade fiberswitches. I believe this enables the use of more buffer credits. Costs are around Euro 9000,- . Mind Brocade/NetApp uses a 1.5 multiplier. If the distance of the fiber connection between two sites is more than 6.5 km you need to purchase an extended fabric license (6.5 x 1.5 = 10 km)
-if DWDM (transport multiple colours/wavelengths of fiber over the same connection) the DWDM most be 100 % transparent. No dynamic adjustment of buffers.
-if DWDM or CWDM are used make sure these are supported by Brocade
-support for Brocade is not done anymore by NetApp but only by Brocade itself.
-the full bandwith between two sites needs to be made available for storage replication traffic. Traffic shaping is not supported.
-only use OM3+ or better optical cables. OM1 and OM2 are not supported!
-make sure optical cables are cleaned before being installed. Even new cables. Special cleaning tools are available.
NetApp has a whitepaper available titled Best Practices for MetroCluster Design and Implementation (TR-3548), It can only be downloaded using an account which can be obtained for free by registering.
VMware had a nice KB article titled VMware support with NetApp MetroCluster