So, having covered an introduction to Zerto and installation and configuration best practices in my two most recent blogs, I’m now going to tackle the very important topic of Virtual Protection Groups (VPGs). Zerto VPGs allow you to create a group of VMs to be protected together, giving you many different options for replication and recovery configuration. With your ZVM and VRAs installed, you are now ready to create Virtual Protection Groups (VPGs). In this blog, I’ll delve into the key things you need to consider when creating a VPG as well as best practices to ensure your Zerto cloud DR environment is optimized.
VPG Grouping Considerations
When creating a VPG, the main thing to consider is how you wish to group your VMs for replication and recovery. You may have a large variety of servers that manage different services or applications as well as servers on multiple networks. Ultimately, the decision on how to create VPGs will be up to you, but reviewing the considerations below during the VPG creation process will help greatly.
First, you may want to consider creating a VPG that is reserved for your Domain Controllers or Active Directory Servers. In the event of a failover, you want to make sure this is the first VPG to failover to ensure your domain is established on the recovery side before the other servers failover. You may run into issues if a SQL or Web server powers on before the Domain servers do as that will restrict access or communication with the domain.
When it comes to creating VPGs for the rest of your servers, there are several ways to group up VMs. A good idea might be to group servers that are dependent on each other for running an application. If you have a web server that works with a backend database server, you might want to group these servers together. This way, if there is a failover scenario, you know these two servers will failover together. You can also set a boot order within the VPG, so in our case, we may want to make sure that during a failover, our database server powers on first, then the web server. Depending on your set up, you may have multiple servers, exchange servers and application servers. You could consider grouping servers by their services. For instance, placing the AD/DC and other critical domain servers in 1 or 2 VPGs. Next, you might have the exchange and database servers in their own groups. Lastly, applications servers would be grouped together in one or more VPGs. So, during a failover, you would first kick off the critical domain groups, then the exchange and database VPGs and lastly your application servers. The goal is to try to have your VMs grouped so that it provides an easy and understandable flow during failover scenarios.
Optimizing RPOs with VPGs
Another thing to consider is that servers will replicate and failover as a group. What this means is that the sum of all data change between all servers in the VPG are replicated in one group. So, if you have 1 SQL server and 4 web servers in 1 VPG, that VPG must replicate the changes made by all 4 servers. This can possibly cause issues in VPGs where there is one server that creates a significant amount of data change as compared to others servers in the VPG. For instance, in our case the SQL server may be causing a large amount of change, while the web servers are mostly static. The Recovery Point Objective (RPO) of this VPG as a whole may suffer, as Zerto may hit points where it is struggling to keep up with the changes caused by our SQL server. In this case, it may be best to separate the SQL server to its own VPG. This way, our 3 web servers will stay in sync as the VPG is only concerned with their changes, the SQL server will also see a boost in performance as the bandwidth for this VPG is reserved only for SQL changes. The SQL VPG can also be given a High Priority, which tells Zerto to allocate more bandwidth to this VPG. This also means that during a failover, all VMs in the group are failed over. So with our VPG containing the 3 webservers, all 3 will be failed over together, there is no way to failover just one of the servers in the VPG.
Finally, keep in mind the number of VMs you put in one VPG. If you are only protecting 10 VMs, you may create just 1 VPG so you are able to easily failover the servers and control their boot order in the VPG. However, you lose some granularity with this. If only one of the servers is down or needs to failover, you have to failover the whole environment. Depending on your bandwidth, you may also start running into issues like the 3 web servers and 1 SQL server described above.
Once you have an idea of how to group your servers in VPGs, the next step is to create them. In your Zerto console, you can click on the VPGs tab which will show a page similar to the one below. By clicking the New VPG button, you can start creating your first VPG.
The first step is to give the VPG a name and set its priority. Priority can be given a High, Medium and Low priority, and this determines how Zerto will prioritize bandwidth allocation. For instance, a VPG with High priority will be given more bandwidth than VPGs with Medium or Low bandwidth. In the case where you have 3 VMs with High priority, more bandwidth is allocated to the group that has the larger amount of data change. This can be helpful during the initial replication period, times where the production environment is experiencing high load or after network disconnects. If you have servers that are more critical, or maybe have higher change rate than the others, you may want to give those VPGs a high priority. This will ensure that your critical servers consume the bulk of your bandwidth and stay in sync.
Next, you will pick the VMs to add to your VPG. In the left-hand column, you will see all your servers that are on hosts with a VRA. To add a server to the VPG, check the box next to one or more of the servers and click the Right Arrow button.
Once you have added the servers to the VPG, you can alter the boot order by clicking the Define Boot Order button above the Selected VMs. The boot order is determined by grouping your VMs into different folders. In my screenshot, I have the iland-test server booting first in the Default Folder. I have moved the iland-test-1 server to NEW GROUP 2. To create a group, click the ADD GROUP button in the Boot Order window. To create a boot order, set the Boot Delay to the number of seconds you want to wait before the NEXT GROUP is powered on. The screenshot is set to have iland-test boot first, wait 60 seconds, and then boot iland-test-1. When the boot order is set, click OK and then Next.
VPG Rules and Alerting
Under the Replication Tab, we can change many of the rules and alerting for the VPG:
- Recovery Site: This is the failover environment the servers are replicating to and will failover to. Typically, you will just have the iland datacenter as an option here. If you are replicating to multiple sites, just make sure the server to be protected to iland has the iland datacenter selected.
- VC/VCD: You may only see this option if you are using a vCloud Director environment on the source site. The iland ECS environment is based off vCloud Director, so you will choose VCD if you have this option.
- ZORG: This is your customer dedicated organization in the ECS environment. You will most likely just have 1 option for this setting, which is your iland provided Org name.
- Recovery Org vDC: This is the Virtual Datacenter created for your ECS organization. Again, you will typically just have on option here. This is where the resources for your environment are allocated.
- Service Profile: By default, a system Service profile is used that is just a template for setting the remaining settings. To make a change to the VPG’s Journal History, RPO Alert or Test Reminder, select Customer on this dropdown.
- Journal History: By default, this is 4 hours. The Journal History determines the amount of time that is safe for recovery. With a 4 hour journal history, you will be able to recover to a point 4 hours prior. This can be set to a higher amount, but keep in mind that a larger journal history consumes more storage as it saves more restore points.
- Target RPO Alert: This is the alert threshold set when monitoring the RPO of a VPG and is set to 5 minutes by default. In most cases, your RPO should be around 15 or 20 seconds. However, if the bandwidth is saturated by high change rate or network issues, the RPO may begin to grow. In this case, if the RPO reaches 5 minutes, meaning Zerto has not been able to create a checkpoint in 5 minutes, Zerto will begin alerting you of the RPO breach.
- Test Reminder: Set to none by default, this sets a reminder for Zerto to alert you when you have gone a certain period of time without conducting a failover test.
- There are advanced settings for the journaling that can be changed, but it is best to leave this to default unless instructed otherwise by iland or Zerto support.
Click Next to continue to wizard. Under the Storage tab, you will see the disks attached to all servers in the VPG. You may notice that the Thin check box is checked for some of the disks. This means that the source server disk is using Thin Provisioning. On the iland side, the disks are deployed using Thick Provisioning. The source side may have a 100GB thin provisioned disk with only 50GB used. This means the amount of space used by this server is just 50GB. To prevent over provisioning on iland’s side, we will convert this to Thick provisioning. So, the disk set to 100GB will actually consume 100GB of storage. You will see more accurate storage usage numbers in the Zerto console if you uncheck all of the checkboxes under the Thin Column.
The Swap column allows you to set a disk to be ignored by Zerto. Keep in mind that during the initial replication, this disk will be replicated. However, after the initial sync, the disk is no longer replicated. This can be helpful if you have swap files on servers or dedicated disks for log files or backups that are not necessary during a failover scenario. Click Next to continue.
Under the Recovery tab, you can set the networks to be used by the VMs in the VPG on failover. Usually, iland will create a network that mimics your production network(s). If you group VMs together that use different networks, we will be able to change the networks for individual NICs on the next page. This setting helps you to set the bulk of the VMs to use one network. You can also choose different networks for a Live Failover and Test Failover. For instance, if you want the failover environment to use the same subnet as production, but a different subnet during a test, that can be configured here. Keep in mind that changing the network requires the IPs to change on the failed over servers, which might break certain ties between servers or applications when communicating.
There is also a vCD Guest Customization checkbox on this page. vCD Guest Customization is a VMware vCloud Director feature that allows you to change guest operating systems for a server from outside the VM. This can be helpful for templates, but with servers already configured it can be tricky. The server’s hostname, IP address and DNS servers could be changed and domain controllers or services may be severely affected. Therefore, this feature is not typically recommended for use. If you are changing IPs, iland engineers will work with you on this feature to make sure we prevent any unwanted changes. When you have the networks selected, click Continue.
The NICs tab will show the NICs for each VM, the Network they are attached to, if they will be connected on failover, and the IP/MAC address used. Here you can change the network for individual NICs and their IP assignments. For instance, if a VM has 2 NICs that connect to 2 different networks you can match that configuration here. Or if you have all servers on a Production Network with one of the servers being on a DMZ network. You can also choose to have a NIC disconnected, in a case where it might be attached to a network not needed in a recovery situation. The IP should be pulled from the VMware tools in your environment. If the IP is not automatically added, you may need to input the IP manually, or add a new IP to be used if you are switching subnets. The MAC Address should match the source servers MAC Address and stay the same on failover. Keep in mind that there is a Live Failover/Move tab and a Test tab. So you can change the network settings for a test, but leave everything the same during a Live Failover.
It may be easier to adjust network settings for a NIC by checking the box on the left-hand side and clicking the EDIT SELECTED button. This brings up the Edit vNIC window where you can change and copy over settings for Live and Test failover scenarios. Once you have your networks set as desired, click OK and Next to continue.
The next page will be for Zerto Backups. Right now, that is not being utilized by iland and can be skipped. If you need more information about the Zerto backups, please contact your account manager. Continue on to the Summary tab.
The last Summary tab will show your VPG settings. Verify these settings are correct and click Done to create the VPG. You should now see your VPG listed under the VPGs tab and once created it will being its initial replication. You can continue to create more VPGs from here, but keep in mind that the more VPGs initializing, the more bandwidth that will be consumed by Zerto. You may find that it is best to allow VPGs to finish their initial sync before creating another one.
I hope you found this overview of Zerto VPG considerations and best practices useful. It is certainly worth taking the time to carefully allocate your VMs to VPGs before going live with your Zerto Cloud replication. Doing this could save you a lot of heartache during a failover situation! In my next blog, I’ll cover live and test failovers.