How Desktop Virtualization Works II

End-User Computing – Simple and Secure

How Desktop Virtualization Works II

End-User Computing – Simple and Secure

VDI Access

Users access VDI with different types of devices:

  • Thin or zero clients
  • Mobile devices (smartphones and tablets)
  • standard PC platforms (Windows, macOS, Linux)

If clients are outside of the corporate network, using WAN, secure access is provided by an additional component – Unified Access Gateway (UAG).

User authentication is done through Active Directory integration, including additional security features such as Single-Sign-On (SSO) and Two-Factor-Authentication.

 

Figure 1. LAN access

 

Figure 2. WAN access

 

Figure 3. Various client devices

 

Thin/Zero clients

 

Thin and zero clients are designed for VDI, reliable and straightforward, with low power consumption. They also have a small footprint, which reduces space requirements. These clients are cheaper than standard desktops or laptops, with minimum maintenance required.

  • Zero Clients – contain no operating system, local disk, CPU, or memory resources. With only a PCoIP chip installed, they are extremely energy efficient and easy to administer. No data is ever stored on the device, which makes them suitable for high-security environments. Some of them are configured for specific protocols only, which could be a problem, especially in large environments. Besides, the configuration and use of USB devices can be complicated in some cases.
  • Thin Clients – contain an operating system, disk, CPU, and memory resources. It brings more capabilities but also more challenges in both hardware and software maintenance. These clients support VPN connections and a variety of USB devices.

Optimal device choice depends on many parameters, including the type of work, financials, and overall VDI environment. Some of the crucial factors are:

  • protocol (PCoIP, Blast, etc.)
  • Wi-Fi connectivity
  • VPN support
  • VoIP support
  • maximum resolution and number of monitors
  • graphical processing capabilities
  • security features
  • number and type of ports
  • centralized management capabilities
  • ease of configuration

 

Mobile devices and standard PC platforms

 

Users access VDI using Horizon Client software or browser if client installation is not possible (VMware Horizon HTML Access).

Standard PC platforms provide outstanding performance, but that comes with higher costs and more complicated maintenance. One way to lower costs is repurposing older devices at the end of their lifecycle. Both standard platforms and mobile devices are an excellent choice for remote user’s access to corporate VDI.

 

User profile management

 

All user environments, huge ones, fully benefit from VDI implementation if the whole process is automated as much as possible. It means the resources are dynamically assigned as needed, at the right point in time, with minimum static, pre-allocated workload capacities. The user logs in and gets the first available virtual machine, which can be different each time. It raises the question of user’s specific data and application settings management.

There are several ways to manage user profiles, depending on specific VDI implementation, Horizon 7 edition, and licensing model:

  • VMware Dynamic Environment Manager (DEM)
  • VMware Persona Management
  • VMware App Volumes Writable Volumes
  • Microsoft FSLogix

Profile management is done through Active Directory integration, using group policies and dedicated administrative templates for Horizon 7. A newer version of DEM can work without AD.

 

VMware Dynamic Environment Manager (DEM)

 

Specific settings are kept on the application level rather than complete profile, which provides better granular control. Configurations are kept in separate .zip files for each application (Figure 4). This way, they can be applied on various operating systems, unlike most standard solutions tied to a specific OS. Horizon 7 Enterprise edition is required.

 

 

Figure 4. Configuration files (DEM)

 

VMware Persona Management

 

This solution keeps the entire user profile, similar to standard Microsoft Roaming Profile solutions. It is available in all Horizon 7 editions, but it doesn’t support RDSH agents and newer versions of Windows 10.

 

VMware App Volumes – Writable Volumes

 

Profiles are kept on separate virtual disks and attached to various virtual machines, as needed. Horizon 7 Enterprise edition is required and separate infrastructure for App Volumes (servers, agents, etc.). Virtual disks are in standard .vmdk format, which eases their administration and data backup/recovery. App volumes can be combined with DEM to get a wide range of profile management options.

 

Microsoft FSLogix

 

This solution is handy for users without Horizon 7 Enterprise edition who can’t use advanced VMware profile management features. Profiles are kept on network share in VHD(X) format and added to VMs as virtual disks. This way, profile content is not copied at log on, which often caused significant start-up delays. Besides, there are several more optimization features:

  • Filter Driver is used for redirection, so applications see the profile as it was on the local disk; this is important because many applications don’t work well with profiles located on network drives
  • Cloud Cache technology enables part of user data to be stored on local disk and multiple network paths for profiles to be defined; this increases redundancy and availability in case of an outage
  • Application Masking can efficiently control resources based on the number of parameters (e.g., username, address range).

Both 32-bit and 64-bit architecture is supported, including all OS starting from Windows 7 and Windows Server 2008 R2. It is available for all users with any of the following licenses:

  • Microsoft 365 E3/E5
  • Microsoft 365 A3/A5/ Student Use Benefits
  • Microsoft 365 F1
  • Microsoft 365 Business
  • Windows 10 Enterprise E3/E5
  • Windows 10 Education A3/A5
  • Windows 10 VDA per user
  • Remote Desktop Services (RDS) Client Access License (CAL)
  • Remote Desktop Services (RDS) Subscriber Access License (SAL)

 

Advanced VDI solutions – Teradici PCoIP Remote Workstation

 

Global data growth requires more and more resources for fast and reliable data processing. Some specific business areas also require very intensive calculations and simulations, as well as complex graphical processing. Standard VDI solutions can’t cope with these demands, and usually, that kind of processing is not moved outside the data centers. On the other hand, many companies need their employees to access corporate resources from any place, at any time.

It can be handled by keeping all processes inside data centers and only transferring display information (in the form of pixels) to remote clients, using the Teradici PCoIP Remote Workstation solution (Figure 5). It is composed of three main components:

  • remote workstation host
  • remote workstation client
  • LAN/WAN

 

 

Figure 5. Teradici PCoIP Remote Workstation solution

 

The host can be any standard Windows or Linux platform which does the data processing. The host’s display information is then processed on pixel level by specific PCoIP techniques, encrypted, and sent over a network to the client. The host must have the following components installed:

  • Graphical card (GPU)
  • PCoIP Remote Workstation Card – receives data from GPU and does pixel-level processing, compression, and encoding. This component has three main types, depending on specific requirements and host configuration (Figure 6).

 

 

Figure 6. PCoIP Remote Workstation Card

 

Due to various display information types (text, images, video, etc.), special algorithms are used to recognize each type and apply appropriate compression methods. Moreover, the compression ratio can be adjusted to network fluctuations.

Image from the host is decompressed and displayed on the client side. Clients can be standard PC platforms (desktop/laptop) or dedicated devices (thin/zero clients), with 4 displays maximum, depending on the resolution.

Regardless of client type, security is at a very high level because data never leaves the data center – only encrypted pixels are transmitted. The use of dedicated devices, such as zero clients, additionally decreases the risk of potential attacks and data loss.

 

Implementation

 

As mentioned, every infrastructure is unique, and each implementation depends on many factors. However, some typical scenarios can be used for approximate resource planning and calculation.

 

Scenario 1. Small and medium environments

 

The basic option assumes infrastructure for 50 users, scalable up to 200 virtual machines by adding hardware resources and appropriate licenses.

Licensing model is based on Horizon 7 Advanced Add-on (Named/CCU) with separate licensing for vSAN, vSphere and vCenter.

Virtual desktops are created as linked clones which significantly reduces the disk space and eases administration. User data are kept on a network share, with 100 GB per user allocation.

Compute resources consist of 4 hosts in the vSAN cluster with RAID-5 configuration. ESXi operating system is installed on separate M2 disks with RAID-1 protection. Table 1 shows approximate calculation details for the vSAN cluster, and Table 2 shows the host specifications. Licenses are defined in Table 3.

 

 

Table 1. vSAN cluster calculation (50 VMs)

 

 

Table 2. Host specifications (50 VMs)

 

 

Table 3. Licenses (50 VMs)

 

Scenario 2. Large environments

 

Besides additional hardware resources, large infrastructures usually need extra features for management, control, and integration. In addition, a certain level of automation is desirable.

This scenario is based on the following presumptions:

  • The number of users is 200, with a possible scale-up to 500
  • Up to 100 GB of data per user
  • Ability to use RDS Published applications
  • Ability to virtualize applications with App Volumes
  • Ability to manage user profiles

The features mentioned above require Horizon 7 Enterprise edition, including vSAN, vSphere, and vCenter licenses. Besides, it enables instant clones for VM deployment, which significantly increases system agility and VM creation speed (compared to linked clones). Licensing model can be both Named or CCU.

User profile management can be done using Writable Volumes – virtual disks assigned to every user, containing all installed applications, data, and specific settings. These disks are attached to VM during logon, so the user profile is always available, regardless of VM assigned. Combined with VMware Dynamic Environment Manager, it can offer a high level of granularity in data and profile management.

The servers used are the same as for Scenario 1, with additional hardware resources installed. All details are listed in Tables 4, 5, and 6.

 

 

Table 4. vSAN cluster calculation (200 VMs)

 

 

Table 5. Host specifications (200 VMs)

 

 

Table 6. Licenses (200 VMs)

 

 

Пре него што наставите…
Претплатите се на наш месечни билтен и будите у току са свим што се дешава у индустрији!

How Desktop Virtualization Works II

End-User Computing – Simple and Secure

How Desktop Virtualization Works II

End-User Computing – Simple and Secure

VDI Access

Users access VDI with different types of devices:

  • Thin or zero clients
  • Mobile devices (smartphones and tablets)
  • standard PC platforms (Windows, macOS, Linux)

If clients are outside of the corporate network, using WAN, secure access is provided by an additional component – Unified Access Gateway (UAG).

User authentication is done through Active Directory integration, including additional security features such as Single-Sign-On (SSO) and Two-Factor-Authentication.

 

Figure 1. LAN access

 

Figure 2. WAN access

 

Figure 3. Various client devices

 

Thin/Zero clients

 

Thin and zero clients are designed for VDI, reliable and straightforward, with low power consumption. They also have a small footprint, which reduces space requirements. These clients are cheaper than standard desktops or laptops, with minimum maintenance required.

  • Zero Clients – contain no operating system, local disk, CPU, or memory resources. With only a PCoIP chip installed, they are extremely energy efficient and easy to administer. No data is ever stored on the device, which makes them suitable for high-security environments. Some of them are configured for specific protocols only, which could be a problem, especially in large environments. Besides, the configuration and use of USB devices can be complicated in some cases.
  • Thin Clients – contain an operating system, disk, CPU, and memory resources. It brings more capabilities but also more challenges in both hardware and software maintenance. These clients support VPN connections and a variety of USB devices.

Optimal device choice depends on many parameters, including the type of work, financials, and overall VDI environment. Some of the crucial factors are:

  • protocol (PCoIP, Blast, etc.)
  • Wi-Fi connectivity
  • VPN support
  • VoIP support
  • maximum resolution and number of monitors
  • graphical processing capabilities
  • security features
  • number and type of ports
  • centralized management capabilities
  • ease of configuration

 

Mobile devices and standard PC platforms

 

Users access VDI using Horizon Client software or browser if client installation is not possible (VMware Horizon HTML Access).

Standard PC platforms provide outstanding performance, but that comes with higher costs and more complicated maintenance. One way to lower costs is repurposing older devices at the end of their lifecycle. Both standard platforms and mobile devices are an excellent choice for remote user’s access to corporate VDI.

 

User profile management

 

All user environments, huge ones, fully benefit from VDI implementation if the whole process is automated as much as possible. It means the resources are dynamically assigned as needed, at the right point in time, with minimum static, pre-allocated workload capacities. The user logs in and gets the first available virtual machine, which can be different each time. It raises the question of user’s specific data and application settings management.

There are several ways to manage user profiles, depending on specific VDI implementation, Horizon 7 edition, and licensing model:

  • VMware Dynamic Environment Manager (DEM)
  • VMware Persona Management
  • VMware App Volumes Writable Volumes
  • Microsoft FSLogix

Profile management is done through Active Directory integration, using group policies and dedicated administrative templates for Horizon 7. A newer version of DEM can work without AD.

 

VMware Dynamic Environment Manager (DEM)

 

Specific settings are kept on the application level rather than complete profile, which provides better granular control. Configurations are kept in separate .zip files for each application (Figure 4). This way, they can be applied on various operating systems, unlike most standard solutions tied to a specific OS. Horizon 7 Enterprise edition is required.

 

 

Figure 4. Configuration files (DEM)

 

VMware Persona Management

 

This solution keeps the entire user profile, similar to standard Microsoft Roaming Profile solutions. It is available in all Horizon 7 editions, but it doesn’t support RDSH agents and newer versions of Windows 10.

 

VMware App Volumes – Writable Volumes

 

Profiles are kept on separate virtual disks and attached to various virtual machines, as needed. Horizon 7 Enterprise edition is required and separate infrastructure for App Volumes (servers, agents, etc.). Virtual disks are in standard .vmdk format, which eases their administration and data backup/recovery. App volumes can be combined with DEM to get a wide range of profile management options.

 

Microsoft FSLogix

 

This solution is handy for users without Horizon 7 Enterprise edition who can’t use advanced VMware profile management features. Profiles are kept on network share in VHD(X) format and added to VMs as virtual disks. This way, profile content is not copied at log on, which often caused significant start-up delays. Besides, there are several more optimization features:

  • Filter Driver is used for redirection, so applications see the profile as it was on the local disk; this is important because many applications don’t work well with profiles located on network drives
  • Cloud Cache technology enables part of user data to be stored on local disk and multiple network paths for profiles to be defined; this increases redundancy and availability in case of an outage
  • Application Masking can efficiently control resources based on the number of parameters (e.g., username, address range).

Both 32-bit and 64-bit architecture is supported, including all OS starting from Windows 7 and Windows Server 2008 R2. It is available for all users with any of the following licenses:

  • Microsoft 365 E3/E5
  • Microsoft 365 A3/A5/ Student Use Benefits
  • Microsoft 365 F1
  • Microsoft 365 Business
  • Windows 10 Enterprise E3/E5
  • Windows 10 Education A3/A5
  • Windows 10 VDA per user
  • Remote Desktop Services (RDS) Client Access License (CAL)
  • Remote Desktop Services (RDS) Subscriber Access License (SAL)

 

Advanced VDI solutions – Teradici PCoIP Remote Workstation

 

Global data growth requires more and more resources for fast and reliable data processing. Some specific business areas also require very intensive calculations and simulations, as well as complex graphical processing. Standard VDI solutions can’t cope with these demands, and usually, that kind of processing is not moved outside the data centers. On the other hand, many companies need their employees to access corporate resources from any place, at any time.

It can be handled by keeping all processes inside data centers and only transferring display information (in the form of pixels) to remote clients, using the Teradici PCoIP Remote Workstation solution (Figure 5). It is composed of three main components:

  • remote workstation host
  • remote workstation client
  • LAN/WAN

 

 

Figure 5. Teradici PCoIP Remote Workstation solution

 

The host can be any standard Windows or Linux platform which does the data processing. The host’s display information is then processed on pixel level by specific PCoIP techniques, encrypted, and sent over a network to the client. The host must have the following components installed:

  • Graphical card (GPU)
  • PCoIP Remote Workstation Card – receives data from GPU and does pixel-level processing, compression, and encoding. This component has three main types, depending on specific requirements and host configuration (Figure 6).

 

 

Figure 6. PCoIP Remote Workstation Card

 

Due to various display information types (text, images, video, etc.), special algorithms are used to recognize each type and apply appropriate compression methods. Moreover, the compression ratio can be adjusted to network fluctuations.

Image from the host is decompressed and displayed on the client side. Clients can be standard PC platforms (desktop/laptop) or dedicated devices (thin/zero clients), with 4 displays maximum, depending on the resolution.

Regardless of client type, security is at a very high level because data never leaves the data center – only encrypted pixels are transmitted. The use of dedicated devices, such as zero clients, additionally decreases the risk of potential attacks and data loss.

 

Implementation

 

As mentioned, every infrastructure is unique, and each implementation depends on many factors. However, some typical scenarios can be used for approximate resource planning and calculation.

 

Scenario 1. Small and medium environments

 

The basic option assumes infrastructure for 50 users, scalable up to 200 virtual machines by adding hardware resources and appropriate licenses.

Licensing model is based on Horizon 7 Advanced Add-on (Named/CCU) with separate licensing for vSAN, vSphere and vCenter.

Virtual desktops are created as linked clones which significantly reduces the disk space and eases administration. User data are kept on a network share, with 100 GB per user allocation.

Compute resources consist of 4 hosts in the vSAN cluster with RAID-5 configuration. ESXi operating system is installed on separate M2 disks with RAID-1 protection. Table 1 shows approximate calculation details for the vSAN cluster, and Table 2 shows the host specifications. Licenses are defined in Table 3.

 

 

Table 1. vSAN cluster calculation (50 VMs)

 

 

Table 2. Host specifications (50 VMs)

 

 

Table 3. Licenses (50 VMs)

 

Scenario 2. Large environments

 

Besides additional hardware resources, large infrastructures usually need extra features for management, control, and integration. In addition, a certain level of automation is desirable.

This scenario is based on the following presumptions:

  • The number of users is 200, with a possible scale-up to 500
  • Up to 100 GB of data per user
  • Ability to use RDS Published applications
  • Ability to virtualize applications with App Volumes
  • Ability to manage user profiles

The features mentioned above require Horizon 7 Enterprise edition, including vSAN, vSphere, and vCenter licenses. Besides, it enables instant clones for VM deployment, which significantly increases system agility and VM creation speed (compared to linked clones). Licensing model can be both Named or CCU.

User profile management can be done using Writable Volumes – virtual disks assigned to every user, containing all installed applications, data, and specific settings. These disks are attached to VM during logon, so the user profile is always available, regardless of VM assigned. Combined with VMware Dynamic Environment Manager, it can offer a high level of granularity in data and profile management.

The servers used are the same as for Scenario 1, with additional hardware resources installed. All details are listed in Tables 4, 5, and 6.

 

 

Table 4. vSAN cluster calculation (200 VMs)

 

 

Table 5. Host specifications (200 VMs)

 

 

Table 6. Licenses (200 VMs)

 

 

Пре него што наставите…
Претплатите се на наш месечни билтен и будите у току са свим што се дешава у индустрији!

Software-Defined Disaster Avoidance – The Proper Way

VMware vSAN metro cluster implementation

Software-Defined Disaster Avoidance – The Proper Way

VMware vSAN metro cluster implementation

In my first blog post, the topic I write about, Software Defined Disaster Avoidance – The Proper Way, is a story that we at Braineering have successfully turned into reality twice so far. The two stories have different participants (clients), but they both face the same fundamental challenges. The stories occur in two distinct periods, the first in 2019 and the second one in 2020.

 

Introduction

 

Both clients are from Novi Sad and belong to the public sector. Both provide IT products to many public services, administrations, and bodies without which life in Novi Sad would not run smoothly. Over 3000+ users use IT products and services hosted in their Datacenters daily. Business applications such as Microsoft Exchange, SharePoint, Lync, MS SQL, and Oracle are just some of the 400+ virtual servers that their IT staff takes care of, maintains, or develops daily.

 

Key Challenges

 

At the time, both users’ IT infrastructure was more or less the standard IT infrastructure we see in most clients. It consists of a primary Datacenter and a Disaster Recovery Datacenter located at another physically remote location.

The primary and DR site is characterized by traditional 3-Tier architecture (compute, storage, and network) Figure 1.

The hardware located on the DR site usually operates using more modest resources and older generation equipment than the primary site. It is replicated to a smaller number of the most critical virtual servers. Both clients had storage base replication between Datacenters, and VMware SRM was used to solve automatic recovery.

 

Figure 1.

 

Even though the clients are different, they had common vital challenges:

  • Legacy hardware 
    • different server generations: G7, G8, G9
    • storage systems End of service life.

 

  • Inability to keep up with the latest versions because of legacy hardware.
    • vSphere
    • VMware SRM
    • Storage OS or microcode

 

  • Weak and, for today’s standards, modest performance
    • 8 Gb SAN, 1Gb LAN
    • Slow storage system disks (SATA and SAS)
    • Storage system fragmentation
    • vCPU: pCPU ratio

 

  • Expensive maintenance – again due to legacy hardware
    • Refurbished disks
    • EOSL (End of service life), EOGS (End of general support)

 

  • Limited scalability, the expansion of CPU, memory, or storage resources

 

When new projects and daily dynamic user requests to upgrade existing applications are also considered, both clients were aware that something urgent needed to be done about this issue.

 

Requirements for the Future Solution

 

The future solution was required to be performant, with low latency and easy scaling with high availability. The goal is to reduce any unavailability to a minimum with low RTO and RPO. The future solution must also be simple to maintain, and migration, i.e., switching to it, should be as painless as possible. If possible, remove/reduce overprovisioning and long-term planning and doubts when the need for expansion (by adding new resources) arises.

And, of course, the future solution must support all those business and in-house applications hosted on the previous IT infrastructure.

 

The Chosen Solution

 

After considering different options and solutions, both users eventually opted for VMware vSAN. As the user opted for vSAN as their future solution in both cases, we at Braineering IT Solutions suggested vSAN in the Stretched Cluster configuration to maximize all the potential and benefits that such a configuration brings. To our delight, both users accepted our proposal.

 

Figure 2.

 

Stretched Cluster

 

What is the vSAN Stretched Cluster? It is an HCI cluster that stretches between two distant locations. (Figure 2)

The chosen future solution fully meets all the requirements mentioned above: it supports all clients’ business and in-house applications. In the All-Flash vSAN configuration, they can deliver a vast number of low-latency IOPS. The Scale-Up and Scale-Out architecture allow you to quickly expand resources by adding additional resources to existing nodes (Scale-Up) or adding new nodes to the cluster (Scale-Out).

It is easy to manage; everything is operated from a single vCenter. Existing backup and DR software solutions are supported and work seamlessly. And finally, as the most significant benefit of vSAN in the Stretched Cluster configuration, we have disaster avoidance and planned maintenance.

 

The benefits of the vSAN Stretched Cluster configuration are:

  • Site-level high availability to maintain business continuity.
  • Disaster avoidance and planned maintenance
  • Virtual server mobility and load-balancing between sites
  • Active-Active Datacenter
  • Easy to manage – a single vSphere vCenter.
  • Automatic recovery in the case of one of the sites’ unavailability
  • Simple and faster implementation compared to the Stretched cluster of traditional storage systems.

 

The Advantages of the Implemented Solution

 

The most important advantages:

New servers: The possibility of tracking new versions of VMware platform solutions. A better degree of consolidation and faster execution of virtual machines

Network 10Gbps: 10Gbps datacenter network infrastructure raises network communications to a higher level and degree of speed.

HCI: Scale-out platform, infrastructure growth by adding nodes. Compute, network, and storage resources are converted into building blocks. Replacement of existing storage systems with the vSAN platform in All-flash configuration.

SDDC: A platform that introduces new solutions such as network virtualization, automation systems, day-two operations …

DR site: New DR site dislocated to a third remote location. It is retaining existing VMware SRM and vSphere Replication technology.

Saving: Consolidation of all VMware licenses, consolidation of hardware maintenance of new equipment. Savings have been achieved in the maintenance of old HW systems.

Stretched cluster: Disaster-avoidance system that protects services and data, and recovers with automated procedures, even in a complete site failure scenario

 

The End Solution

 

Today’s IT infrastructure for both clients is shown in Figure 3.

 

Figure 3.

 

The preferred site and Secondary site, the Active-Active cluster, use one common stretched vSAN datastore. All I/O operations on this stretched datastore are synchronized. VMware Replication replicates the 25 most critical virtual servers on the DR site, and that replication is asynchronous. For automated and orchestrated recovery on the DR site in the event of a disaster on a stretched cluster, both users retained the solutions they had previously implemented, VMware SRM.

 

 

 

 

 

 

 

Пре него што наставите…
Претплатите се на наш месечни билтен и будите у току са свим што се дешава у индустрији!

Software-Defined Disaster Avoidance – The Proper Way

VMware vSAN metro cluster implementation

Software-Defined Disaster Avoidance – The Proper Way

VMware vSAN metro cluster implementation

In my first blog post, the topic I write about, Software Defined Disaster Avoidance – The Proper Way, is a story that we at Braineering have successfully turned into reality twice so far. The two stories have different participants (clients), but they both face the same fundamental challenges. The stories occur in two distinct periods, the first in 2019 and the second one in 2020.

 

Introduction

 

Both clients are from Novi Sad and belong to the public sector. Both provide IT products to many public services, administrations, and bodies without which life in Novi Sad would not run smoothly. Over 3000+ users use IT products and services hosted in their Datacenters daily. Business applications such as Microsoft Exchange, SharePoint, Lync, MS SQL, and Oracle are just some of the 400+ virtual servers that their IT staff takes care of, maintains, or develops daily.

 

Key Challenges

 

At the time, both users’ IT infrastructure was more or less the standard IT infrastructure we see in most clients. It consists of a primary Datacenter and a Disaster Recovery Datacenter located at another physically remote location.

The primary and DR site is characterized by traditional 3-Tier architecture (compute, storage, and network) Figure 1.

The hardware located on the DR site usually operates using more modest resources and older generation equipment than the primary site. It is replicated to a smaller number of the most critical virtual servers. Both clients had storage base replication between Datacenters, and VMware SRM was used to solve automatic recovery.

 

Figure 1.

 

Even though the clients are different, they had common vital challenges:

  • Legacy hardware 
    • different server generations: G7, G8, G9
    • storage systems End of service life.

 

  • Inability to keep up with the latest versions because of legacy hardware.
    • vSphere
    • VMware SRM
    • Storage OS or microcode

 

  • Weak and, for today’s standards, modest performance
    • 8 Gb SAN, 1Gb LAN
    • Slow storage system disks (SATA and SAS)
    • Storage system fragmentation
    • vCPU: pCPU ratio

 

  • Expensive maintenance – again due to legacy hardware
    • Refurbished disks
    • EOSL (End of service life), EOGS (End of general support)

 

  • Limited scalability, the expansion of CPU, memory, or storage resources

 

When new projects and daily dynamic user requests to upgrade existing applications are also considered, both clients were aware that something urgent needed to be done about this issue.

 

Requirements for the Future Solution

 

The future solution was required to be performant, with low latency and easy scaling with high availability. The goal is to reduce any unavailability to a minimum with low RTO and RPO. The future solution must also be simple to maintain, and migration, i.e., switching to it, should be as painless as possible. If possible, remove/reduce overprovisioning and long-term planning and doubts when the need for expansion (by adding new resources) arises.

And, of course, the future solution must support all those business and in-house applications hosted on the previous IT infrastructure.

 

The Chosen Solution

 

After considering different options and solutions, both users eventually opted for VMware vSAN. As the user opted for vSAN as their future solution in both cases, we at Braineering IT Solutions suggested vSAN in the Stretched Cluster configuration to maximize all the potential and benefits that such a configuration brings. To our delight, both users accepted our proposal.

 

Figure 2.

 

Stretched Cluster

 

What is the vSAN Stretched Cluster? It is an HCI cluster that stretches between two distant locations. (Figure 2)

The chosen future solution fully meets all the requirements mentioned above: it supports all clients’ business and in-house applications. In the All-Flash vSAN configuration, they can deliver a vast number of low-latency IOPS. The Scale-Up and Scale-Out architecture allow you to quickly expand resources by adding additional resources to existing nodes (Scale-Up) or adding new nodes to the cluster (Scale-Out).

It is easy to manage; everything is operated from a single vCenter. Existing backup and DR software solutions are supported and work seamlessly. And finally, as the most significant benefit of vSAN in the Stretched Cluster configuration, we have disaster avoidance and planned maintenance.

 

The benefits of the vSAN Stretched Cluster configuration are:

  • Site-level high availability to maintain business continuity.
  • Disaster avoidance and planned maintenance
  • Virtual server mobility and load-balancing between sites
  • Active-Active Datacenter
  • Easy to manage – a single vSphere vCenter.
  • Automatic recovery in the case of one of the sites’ unavailability
  • Simple and faster implementation compared to the Stretched cluster of traditional storage systems.

 

The Advantages of the Implemented Solution

 

The most important advantages:

New servers: The possibility of tracking new versions of VMware platform solutions. A better degree of consolidation and faster execution of virtual machines

Network 10Gbps: 10Gbps datacenter network infrastructure raises network communications to a higher level and degree of speed.

HCI: Scale-out platform, infrastructure growth by adding nodes. Compute, network, and storage resources are converted into building blocks. Replacement of existing storage systems with the vSAN platform in All-flash configuration.

SDDC: A platform that introduces new solutions such as network virtualization, automation systems, day-two operations …

DR site: New DR site dislocated to a third remote location. It is retaining existing VMware SRM and vSphere Replication technology.

Saving: Consolidation of all VMware licenses, consolidation of hardware maintenance of new equipment. Savings have been achieved in the maintenance of old HW systems.

Stretched cluster: Disaster-avoidance system that protects services and data, and recovers with automated procedures, even in a complete site failure scenario

 

The End Solution

 

Today’s IT infrastructure for both clients is shown in Figure 3.

 

Figure 3.

 

The preferred site and Secondary site, the Active-Active cluster, use one common stretched vSAN datastore. All I/O operations on this stretched datastore are synchronized. VMware Replication replicates the 25 most critical virtual servers on the DR site, and that replication is asynchronous. For automated and orchestrated recovery on the DR site in the event of a disaster on a stretched cluster, both users retained the solutions they had previously implemented, VMware SRM.

 

 

 

 

 

 

 

Пре него што наставите…
Претплатите се на наш месечни билтен и будите у току са свим што се дешава у индустрији!

We can recover from a disaster, but can we avoid it?

Disaster Avoidance -VMware vSphere

We can recover from a disaster, but can we avoid it?

Disaster Avoidance -VMware vSphere

Intro

 

If a disaster occurs, every business needs a set of recovery strategies and solutions prepared in advance to protect and restore business-critical applications. According to Business Impact Analysis (BIA), RPO, RTO, and MTD are defined.

RPO value is how much data is allowed to lose but measured in time. RPO is defined based on the amount of data that can be lost within a period of time before significant harm to the business occurs. RPO is used to determine the frequency of backups.

RTO values represent how long it takes to achieve restoration goals before reaching maximum tolerable downtime (MTD).

MTD value is how long it takes to restore from the disaster to a fully operational state. MTD is defined based on the quantity of time applications, and business processes can be down without causing damage to a business. Often MTD is overlooked, from an IT perspective, because the WRT (Work Recovery Time) procedure takes time to check that all systems are synchronized, and data needs to be checked and tested to be sure that they are in the proper sequence.

Disaster Recovery

 

The concept of disaster recovery presents strategies and solutions, which have traditionally been the way to respond to all sorts of outages (natural, hardware & software failures and human-made mistakes). It presents a set of procedures for returning access and functionality of IT infrastructure to a fully operational state after a catastrophic interruption. So, disaster recovery is a manual task for recovering workloads at a recovery site from replicated data. Tools like VMware Site Recovery Manager (SRM) can be used for recovery automation.

 

Disaster Avoidance

 

The question is, can disaster be avoided?  Is there a way to be proactive and keep data safe even if a disaster happens?  The answer is yes! Instead of recovering data, disaster avoidance forecast and prepare for a disaster before it happens. Disaster avoidance enables the highest level of resiliency for business-critical applications and virtual machines hosting them, which ensures application availability in case of disaster.

Over the years, we have had solutions with synchronous replication functionality, but these solutions were complex to implement and very expensive. Usually, deployment of those solutions required Professional Services engagement, and maintenance was spread to multiple vendors.

This blog will present three solutions based on VMware Metro Storage Cluster (vMSC), which are simple to deploy and maintain and come with a price worth thinking about. These solutions will provide better IT infrastructure resilience than traditional disaster recovery solutions. But, to achieve multi-level protection, you should have a third site that will act as a traditional DR site. Also, in case of VM guest OS failures or ransomware attacks, a backup solution is needed to provide you with the ability to restore VMs and guest OS files from multiple restore points. For backup, we recommend a solution that leverages vSphere APIs for I/O Filtering (VAIO) like Cohesity, which delivers near-zero RPOs and rapid RTOs.

 

vMSC – vSphere Metro Storage Cluster

 

vMSC is a storage configuration that combines replication with array-based clustering. In this design, the datastore, which spans both sites, must be accessible from both sites. These configurations are usually implemented in environments where disaster and downtime avoidance is a crucial requirement. Every disk writes synchronously at both sites. It ensures data consistency regardless of the location. So, this architecture requires significant bandwidth between two sites, and very low latency (up to 10ms RTT).

With traditional synchronous replication, the primary-secondary relationship between active (primary) LUN and mirror (secondary) LUN exists. The replication needs to be stopped to access secondary LUN, and secondary LUN is presented to hosts with different LUN ID.

With vMSC, storage subsystems must be able to read and write to both locations, and disk writes are committed synchronously at both locations to ensure that data is always consistent.

Based on how hosts access storage, we have two types of vMSC configurations:

  • Uniform host access configuration: hosts from both sites are connected to all storage nodes on both sites.

With uniform host access configuration, in the event of storage outage on site A, hosts from site A will access identical LUN through Storage B.

 

  • Non-uniform host access configuration: hosts from each site are connected only to the storage nodes within the same site.

With non-uniform host access configuration, in the event of storage outage on site A, VMs from site A will be restarted on site B by vSphere HA.

As of the licensing, from the VMware side, there is no minimum license requirement. You can create stretched cluster with any edition. If automated workload balancing is required, the vSphere Enterprise Plus license requirement is from either a CPU or storage perspective.

 

Pure Storage Active Cluster

 

Pure Storage® Purity ActiveCluster is a fully symmetric active/active bidirectional replication solution that provides synchronous replication for zero RPO and automatic transparent failover for zero RTO. ActiveCluster feature offers active/active storage clustering within and across multiple physical locations. These physical locations can be a different rack in a single data center or utterly different data centers with up to 11ms of round-trip network latency.

No additional hardware or licenses are required. Synchronous replication implies synchronized writes between arrays protected in NVRAM on both arrays before being acknowledged to the host. Transparent failover ensures non-disruptive failover between synchronously replicating arrays with automatic resynchronization and recovery.

Purity ActiveCluster comprises three core components: The Pure1 Mediator, active/active clustered array pairs (Purity version 5.0.0 or higher), and stretched storage containers.

 

 

  • Pure1 Mediator is a required component that is used to determine which array will continue to serve data services if an outage occurs in the environment. Mediator must be located in a 3rd site that is in a separate failure domain from either site where arrays are located. Each array must have independent network connectivity to the Mediator such that a single network outage does not prevent both arrays from accessing the Mediator. If a failover is required, the connection to the Mediator occurs from the controller management ports. The preferred option is to use a cloud mediator, provided by Pure, but if arrays do not have internet access, an on-prem Mediator (OVA image) is available for deployment.
  • ActiveCluster storage (volumes) can be accessed by hosts using either a uniform or non-uniform SAN topology. Advantage of using Pure Storage Purity ActiveCluster:
  • In ActiveCluster, volumes in stretched pods are read/write on both arrays
  • The optimized path is defined on a per host-to-volume connection basis using a predefined-array option.
  • A Pod is a stretched storage container that defines a set of objects that are synchronously replicated together. The array can support multiple Pods. A pod can exist on just one or two arrays simultaneously with synchronous replication.

 

The replication network supports connecting arrays with up to 11ms of round-trip time (RTT) latency between the arrays. Two ethernet ports per controller, connected via a switched infrastructure, are required for replication connectivity. For redundant configurations using dual switches, each controller must connect to each local switch, and switching infrastructure must allow all replication ports to connect to each other.

 

ActiveCluster is designed to be genuinely active/active, where either array can maintain I/O services to synchronously replicated volumes. Uniform storage access configuration has failover-less maintenance. In the event of an array failure, or replication link failure causing one array to stop I/O services,  the hosts experience only the loss of some storage paths and continue to use other paths to the available array. In a non-uniform storage access configuration, VMs running on hosts that have lost access to the array will be restarted on the hosts connected to the other storage array.

ActiveCluster includes an automatic way for applications to transparently failover without user intervention, using Pure1 Cloud Mediator to provide a quorum mechanism. Transparent failover between arrays in ActiveCluster is automatic.

In the event of replication network failure, split-brain scenario, both arrays will pause I/O within standard host I/O timeout and reach out to the Mediator to determine which array can continue to serve I/O for each replicated pod. When the ActiveCluster mediator race begins, the result can be unpredictable. That means, in the case of non-uniform host configuration, lack of mediator race predictability can lead to disruptive restart for the application running on stretched pod volume. ActiveCluster provides a failover preference feature that enables the storage administrator to influence the outcome of the resulting race. The preference feature gives the preferred array, for each pod, additional 6 seconds in its race to the Mediator.

In non-uniform host connectivity setting failover, preference is recommended best practice. Disruptive restarts will occur only in cases when one FlashArray is offline or the entire site is lost.

Purity 5.3 introduced ActiveCluster built-in Mediator pulling. This feature allows both arrays to agree a mediator race winner for each stretched pod if both arrays cannot reach the Mediator. Pod failover preference is used to determine the winner (if set). If no pod failover preference were set, the winner would be selected automatically. In the following table availability of stretched pod volumes is given based on different solution components failure.

* Pre-Election completes before second component failure.

** Simultaneous failures of components.

*** Assumes the „Other Array“ was not Pre-Elected. If the Pre-Elected array fails, stretched

pod volumes are unavailable.

 

Resynchronization and recovery are automatic. The storage administrator intervention is no longer needed to recover and resynchronize ActiveCluster replication.

 

NetApp SnapMirror business continuity (SM-BC)

 

ONTAP 9.8 introduces SnapMirror Business Continuity (SM-BC), enabling workloads to be served simultaneously on both clusters. SM-BC is a continuously available storage solution, available for NetApp ONTAP® running on NetApp AFF or NetApp All SAN Array (ASA) storage systems. SM-BC supports only two-node HA clusters (either AFF or ASA); no additional hardware is required.

Compared to SnapMirror Synchronuos (SM-S), which require manual failover or to use DR management solution for failover, SM-BC enables automated failover without any manual intervention. SM-BC maintains the LUN identity between the two copies, so applications see them as a shared LUN. Application granularity is enabled using a consistency group, with automatic transparent failover to the secondary copy with no data loss. Besides business continuity with granular application management, SM-BC enables additional use cases like leveraging 2nd copy for test and dev. An ONTAP mediator is required on the 3rd site to monitor two ONTAP clusters and orchestrate automated failover if the primary storage system is offline. SM-BC does not require extra licensing as long as your cluster has Data Protection or Premium Bundle.

SM-BC provides the following benefits:

  • Application granularity for business continuity
  • Automated failover with the ability to test failover for each application.
  • LUN identity remains the same, so the application sees them as a shared virtual device.
  • Ability to reuse secondary with the flexibility to create instantaneous clones for application usage for dev-test, UAT, or reporting purposes, without impacting application performance or availability.
  • Simplified application management using consistency groups to maintain dependent write-order consistency.

 

SM-BC architecture provides active workloads on both clusters, where primary workloads can be served simultaneously from both clusters. The data protection relationship is created between the source storage system and destination storage system by adding the application-specific LUNs from different volumes within a storage virtual machine (SVM) to the consistency group. The purpose of a CG is to take simultaneous snapshot images of multiple volumes, thus ensuring crash-consistent copies of a collection of volumes at a point-in-time (PiT). Under normal operations, the enterprise application writes to the primary consistency group, synchronously replicating this I/O to the mirror consistency group. Even though two separate copies exist in the data protection relationship, because SM-BC maintains the same LUN identity, the application host sees this as a shared virtual device with multiple paths while only one LUN copy is being written to at a time. When a failure occurs and the primary storage system goes offline, the ONTAP Mediator detects this failure and enables seamless application failover to the mirror consistency group. This process fails over only a specific application without the need for the manual intervention or scripting previously required for failover.

In case of replication link failure, NetApp® ONTAP® Mediator detects link failure. Primary LUN continues to serve I/O, to the hosts, and all paths from the secondary cluster report illegal request/LU not found.

 

If a disaster occurs in Site A, Mediator detects it and informs the Secondary site, and the Secondary LUN continues to serve I/O to the hosts. When Site A comes back online, Mediator will establish a relationship in the reverse direction and assign Secondary to Site A volumes. After the relationship reaches a sync state, planned failover can be performed to restore normal operations.

In case of disaster at Site B, primary LUN continues to server I/O to the hosts.

 

In the case of NetApp® ONTAP® Mediator failure (virtual machine), primary LUN continues to server I/O to the hosts, and the relationship is in sync. Because ONTAP Mediator is not available, AUFO (automatic unplanned) or PFO (planned) failover is impossible.

vSAN Streched Cluster

 

Compared with previous solutions based on physical Storage Arrays, vSAN Stretched Cluster is based on VMware vSAN software-defined storage architecture. vSAN is a storage solution that runs on standard x86 hardware. It is integrated into vSphere kernel and fully integrated with other vSphere functionalities such as HA, DRS, vMotion. vSAN Datastore consists of all local disks aggregated into a single datastore shared by all hosts in the cluster.

 

 

Initial setup and maintenance are much more manageable than previous solutions, as the configuration is carried out from the vSphere client. Due to the way that vSAN works, there is no need for configuring storage replication. The deployment of vSAN Stretched Cluster is wholly done from vSphere wizard. The minimum for deployment is 2+1 witnesses and a maximum of 40 ESXi hosts +1 witnesses (vSAN 7 U2). On the 3rd site witness (physical or virtual) is deployed.

 

 

 

Benefits of the vSAN Stretched Cluster configuration are:

  • Disaster avoidance and planned failover (maintenance)
  • Active-Active Datacenter
  • Easy to manage with a single vSphere vCenter
  • Site-level high availability to maintain business continuity
  • Automatic recovery in case one of the sites is unavailable
  • Simple and faster implementation, compared to the Stretched cluster using traditional storage systems

vSAN stretched cluster is an HCI solution that extends between three distant locations or Fault Domains (FD); these include preferred, secondary, and witness. During initial configuration, it is needed to decide which site will be preferred, and this is important if we have a split-brain (ISL failure) scenario. In this scenario, even if the Secondary Site is healthy, vSphere HA will restart VMs from the Secondary Site to Preferred Site.

In vSAN, we use Storage policies to define virtual machine storage requirements for performance and availability. Besides the default storage policy between active sites (Raid1), with vSAN 6.6, we have an additional option for Local Protection and Site Affinity.  On the site, local protection, or FTT, refers to the number of failures (0 to 3), and it can be raid1 or raid5/6. With the Site Affinity Policy, we can define for which objects protection across sites is not desired.

Like in previous solutions, we have different scenarios if some of the essential components fail

 

 

If the cluster loses communication between sites (ISL down), a quorum will be established between the Preferred site and Witness. The vSphere HA will restart VMs from the Secondary site to the Preferred site. That is why it is essential to determine, in the initial deployment, which of the two sites will be preferred.

 

 

 

If the Witness site is down (becomes inaccessible or network isolated), all VMs continues to run on their sites.

 

 

 

If one of the sites is down or becomes network-isolated, the quorum will be established between surviving site and the Witness site. The HA on the other site will restart all VMs from the lost or isolated site.

 

 

 

If the cluster loses one of the hosts, HA will restart those VMs on the other host. If the host does not recover in 60 minutes, all components on that host will be automatically recreated on one of the remaining hosts.

 

Conclusion

 

With vMSC implementation, same benefits that high-availability cluster provide to a local site are available within two data centers which are geographically dispersed. Cluster is spread over two locations and managed by a single vCenter. VMs in vMSC can be migrated between sites with vSphere vMotion and vSphere Storage vMotion. Distance between data centers is limited, often within the metropolitan area (RTT requirement).

Disaster avoidance significantly reduces the probability that a disaster will occur and provides better resilience than traditional disaster recovery. But to achieve multi-level protection third site is needed to act as a traditional DR.

Пре него што наставите…
Претплатите се на наш месечни билтен и будите у току са свим што се дешава у индустрији!