Designing high availability (HA) and/or disaster recovery (DR) can be a daunting task. There are many options and many components in which to consider. Just a sampling of some of the questions you will have to consider:
How the failover will function?
What is the process for failover?
Why would you want a manual or automatic process?
What systems do you want to protect?
How will users gain access to their data?
This article does not try to cover all of the complexities of a DR design but it does address the recommended guidelines when considering XenApp in use with NetScaler to provide a highly available design.
Server Capacity Sizing
When designing a XenApp environment, many customers say they want it designed for X number of users. But, what they need to keep in mind as well as the consultant is to make sure they have extra room to handle a server outage. For instance, if one of the XenApp servers were to fail, those users will need to be distributed among the rest of the XenApp servers. The design should be an N + 1 design as recommended by Citrix. This design allows for a failure of 1 server and the rest of the servers will still be able to handle the user load without any performance degradation.
In most larger companies, SQL servers are protected through either clustering, mirroring, or replication. But a lot of smaller companies forgo this option as it might seem too complex, or not necessary. No matter what the company size is, protecting your databases is important. But, this is easily solved, and not as complex as what some may think. There are even 3rd party software vendors that try to make this as easy as possible, like Double-Take. Also starting with SQL 2005, these methods are built into the product. So in other words, there is no excuse, protect your SQL server.
I'm not going to go into all the benefits of server virtualization, but in regards to XenApp, there are a few benefits I want to point out. These benefits have a direct correlation with your HA / DR design.
First, HA. With HA, you can configure your VM's to restart on another host, if the original housing host fails suddenly. This will keep your XenApp farm with your desired number of servers up. Yes, users will lose their connection if they were on the VM that lived on the host that failed. But, with your N+1 design, they try to log in right away and of course get redirected to a different live server. This gives, the failed VM enough time to restart on a good host and ready to handle user traffic again without your intervention. VMWare and XenServer do this very well.
Second, [Xen/V]motion. If a hypervisor host is failing, meaning it hasn't failed yet, but its demise is imminent you can move your live running servers to another hypervisor host with no disruption to your users (yes, they may experience a brief "pause", but their sessions remain active and no data is lost). Most hypervisors have this functionality, VMWare calls it VMotion and XenServer calls it XenMotion. Whatever you call it, it works, and it works well.
Third, DR. Most major hypervisor players offer some sort of DR feature in their product as well. This allows the virtual machines (VMs) to be brought up at another separate physical location in the event of a catastrophic failure. This would be considered a standby method. This means, that the DR site is not active and needs to be brought online before any resources can be utilized. I wouldn't say this is good for all servers, as some would benefit more from a service level DR, like with SQL mirroring and such. I like using this on servers that have data that doesn't change much, so the likely hood of data loss is mitigated. A hybrid approach is best used when considering this feature. XenApp servers are a good fit for this as there is no data on them.
Access Gateway (AG) allows for secure access to published resources and internal systems through either an ICA proxy or SSL VPN. If you were to only have 1 AG, and it went down, then all external users would then be unable to access the internal systems. Therefore, it is best to deploy them in a pair so you can load balance across them, and you can achieve load balancing across your AGs via NetScaler. Even easier, you could run AG as a feature inside your NetScaler.
Web Interface / Cloud Gateway
No, I'm not discussing the difference between Web Interface (EOL 2015) and Cloud Gateway (Web Interfaces successor). But, you need either or to allow users to gain access to their published resources. You need to think of how many are needed to handle the user load, especially during peak times in the morning when users are getting in and starting their sessions. But, no matter what, always have at least 2 (if you are having bad performance because you failed to properly size the number of front end servers needed, don't blame me). Great you have 2, but how do you balance them? Microsoft NLB? No! You use NetScaler to load balance the servers. NetScaler is smart enough (as long as you configure it) to determine if a server down and traffic needs to be redirected to another server or to just redistribute traffic.
What does the broker service do? The broker service is what your Web Interface (WI) / Cloud Gateway (CG) uses to authenticate users when they log in, enumerate published resources for a user, and direct users to an available XenApp server. So, if that server were to go down, users wouldn't be able to do anything, not even login to the WI / CG. Again, you want to properly size your environment for your user load, but also again, you don't want to have just 1 broker service. Also, just like WI / CG, you use the abilities of NetScaler to load balance your brokers for performance and making sure they are online before sending requests.
If you have a multiple site implementation of XenApp, then you have to contend with user data between them. First, which I am hoping you have done already, is implemented some form of profile offloading from the XenApp servers. This can be done through Citrix Profile Management, or through a 3rd party vendor like AppSense, or RES. Then you have to make sure the data that the users are accessing is close to the XenApp farm. In the case of multi-site XenApp implementations, you will need to do a couple of things:
Replicate user data between sites. You can accomplish this through SAN replication, Microsoft DFSR, or some other 3rd party solution.
Make sure you set the preferred XenApp site for your users so that they are closest to their data.
Global Server Load Balancing
Global Server Load Balancing (GSLB) is not part of XenApp, but it is part of NetScaler and it pairs very well with this discussion. GLSB allows a single namespace to be load balanced or failover between physical sites. So, if you had to failover to your DR site, you can do this at the URL level with GLSB and continue on down the line of bringing your services online.
The combination of NetScaler, XenServer, and XenApp provide a solid basis for a XenApp HA / DR solution. This article went over at a high level of the components to think about when designing a XenApp HA / DR solution. There are many other products that compliment this solution, but that is for another time.