Hi,
We have a client currently being hosted on a single server in a data centre in the UK. They are asking about getting a level of physical redundancy built into the hosting infrastructure. If a plane lands on the data center, they want to be able to continue operating their website. They are not overly concerned about the service dropping out for a minute or two, but they want to avoid the hours spent offline whilst we restored from backups to another data center and redirected the DNS.
I've done some investigation and found two relatively cheap solutions:
1. Get an additional failover server hosted in another data centre. Operate some level of rsync and mySQL replication to the failover server. Then run a service like http://dynect.com/ to do a rudimentary form of failover via DNS. This is not a 100% solution for everyone (mainly people with badly configured long TTL DNS servers). The rsync/mysql replication is also open to potential problems. This seems like one of the cheapest solutions.
2. Use a provider with a SAN-based virtualisation setup with a backup data centre. I know Hostway offer this, but it is not cheap! In the event of a failover we just fire up our VMWare instance in the failover data center and pick up on the shared SAN.
Both of these are not ideal. 1 is a bit too flakey for my liking, and 2 is a bit too expensive for my client. Is there a neater solution that I am missing?
The application is Java/Struts2/Spring/Hibernate running Tomcat/Apache/MySQL on Linux .
Thanks!
1. The most cost effective in terms of immediate capital outlay is active-passive - which is pretty much #1 that you described. This is generally also less effort... and is generally easy to test.
2. The most cost effective long term strategy is active-active as you get ongoing use of both sites. Particularly if the client is happy for reduced performance in a DR scenario... Even if this isn't the case, at least you get a performance boost from the DR site.
This is usually more effort as you need things like replication to work/perform... And you also need to test a whole new range of failure scenarios.
This approach does have some other advantages though. For example, you can do rolling deploys very easily (upgrade DR first, bring down Prod, upgrade Prod).
3. A common hybrid is to have two active sites on the Tomcat/Apache end and active-passive for the MySQL. Depending on the DB load this can be a best of both worlds scenario.
4. Some other solutions use a coherent cache solution - e.g. Tangosol, which works with Hibernate/MySQL. As long as the latency between the two sites is low, then this should work. Tangosol is a non-free product, now owned by Oracle afaik, but I don't think it's prohibitively expensive... I've seen this used a lot, and it's probably a very simple and elegant solution, but I personally don't like adding another moving part in. A lot of people I've met swear by it though, particularly Hibernate users.