What Is a Cluster?A group of machines all serving an identical purpose is called a cluster. Similarly, an application or a service is clustered if any component of the application or service is served by more than one server. Figure 15.1 does not meet this definition of a clustered service, even though there are multiple machines, because each machine has a unique roll that is not filled by any of the other machines. Figure 15.1. An application that does not meet the cluster definition.
Figure 15.2 shows a simple clustered service. This example has two front-end machines that are load-balanced via round-robin DNS. Both Web servers actively serve identical content. Figure 15.2. A simple clustered service.
There are two major reasons to move a site past a single Web server:
While both splitting a collection of services into multiple small clusters and creating large clusters that can serve multiple roles have merits, the first is the most prone to abuse. I've seen numerous clients crippled by "highly scalable" architectures (see Figure 15.3). Figure 15.3. An overly complex application architecture.The many benefits of this type of setup include the following:
The drawbacks are considerations of scale. Many projects are overdivided into clusters. You have 10 logically separate services? Then you should have 10 clusters. Every service is business critical, so each should have at least two machines representing it (for redundancy). Very quickly, we have committed ourselves to 20 servers. In the bad cases, developers take advantage of the knowledge that the clusters are actually separate servers and write services that use mutually exclusive facilities. Sloppy reliance on the separation of the services can also include things as simple as using the same-named directory for storing data. Design mistakes like these can be hard or impossible to fix and can result in having to keep all the servers actually physically separate. Having 10 separate clusters handling different services is not necessarily a bad thing. If you are serving several million pages per day, you might be able to efficiently spread your traffic across such a cluster. The problem occurs when you have a system design that requires a huge amount of physical resources but is serving only 100,000 or 1,000,000 pages per day. Then you are stuck in the situation of maintaining a large infrastructure that is highly underutilized. Dot-com lore is full of grossly "mis-specified" and underutilized architectures. Not only are they wasteful of hardware resources, they are expensive to build and maintain. Although it is easy to blame company failures on mismanagement and bad ideas, one should never forget that the $5 million data center setup does not help the bottom line. As a systems architect for dot-com companies, I've always felt my job was not only to design infrastructures that can scale easily but to build them to maximize the return on investment. Now that the cautionary tale of over-clustering is out of the way, how do we break services into clusters that work? |