MSDN Magazine - December 2007 - (Page 19) right? With only two computers, you can’t know which is correct in the event of a single unit failure. So, NASA added a third machine, and if two computers agree but the third returns a different result, the third is considered to be broken. This level of redundancy is nice—unless one of your three boxes fails, because then you’re back in the scenario where only two are available and you don’t know which is correct in the event of a second failure. So NASA added a fourth, then talked itself into adding a fifth. Clearly, resolving non-crashing failures in a pair of mirrored processes is not simple. Another problem with this approach is that if you run multiple processes on the same machine and one of them exhausts a system-level resource, there’s a reasonable chance that the other process will need the same resource simultaneously. And this may cause the reserve process to fail in the same manner as the first. In fact, they may even both be competing with one another for the same resource. Recycling a Portion of the Process For a high level of resiliency,tearing down a process and restarting it or failing over to another process just won’t do.What you really need to do is find the part of the application that failed and recycle that part.This requires isolating various parts of your application’s process into recyclable chunks.Operations must either be stateless or they must use a transactional system to ensure that no writes occur or that they get backed out.Additionally, all resource usage must be freed when you recycle a part of the process. When thinking about long-living servers,think about state corruption and lack of consistency. Consistency can apply to several different layers. While a simple linked list might be consistent, a complex data structure will require additional invariants.If a consumer has an invariant that all elements in a linked list also must be stored as values in a hash table, then consistency in the linked list doesn’t mean the application is consistent. For this reason, you must treat the slightest possibility of corruption that breaks invariants to be a problem. If an asynchronous exception occurs, how much of the state may potentially be corrupted and how can a server be resilient to this corruption? To carve an application into recyclable chunks,you must isolate operations from one another. When an asynchronous exception occurs, it may have caused state corruption. To avoid recycling the entire process, you need to have encapsulated a smaller portion of your process, including the set of failed operations and all relevant state information. What Is an AppDomain? An application domain, or AppDomain for short, is a sub-process unit of isolation for managed code. Most assemblies can be loaded into an AppDomain.When the AppDomain is unloaded, the assemblies can often be unloaded as well. CLR Inside Out december2007 19 http://www.iocomp.com
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.