Better Software - June 2008 - (Page 39) Figure 2: RCA cause-effect diagram for corrupted data primary effect structure but not too much. Fishbone analysis is best left to aggregations of RCAs or process problems. Apollo and 5 Why use similar approaches, but the Apollo method has more structure and guidance. For those wishing to dig deeper into this method, I suggest reading Apollo Root Cause Analysis by Dean L. Gano. We’ll cover the basic process here. The Apollo method uses a causeand-effect chart. The two-level picture shown in figure 1 will be used to explain the principles. The “primary effect” is the problem or failure we want to prevent from recurring. We need to identify the what, when, and where as well as the significance of the failure. To ensure all are on the same page, these must be clearly stated and understood by all members of the RCA team before the analysis starts. Understanding the significance is needed to allow the team to make appropriate cost-benefit recommendations for those failures where there is a relationship between cause-and-effect cost. Statements must be specific; “System down” has far less meaning than “One hour of production lost at a cost of $1 million.” Once the primary effect is identified, understood, and agreed upon, the next step is to diagram the cause-andeffect chain. In the Apollo method, the term “caused by” is used, and I suggest using this exact phrase when building the chart. Saying “caused by” focuses participants on understanding the link between the two causes or actions. The causes can be divided into a “condition cause” (the necessary set of conditions that exist over time) and an “action cause” (the event that triggered the failure). A software example of a condition cause might be the failure to check for buffer overflow and the action cause a hacker overflowing the buffer in an attempt to breach the system, leading to a primary effect of a security failure. Another condition cause might be programmers failing to monitor device status for data errors and the action cause a reported parity error, leading to the primary effect of corrupted data accepted by the system. The pairing of the condition cause and action cause is an “and” condition, in which both must exist to cause the primary effect. If either alone could cause the primary effect, then it is likely there are two separate paths, and the causes should be further divided into condition and action causes. Understanding there is a difference may allow the analysis to proceed to a better understanding of the failure cause. The Apollo method strongly suggests that evidence for the causes be noted. Supposition and guesswork should not be used at this point. If you do not www.StickyMinds.com know, close the path with a “?” and go on to examine other paths. Additional investigation should lead to stopping the path or further diagramming when more information is available. The analysis continues down the path until the point of “collective ignorance” is reached. In other words, no one can identify additional caused-by relationships. Let’s analyze a corrupted-data primary effect. In this example, the corrupted data was caused by the action of hardware and software timing (action cause) and software ignoring the parity fault (condition cause). Following the action cause, the timing was caused by a long delay in sending a command (action cause) and a hardware design error (condition cause), caused by enabling the software trace to log events. Following the first condition cause, ignoring the parity fault was caused by programmers unaware of the significance of the fault (“We don’t know what to do with parity faults, so we ignore them”). Their ignorance could be traced to lack of knowledge in handling faults and an assumption by the hardware designer that the programmers would “obviously” know a parity error meant data was invalid. Figure 2 shows one way of drawing these relationships. Each branch should be read as “caused by,” with the action and condition causes as stated above. I have left the evidence off because in this analysis each of the causes was agreed to by the team and the level of complexity did not require this step. Note that in this example the first branch is an “and” condition because if either path is prevented, the failure will not repeat. If we wished, the “hardware design error” path could be further expanded. As we develop this chart, it is useful to follow the path from left to right, stating “Corrupted data is caused by a late command and parity fault ignored; late command is caused by hardware design error and software trace enabled; and parity fault ignored is caused by programmer ignorance and hardware designer assumption.” This verifies that the “caused by” chain is valid and that we have not skipped any intermediate causes. We should also walk the chart from right to left, stating: “Programmer ignorance and the hardware designer JUNE 2008 BETTER SOFTWARE 39 http://www.StickyMinds.com
Table of Contents Feed for the Digital Edition of Better Software - June 2008 Better Software - June 2008 Contents Mark Your Calendar Contributors Technically Speaking eLightenment Code Craft Test Connection Management Chronicles Agile Model-Driven Development The Myth of Risk Management Stop the Insanity! Product Announcements 10 Things You Might Not Know About … The Last Word Ad Index Better Software - June 2008 Better Software - June 2008 - (Page Intro) Better Software - June 2008 - Better Software - June 2008 (Page Cover1) Better Software - June 2008 - Better Software - June 2008 (Page Cover2) Better Software - June 2008 - Better Software - June 2008 (Page 1) Better Software - June 2008 - Better Software - June 2008 (Page 2) Better Software - June 2008 - Contents (Page 3) Better Software - June 2008 - Mark Your Calendar (Page 4) Better Software - June 2008 - Mark Your Calendar (Page 5) Better Software - June 2008 - Mark Your Calendar (Page 6) Better Software - June 2008 - Mark Your Calendar (Page 7) Better Software - June 2008 - Contributors (Page 8) Better Software - June 2008 - Contributors (Page Telelogic1) Better Software - June 2008 - Contributors (Page Telelogic2) Better Software - June 2008 - Contributors (Page 9) Better Software - June 2008 - Contributors (Page 10) Better Software - June 2008 - Technically Speaking (Page 11) Better Software - June 2008 - eLightenment (Page 12) Better Software - June 2008 - eLightenment (Page 13) Better Software - June 2008 - Code Craft (Page 14) Better Software - June 2008 - Code Craft (Page 15) Better Software - June 2008 - Code Craft (Page 16) Better Software - June 2008 - Code Craft (Page COD1) Better Software - June 2008 - Code Craft (Page COD2) Better Software - June 2008 - Code Craft (Page COD3) Better Software - June 2008 - Code Craft (Page COD4) Better Software - June 2008 - Code Craft (Page 17) Better Software - June 2008 - Test Connection (Page 18) Better Software - June 2008 - Test Connection (Page 19) Better Software - June 2008 - Management Chronicles (Page 20) Better Software - June 2008 - Management Chronicles (Page 21) Better Software - June 2008 - Agile Model-Driven Development (Page 22) Better Software - June 2008 - Agile Model-Driven Development (Page 23) Better Software - June 2008 - Agile Model-Driven Development (Page 24) Better Software - June 2008 - Agile Model-Driven Development (Page 25) Better Software - June 2008 - Agile Model-Driven Development (Page 26) Better Software - June 2008 - Agile Model-Driven Development (Page 27) Better Software - June 2008 - Agile Model-Driven Development (Page 28) Better Software - June 2008 - Agile Model-Driven Development (Page 29) Better Software - June 2008 - The Myth of Risk Management (Page 30) Better Software - June 2008 - The Myth of Risk Management (Page 31) Better Software - June 2008 - The Myth of Risk Management (Page 32) Better Software - June 2008 - The Myth of Risk Management (Page 33) Better Software - June 2008 - The Myth of Risk Management (Page 34) Better Software - June 2008 - The Myth of Risk Management (Page 35) Better Software - June 2008 - Stop the Insanity! (Page 36) Better Software - June 2008 - Stop the Insanity! (Page 37) Better Software - June 2008 - Stop the Insanity! (Page 38) Better Software - June 2008 - Stop the Insanity! (Page 39) Better Software - June 2008 - Stop the Insanity! (Page 40) Better Software - June 2008 - Stop the Insanity! (Page 41) Better Software - June 2008 - Stop the Insanity! (Page 42) Better Software - June 2008 - Stop the Insanity! (Page 43) Better Software - June 2008 - Product Announcements (Page 44) Better Software - June 2008 - Product Announcements (Page 45) Better Software - June 2008 - 10 Things You Might Not Know About … (Page 46) Better Software - June 2008 - The Last Word (Page 47) Better Software - June 2008 - Ad Index (Page 48) Better Software - June 2008 - Ad Index (Page Cover3) Better Software - June 2008 - Ad Index (Page Cover4)
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.