The International Electrotechnical Commission (IEC) defines functional safety in the IEC 61508-0: 3.1 standard as “part of the overall safety that depends on a system or equipment operating correctly in response to its inputs.” In the article “IEC 61508 Explained,” the IEC further states, “Functional safety is the detection of a potentially dangerous condition resulting in the activation of a protective or corrective device or mechanism to prevent hazardous events arising or providing mitigation to reduce the consequence of the hazardous event.”
A defined life cycle addresses the analysis, design, installation, operation, and maintenance of equipment. The level of safety is achieved by avoiding or controlling faults. Exida, a certifying agency for functional safety, states, “The goal of functional safety is to design an automatic safety function that will perform the intended function correctly or the system will fail in a predictable (safe) manner.” Performing the intended function is based on the reliability of the system, and failing in a predictable manner is based on the safety design of the system.
Every company should feel obligated to provide equipment and processes that are safe for users, the community, and the environment. The failure to ensure that safety measures are in place can lead to personal injury or death to one or many, damage to the environment, and severe damage or destruction to capital equipment and facilities. The financial impact due to liability claims, equipment loss, business interruption, and company image can severely affect businesses of all sizes. Many governments are now requiring machines imported or built for use in their countries to meet safety requirements. Europe has adopted the Machinery Directive (2006/42/EC) to ensure a common safety level for machinery. The use of functional safety devices can help reduce the risks for hazardous events and help meet governmental agency requirements.
Fortunately, international standards have been published to apply consistent and proven methods to systems requiring functional safety. The base generic specification, IEC 61508, is intended for applications in a variety of industries. Examples of how industry groups have applied the concepts of IEC 61508 and included specific additions to make them more relevant include the following:
IEC 61508 covers the complete safety life cycle of electrical/electronic/programmable electronic (E/E/PE) safety related systems. The standard seeks to reduce risk by addressing the likelihood of a hazardous event occurring and the severity of the consequences if it does. Since zero risk can never be achieved, safety must be considered at the very start of the design so that risks can be properly addressed and reduced.
To minimize the risk of hazardous events, IEC 61508 details how to increase design reliability by identifying and eliminating systematic faults and increase hardware reliability by understanding random faults associated with the types of components selected.
Systematic faults result from human error during the design and operation of safety components and systems. For components to be certified to IEC 61508, documented engineering procedures are evaluated to identify and reduce the chance of oversight due to human error. Reviewing possible failures in all the life-cycle phases, from design to decommissioning, is critical to identify and remove these systematic faults.
Random failures occur when hardware components fail or degrade randomly because of physical stresses such as temperature, corrosion, and fatigue. Safety system designs account for random failures using statistical information produced from test and historical data. Companies can calculate the probability of failure for a component and use it to determine the amount of risk associated with the component and system. IEC 61508 also allows components to be “proven in use,” which accounts for the operational history of the component. For a component to be proven in use, it must have sufficient supporting information such as operational hours, revision history, fault reporting systems, and field failure data. Various methods discussed later in this document can be used to minimize the effect of random failures.
The safety life cycle is provided by the various specifications to give designers a framework for creating safe and cost-effective systems. IEC 61508 divides the life cycle into three main parts: analysis, realization, and operation.
Figure 1. Safety Life Cycle Defined by IEC 61508
The safety needs are identified and investigated in the analysis phase. A hazards and risk analysis is completed to understand what hazardous events could occur, the likelihood of the events, and the consequences of them. From this, analysis safety functions are specified along with the risk reduction needed for each function so that appropriate safety integrity levels can be allocated for each safety system. This phase ends with a Safety Requirements Specification document, which details the analysis phase findings and provides a guideline for the designer to use during the realization phase.
In the realization phase, the designer begins to select the technology and architecture to meet the safety requirements identified in the analysis phase. The components selected undergo reliability and safety calculations to make sure they meet appropriate safety integrity levels. Once validated, the detailed design is documented with wiring diagrams, installation instructions, and operating instructions. At this point, the system can be installed and commissioned so that a factory acceptance test can be completed.
During operation, the final phase, the systems are maintained and repaired as specified in the requirements document. This includes items such as proof tests, operator training, and system modifications to continue to provide a safe system. The decommissioning or disposal of a system can also occur during this phase.
A trained and experienced professional is essential to make sure the safety life cycle is properly followed, validated, and documented. Various certifying and training organizations such as exida train personnel to be certified functional safety experts.
IEC 61511 Part 1: 3.2.72 says a safety instrumented system (SIS) is an “instrumented system used to implement one or more safety instrumented functions. A SIS is composed of any combination of sensor(s), logic solver(s), and final element(s).” A SIS is used to prevent or minimize the risk associated with possible hazardous conditions in process and equipment.
A safety instrumented function (SIF) is the portion of the machine or process that is responsible for the safety critical portion. A SIF is intended to keep the operation safe or place the machine into a safe state to prevent a hazardous event. It consists of three components: sensor, logic solver, and final element.
Figure 2 Components of a Safety Instrumented Function
The sensor measures the conditions of the equipment and detects when hazardous conditions are present. Examples of sensors are emergency stop buttons, light curtains, safety mats, pressure transducers, and temperature transducers. The logic solver reviews all the sensor inputs and performs a safety action when hazardous conditions occur based on the program the user created during the realization phase. It then sends an output signal to a final element to place the equipment into a nonhazardous/safe state. The logic solver keeps the equipment in the safe state until corrective actions are taken and/or the sensors detect safe operation conditions. Examples of final elements are relays and valves.
The safety integrity level (SIL) is a measure of the safety performance for a safety function. It can also be considered the level of risk reduction for the function. Many use the SIL term to specify a target level of risk reduction. IEC 61508 defines four SIL levels. SIL 4 provides the highest level of safety performance, and SIL 1 provides the least and details the requirements to meet each of the SIL levels. All functions and components of a safety function and system must meet the appropriate levels for the system to meet the necessary safety level. If after analysis, all the system components are SIL 3 rated except for one SIL 2 rated component, then the full system can receive no higher than a SIL 2 rating.
SILs depend on many different factors such as systematic capability level for the design and the component suppliers, architectural constraints, hardware fault tolerance and safe failure fraction, and the probability of failure.
As described previously, systematic faults result from human error during the design and operation of safety components and systems. The development process and quality system are evaluated during certification to determine the systematic capability level. IEC 61508 sets forth the requirements for reviewing designs to determine the systematic capability level. Factors such as failure detection accuracy, code protection ability, and diversity of hardware are considered. The certificates of components certified by a third party to a SIL level per IEC 61508 list their systematic capability levels.
Random hardware faults affect the hardware safety integrity of the system. Architectural constraints based on how the components are connected and used in the safety function affect the SIL level. The probability of failure to operate or act on a hazardous event also affects the SIL level.
To help understand the risks and likelihood of failures caused by random hardware faults, techniques such as failure mode effects and diagnostics analysis (FMEDA) are conducted. FMEDA is a detailed analysis of failure modes and diagnostic capabilities for components. This is a proven method for determining failure modes and rates that can be used to calculate safe failure fractions and probabilities of failure. Certifying bodies such as exida and TÜV conduct FMEDA analysis on components and provide designers with the data to use in designing and determining the SIL levels of their safety systems.
The safe failure fraction (SFF) is the fraction of the component’s overall failure rate that results in either a safe fault or a detected unsafe fault. The four types of random hardware that make up the overall failure rate are:
λsu: safe undetected
λsd: safe detected
λdu: dangerous undetected
λdd: dangerous detected
The hardware fault tolerance (HFT) of a safety system of N (either 0, 1, or 2) means that N+1 is the minimum number of faults that can lead to the loss of the safety function. If the hardware’s HFT = 1, the system maintains the safety function if one fault occurs. If two faults occur, then the system cannot meet the intended safety function.
Voting of components is used to provide higher values of HFT. In voting an M out of N (MooN), M is the minimum number of channels that must be available and functioning properly. N is the total number of channels present. A 1oo1 architecture is a simple configuration for which only one component is present and has an HFT=0. A 1oo2 architecture has a total of two components, but only one of those has to function at a given time and has an HFT=1.
The SFF and the HFT level are used when determining the SIL level for the system. IEC 61508 specifies two types of subsystems (components), Type A and Type B, and requires certain SFF and HFT conditions that depend on these subsystems.
Safe Failure Fraction (SFF) | Hardware Fault Tolerance (HFT) | ||
0 | 1 | 2 | |
<60% | SIL 1 | SIL 2 | SIL 3 |
60% ≤ 90% | SIL 2 | SIL 3 | SIL 4 |
90% ≤ 99% | SIL 3 | SIL 4 | SIL 4 |
>99% | SIL 3 | SIL 4 | SIL 4 |
Table 1. The Safety Integrity Level for a Type A Subsystem (simple, well understood, and proven in the field/IEC 61508-2)
Safe Failure Fraction (SFF) | Hardware Fault Tolerance (HFT) | ||
0 | 1 | 2 | |
<60% | Not Allowed | SIL 1 | SIL 2 |
60% ≤ 90% | SIL 1 | SIL 2 | SIL 3 |
90% ≤ 99% | SIL 2 | SIL 3 | SIL 4 |
>99% | SIL 3 | SIL 4 | SIL 4 |
Table 2. Safety Integrity Level for a Type B Subsystem (complex systems that are not fully understood or proven in the field/IEC 61508-2)
The likelihood of a malfunction or failure of a system due to hardware faults, known as the probability of failure, depends on the mode of operation. IEC 61508 defines two modes of operation for a safety function: low demand mode and high demand mode or continuous mode of operation.
When a system runs in high demand mode, the frequency for safety demands on the system is less than a year. Running in continuous mode is equivalent to running in very high demand mode. An example of this type of system is a light curtain protecting the user from a hazard on a piece of manufacturing equipment such as a sheet metal punch press. The probability of dangerous failure per hour (PFH) is used for systems in high demand or continuous mode. In the simplest form, the PFH is equal to λdu (dangerous undetected faults) when the components are used without hardware fault tolerance (HFT = 0). Refer to IEC 61508 for other hardware configurations. Table 3 shows the required PFH values for high demand or continuous mode systems to meet the various SIL levels.
SIL | Probability of Dangerous Failure per Hour (PFH) |
4 | ≥10-9 to <10-8 |
3 | ≥10-8 to <10-7 |
2 | ≥10-7 to <10-6 |
1 | ≥10-6 to <10-5 |
Table 3. Safety Integrity Levels for Safety Functions Operating in High Demand or Continuous Mode (IEC 61508-1)
When running in low demand mode, the frequency for a safety demand on the system is no greater than once per year. An example of a low demand system is a high integrity pressure protection system (HIPPS) in a processing plant. The probability of dangerous failure on demand (PFDavg) is used for systems in low demand mode. Many factors are considered when calculating PFDavg such as proof test interval, repair time, and the architecture of the components (for example, the 1oo2 voting system). Proof testing evaluates the safety system components to detect any failures that may not be detected by diagnostics built into the system. Any failures detected in proof tests are repaired so the system is in a like-new state. By increasing the frequency of proof tests, designers can reach higher SIL levels, but they must consider the cost and complexity of the test. Repair time, also called mean time to repair (MTTR), is the time required to completely repair a failure once detected on a safety system. This time includes the time to detect a repair, get a technician to start the repair, and finish the repair. Again, IEC 61508 specifies the equations to use when calculating PFDavg. Table 4 shows the required PFDavg values for low demand systems to meet the various SIL levels:
SIL | Probability of Dangerous Failure on Demand (PFDavg) |
4 | ≥10-5 to <10-4 |
3 | ≥10-4 to <10-3 |
2 | ≥10-3 to <10-2 |
1 | ≥10-2 to <10-1 |
Table 4. Safety Integrity Levels for Safety Functions Operating in Low Demand Mode (IEC 61508-1)
The probability of failure values for the individual components of a SIF are calculated and then added together to get the overall probability of failure for the SIF. The SILs given for the probability of failure values in the previous tables refer to the overall SIF. Figure 3 shows the recommended guidelines for the typical percentages of each of the components.
Figure 3. Recommended Allocations for Probability of Failure per Component in a Safety Instrumented Function
Functional safety systems are key to avoiding injuries or damage to equipment and the environment. Taking this precaution can minimize possible financial burdens on equipment providers by making the equipment safer. Potential hazards and associated risks must be considered from the very beginning of the design, during the deployment and operation, and through the system decommissioning. The success of any safety system depends on properly trained and certified designers with the thorough knowledge to implement the appropriate safety standards.