Safety Critical & High Availability Systems Masterclass
Duration: 3 days
Number of participants: recommended optimum 15, maximum 25
The primary goal of this Masterclass is to give the participant the skills necessary to design systems and software for real-time and embedded computers in which faults and failures could pose a danger to human life. As part of this, participants gain skills in designing systems for high availability. This is very practical, results-oriented training that provides knowledge and skills that can be applied immediately.
This Masterclass examines the design of embedded systems and software that are to provide services in applications that could, when they fail, threaten the well-being or safety of people. Many, though not all, of these systems must not be stopped under any circumstances, and thus must be designed for high availability. Practical guidance is offered on how to address these concerns when designing systems in fields such as medical, automotive, avionics, nuclear and chemical process control.
The Masterclass surveys concepts and alternatives for system and software architectures appropriate for safety-critical and high availability systems. Following an examination of hazard and risk analysis techniques, the seminar goes on to list a number of approaches to software safety that span fault avoidance, fault detection, and fault containment tactics including redundancy, recovery, masking and barriers. A variety of candidate architectural design patterns are examined, including dual/triple modular redundancy, shutdown monitors, dissimilar independent designs, backup parallel patterns and active/monitor parallel patterns. Many real-world examples are presented.
Systems which are required to provide high availability must be designed to tolerate faults. Their design is usually based on off-the-shelf hardware and software combined in ways that will achieve “five-nines” (99.999%) or greater availability. Basic hardware N-plexing and voting issues are discussed, followed by an in-depth study of a number of backward error recovery fault tolerance techniques including Checkpoint-Rollback, Process Pairs, and Recovery Blocks. The class continues with several forward error recovery techniques. Software design approaches are discussed for run-time Built-In Self Test (BIST) of processor and peripheral hardware. Technical issues such as failover management, data replication, and software design defects, are addressed in depth.
This Masterclass is far from a general course about system or software design theory, but rather it is tightly focused on the design of embedded systems and software that are required to provide their intended functions without endangering the safety or life of users or their environment, while at the same time maintaining high availability if required.
This Masterclass is intended for practicing real-time and embedded systems engineers, software system architects, project managers and technical consultants who have responsibility for designing, structuring and implementing the hardware and software for real-time and embedded computer systems in applications that could, when they fail, threaten the well-being or life of people. Many of these systems have high availability as an additional design requirement.
Course participants are expected to be familiar with general embedded and real-time software design. This knowledge can be gained by attending a prerequisite embedded software design course such as “Architectural Design of Real-Time Software”.
The course is based on lectures, discussions, design examples, exercises.