The acronym “RASM” encompasses four separate but related characteristics of a data processing, mechanical, or other physical system : reliability, availability, serviceability, and manageability. IBM is commonly noted1 as one of the first users of the acronym “RAS” (reliability, availability, and serviceability) in the early data processing machinery industry to describe the robustness of its products. The “M” was recently added to RASM to highlight the key role “manageability” plays in supporting system robustness by facilitating many dimensions of reliability, availability, and serviceability. RASM features can contribute significantly to the mission of systems for test, measurement, control, and experimentation as well as their associated business goals.
In general, RASM expresses the robustness of a system related to how well it performs its intended function. Therefore the RASM characteristics of a system are crucial to the quality of the mission for which the system is deployed. This has a great impact on both technical and business outcomes. For example, RASM functions can aid in establishing when preventive maintenance or replacement should take place. This, in turn, can effectively convert a surprise or unplanned outage into a manageable, planned outage, and thus maintain smoother service delivery and increase business continuity.
As the number of systems increases for a given mission, the ability to simply know what assets exist, their locations, and their conditions directly affects the efficiency of a company or an organization. In addition, with many systems, it becomes increasingly more difficult to perform updates and maintenance in an orderly and error-free manner. If systems are in remote locations, such as in a tunnel or up high on a structure, the effort and cost to access them can negatively impact business operations. Strong RASM characteristics afford great efficiency in these scenarios and hence lower the cost of ownership and operation of a system.
As illustrated in Figure 1, the four components of RASM are interrelated and many times supportive and overlapping.
In the context of instrumentation and computing systems, reliability is the probability that a system will function as expected without failure for a given duration of time in a specified environment. That is, reliability is a function of time, and it expresses the probability at a time in the future (t+1) that a system is still working, given that it was working at an earlier time (t). Less formally, “Reliability is when stuff doesn’t break.”
Click here for more information on reliability
Availability is the probability that a system is able to perform its intended function when called upon, even in the midst of some failures. It can also mean the extent to which the system is simply operating even though some of its functions may not be. Thus a system can be available but not necessarily reliable. If a specific function of a system is failing but is corrected during operation to regain the desired function, then the system is said to be fully available though not fully reliable.
Click here for more information on availability
Mean time between failure (MTBF) is a common measurement parameter for managing risk, predicting reliability, predicting availability, and planning for a system’s spare parts.
Figure 1. The Interrelated Components of RASM
Click here for more information on MTBF
Serviceability is the measure of, and the set of features that support, the ease and speed at which a failed system can be diagnosed and repaired. A key parameter associated with measuring serviceability is the mean time to repair (MTTR). MTTR also directly affects availability because a quick repair (low MTTR) to get the system operational again means the system is more available.
Click here for more information on serviceability and MTTR
Manageability is the measure of, and set of features that support, the ease and competence to which a system can be configured, controlled, and supervised. “Systems management,” as it is commonly called in the information technology (IT) industry, encompasses several tasks as depicted in Figure 2. Systems management features two fundamentally different modes: in-band and out-of-band.
In-band management occurs in the system’s main OS. It uses the system’s main production processor to implement management tasks as well as its intended application. It typically exposes a rich set of management capabilities while operating exclusively in the “fully on” system state.
Out-of-band management occurs in a separate dedicated “management processor” that is independent of the system’s main processor and OS. This typically exposes a subset of in-band management capabilities but frees up the system to focus fully on its application. Out-of-band management can operate in a variety of system states, including low power and even failure states.
Figure 2. Tasks Associated With Manageability
Click here for more information on manageability
To illustrate RASM in the context of both IT and telecommunications systems as well as test, measurement, and control systems, consider the following scenarios summarized in Table 1.
Table 1. Example Scenarios for RASM
 Daniel P. Siewiorek and Robert S. Swarz, Reliable Computer Systems: Design and Evaluation, 3rd ed. (A K Peters/CRC Press, 1998), 508.