Reliability, Availability, Serviceability (RAS) is a hardware engineering term coined by IBM to promote the extreme robustness of their mainframe computers. A computer built with high levels of RAS is more fault-tolerant, self-correcting when it discovers corrupted data, and quick and easy to repair without disrupting operations. The concept of RAS is no longer confined to computer hardware, but now applies to many kinds of systems, networks, and software.
Reliability indicates the probability that the software or system will produce accurate/correct results consistently, according to its specifications. Vendors can enhance reliability by adding features that help detect and repair problems in their products and adding failover mechanisms. Availability indicates the probability that a system or software will be operational at any given time. Features that allow a system to function even when problems occur, instead of crashing, will enhance the product’s availability. Availability is usually expressed as a percentage of the time you can expect the system to be operational, such as 99.999% (commonly called five nines). Serviceability refers to the ease and speed with which a system can be fixed or maintained without disrupting operations.
IBM Z Systems, POWER systems, Intel Xeon Processors and Oracle SPARC servers are examples of current hardware designed to maximize RAS. Software such as SUSE Linux Enterprise High Availability Extension is designed to deliver high levels of RAS by ensuring service availability through clustering and replication, reducing recovery time, and maintaining data integrity.