Fault tolerance – Knowledge and References

Explore chapters and articles related to this topic

Fog Computing: Present and Future

Published in Ravi Tomar, Avita Katal, Susheela Dahiya, Niharika Singh, Tanupriya Choudhury, Fog Computing, 2023

The service requirements for fog computing ensure that the user requests are constantly updated on the service results. To ensure the service request’s seamless operation, we must consider the fault tolerance and quality of service (QoS). Fault tolerance is an essential part of any actual time application, as there is always a chance for some parts of the systems to stop functioning and hinder the system altogether. Fault tolerance mitigates this risk and provides for solutions when resources become meager, but the maintenance of the service is unquestionable. Processes like point-based restart, job shifting, and task rescheduling are used in cloud architecture to bandage the gap caused by faults in the system.

VLSI Architectures for Supercomputing

View Chapter

Purchase Book

Published in Hojjat Adeli, Supercomputing in Engineering Analysis, 2020

Tse-yun Feng, Chuan-lin Wu

It is apparent that a chip of such complexity will suffer from production defects at the end of production and reliability problems at the run time. To circumvent the yield and reliability problems, fault tolerance is needed. The extreme regularity of the architecture allows fault tolerance to be achieved through reconfiguration in a particularly cost-effective way. A limited number of spare cells and reconfiguration interconnection networks are employed to achieve fault tolerance.

Dependability

View Chapter

Purchase Book

Published in Vivek Kale, Digital Transformation of Enterprise Architecture, 2019

Vivek Kale

Fault tolerance is the ability of a system to continue performing its intended functions in presence of faults. In a broad sense, fault tolerance is associated with reliability, with successful operation, and with the absence of breakdowns. A fault-tolerant system should be able to handle faults in individual hardware or software components, power failures, or other kinds of unexpected problems and still meet its specification.

Providing a new approach to increase fault tolerance in cloud computing using fuzzy logic

View Article

Journal Information

Published in International Journal of Computers and Applications, 2022

Amin Rezaeipanah, Musa Mojarad, Ahad Fakhari

One of the important issues in cloud computing is the increased fault tolerance due to the existence of different resources and different techniques to have been used to solve them [6]. Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of (or one or more faults within) some of its components. The basic concepts of fault-tolerant computing are reviewed, focusing on hardware. Failures, faults, and errors in digital systems are examined, and measured. The use of computational nodes in cloud computing increases the probability of the fault and, on the other hand, since real-time systems are critical in terms of safety, their reliability must be increased [7]. In particular, the fault-tolerant system will tolerate the fault in case of incident and continues its operation. It should be noted that there is a difference between the words of the fault, the error and the failure. In fact failure, the inability of a system or component to perform the required function according to its specification. Fault is a reason, which enables the failure to arise. The underlying cause of the failure is an error in the system [8]. Our goal in this research is to find a strategy to provide load balancing between VMs in the event of an error.

Applying big data and stream processing to the real estate domain

View Article

Journal Information

Published in Behaviour & Information Technology, 2019

Herminio García-González, Daniel Fernández-Álvarez, José Emilio Labra-Gayo, Patricia Ordóñez de Pablos

In this section, we describe the quality model of the three in-study architectures by means of a quality attributes comparison. For simplicity, the serving layer has not been taking into account as the three architectures have this layer in common. The selected quality attributes are: Recoverability: Ability of a system to recover to a stable point once a failure has been produced.Fault tolerance: Property from which a system can continue working when a component has failed.New data gap: Measurement of the waiting time from when a new data is received until it is completely available.Hardware consumption: Amount of required hardware resources needed by an architecture to operate correctly.Modifiability: Property of an architecture that allows the modification of a component using the minimum time possible.

Determining the reliability importance of switching elements in the shuffle-exchange networks

View Article

Journal Information

Published in International Journal of Parallel, Emergent and Distributed Systems, 2019

Fathollah Bistouni, Mohsen Jahanshahi

In general, there are two main approaches to improve the reliability of a system: fault tolerance and fault avoidance. Fault tolerance can be achieved by creating redundancy in system components so that if a component fails, the successor components can be used. In other words, this approach does not take any action until a fault occurs, and then takes the necessary steps to deal with system failure. Although this approach may be to avoid the failure of the whole system, it can lead to reduced system performance (e.g. increase in response time) compared to normal system operation. In addition, the redundancy can lead to an increase in design complexity, cost, weight, space requirements, and so on. In contrast, fault avoidance has a prevention strategy using high-quality and high-reliability components, and it is usually more affordable compared to fault tolerance methodology.