ISSN: 2634-8853 | Open Access

Journal of Engineering and Applied Sciences Technology

Journal Menu

Submit Manuscript

Our PubMed Indexed Articles

Detecting Peripheral Neuropathy in Patients with Diabetes, Prediabetes and other High-Risk Conditions: An Advanced Practice Nurse’s Perspective

PMID: 35445219

An Analysis of Peripheral Neuropathy Symptom Characteristics in HIV

PMID: 35174365

Overview of Neurotrauma and Sensory Loss

PMID: 35692955

A mobile app providing individually-tailored psychoeducation about sleep for older adults with chronic health conditions and low health literacy

PMID: 38770111

Fault-Tolerant Event-Driven Systems- Techniques and Best Practices

Author(s): Ashwin Chavan

Reliability is an important feature in modern distributed, event-driven systems to provide simple, easy-to-manage, and fault-tolerant solutions that refrain from degrading into system failures. It is becoming a widespread solution for formulating microservices, serverless, and development of different cloudnative structures. Asynchronous architectures are distinguished by their interconnectedness and geographical dispersion, making them susceptible to failures. This paper aims to analyze some essential fault-tolerant approaches and technologies with the help of real-world examples of significant tech giants such as Netflix, Uber, and GitHub to highlight the problems and solutions in this area. Techniques like redundancy and replication, idempotency, circuit breakers, retry mechanism and event sourcing are considered for their parts in system reliability. Furthermore, more sophisticated techniques such as graceful degradation, systems that heal independently, and sharding are discussed for their ability to improve availability in partial failure scenarios. The use of strategies such as monitoring and observability, as well as fallbacks, are also discussed in this regard. New directions, such as applying Artificial Intelligence in fault detection and Blockchain applications, can positively affect the systems' reliability. In this approach, case studies show how Netflix will maintain uninterrupted streaming with the help of regional redundancy, and Uber will maintain transactional consistency with the help of event sourcing and sagas. GitHub also uses graceful degradation and circuit breaker techniques to support basic functionalities during blackouts. Such considerations illustrate that fault tolerance is essential to improve competitive advantage and customer trust and enable business continuity. This paper encapsulates detailed information and practical solutions that Architects, Developers and Business Executives need to build sustainable, high-availability, fault-tolerant, event-driven systems for a complex computing environment.