Monday 18 May 2020

What is Chaos Monkey? Chaos engineering explained

Pioneered out of the halls of Netflix during its shift from distributing DVDs to building distributed cloud systems for streaming video, Chaos Monkey introduced an engineering principle that has been embraced by software development organisations of all shapes and sizes: namely, that by intentionally breaking systems you can learn to make them more resilient.

According to the original Netflix blog post on the topic, published in July 2011 by Yury Izrailevsky, then director of cloud and systems infrastructure, and Ariel Tseitlin, director of cloud solutions at the streaming company, Chaos Monkey was designed to randomly disable production instances on its Amazon Web Services (AWS) infrastructure, thus exposing weaknesses that Netflix engineers could eliminate by building better automatic recovery mechanisms.

The catchy name came from “the idea of unleashing a wild monkey with a weapon in your data center (what jobs can you get with a computer science degree) to randomly shoot down instances and chew through cables — all the while we continue serving our customers without interruption,” the blog post states.

No comments:

Post a Comment

Why it's the ideal opportunity for telecoms to zero in on clients

 Brought together computerized stages can help telecoms players incorporate siloed frameworks, robotize basic administrations and improve cl...