DR. JEFF DANIELS
  • Home
  • About
  • Publications and Speaking
  • Contact
Digital Transformation | Leader | Professor

Netflix Approach to the Cloud: Simian Army

10/20/2011

0 Comments

 
Ariel Tseitlin and Yury Izrailevsky from Netflix share their approach to cloud adoption using "Simian Army" suite of tools.
Below are the definition of the various tools Netflix engineers created:

Chaos Monkey, a tool that randomly disables our production instances to make sure we can survive this common type of failure without any customer impact.

Latency Monkey induces artificial delays in our RESTful client-server communication layer to simulate service degradation and measures if upstream services respond appropriately. In addition, by making very large delays, we can simulate a node or even an entire service downtime (and test our ability to survive it) without physically bringing these instances down. This can be particularly useful when testing the fault-tolerance of a new service by simulating the failure of its dependencies, without making these dependencies unavailable to the rest of the system.

Conformity Monkey finds instances that don’t adhere to best-practices and shuts them down. For example, we know that if we find instances that don’t belong to an auto-scaling group, that’s trouble waiting to happen. We shut them down to give the service owner the opportunity to re-launch them properly.

Doctor Monkey taps into health checks that run on each instance as well as monitors other external signs of health (e.g. CPU load) to detect unhealthy instances. Once unhealthy instances are detected, they are removed from service and after giving the service owners time to root-cause the problem, are eventually terminated.

Janitor Monkey ensures that our cloud environment is running free of clutter and waste. It searches for unused resources and disposes of them.

Security Monkey is an extension of Conformity Monkey. It finds security violations or vulnerabilities, such as improperly configured AWS security groups, and terminates the offending instances. It also ensures that all our SSL and DRM certificates are valid and are not coming up for renewal.

10-18 Monkey (short for Localization-Internationalization, or l10n-i18n) detects configuration and run time problems in instances serving customers in multiple geographic regions, using different languages and character sets.

Chaos Gorilla is similar to Chaos Monkey, but simulates an outage of an entire Amazon availability zone. We want to verify that our services automatically re-balance to the functional availability zones without user-visible impact or manual intervention.

I like the approach of the Simian Army to simulate failures and keep systems healthy, responsive, and available.  Two follow-on thoughts:
  1. Is the Simian Army a suite of COTS tools, homegrown scripts, or a combination of COTS customized.
  2. What are the results of testing and simulation using these tools?
Would be great to see this in a case study format or detailed journal paper.
Entire post (Netflix) - http://techblog.netflix.com/2011/07/netflix-simian-army.html?m=1
0 Comments
    Picture

    Author

    Director
    @lockheedmartin
    | Professor
    @UMDGlobalCampus
    | 1st Cloud Dissertation | Top 5 #Thinkers360 #blockchain #cloud #iot #AI #AIEthics #digital #cyber #5g

    View my profile on LinkedIn
    Follow @jeffdaniels
    Tweets by jeffdaniels

    RSS Feed

    Archives

    December 2022
    August 2022
    March 2021
    February 2021
    January 2021
    December 2020
    September 2020
    August 2020
    February 2020
    January 2019
    October 2015
    April 2015
    January 2015
    September 2014
    August 2014
    July 2014
    June 2014
    May 2014
    March 2014
    February 2014
    January 2014
    December 2013
    November 2013
    August 2013
    July 2013
    June 2013
    February 2013
    December 2012
    October 2012
    September 2012
    August 2012
    April 2012
    March 2012
    February 2012
    January 2012
    December 2011
    November 2011
    October 2011
    September 2011

    Categories

    All
    4h
    Acoustic
    Adele
    Adoption
    Aero
    Aerospace
    Airshow
    Alliance
    Architect
    Architecture
    Astronaut
    Augustine
    Bahill
    Book
    Books
    Boxing
    Budget
    Business
    Business Card
    Candidate
    Card
    Career
    Careerdevelopment
    Chan
    Chowder
    Cio
    Cities Names
    Clam
    Cloud
    Cloudcomputing
    Cnci
    College
    Computing
    Conference
    Connectivity
    Crowe
    Csedweek
    Cto
    Cyber
    Cybersecurity
    Deep Dive
    Defense
    Denise
    Dfw
    Digital
    Ebook
    Education
    Email
    Engineering
    Exploration
    Extreme
    F35
    Fall
    Fb
    Fedgov
    Fighter
    Flight
    Flighttest
    Florida
    Food
    Framework
    Frazier
    Get
    Gissing
    Glennis
    Google
    Haunted
    Hbr
    Heterogeneous
    History
    Homogeneous
    Horwath
    House
    Ideacast
    Identity
    Insiderhighered
    Internet
    Interview
    Joe
    Jsf
    Kindle
    Kindlefire
    Klout
    Kolditz
    Leadership
    Learning
    Linkedin
    Lm
    Martin
    Meeting
    Mentor
    Miracles
    Mit
    Mobile
    Monkey
    Mst3k
    Music
    Nasa
    Nascar
    Nelson
    Netflix
    Networking
    Nist
    Norm
    Orlando
    Phd
    Pictures
    Post
    Practice
    Process
    Pumpkin
    Put
    Quote
    Races
    Ragan
    Recipe
    Results
    Robots
    Role
    Rollinginthedeep
    Scary
    Search
    Security
    Servo
    Silence
    Simian
    Smokin
    Smoothie
    Snarky
    Socialnetwork
    Sound Barrier
    Space
    Speakup
    Spending
    Star
    Stem
    Sterman
    Strategy
    Success
    Systems
    Systemsengineering
    Teaching
    Teamtexas
    Techmgmt
    #techmgmt
    Techmgmt#
    #techmgt
    Technology
    Texas
    Tms
    Togaf
    Townhall
    Treat
    Trend
    Trust
    Tx
    Web
    Web2.0
    X1
    Yeager

    RSS Feed

Powered by Create your own unique website with customizable templates.
Photos used under Creative Commons from europeanspaceagency, ▓▒░ TORLEY ░▒▓, Lori_NY, Dean_Groom, dalecruse, Fin Cosplay & Amigurumi, Iain Farrell, erin_everlasting, palindrome6996, Easa Shamih (eEko) | P.h.o.t.o.g.r.a.p.h.y, markhillary, Matt McGee, Marc_Smith, woodleywonderworks, agustilopez, rachel_titiriga, SeaDave, cheri lucas., Caio H. Nunes, grabbingsand, Armchair Aviator, quinn.anya, Jennifer Kumar, billaday, edtechworkshop, chucknado, purpleslog, yugenro, christianeager, dground, GlasgowAmateur, expertinfantry, OiMax
  • Home
  • About
  • Publications and Speaking
  • Contact