Food Fight

The Podcast where DevOps chefs do battle

Netflix OSS

Show Date: Thursday, May 9, 2013

Join us as we discuss the Netflix OSS tools.

Watch Now


In the News


  • Introductions
  • Chef News
  • Business Week article
  • Why is Netflix doing OSS
  • Cloud-native
  • No traditional HA tools from linux. Why not?
  • S3 is shared filesystem for everything
  • Have abstracted one layer above instances
  • Oracle to SimpleDB transition
  • Switching between NoSQL systems
  • Configuration Management pushed up into the application itself
    • Archaius - Archaius includes a set of configuration management APIs used by Netflix.
    • Eureka - Eureka is a REST (Representational State Transfer) based service that is primarily used in the AWS cloud for locating services for the purpose of load balancing and failover of middle-tier servers.
  • Java as the language of choice
    • There is a Python interface for some of the tools
    • Closure, Groovy, etc. are other areas that are being explored
  • AMI generation
    • Aminator - Easily turn an app into an AMI
      • Take base image, add some packages, run some chef recipes
      • Looking at including chef in the base image for use during build time
    • Code changes are always deployed as new AMIs
    • knife ec2 server create --bake -
    • Average lifetime of an instance is ~35 hours
  • Monitoring
    • AppDynamics - Out-of-band monitoring
    • Atlas
    • Double exponential smoothing
    • FFT - Fast Fourier transform - Look at traffic to be sure it’s going in the expected direction.
    • Real-time FFT written in R is used for alerting. Other availability is determined after-the-fact.
  • Circuit breakers
    • Hystrix - Latency and Fault Tolerance for Distributed Systems - turns off backend if the backend seems to be down or slow. Adds test to see when it should be re-enabled.
    • Turbine - Dashboard that shows the status of the circuit breakers
    • These help with graceful degradation of features on Netflix
  • Application Stack:
    • Tomcat
    • Cassandra
  • Simian Army
    • Chaos Monkey
    • Chaos Gorilla - Will destroy an entire zone
    • Latency Monkey - Reaches into Karyon and injects latency
      • Is much better at finding issues / bugs than Chaos Monkey is
      • Latency Monkey introduces latency, Hystrix should trip circuits
    • Howler Monkey - Looks for overused resources and other auditing
    • Security Monkey - Ensures certs are not expiring soon, etc.
    • Janitor Monkey - Cleans-up unused resources
    • Conformity Monkey
  • How Trotter is using the Netflix stack
    • archaius better than plan old properties files
    • eureka, karyon, asgard
  • Asgard - AWS console “on crack”. Built on Groovy.
    • Necessary when you start deploying auto-scaling groups instead of auto-scaling images
    • When would you not use auto-scaling groups?
      • “Fork lift” operations - moving one app “to the cloud”
      • Trotter recommends auto-scaling group even if the group size is one
  • Time from deploy-ami to instance - about 3 minutes to start a fairly large instance (start 500 in about 8 minutes)
  • How do I get started with the Netflix platform?
    • Flux capicator - Flux Capacitor is a Java-based distributed application demonstrating the following Netflix Open Source components.
    • Netflix Recipes RSS - RSS is a Netflix Recipes application demonstrating how all of the following Netflix Open Source components can be tied together.
  • Netflix OSS Prize - A contest for Software Developers
  • Visiting Netflix offices