As we research and dig deeper into scaling, we keep running into Netflix. They are very public with their stories. This post is a round up that we put together with Bryan’s help. We collected info from all over the internet. If you’d like to reach out with more info, we’ll append this post. Otherwise, please enjoy!
–Chris / ScaleScale / MaxCDN
|Application & Data|
|Database||MySQL, Cassandra, Oracle||Frameworks||Node.js|
|Cloud Hosting||Amazon EC2|
|SQL Database-as-a-Service||Amazon RDS|
|NoSQL Database-as-a-Service||Amazon DynamoDB|
|Database Cluster Management||Dynomite|
|Productivity Suite||Google Apps|
|Transactional Email||Amazon SES|
|Mobile Push Messaging||Urban Airship|
|Code Collaboration & Version Control||GitHub|
|Server Management||Apache Mesos|
|Log Management||Sumo Logic|
|Mobile Error Monitoring||Crittercism|
|Performance Monitoring||Boundary, LogicMonitor|
|Open Connect CDN|
A look at what we think is interesting about how Netflix Scales
Netflix was founded in 1997 by Marc Randolph and Reed Hastings in Scotts Valley, California and started with 30 employees with 925 working on pay-per-rent.Netflix, now the world’s leading Internet television network, has more than 69 million subscribers in 50 countries enjoying more than ten billion hours of TV shows and movies per month. They are very transparent and publish a lot of information online. We’ve collected it and are sharing the things we think are most interesting:
Supporting Many titles with Amazon
Supporting Many different Devices
NetFlix Connect CDN
Scaling Open Source Projects
NetFlix had a famous presentation about culture. The concepts are about re-thinking HR. A lot of their scaling of people is focused on the principles form this presentation. Here are some sample slides and the presentation. This gives some important context to the culture to understand how they scale their software stack and why it works.
The Full presentation is here
Supporting Many titles with Amazon
Netflix’s infrastructure is on Amazon EC2 with master copies of digital films from movie studios being stored on Amazon S3. Each film is encoded into over 50 different versions based on video resolution and audio quality using machines on the cloud. Over 1 petabyte of data is stored on Amazon. These data are sent to content delivery networks to feed the content to local ISPs.
Netflix uses a number of open-source software at the backend, including Java, MySQL, Gluster, Apache Tomcat, Hive, Chukwa, Cassandra and Hadoop.
Diagram showing Netflix viewable content, tech stack, distribution method and playback devices
Supporting Many Devices
The huge amount of codec and bitrate combinations on Netflix means “having to encode the same title 120 different times before it can be delivered to all streaming platforms”.
Although Netflix uses adaptive bitrate streaming technology to adjust the video and audio quality to match the customer’s download speed, they also provide users the ability to choose the quality of video on its website.
You can watch instantly from any Internet-connected device that offers a Netflix app, such as a computer, gaming console, DVD or Blu-ray player, HDTV, set-top box, home theater system, phone or tablet.
They support every title in the following Codecs with different bit rates to make them work on device and connection.
- Video – VC-1, H.264 (AVC), VC-1, H.263, H.265 (HEVC)
- Audio – WMA, Dolby Digital, Dolby Digital Plus, AAC and Ogg Vorbis
Netflix Open Connect CDN
The Netflix Open Connect CDN is provided for larger ISPs that have over 100,000 subscribers. A specially built low power high storage density appliance caches Netflix content within the ISPs’ data centers to reduce internet transit costs. This appliance runs the FreeBSD operating system, nginx and the Bird Internet routing daemon.
NetFlix Paris Open Connect – Photo Credit: @dtemkin twitter
Watch the Open Connect video here
In 2009, Netflix did a contest called the Netflix prize. They opened up a bunch of anonymized data and allowed teams to try and derive better algorithms. They got a 10.06% uplift of their existing algorithm from the winning team. Netflix was going to run another Netflix Prize but ultimately didn’t because of privacy concerns from the FTC.
The Netflix recommendation system consists of many algorithms. The two core algorithms used in their production system are Restricted Boltzmann Machines (RBM) and a form of Matrix Factorization called SVD++. These two algorithms are combined using a linear blend to produce a single higher accuracy estimate.
Restricted Boltzmann Machines are neural networks that have been modified to work in collaborative filtering. Each user has one RBM with the input node for each representing a movie the user has rated.
SVD++ is an asymmetric form of SVD (Singular Value Decomposition) that makes use of implicit information like RBMs. It was developed by the winning team in the Netflix Prize contest.
On their Engineering blog, the Netflix team covers Learning a Personalized Homepage
Open Source Projects
Popular search terms:
- https://www scalescale com/the-stack-behind-netflix-scaling/
- netflix show about stacks
- netflix stack
- netflix algorithm