ScaleScaleScaleScale

Great Architectures, Stacks & DevOps at Webscale

By Chris Ueland


Scaling CloudFlare’s Massive WAF

Application
HTTP Servernginx
App Server:OpenResty
JIT Compiler:LuaJIT
Algorithms
String Matching:Aho-Corasick
Rules
Open Rules:OWASP
System Profiling
FlameGraph: SystemTap generated
Real Time Analyzing: Nginx SystemTap Toolkit
profile

I first heard John speak at the Nginx.Conf conference in San Francisco. He’s done an amazing job explaining a large scale, high volume WAF (Web Application Firewall) platform that he and his colleagues have built. In this interview he’ll explain design goals, benchmarking, testing and WAF rule new roll outs. The story here is really about performance and scale by optimizing every last drop out with Nginx and LUA. Enjoy.

–Chris / ScaleScale / MaxCDN

John Graham-Cumming of CloudFlare

John is an Engineer at Cloudflare that designed their WAF.

What is the vision behind the WAF?

CloudFlare wants to provide a WAF to a very large number of customers. To do so meant two things: being compatible with the existing mod_security WAF so that we could leverage existing rulesets and allow people familiar with mod_security (both CloudFlare people and customers) to write new rules.

How CloudFlare WAF Works

CloudFlare’s WAF stops attacks at the network edge, protecting your website from common web threats and specialized attacks before they reach your servers. It covers both desktop and mobile websites as well as applications.

The Web Application Firewall (WAF) works by examining HTTP requests to your website. It looks at both GET and POST requests and applies rules to help filter out illegitimate traffic from legitimate website visitors. You can decide whether to block, challenge or simulate an attack. With blocking and challenging, CloudFlare’s WAF will block any traffic identified as illegitimate before it reaches your origin web server.

How CloudFlare Works
CloudFlare’s Web Application Firewall (WAF) automatically protects your website from these types of attacks:

• SQL injection, comment spam• Cross-site scripting (XSS)
• Distributed denial of service (DDoS) attacks• Application-specific attacks (WordPress, CoreCommerce)

Testing CloudFlare’s XSS Protection

Using www.jgc.org, it’s very easy to see the CloudFlare WAF in action. Using a simple GET operation with a dummy variable that contains a basic XSS script will trigger the security feature and show a page saying that you have been blocked.

Request Headers

GET /?user=<script>alert("test")</script> HTTP/1.1
Host: jgc.org
Connection: keep-alive
...

Response Headers

HTTP/1.1 403 Forbidden
Date: Wed, 10 Dec 2014 06:56:35 GMT
Content-Type: text/html; charset=UTF-8
...

Click here to see the error screen generated by the WAF

Where did the initial and new rules come from?

We use both the open source OWASP ruleset plus we developed our own internal rules based on attack traffic against CloudFlare customers. Today the majority of blocked requested are being stopped by our custom rules.

We develop rules internally based on attacks or vulnerabilities and then build a test suite (positive and negative tests to ensure that the rules are blocking only what we want). We have a large automatic test suite for the WAF which gets run across the entire rule set to ensure that it’s working correctly.

Recently added WAF Rules
DescriptionExploitBlog Post
Drupal 7 sql injectionSA-CORE-2014-005Drupal 7 SA-CORE-2014-005 SQL Injection Protection
ShellshockShellshock (software bug)Inside Shellshock: How hackers are using it to exploit systems
Shellshock protection enabled for all customers
WHMCS Zero Day VulnerabilityWHMCS Security Advisory for 5.xPatching a WHMCS zero day on day zero
Protect Your Sites With Rapidly Deployed WAF Rules

We process all requests. GETs, POSTs, etc. and the bodies that go with them. We have a custom routine inside the WAF that looks at POST data (for example) and identifies it by both the MIME type and by sniffing the actual bytes looking to see what the data is.

The WAF is not enabled for all customers. Only paying customers receive the WAF.

We work with our customers to define site specific rules for them and regularly put in place WAF rules to block site specific attacks. In future, we plan to roll out a user interface where customers can write and upload their rules for their sites.

Is speed important to you? What is your philosophy?

Yes, speed matters enormously because of the scale of CloudFlare and because part of our service is performance. We have a variety of benchmarking tools but perhaps more important is our metrics system that allows us to examine real-time and historical performance information (including WAF performance).

Our goal is to run on average in under 1ms for each request being processed by the WAF. Currently we are in the 100s of µs (10th’s of milliseconds) per request. As an example, in the last 24 hours we have blocked 1.2 billion HTTP requests (that’s about 14,000 per second).

#statporn
14,000 blocked reqs/sec1.2 billion blocked reqs/day
• Goal: exec all rules <= 1ms• actual execution ~400µs
1,937 string matches5,682 general rules
102 Cloudflare Rules

When you first launched, what kind of latency did you see?

When the code was first written and tested we were seeing about 10ms latency on a laptop machine. That was optimized using techniques like function memoization and then some architectural changes (mostly the elimination of the use of closures) and the latency was close to 1ms. After that the WAF was put into production and work was done using systemtap and internal tools to analyze LuaJIT and PCRE performance. We worked closely with Mike Pall (the LuaJIT maintainer) to ensure that WAF-specific functions we need are JITed.

Using LuaJIT is night and day. We would not ever use lua itself in production. LuaJIT is way more performant than Lua on x64 hardware (see http://luajit.org/performance_x86.html).

How do you speed things up and look for slow execution?

For the initial tuning of the WAF code we used Lua-based profiling tools (and wrote one ourselves) to look at performance of the Lua code that implements the WAF. Once in production we used systemtap and flamegraphs to identify hotspots and optimize them. When launching into production, we did not need to change anything in our physical infrastructure. We did not purchase or use any new hardware. The WAF is mostly CPU intensive.

< 1ms Latency

< 1ms Latency

Before we implemented the new WAF, CloudFlare has been running Apache alongside nginx just to be able to use mod_security. This combination was very slow and cumbersome. Ultimately it didn’t scale with CloudFlare’s growing business so we started working on a new WAF using nginx + LuaJIT.

CloudFlare is operating one of the world’s largest deployments of nginx + LuaJIT. Every fraction of a microsecond that can be shaved off for processing a request has significant impact so we decided to sponsor some changes to the LuaJIT opensource project.

The overall goal of the project was to get the median WAF block/allow decision made under 1ms in real world scenarios. Optimizations were made by examining the WAF’s performance under a test harness with line-level timing information. We ran the WAF in CloudFlare’s network with very detailed systemtap-based instrumentation.

Information from the systemtap is fed into a pastebin which parses it and produces a flame graph showing where the code is running.

FlameGraph

The flamegraphs early on showed extensive uses of closures which was causing slowness in LuaJIT. Some parts of the compiler were rewritten to remove their use and make it run faster.

Here’s another view generated from the same information which identified hot functions. Here it shows that string matching and regular expressions are the most expensive operations.

FlameGraph

To make these matching functions run faster, We have implemented our own version of the Aho-Corasick algorithm. The Aho-Corasick algorithm is a fast string matching algorithm that can match a large set of keywords simultaneously against incoming text. The advantage of the algorithm is that it can match multiple strings in a single pass over a large body of text, compared to searching for the strings individually using the Boyer-Moore search which requires multiple passes over the text. In this article, the author shows how Aho-Corasick is implemented using Haskell. CloudFlare has also open-sourced a custom Aho-Corasick implementation in Golang and C++ with LUA.

Optimizations in the Lua language, the LuaJIT compiler and the WAF core meant that for a very fast and flexible all Lua WAF which runs within nginx’s core.

See an example in LUA »

local waf_vars = waf.vars
local waf_streq = waf.streq
local waf_setvar = waf.setvar
local waf_msg = waf.msg
local waf_drop = waf.drop
local waf_disabled_ids = waf.disabled_ids
local waf_deny = waf.deny
local waf_activate = waf.activate
local t1_1 = {}
if not waf_disabled_ids['00001'] and waf_streq(waf, v2_5, '2_5', t1_1, '1_1', 'b783efc191a7c066c1d87068f63a84a39f9830bb', false) then
    waf_vars['RULE']['ID'] = ‘00001’
    waf_activate(waf, rulefile)
    waf_msg(waf, 'CloudFlare Test Rule (drop) activated')
    waf_setvar(waf, {{'TX:ANOMALY_SCORE', '+100'},{'TX:%{RULE:ID}', 'CloudFlare unique hash test rule (drop)'}})
    waf_drop(waf, rulefile)
end
if not waf_disabled_ids['00002'] and waf_streq(waf, v2_5, '2_5', t1_1, '1_1', '4709edce126971876b47523778fa7b942ec14b5', false) then
    waf_vars['RULE']['ID'] = '00002'
    waf_activate(waf, rulefile)
    waf_msg(waf, 'CloudFlare Test Rule (deny) activated')
    waf_setvar(waf, {{'TX:ANOMALY_SCORE', '+100'},{'TX:%{RULE:ID}', 'CloudFlare unique hash test rule (deny)'}})
    waf_deny(waf, rulefile)
end

Read & watch more about building a low-latency WAF inside NGINX using Lua

Watch John’s presentation on “Building a low-latency WAF inside NGINX using Lua” on YouTube. You can also download the presentation used in this video here.

Popular search terms:

  • ScalingCloudFlaresMassiveWAF-ScaleScale
  • haproxy as waf lua
  • 9QYB
  • wildnpp
profile

Chris Ueland

http://www.ueland.com

Wanting to call out all the good stuff when it comes to scaling, Chris Ueland created this blog, ScaleScale.