The following text describes the evaluation of the most reasonable solution in order to achieve the goal of protecting our infrastructure from DDoS attacks. All collected values and impressions do not lay claim to being correct nor complete. This article only reflects our experiences and data and therefore should be used to help you make your own decisions.
The Electronic Sports League – ESL – is Europe’s largest online computer gaming league. Over 2.6 million registered members generate more than 100 million page impressions per month. In order to have the ability to deal with such a huge amount of data, it requires an extremely stable IT-infrastructure. The ESL was increasingly targeted by DDoS attacks – a distributed network of computers hammering our servers with thousands of requests. Wikipedia: “Denial of service attack“ The goal of DDoS attacks is to make the target unavailable to its intended users, therefore causing economic loss.
In search of a solution to the problem of these DDoS attacks we made a number of different approaches.
The attacks we had to deal with were mostly simple SYN attacks, between 80k and 500k pkts/s in size. Our primary goal was to be resistant to these SYN Attacks. The device should mitigate these attacks as soon as possible, the period of vocational adjustment and thus the amount of configuration should be manageable. The more types of attacks it detects so much the better. Also it should not be attackable itself, so it should be able to operate in transparent mode. As an alternative there are different providers of proxy-services. The IP which should be protected is pointed at the proxy of the provider. They are in charge of defending the attacks. However, looking at the amount of service IPs we offer, together with the amount of traffic we generate, this option was not feasible in our case.
The firewall we had used so far was an HP DL380 with an additional Intel Network Card running Debian. This hardware had massive problems to handle the amount of packets per second. System interrupts between 20k and 25k were leading to si values in “top” between 90% and 100%. Ksoftirq was leading in CPU-Usage. The consequences were dropped packets, the website becoming slow and unresponsive. Having a brief look at Google was promising to find a solution for this problem.
Below I don’t want to immerse myself into details, but rather give a brief overview about actions we have taken. It must be pointed out that all actions were taken to the best of our knowledge at the time, but we cannot rule out a configuration mistake that led to wrong results.
The first hit on Google was NAPI. NAPI is designed for reducing CPU-load caused by high system interrupts. We tried it out but it had no effect on our problem.
Next we tried tuning the Intel network-card-driver. The InterruptThrottleRate was especially interesting for us. Like NAPI, InterruptThrottleRate is also in charge of optimizing interrupts as it delays packets and thereby leading to less CPU load. To our disappointment this also had no effect in our tests.
Another approach was using syn-cookies to avoid at least SYN based DDoS attacks. In this model the common table of half-open TCP-connections is obsolete, so that it cannot be overflowed. The sequence number is calculated each time a handshake will take place. This is a good option for servers which are terminating the attacked IP, but has no influence on the firewall routing those packets. So it would still ne a case of just too many packets.
Also further approaches like SYN-proxy (not available under linux) and iptables tuning were not leading to success, so that we were forced to searching for a hardware solution. So what exactly are we looking for?
Taking a look at the market for usable devices, you are promised that nearly every device is suitable for our situation. In order to develop our own opinion besides what the marketing would have us believe, we tried to reproduce the attacks in our test environment.
The test scenario
We setup 4 servers to reproduce the online scenario. 2 acting as attackers, 1 as web server and 1 as client.
Web server Attacker Client
|CPU||2x AMD Opteron @2,2GHz||2x Intel Xeon @3.20GHz||Intel Pentium 4 @3GHz|
|NIC||BCM5704 Gigabit||82540EM Gigabit||82573L Gigabit|
|OS||Ubuntu lucid (10.04)||Ubuntu lucid (10.04)||Ubuntu lucid (10.04)|
- Webserver: Lighttpd which serves a simple html containing few pictures
- Attacker: sudo hping3 192.168.0.11 –interface eth0 –flood –destport 80 –syn –rand-source –verbose
- Client: Curl-loader constantly loading static html and 4 small pictures
Lets take a look at our nominees ;) We evaluated the following devices in chronological order. Fortigate 310-B, Juniper SRX650 (Routing Mode), Palo Alto PA-2050, TopLayer IPS-1000E, RioRey RX2310U. Except for the Juniper all devices are able to operate in transparent mode.
Fortigates 310-B was recommend and made available for testing by a local computer retailer which also supported us with configuring it so that misconfiguration would be minimized. The device offers many many configuration options and would be categorized as an all-rounder. We especially liked the function of virtual firewalls. Here you can configure completely independent configurations for different scenarios which you can simply enable or disable. For our main problem, the DDoS attacks, the Fortigate offers a set of special anti-DDoS policies which can be applied on every of the virtual firewalls. These policies have again thousands of configuration options you can adjust to your needs. The idea of those policies is to gain control over DDoS attacks through limiting packet rates. Sadly it it emerged in our test scenario that pretty quickly the device encountered the same problems as our Linux firewall. CPU load rises to 100% and all further packets are dropped completely. Also when you disable all rules regarding packet inspection, it cannot manage the volume of packets correctly, so we refrained from enabling further IPS functions.
The Juniper SRX650 is a classic Layer3 Firewall. It does not support transparent mode which forced us to test it in routing mode. Besides rate limiting there are no special anit-DDoS policies that can be configured. Our tests quickly verified our presumption that this simply is not the device we are looking for. Right away few seconds after beginning the attack the SRX650 buckles under the amount of packets. The interface is completely unresponsive and needs about 5 minutes to return to a normal behavior after stopping the attack. The next best model does support the Layer 2 mode but exceeded our price range.
Palo Alto PA-2050
The PA-2050 from Palo Alto Networks also promised to solve our problem. We had direct support from the vendor who was familiar with our test setup and should have led us to quick success with the optimum configuration setup. We were surprised however when we saw Palo Alto behaving the same way the Juniper did. After a few seconds of packets no further traffic was handled and the client tried to access the test page to no avail. The Palo Alto crew tried its best, but under our circumstances we did not find any solution respecting our time frame. We still however think this device is just as good as the juniper for other fields of application.
our testreport in detail overview traffic overview packets
|our testreport in detail||overview traffic||overview packets|
The IPS-1000E belongs to the class of devices which are specialized in Intrusion Preventions Systems (IPS). As we were being attacked at time of evaluation we made the decision to test it in the real production environment.
|Increasing incoming packets and bisection of traffic||At the same time toplayer drops packets||it seems it is overloaded|
|detailed report||after the attack||traffic gap while attack|
The attacks lasted 10 minutes. In the first minutes we were hardly reachable and incoming traffic was cut into half. After 2-5 minutes the situation became more stable and everything went subjectively faster. After 10 minutes our monitoring system changed its state back to normal. The TopLayer solution is obviously not capable of protecting us from a DDoS attack completely. The firewall reported being overloaded at only 48k packets/s. We have already had attacks in the range of half a million packets per second. We believe further investigation and tuning could result in more effective protection, but due to TopLayer being far too expensive, this approach was not followed up any further.
The RioRey Device is specialized in mitigating DDoS attacks and only DDoS attacks. If you are searching for a firewall with also routing etcetera, this is not the right device for you.
It was the last in our test series and turned out to be the best.
At first – disappointed from the other tests – we did not expect much. A mitigation time of 90 seconds and requiring a Windows client to administrate the device were not good signs.
After some time stuck in the German customs, the device finally found its way to our office and we began installing it. Installation in a production environment was done easily without a risk or downtime, because its default configuration is set to monitor mode; that means that all attacks are reported and recognized as in filter mode, but no packets will be dropped; traffic is just passed through. The device offers a WAN, LAN and MGMT interface. Once connected to MGMT interface, you configure the basic setup browsing at the preconfigured IP over https.
|basic setup page|
Here you configure the basics like IP address, syslog server, snmp, passwords and etcetera. To get an insight view on monitor and filter mode you need a windows client which has the RioReys software called “rVIEW” installed. You connect to the configured IP and now can get much more information and configuration options. So lets start the test:
|our testreport in detail||riorey in action|
As you can see it takes 90 seconds to analyze the traffic. After this time, about 90% of legitimate traffic is passed through, all illegitimate traffic is blocked. And this happened with zero configuration (except ip,password). You just put the RioRey in place, switch to filter mode and that’s it. Besides the really simple installation, the most important point is, that it is the only device that actually lives up to what its promise. So RioRey call themselves rightly “The DDoS specialist”.
After these tests we installed the device in our production environment with direct communication to the RioRey tech team. They were analyzing our traffic and suggesting the optimum settings for our environment. What really impressed our team, was the detailed analysis they provided of an a attack we dealt while RioRey was active. It turned out that not only the device itself is of high quality, but – even more important – the staff behind this device is.
After some weeks in operation, there were a few things which are not perfect at the moment. Many alerts which seem to have had no effect whether filtered or not are reported. At this moment we cannot really say what of those attacks are really a threat for our website. So there’s a need of some more tweaking. Another point is the weak status log. Somehow it is not displaying recent hardware events like “link up/down state changed, power failures and stuff” you only see the last event that switched state.
On the flip side we also experienced many positive behaviors. You can reboot and update the device without a downtime. The traffic during this time is just unfiltered. The reports are doing a good job so we can identify the duration of the attack and the attackers IP(s). All things considered we are fully satisfied with this device. It just does what we expect of it. Heres another attack defended by RioRey in production environment:
|our backbone reports the attack||riorey report||thanks to riorey it has no influence on our rendertime|
Author: Thomas Poehler