Speeding up SSL – All You Need to Know About HAProxy
The following article has been contributed by Marcus “Darix” Rückert, Senior Software Engineer in the Operations & Services Team at SUSE. It first appeared on his personal homepage.
For quite a few years now I have been a HAProxy user, even using snapshots in production for a very long time. Especially after support was added to terminate SSL connections directly in HAProxy. Getting rid of stunnel was so nice…
For a very long time I was doing really well with this setup. But over time more and more services were put behind HAProxy and the connections and the total amount of connections went up. We started to see some performance issues. Which at first sounds weird … if you look at the benchmarks on the HAProxy website they can do thousands if not hundreds of thousands of connections per seconds.
So what happened?
HAProxy is a single process event driven program at its core. This means while we handle one connection and doing the computation for that request, no other connection will actually be handled. And this means every bit of code needs to be as fast as possible so you can quickly switch to the next connection. In general this model works really well and is widely used.
If SSL comes into the picture things get tricky. SSL handshakes are quite expensive computations, so at the end your requests are limited by the number of handshakes you can do per second.
Need for speed
Now you have two options to get more speed: faster computations (meaning faster CPUs or crypto accelerators) or spread the work on more CPUs. The first one is not always an option. Per-core-performance is still growing, but not too much. Which leaves us with option 2.
HAProxy has the nbproc directive but the documentation discourages its use. So step 1 is to ask upstream about it. The answer in short: your problem is the only real use case for it. I tried to argue that the warning could be relaxed perhaps, but without the warning people would probably use the directive even in situations where SSL was not the problem. I learned that during my time with lighttpd, but at least I had some steps to proceed.
Our base configuration looks as follows:
global
log /dev/log daemon
maxconn 32768
chroot /var/lib/haproxy
user haproxy
group haproxy
stats socket /var/lib/haproxy/stats user haproxy group haproxy mode 0640"\"
level operator
tune.bufsize 32768
tune.ssl.default-dh-param 2048
ssl-default-bind-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-"\"
SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-"\"
ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-"\"
SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-"\"
SHA256
ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11
ssl-default-server-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-"\"
GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-"\"
ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-"\"
SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-"\"
SHA256
ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11
defaults
log global
mode http
option log-health-checks
option log-separate-errors
option dontlognull
option httplog
option splice-auto
option socket-stats
retries 3
option redispatch
maxconn 10000
timeout connect 5s
timeout client 60s
timeout server 450s
frontend http
bind 0.0.0.0:80 tfo
bind :::80 v6only tfo
bind 0.0.0.0:443 tfo ssl crt /etc/ssl/services/
bind :::443 v6only tfo ssl crt /etc/ssl/services/
acl is_ssl ssl_fc
default_backend nginx
backend nginx
option forwardfor
server nginx 127.0.0.1:81 check inter 2s
(Editor’s note: We had to insert line breaks to make the entire code available, and we marked them here with “\”. To get the correct code, please remove the “\”.)
A few comments about the config:
- We define the ciphers globally, so we don’t have to specify it on each bind statement later.
- IPv6 sockets get bound with v6 only so we don’t handle IPv4 connections on the IPv6 sockets.
So let’s find out what our base line speed is. Both benchmarks ran in parallel.
$ ab2 -c 20 -n 10000 https://myhost.mydomain.tld/
[snip]
Requests per second: 247.93 [#/sec] (mean)
Time per request: 80.668 [ms] (mean)
Time per request: 4.033 [ms] (mean, across all concurrent requests)
Transfer rate: 55.45 [Kbytes/sec] received
[snip]
$ ab2 -c 20 -n 10000 http://myhost.mydomain.tld/
[snip]
Requests per second: 630.70 [#/sec] (mean)
Time per request: 31.711 [ms] (mean)
Time per request: 1.586 [ms] (mean, across all concurrent requests)
Transfer rate: 141.04 [Kbytes/sec] received
[snip]
The properties of the SSL connection are :
TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,4096,256
The Solution
The changes to the global options and defaults are minor. First we configure that we want 7 processes launched in total. One for our request routing and plain HTTP access. Six processes for handling SSL. With the setting in the default block we ensure that any block without a bind-process statement will be owned by the first process.
global
[snip]
nbproc 7
defaults
[snip]
bind-process 1
Next is our new SSL-only frontend. All we do here is doing the SSL termination and then forward it back to our routing backend. The SSL backend binds the processes 2-7.
listen ssl
bind-process 2-7
The intuitive idea for the bind statements would be:
bind 0.0.0.0:443 tfo ssl crt /etc/ssl/services/ process 2-7
bind :::443 v6only tfo ssl crt /etc/ssl/services/ process 2-7
Each bind statement gets assigned six processes. You could leave out the process statement at the end of the bind lines, and then all processes assigned to this block will be used.
This creates 2 sockets shared among the six processes. When the kernel sends a connection to those sockets, each process tries to grab the connection, but in the end only one process gets it. This might lead to unbalanced work.
Another option is shown below where we get multiple sockets with SO_REUSEPORT. In this case the kernel will distribute the load more fairly over all the processes involved. Our configuration block would look like this:
bind 0.0.0.0:443 tfo ssl crt /etc/ssl/services/ process 2
bind 0.0.0.0:443 tfo ssl crt /etc/ssl/services/ process 3
bind 0.0.0.0:443 tfo ssl crt /etc/ssl/services/ process 4
bind 0.0.0.0:443 tfo ssl crt /etc/ssl/services/ process 5
bind 0.0.0.0:443 tfo ssl crt /etc/ssl/services/ process 6
bind 0.0.0.0:443 tfo ssl crt /etc/ssl/services/ process 7
bind :::443 v6only tfo ssl crt /etc/ssl/services/ process 2
bind :::443 v6only tfo ssl crt /etc/ssl/services/ process 3
bind :::443 v6only tfo ssl crt /etc/ssl/services/ process 4
bind :::443 v6only tfo ssl crt /etc/ssl/services/ process 5
bind :::443 v6only tfo ssl crt /etc/ssl/services/ process 6
bind :::443 v6only tfo ssl crt /etc/ssl/services/ process 7
The SSL session cache is shared between all processes, and the TLS tickets are encrypted using a private key that is generated before all the child processes are forked.
A small optimization for the internal traffic is the tcp-smart-connect option. HAProxy directly sends the data (ie: the proxy protocol header and request data) in the first packet. This ensures that the HTTP back-end has the request available immediately and saves it from having to poll for the data.
option tcp-smart-connect
As the last step in the listen block, we configure forwarding to next back-end. We use send-proxy-v2 here so the HTTP back-end knows the real remote IP.
In theory we could use Unix domain sockets here. But there is no splice support for Unix domain sockets yet ☹️.
server http 127.0.0.1:84 send-proxy-v2
Last but not least we define the additional listening socket for our internal
routing. We cannot use the normal port 80 sockets as we want to allow the
proxy protocol on the socket
frontend http
[snip]
#
# the socket for routing the requests
#
bind 127.0.0.1:84 tfo accept-proxy
acl is_ssl fc_rcvd_proxy
[snip]
One important last point is the is_ssl ACL. Normally you would use ssl_fc (SSL front-end connection) to see if your connection was received via a SSL socket. As we moved the SSL out of this scope we can not use it anymore. But the connections from the SSL front-end also forward the connection data via the proxy protocol. As we only have one socket using this feature, we can safely assume that those connections came from our SSL front-end and thus are secure.
The complete configuration with inline comments for all the changes looks as follows:
global log /dev/log daemon maxconn 32768 chroot /var/lib/haproxy user haproxy group haproxy stats socket /var/lib/haproxy/stats user haproxy group haproxy mode 0640"\" level operator tune.bufsize 32768 tune.ssl.default-dh-param 2048 ssl-default-bind-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-"\" SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-"\" ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-"\" SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-"\" SHA256 ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11 ssl-default-server-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-"\" GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-"\" ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-"\" SHA384:ECDHE- RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-"\" SHA256 ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11 # launch 7 process # process 1 for plain routing # process 2-7 for SSL nbproc 7 defaults log global mode http option log-health-checks option log-separate-errors option dontlognull option httplog option splice-auto option socket-stats retries 3 option redispatch maxconn 10000 timeout connect 5s timeout client 60s timeout server 450s # by default use process 1 bind-process 1 # our new SSL only frontend. All we do here is doing the SSL termination and # then forward it back to our routing backend listen ssl # bind processes 2-7 bind-process 2-7 # # Intuitive idea: # bind 0.0.0.0:443 tfo ssl no-sslv3 crt /etc/ssl/services/ process 2-7 # bind :::443 v6only tfo ssl no-sslv3 crt /etc/ssl/services/ process 2-7 # # Each bind statement gets assigned 6 processes. You could leave out the # process statement at the end of the bind lines and then all processes # assigned to this block will be used. # # This creates 2 sockets shared among the 6 processes. When the kernel sends # a connection to those sockets each process tries to grab the connection # but in the end only 1 process gets it. This might lead to unbalanced work. # # Another option is shown below where we get multiple sockets with # SO_REUSEPORT. In this case the kernel will distribute the load more fairly # over all the processes involved. # # The ssl session cache is shared between all processes and the TLS tickets # are encrypted using a private key that is generated before all the # child processes are forked. # bind 0.0.0.0:443 tfo ssl no-sslv3 crt /etc/ssl/services/ process 2 bind 0.0.0.0:443 tfo ssl no-sslv3 crt /etc/ssl/services/ process 3 bind 0.0.0.0:443 tfo ssl no-sslv3 crt /etc/ssl/services/ process 4 bind 0.0.0.0:443 tfo ssl no-sslv3 crt /etc/ssl/services/ process 5 bind 0.0.0.0:443 tfo ssl no-sslv3 crt /etc/ssl/services/ process 6 bind 0.0.0.0:443 tfo ssl no-sslv3 crt /etc/ssl/services/ process 7 bind :::443 v6only tfo ssl no-sslv3 crt /etc/ssl/services/ process 2 bind :::443 v6only tfo ssl no-sslv3 crt /etc/ssl/services/ process 3 bind :::443 v6only tfo ssl no-sslv3 crt /etc/ssl/services/ process 4 bind :::443 v6only tfo ssl no-sslv3 crt /etc/ssl/services/ process 5 bind :::443 v6only tfo ssl no-sslv3 crt /etc/ssl/services/ process 6 bind :::443 v6only tfo ssl no-sslv3 crt /etc/ssl/services/ process 7 # # Optimization: # # Directly send the data (ie: the proxy protocol header and request data) in # the first packet. This ensures that the http backend has the request # available immediately and saves it from having to poll for the data. # option tcp-smart-connect # forward to next backend. # we use send-proxy-v2 here so the http backend knows the real remote IP # # In theory we could use unix domain sockets here. But there is no splice
# support for unix domain sockets yet *sad face* server http 127.0.0.1:84 send-proxy-v2 frontend http bind 0.0.0.0:80 tfo bind :::80 v6only tfo # bind 0.0.0.0:443 tfo ssl crt /etc/ssl/services/ # bind :::443 v6only tfo ssl crt /etc/ssl/services/ # # the socket for routing the requests # bind 127.0.0.1:84 tfo accept-proxy acl is_ssl fc_rcvd_proxy default_backend nginx backend nginx option forwardfor server nginx 127.0.0.1:81 check inter 2s
(Editor’s note: We had to insert line breaks to make the entire code available, and we marked them here with “\”. To get the correct code, please remove the “\”.)
And we can validate the different listens:
$ ss -tplen | grep haproxy
LISTEN 0 128 127.0.0.1:84 *:* users:(("haproxy",pid=8194,fd=21))
ino:281097 sk:f <->
LISTEN 0 128 *:443 *:* users:(("haproxy",pid=8200,fd=11))
ino:281087 sk:10 <->
LISTEN 0 128 *:443 *:* users:(("haproxy",pid=8199,fd=10))
ino:281086 sk:11 <->
LISTEN 0 128 *:443 *:* users:(("haproxy",pid=8198,fd=9))
ino:281085 sk:12 <->
LISTEN 0 128 *:443 *:* users:(("haproxy",pid=8197,fd=8))
ino:281084 sk:13 <->
LISTEN 0 128 *:443 *:* users:(("haproxy",pid=8196,fd=7))
ino:281083 sk:14 <->
LISTEN 0 128 *:443 *:* users:(("haproxy",pid=8195,fd=6))
ino:281082 sk:15 <->
LISTEN 0 128 *:80 *:* users:(("haproxy",pid=8194,fd=19))
ino:281095 sk:16 <->
LISTEN 0 128 :::443 :::* users:(("haproxy",pid=8200,fd=17))
ino:281093 sk:17 v6only:1 <->
LISTEN 0 128 :::443 :::* users:(("haproxy",pid=8199,fd=16))
ino:281092 sk:18 v6only:1 <->
LISTEN 0 128 :::443 :::* users:(("haproxy",pid=8198,fd=15))
ino:281091 sk:19 v6only:1 <->
LISTEN 0 128 :::443 :::* users:(("haproxy",pid=8197,fd=14))
ino:281090 sk:1a v6only:1 <->
LISTEN 0 128 :::443 :::* users:(("haproxy",pid=8196,fd=13))
ino:281089 sk:1b v6only:1 <->
LISTEN 0 128 :::443 :::* users:(("haproxy",pid=8195,fd=12))
ino:281088 sk:1c v6only:1 <->
LISTEN 0 128 :::80 :::* users:(("haproxy",pid=8194,fd=20))
ino:281096 sk:1d v6only:1 <->
In the example above we see:
- pid 8194 handles *:80, [::]:80 and 127.0.0.1:84.
- pids 8195-8200 handle *:443 and [::]:443
And this really helped?
$ ab2 -c 20 -n 10000 https://myhost.mydomain.tld/
[snip]
Requests per second: 678.84 [#/sec] (mean)
Time per request: 29.462 [ms] (mean)
Time per request: 1.473 [ms] (mean, across all concurrent requests)
Transfer rate: 151.81 [Kbytes/sec] received
[snip]
One could say … YES 👍. With the burden of SSL removed from the main process, we get a sharp increase of performance here as well:
$ ab2 -c 20 -n 10000 http://myhost.mydomain.tld/
[snip]
Requests per second: 19338.32 [#/sec] (mean)
Time per request: 1.034 [ms] (mean)
Time per request: 0.052 [ms] (mean, across all concurrent requests)
Transfer rate: 4324.68 [Kbytes/sec] received
[snip]
Why don’t you test it yourself? 😁
Related Articles
Sep 05th, 2023
Getting granular on GCC 12
Jul 20th, 2023
Check it out: documentation.suse.com featuring new search!
Aug 20th, 2024
Comments
This is a great post and thanks for this.
I was having an issue when trying to force https via redirect (was getting a cyclic loop).
So I separated the bind:80 and bind:84 logic to two frontends:
frontend http
bind 0.0.0.0:80 tfo
bind :::80 v6only tfo
redirect scheme https code 301
frontend https
bind 127.0.0.1:84 tfo accept-proxy
acl is_ssl fc_rcvd_proxy
default_backend nginx
This fixed the issue for me in case anyone should encounter the same.
Is there a better way to do this?
Thanks.