Reusing TCP connections for HTTP requests in Go

May 2018 ยท 8 minute read

In the OSI model, HTTP is a layer 7 protocol which works on top of layer 4.

Whenever an HTTP client sends a request to the server, it does that by first establishing (or probably re-use) the TCP connection to the server and then sends the request data over it. It is be best explained by an analogy of cable car.

cable-car

The cable car uses cable as a medium to move from point A to point B. Here cable is the tcp connection, while car is the HTTP request.

Continuing with the analogy, if every time a cable needs to be setup between 2 points for the movement of a car, it is going to be a time consuming process. So, rather than setting up the cable every time, we keep the cable intact and then reuse the cable for the car to go from poing A to point B (and vice-versa).

This is exactly what the HTTP clients do to send multiple sequential HTTP requests over the same tcp connection. If the clients do not do this then they will have to pay the extra cost of setting up the tcp connection, which is going to be 1.5 RTT (TCP 3-Way Handshake), every time time a http request needs to be sent. This is going to increase the latency of http requests.

We can verify these claims by doing strace over curl. strace is going to inform us when curl makes a connect system call which initiates a connection on a socket.

[email protected]:~$ url="http://freshworks.com"
[email protected]:~$ strace -e trace=connect \
> curl -s $url $url > /dev/null
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("13.33.170.94")}, 16) = -1 EINPROGRESS (Operation now in progress)
+++ exited with 0 +++

Here we observe that curl makes 2 http requests to freshworks.com but made only 1 connect syscall to establish the tcp connection. That is, 2 requests and 1 tcp connection.
This is one side of the coin, lets look at the other side to understand it better.

[email protected]:~$ url="http://freshworks.com"
[email protected]:~$ strace -e trace=connect \
> curl -s -H 'connection: close'  $url $url > /dev/null
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("13.33.170.38")}, 16) = -1 EINPROGRESS (Operation now in progress)
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("13.33.170.38")}, 16) = -1 EINPROGRESS (Operation now in progress)
+++ exited with 0 +++

Here we ask the server to close the connection after the response has been written by setting the value of connection field of header to close. And as expected we see 2 connect calls being made to establish 2 separate tcp connections to the server (file descriptors are reused hence they can be same).

Bit about Connection pooling

How connection pooling works?

To send data over the wire, we create a tcp connection to the server, send data, and then close it. For next request we do this again, and so on an so forth. To pool the connections, we need to skip the last step, that is, closing of tcp connection. Rather than closing, we put the connection in the pool and in the first step rather than creating a new connection, try to get an idle connection from the pool. If it is available, get it, use it and put it back.

But we have a problem here. At any given time the connections in the pool should be valid. By valid I mean, there should always be an existing peer socket on the other side. If there is no peer then it makes no sense to keep the connection in the pool as writing to it won’t really be transmitted to anyone because of its receiver’s non-existence. How can connection in pool become invalid? 1.) the peer goes down 2.) server restarted 3.) network partitioning. Elaborating on second casa, client has some connections to the server in the pool and the server restarted. Now nothing will happen to the connections in the pool as they are just lieing in memory not doing anything. Though each connection has now been closed by TCP 4-way handshake. (Consider client and server are on same machine.) Client will know about the closed state of connection only when it would try to use it, ie in next request which obviously will fail because the socket is closed. This is the behaviour which one would observe if using Go’s database/sql, go-redis or python’s MySQLdb. And this might go on until the strategy is to re-use the connection and there are still free connections in pool.

We need a way to detect such invalid connections and remove them from the pool. One way would be to detect if the connection is closed as soon as we get it from the pool, efficiently. TODO: Very important, why can’t we detect dropped connections in connection pool? But it seems no one does that. Libraries just fail and probably relies on upper layers to retry.

This is a difficult problem to solve efficiently.

There are 2 ways to solve this:

Firstly enable keepalive on tcp connections. Doing so will make the kernel send probes to the other end of the connection and subsequently wait for a reply. If the reply comes within a period then we know that the other side is active, otherwise the connection will be closed and will be removed from the pool. Here is how it looks in wireshark: tcp-keeplive-in-wireshark Client (port 13718) sends keepalive probe to server running on port 8000 and the server replies back to the client with an acknowledgement, asserting its existence. Sounds good. But this is an approximate solution because of the nature how keepalive works. Lets say we enabled keepalive on a tcp connection and set the interval to be 30s, which means after every 30s send a probe. Now reset the clock to 0 and lets assume the server went down (unceremoniously) at 5th second. Next probe will only go after 25 seconds which means trying to use the connection between 5th second and 30th second would result in failed request. We might reduce the scope of the problem by increasing the frequency of probes but that obviously comes at the cost of network/cpu performance. (TODO: though one problem here is, how do we detect if tcp connection has been closed? Is it trivial?)

TODO: verify this. Second solution is pretty trivial. Whenever we get a connection from pool, we check if the server is reachable by sending some sort of ping to the server and receiving a response. If it works, we’re good. But this comes at the cost of 1 extract request per request made to the server, but solution is 100% complete. Client will never have to go through a failed request because of bad underlying connection. If ping/pong fails client can always dial a new connection.

How Go does it?

Go communicates with HTTP server using http.Client, which contains transport of type http.RoundTripper responsible making the request and getting backing response. This transport is actually responsible for pooling connections for re-use.

Transport exposes 4 options to control use of pooled connections:
1. MaxIdleConns - total #idle connections that a single transport can hold, default 0 which means no limit.
2. MaxIdleConnsPerHost - as name suggests limit per host, default 2.
3. IdleConnTimeout - max time for which a connection can stay in idle pool, default 0.
4. DisableKeepAlives - activate or deactivate connection pooling, default false.

As discussed above, if we create a client whose transport’s DisableKeepAlives set to true, the request sent to the server will contain a header keyed at Connection set to close. And as soon as the client has read all the response’s body and closed it, the connection will be terminated by the server.

It is worth noting that even if the client has enabled the use of keep-alive, ultimately it will be up to the discretion of the server’s keep-alive config which will determine if connections can be reused or not. Go’s HTTP server’s _keep-alive_ness can be configured using http.SetKeepAlivesEnabled(bool). By default, its true, but if set to false it does not matter if value of http.Transport.DisableKeepAlives is false. No keep-alive will be communicated between the server and the client. On the other hand, if the client is communicating with server via a reverse proxy in between then ‘keep-alive’ needs to be enable on that for re-use of connections.

That is about the configuration part.

Lets talk about a potential error that can arise due to programming error. Once we’ve got the response back from the server, Go asks us to read the body fully and to close it as well (note that resp.Body is of type io.ReadCloser) to facilitate re-use of ‘keep-alive’ connections.

Quoting from the doc:

type Response struct {
    // ...

    // It is the caller's responsibility to
    // close Body. The default HTTP client's Transport may not
    // reuse HTTP/1.x "keep-alive" TCP connections if the Body is
    // not read to completion and closed.
    Body io.ReadCloser

    // ...
}

And that makes sense because if we have not emptied the connection before putting it to pool we might get into trouble of reading previous request’s response data because other request’s response has not been cleared yet and is still waiting to be read!

But, sometimes connections will still land up in pool even if the connection is not explicitly drained
and closed. The case here I’m talking about is when the server responds with no data, ie when content-length header’s value is 0.

What happens when returned response is un-assigned? That is, it is assigned to underscore?!! Like _, err = c.Do(req)? Well, this is bad idea because it contradicts what we’re supposed to do when we are done with the response: read it all and close it. So, connections won’t be reused.

If you liked the blog, follow me twitter.