source: code/trunk/cmd/soju/main.go@ 483

Last change on this file since 483 was 477, checked in by delthas, 4 years ago

Increase downstream TCP keepalive interval to 1 hour

The rationale for increasing the TCP keepalive interval from 15 seconds
(default) to 1 hour follows.

  • Why increasing TCP keepalives for downstream connections is not an issue wrt to detecting connection interruptions

The use case of TCP keepalives is detecting whether a TCP connection was
forcefully shut down without receiving any TCP FIN or RST frame, when no
data are sent from that endpoint to the other peer.

If any data is sent from the peer and is not ACKed because the
connection was interrupted, the socket will be closed after the TCP RTO
(usually a few seconds) anyway, without the need for TCP keepalives.

Therefore the only use of TCP keepalives is making sure that a peer that
is not writing anything to the socket, and is actively reading and
waiting for new stream data to be received, can, - instead of waiting
forever to receive packets that will never arrive because the connection
was interrupted -, detect this disconnection, close the connection
locally, then try to connect again to its peer.

This only makes sense from a client point-of-view. When an IRC client is
not write(2)ing anything to the socket but is simply waiting for new
messages to arrive, ie read(2)ing on the socket, it must ensure that the
connection is still alive so that any new messages will indeed be sent
to him. So that IRC client should probably enable TCP keepalives.

However, when an IRC server is not writing anything to its downstream
socket, it doesn't care if it misses any messages from its downstream
client: in any case, the downstream client will instantly detect when
its messages are not reaching its server, because of the TCP RTO
(keepalives are not even needed in the client in that specific case),
and will try to reconnect to the server.

Thus TCP keepalives should be enabled for upstream connections, in
order to make sure that soju does not miss any messages coming from
upstream servers, but TCP keepalives are not needed for downstream
connections.

  • Why increasing TCP keepalives for downstream connections is not an issue wrt security, performance, and server socket resources exhaustion

TCP keepalives are orthogonal to security. Malicious clients can open
thousands of TCP connections and keep them open with minimal
bookkeeping, and TCP keepalives will not prevent attacks planning to
use up all available sockets to soju.

It is also unlikely that soju will keep many connections open, and in
the event that thousands of dead, disconnected connections are active in
soju, any upstream message that needs to be sent to downstreams will
disconnect all disconnected downstreams after the TCP RTO (a few
seconds). Performance could only be slightly affected in the few seconds
before a TCP RTO if many messages were sent to a very large number of
disconnected connections, which is extremely unlikely and not a large
impact to performance either.

  • Why increasing TCP keepalives could be helpful to some clients running on mobile devices

In the current state of IRC, most clients running on mobile devices
(mostly running Android and iOS) will probably need to stay connected
at all times, even when the application is in background, in order to
receive private messages and highlight notifications, complete chat
history (and possibly reduced connection traffic due to avoiding all the
initial messages traffic, including all NAMES and WHO replies which
are quite large).

This means most IRC clients on mobile devices will keep a socket open at
all times, in background. When a mobile device runs on a cellular data
connection, it uses the phone wireless radio to transmit all TCP
packets, including TCP packets without user data, for example TCP
keepalives.

On a typical mobile device, a wireless radio consumes significant power
when full active, so it switches between several energy states in order
to conserve power when not in use. It typically has 3 energy states,
from Standby, when no messages are sent, to Low Power, to Full Power;
and switches modes on an average time scale of 15s. This means that any
time any TCP packet is sent from any socket on the device, the radio
switches to a high-power energy state, sends the packet, then stays on
that energy state for around 15s, then goes back to Standby. This
does include TCP keepalives.

If a TCP keepalive of 15s was used, this means that the IRC server would
force all clients running on mobile devices to send a TCP keepalive
packet at least once every 15s, which means that the radio would stay
in its high-power energy state at all times. This would consume a
very significant amount of power and use up battery much faster.

Even though it would seem at first that a mobile device would have many
different sockets open at any time; actually, a typical Android device
typically has at one background socket open, with Firebase Cloud
Messaging, for receiving instant push notifications (for example, for
the equivalent of IRC highlights on other messaging platforms), and
perhaps a socket open for the current foreground app. When the current
foreground app does not use the network, or when no app is currently
used and the phone is in sleep mode, and no notifications are sent, then
the device can effectively have no wireless radio usage at all. This
makes removing TCP keepalives extremely significant with regard to the
mobile device battery usage.

Increasing the TCP keepalive from soju lets downstream clients choose
their own keepalive interval and therefore possibly save battery for
mobile devices. Most modern mobile devices have complex heuristics for
when to sleep the CPU and wireless radio, and have specific rules for
TCP keepalives depending on the current internet connection, sleep
state, etc.

By increasing the downstream TCP keepalive to such a high period, soju
lets clients choose their most optimal TCP keepalive period, which means
that in turn clients can possibly let their mobile device platform
choose best that keepalive for them, thus letting them save battery in
those cases.

File size: 5.8 KB
Line 
1package main
2
3import (
4 "context"
5 "crypto/tls"
6 "flag"
7 "log"
8 "net"
9 "net/http"
10 "net/url"
11 "os"
12 "os/signal"
13 "strings"
14 "sync/atomic"
15 "syscall"
16 "time"
17
18 "github.com/pires/go-proxyproto"
19
20 "git.sr.ht/~emersion/soju"
21 "git.sr.ht/~emersion/soju/config"
22)
23
24// TCP keep-alive interval for downstream TCP connections
25const downstreamKeepAlive = 1 * time.Hour
26
27func main() {
28 var listen, configPath string
29 var debug bool
30 flag.StringVar(&listen, "listen", "", "listening address")
31 flag.StringVar(&configPath, "config", "", "path to configuration file")
32 flag.BoolVar(&debug, "debug", false, "enable debug logging")
33 flag.Parse()
34
35 var cfg *config.Server
36 if configPath != "" {
37 var err error
38 cfg, err = config.Load(configPath)
39 if err != nil {
40 log.Fatalf("failed to load config file: %v", err)
41 }
42 } else {
43 cfg = config.Defaults()
44 }
45
46 if listen != "" {
47 cfg.Listen = append(cfg.Listen, listen)
48 }
49 if len(cfg.Listen) == 0 {
50 cfg.Listen = []string{":6697"}
51 }
52
53 db, err := soju.OpenSQLDB(cfg.SQLDriver, cfg.SQLSource)
54 if err != nil {
55 log.Fatalf("failed to open database: %v", err)
56 }
57
58 var tlsCfg *tls.Config
59 var tlsCert atomic.Value
60 if cfg.TLS != nil {
61 cert, err := tls.LoadX509KeyPair(cfg.TLS.CertPath, cfg.TLS.KeyPath)
62 if err != nil {
63 log.Fatalf("failed to load TLS certificate and key: %v", err)
64 }
65 tlsCert.Store(&cert)
66
67 tlsCfg = &tls.Config{
68 GetCertificate: func(*tls.ClientHelloInfo) (*tls.Certificate, error) {
69 return tlsCert.Load().(*tls.Certificate), nil
70 },
71 }
72 }
73
74 srv := soju.NewServer(db)
75 // TODO: load from config/DB
76 srv.Hostname = cfg.Hostname
77 srv.LogPath = cfg.LogPath
78 srv.HTTPOrigins = cfg.HTTPOrigins
79 srv.AcceptProxyIPs = cfg.AcceptProxyIPs
80 srv.Debug = debug
81
82 for _, listen := range cfg.Listen {
83 listenURI := listen
84 if !strings.Contains(listenURI, ":/") {
85 // This is a raw domain name, make it an URL with an empty scheme
86 listenURI = "//" + listenURI
87 }
88 u, err := url.Parse(listenURI)
89 if err != nil {
90 log.Fatalf("failed to parse listen URI %q: %v", listen, err)
91 }
92
93 switch u.Scheme {
94 case "ircs", "":
95 if tlsCfg == nil {
96 log.Fatalf("failed to listen on %q: missing TLS configuration", listen)
97 }
98 host := u.Host
99 if _, _, err := net.SplitHostPort(host); err != nil {
100 host = host + ":6697"
101 }
102 ircsTLSCfg := tlsCfg.Clone()
103 ircsTLSCfg.NextProtos = []string{"irc"}
104 lc := net.ListenConfig{
105 KeepAlive: downstreamKeepAlive,
106 }
107 l, err := lc.Listen(context.Background(), "tcp", host)
108 if err != nil {
109 log.Fatalf("failed to start TLS listener on %q: %v", listen, err)
110 }
111 ln := tls.NewListener(l, ircsTLSCfg)
112 ln = proxyProtoListener(ln, srv)
113 go func() {
114 if err := srv.Serve(ln); err != nil {
115 log.Printf("serving %q: %v", listen, err)
116 }
117 }()
118 case "irc+insecure":
119 host := u.Host
120 if _, _, err := net.SplitHostPort(host); err != nil {
121 host = host + ":6667"
122 }
123 lc := net.ListenConfig{
124 KeepAlive: downstreamKeepAlive,
125 }
126 ln, err := lc.Listen(context.Background(), "tcp", host)
127 if err != nil {
128 log.Fatalf("failed to start listener on %q: %v", listen, err)
129 }
130 ln = proxyProtoListener(ln, srv)
131 go func() {
132 if err := srv.Serve(ln); err != nil {
133 log.Printf("serving %q: %v", listen, err)
134 }
135 }()
136 case "unix":
137 ln, err := net.Listen("unix", u.Path)
138 if err != nil {
139 log.Fatalf("failed to start listener on %q: %v", listen, err)
140 }
141 ln = proxyProtoListener(ln, srv)
142 go func() {
143 if err := srv.Serve(ln); err != nil {
144 log.Printf("serving %q: %v", listen, err)
145 }
146 }()
147 case "wss":
148 addr := u.Host
149 if _, _, err := net.SplitHostPort(addr); err != nil {
150 addr = addr + ":https"
151 }
152 httpSrv := http.Server{
153 Addr: addr,
154 TLSConfig: tlsCfg,
155 Handler: srv,
156 }
157 go func() {
158 if err := httpSrv.ListenAndServeTLS("", ""); err != nil {
159 log.Fatalf("serving %q: %v", listen, err)
160 }
161 }()
162 case "ws+insecure":
163 addr := u.Host
164 if _, _, err := net.SplitHostPort(addr); err != nil {
165 addr = addr + ":http"
166 }
167 httpSrv := http.Server{
168 Addr: addr,
169 Handler: srv,
170 }
171 go func() {
172 if err := httpSrv.ListenAndServe(); err != nil {
173 log.Fatalf("serving %q: %v", listen, err)
174 }
175 }()
176 case "ident":
177 if srv.Identd == nil {
178 srv.Identd = soju.NewIdentd()
179 }
180
181 host := u.Host
182 if _, _, err := net.SplitHostPort(host); err != nil {
183 host = host + ":113"
184 }
185 ln, err := net.Listen("tcp", host)
186 if err != nil {
187 log.Fatalf("failed to start listener on %q: %v", listen, err)
188 }
189 ln = proxyProtoListener(ln, srv)
190 go func() {
191 if err := srv.Identd.Serve(ln); err != nil {
192 log.Printf("serving %q: %v", listen, err)
193 }
194 }()
195 default:
196 log.Fatalf("failed to listen on %q: unsupported scheme", listen)
197 }
198
199 log.Printf("server listening on %q", listen)
200 }
201
202 sigCh := make(chan os.Signal, 1)
203 signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM, syscall.SIGHUP)
204
205 if err := srv.Start(); err != nil {
206 log.Fatal(err)
207 }
208
209 for sig := range sigCh {
210 switch sig {
211 case syscall.SIGHUP:
212 if cfg.TLS != nil {
213 log.Print("reloading TLS certificate")
214 cert, err := tls.LoadX509KeyPair(cfg.TLS.CertPath, cfg.TLS.KeyPath)
215 if err != nil {
216 log.Printf("failed to reload TLS certificate and key: %v", err)
217 break
218 }
219 tlsCert.Store(&cert)
220 }
221 case syscall.SIGINT, syscall.SIGTERM:
222 log.Print("shutting down server")
223 srv.Shutdown()
224 return
225 }
226 }
227}
228
229func proxyProtoListener(ln net.Listener, srv *soju.Server) net.Listener {
230 return &proxyproto.Listener{
231 Listener: ln,
232 Policy: func(upstream net.Addr) (proxyproto.Policy, error) {
233 tcpAddr, ok := upstream.(*net.TCPAddr)
234 if !ok {
235 return proxyproto.IGNORE, nil
236 }
237 if srv.AcceptProxyIPs.Contains(tcpAddr.IP) {
238 return proxyproto.USE, nil
239 }
240 return proxyproto.IGNORE, nil
241 },
242 }
243}
Note: See TracBrowser for help on using the repository browser.