2
2
* FD polling functions for Speculative I/O combined with Linux epoll()
4
* Copyright 2000-2007 Willy Tarreau <w@1wt.eu>
4
* Copyright 2000-2008 Willy Tarreau <w@1wt.eu>
6
6
* This program is free software; you can redistribute it and/or
7
7
* modify it under the terms of the GNU General Public License
8
8
* as published by the Free Software Foundation; either version
9
9
* 2 of the License, or (at your option) any later version.
12
* This code implements "speculative I/O" under Linux. The principle is to
13
* try to perform expected I/O before registering the events in the poller.
14
* Each time this succeeds, it saves an expensive epoll_ctl(). It generally
15
* succeeds for all reads after an accept(), and for writes after a connect().
16
* It also improves performance for streaming connections because even if only
17
* one side is polled, the other one may react accordingly depending on the
18
* level of the buffer.
20
* It has a presents drawbacks though. If too many events are set for spec I/O,
21
* those ones can starve the polled events. Experiments show that when polled
22
* events starve, they quickly turn into spec I/O, making the situation even
23
* worse. While we can reduce the number of polled events processed at once,
24
* we cannot do this on speculative events because most of them are new ones
25
* (avg 2/3 new - 1/3 old from experiments).
27
* The solution against this problem relies on those two factors :
28
* 1) one FD registered as a spec event cannot be polled at the same time
29
* 2) even during very high loads, we will almost never be interested in
30
* simultaneous read and write streaming on the same FD.
32
* The first point implies that during starvation, we will not have more than
33
* half of our FDs in the poll list, otherwise it means there is less than that
34
* in the spec list, implying there is no starvation.
36
* The second point implies that we're statically only interested in half of
37
* the maximum number of file descriptors at once, because we will unlikely
38
* have simultaneous read and writes for a same buffer during long periods.
40
* So, if we make it possible to drain maxsock/2/2 during peak loads, then we
41
* can ensure that there will be no starvation effect. This means that we must
42
* always allocate maxsock/4 events for the poller.
13
47
#include <unistd.h>
370
406
* succeeded. This reduces the number of unsucessful calls to
371
407
* epoll_wait() by a factor of about 3, and the total number of calls
409
* However, when we do that after having processed too many events,
410
* events waiting in epoll() starve for too long a time and tend to
411
* become themselves eligible for speculative polling. So we try to
412
* limit this practise to reasonable situations.
375
if (status >= MIN_RETURN_EVENTS) {
415
spec_processed += status;
416
if (status >= MIN_RETURN_EVENTS && spec_processed < absmaxevents) {
376
417
/* We have processed at least MIN_RETURN_EVENTS, it's worth
377
418
* returning now without checking epoll_wait().
400
441
wait_time = __tv_ms_elapsed(&now, exp) + 1;
403
/* now let's wait for real events */
404
fd = MIN(maxfd, global.tune.maxpollevents);
444
/* now let's wait for real events. We normally use maxpollevents as a
445
* high limit, unless <nbspec> is already big, in which case we need
446
* to compensate for the high number of events processed there.
448
fd = MIN(absmaxevents, spec_processed);
449
fd = MAX(global.tune.maxpollevents, fd);
405
452
status = epoll_wait(epoll_fd, epoll_events, fd, wait_time);
456
503
if (epoll_fd < 0)
506
/* See comments at the top of the file about this formula. */
507
absmaxevents = MAX(global.tune.maxpollevents, global.maxsock/4);
459
508
epoll_events = (struct epoll_event*)
460
calloc(1, sizeof(struct epoll_event) * global.tune.maxpollevents);
509
calloc(1, sizeof(struct epoll_event) * absmaxevents);
462
511
if (epoll_events == NULL)