~titusx/nginx/module-http-push

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
![NCHAN](https://raw.githubusercontent.com/slact/nchan/master/nchan_logo.png)

Nchan is a scalable, flexible pub/sub server for the modern web, built as a module for the Nginx web server. It can be configured as a standalone server, or as a shim between your application and tens, thousands, or millions of live subscribers. It can buffer messages in memory, on-disk, or via Redis. All connections are handled asynchronously and distributed among any number of worker processes. It can also scale to many nginx server instances with Redis.

Messages are published to channels with HTTP POST requests and websockets, and subscribed also through websockets, long-polling, EventSource (SSE), or old-fashioned interval polling. Any location can be a subscriber endpoint for up to 4 channels. Each subscriber can be optionally authenticated via a custom application url, and an events meta channel is available for debugging.

## Status and History

The first iteration of Nchan was written in 2009-2010 as the Nginx HTTP Push Module. It was vastly refactored in 2014-2015, and here we are today. The present release is in the **testing** phase. The core features and old functionality are thoroughly tested and stable. Some of the new functionality, specifically *redis storage and channel events are still experimental* and may be a bit buggy, and the rest is somewhere in between.

Nchan is already very fast (parsing regular expressions within nginx uses more CPU cycles than all of the nchan code), but there is also a lot of room left for improvement. This release focuses on *correctness* and *stability*, with further optimizations (like zero-copy message publishing) planned for later.

Please help make the entire codebase ready for production use! Report any quirks, bugs, leaks, crashes, or larvae you find.

## Getting Started

### Download
##### Pre-built packages
A source package is available for [arch linux](https://archlinux.org): [nginx-nchan-git](https://aur.archlinux.org/packages/nginx-nchan-git/). Debian package coming soon.


### Build and Install
For now, build a recent Nginx version with 
```
./configure --add-module=path/to/nchan ...
make 
```
## Usage

Nchan can be configured as a shim between your application and subscribers, a standalone pub/sub server for web clients, or as websocket proxy with long-polling fallback for your application. There are many other use cases, but for now I will focus on the above.

### The Basics 

The basic unit of most pub/sub solutions is the messaging *channel*. Nchan is no different. Publishers send messages to channels with a certain *channel id*, and subscribers subscribed to those channels receive them. Pretty simple, right? 

Well... the trouble is that nginx configuration does not deal with channels, publishers, and subscribers. Rather, it has several sections for incoming requests to match against *server* and *location* sections. *Nchan configuration directives map servers and locations onto channel publishing and subscribing endpoints*:

```nginx
#very basic nginx confing
worker_processes 5;

http {  
  server {
    listen       80;
    
    location = /sub {
      nchan_subscriber;
      nchan_channel_id foobar;
    }
    
    location = /pub {
      nchan_publisher;
      nchan_channel_id foobar;
    }
  }
}
```

The above maps requests to the URI `/sub` onto the channel `foobar`'s *subscriber endpoint* , and similarly `/pub` onto channel `foobar`'s *publisher endpoint*.


#### Publisher Endpoints

Publisher endpoints are Nginx config *locations* with the *`nchan_publisher`* directive.

Messages can be published to a channel by sending HTTP **POST** requests with the message contents to the *publisher endpoint* locations. You can also publish messages through a **Websocket** connection to the same location.

##### Publishing Messages

Requests and websocket messages are responded to with information about the channel at time of message publication. Here's an example from publishing with `curl`:

```
>  curl --request POST --data "test message" http://127.0.0.1:80/pub

queued messages: 5
last requested: 4 sec. ago (-1=never)
active subscribers: 0
```

The response can be in plaintext (as above), JSON, or XML, based on the request's *`Accept`* header:

```
> curl --request POST --data "test message" -H "Accept: text/json" http://127.0.0.2:80/pub

{"messages": 2, "requested": 4, "subscribers": 0 }
```

Websocket publishers also receive the same responses when publishing, with the encoding determined by the *`Accept`* header present during the handshake.

The response code for an HTTP request is *`202` Accepted* if no subscribers are present at time of publication, or *`201` Created* if at least 1 subscriber was present.

##### Other Publisher Endpoint Actions

- HTTP `GET` requests return channel information without publishing a message. The response code is `200` if the channel exists, and `404` otherwise:

  ```
  > curl --request POST --data "test message" http://127.0.0.2:80/pub
  ...
  
  > curl -v --request GET -H "Accept: text/json" http://127.0.0.2:80/pub
  
  {"messages": 1, "requested": 7, "subscribers": 0 }
  ```

- HTTP `DELETE` requests delete a channel. Like the `GET` requests, this returns a `200` status response with channel info if the channel existed, and a `404` otherwise.

#### Subscriber Endpoint

Subscriber endpoints are Nginx config *locations* with the *`nchan_subscriber`* directive.

Nchan supports several different kinds of subscribers for receiving messages: *Websocket*, *EventSource* (Server Sent Events),  *Long-Poll*, and *Interval-Poll*.

- *Long-Polling*  
  Initiated by sending an HTTP `GET` request to a channel subscriber endpoint.  
  The long-polling subscriber walks through a channel's message queue via the built-in cache mechanism of HTTP clients, namely with the "`Last-Modified`" and "`Etag`" headers. Explicitly, to receive the next message for given a long-poll subscriber response, send a request with the "`If-Modified-Since`" header set to the previous response's "`Last-Modified`" header, and "`If-None-Match`" likewise set to the previous response's "`Etag`" header.  
  Sending a request without a "`If-Modified-Since`" or "`If-None-Match`" headers returns the first message in a channel's message queue.
  
- *Interval-Polling*  
  Works just like long-polling, except if the requested message is not yet available, immediately responds with a `304 Not Modified`.

- *Websocket*  
  Nchan supports the latest protocol version 13 (RFC 6455). To use a websocket subscriber, initiate a connection to the desired subscriber endpoint location.  
  If the websocket connection is closed by the server, the `close` frame will contain the HTTP response code and status line describing the reason for closing the connection.  
  Websocket extensions and subprotocols are not yet supported.
  
- *EventSource* ( Server-Sent Events )  
  Initiated by sending an HTTP `GET` request to a channel subscriber endpoint with the "`Accept: text/event-stream`" header. Each message `data: ` segment will be prefaced by the message `id: `.  
  To resume a closed EventSource connection from the last-received message, initiate the connection with the "`Last-Event-ID`" header set to the last message's `id`.


#### PubSub Endpoint  

PubSub endpoints are Nginx config *locations* with the *`nchan_pubsub`* directive.

A combination of *publisher* and *subscriber* endpoints, this location treats all HTTP `GET`
requests as subscribers, and all HTTP `POST` as publishers. One simple use case is an echo server:

```nginx
  location = /pubsub {
    nchan_pubsub;
    nchan_channel_id foobar;
  }
```

A more applicable setup may set different publisher and subscriber channel ids:

```nginx
  location = /pubsub {
    nchan_pubsub;
    nchan_publisher_channel_id foo;
    nchan_subscriber_channel_id bar;
  }
```

Here, subscribers will listen for messages on channel `foo`, and publishers will publish messages to channel `bar`. This can be useful when setting up websocket proxying between web clients and your application.

## The Channel ID

So far the examples have used static channel ids, which is not very useful in practice. It can be set to any nginx *variable*, such as a querystring argument, a header value, or a part of the location url:

```nginx
  location = /sub_by_ip {
    #channel id is the subscriber's IP address
    nchan_subscriber;
    nchan_channel_id $remote_addr;
  }
  
  location /sub_by_querystring {
    #channel id is the query string parameter chanid
    # GET /sub/sub_by_querystring?foo=bar&chanid=baz will have the channel id set to 'baz'
    nchan_subscriber;
    nchan_channel_id $arg_chanid;
  }

  location ~ /sub/(\w+)$ {
    #channel id is the word after /sub/
    # GET /sub/foobar_baz will have the channel id set to 'foobar_baz'
    # I hope you know your regular expressions...
    nchan_subscriber;
    nchan_channel_id $1; #first capture of the location match
  }
```

#### Channel Multiplexing

Any subscriber location can be an endpoint for up to 4 channels. Messages published to all the specified channels will be delivered in-order to the subscriber. This is configured by specifying multiple channel ids for the `nchan_channel_id` or `nchan_channel_subscriber_id` config directive:

```nginx
  location ~/multisub/(\w+)/(\w+)$ {
    nchan_subscriber;
    nchan_channel_id "$1" "$2" "common_channel";
    #GET /multisub/foo/bar will be subscribed to:
    # channels 'foo', 'bar', and 'common_channel',
    #and will received messages from all of the above.
  }
```

Publishing to multiple channels from one location is not supported.

## Configuration Directives

- **nchan_channel_id**  
  default: `(none)`  
  context: server, location, if  
  > Channel id for a publisher or subscriber location. Can have up to 4 values to subscribe to up to 4 channels.    

- **nchan_publisher**  
  context: server, location, if  
  legacy name: push_publisher  
  > Defines a server or location as a message publisher. Requests to a publisher location are treated as messages to be sent to subscribers. See the protocol documentation for a detailed description.    

- **nchan_publisher_channel_id**  
  default: `(none)`  
  context: server, location, if  
  > Channel id for publisher location.    

- **nchan_pubsub**  
  default: `(none)`  
  context: server, location, if  
  > Defines a server or location as a publisher and subscriber endpoint. For long-polling, GETs subscribe. and POSTS publish. For Websockets, publishing data on a connection does not yield a channel metadata response. Without additional configuration, this turns a location into an echo server.    

- **nchan_subscriber** `[ any | websocket | eventsource | longpoll | intervalpoll ]`  
  default: `any (websocket|eventsource|longpoll)`  
  context: server, location, if  
  legacy name: push_subscriber  
  > Defines a server or location as a subscriber. This location represents a subscriber's interface to a channel's message queue. The queue is traversed automatically via caching information request headers (If-Modified-Since and If-None-Match), beginning with the oldest available message. Requests for upcoming messages are handled in accordance with the setting provided. See the protocol documentation for a detailed description.    

- **nchan_subscriber_channel_id**  
  default: `(none)`  
  context: server, location, if  
  > Channel id for subscriber location. Can have up to 4 values to subscribe to up to 4 channels.    

- **nchan_subscriber_concurrency** `[ last | first | broadcast ]`  
  context: http, server, location, if  
  legacy name: push_subscriber_concurrency  
  > Controls how multiple subscriber requests to a channel (identified by some common ID) are handled.The values work as follows:  
  >       - broadcast: any number of concurrent subscriber requests may be held.  
  >       - last: only the most recent subscriber request is kept, all others get a 409 Conflict response.  
  >       - first: only the oldest subscriber request is kept, all others get a 409 Conflict response.    

- **nchan_subscriber_first_message** `[ oldest | newest ]`  
  default: `newest`  
  context: server, location, if  
  > Controls the first message received by a new subscriber. 'oldest' returns the oldest available message in a channel's message queue, 'newest' waits until a message arrives    

- **nchan_subscriber_timeout** `[ <number> ]`  
  default: `0 (none)`  
  context: http, server, location, if  
  legacy name: push_subscriber_timeout  
  > The length of time a subscriber's long-polling connection can last before it's timed out. If you don't want subscriber's connection to timeout, set this to 0. Applicable only if a push_subscriber is present in this or a child context.    

- **nchan_authorize_request** `[ <url> ]`  
  context: server, location, if  
  > send GET request to internal location (which may proxy to an upstream server) for authorization of ap ublisher or subscriber request. A 200 response authorizes the request, a 403 response forbids it.    

- **nchan_max_message_buffer_length** `[ <number> ]`  
  default: `10`  
  context: http, server, location  
  legacy name: push_max_message_buffer_length  
  > The maximum number of messages to store per channel. A channel's message buffer will retain at most this many most recent messages.    

- **nchan_max_reserved_memory** `[ <size> ]`  
  default: `32M`  
  context: http  
  legacy name: push_max_reserved_memory  
  > The size of the shared memory chunk this module will use for message queuing and buffering.    

- **nchan_message_buffer_length** `[ <number> ]`  
  default: `*none*`  
  context: http, server, location  
  legacy name: push_message_buffer_length  
  > The exact number of messages to store per channel. Sets both nchan_max_message_buffer_length and nchan_min_message_buffer_length to this value.    

- **nchan_message_timeout** `[ <time> ]`  
  default: `1h`  
  context: http, server, location  
  legacy name: push_message_timeout  
  > The length of time a message may be queued before it is considered expired. If you do not want messages to expire, set this to 0. Applicable only if a nchan_publisher is present in this or a child context.    

- **nchan_min_message_buffer_length** `[ <number> ]`  
  default: `1`  
  context: http, server, location  
  legacy name: push_min_message_buffer_length  
  > The minimum number of messages to store per channel. A channel's message  buffer will retain at least this many most recent messages.    

- **nchan_redis_url**  
  default: `127.0.0.1:6379`  
  context: http  
  > The path to a redis server, of the form 'redis://:password@hostname:6379/0'. Shorthand of the form 'host:port' or just 'host' is also accepted.    

- **nchan_store_messages** `[ on | off ]`  
  default: `on`  
  context: http, server, location, if  
  legacy name: push_store_messages  
  > Whether or not message queuing is enabled. "Off" is equivalent to the setting nchan_channel_buffer_length 0    

- **nchan_use_redis** `[ on | off ]`  
  default: `off`  
  context: http, server, location  
  > Use redis for message storage at this location.    

- **nchan_authorized_channels_only** `[ on | off ]`  
  default: `off`  
  context: http, server, location  
  legacy name: push_authorized_channels_only  
  > Whether or not a subscriber may create a channel by making a request to a push_subscriber location. If set to on, a publisher must send a POST or PUT request before a subscriber can request messages on the channel. Otherwise, all subscriber requests to nonexistent channels will get a 403 Forbidden response.    

- **nchan_channel_group** `[ <string> ]`  
  default: `(none)`  
  context: server, location, if  
  legacy name: push_channel_group  
  > Because settings are bound to locations and not individual channels, it is useful to be able to have channels that can be reached only from some locations and never others. That's where this setting comes in. Think of it as a prefix string for the channel id.    

- **nchan_channel_event_string** `[ <string> ]`  
  default: `$nchan_channel_event $nchan_channel_id`  
  context: server, location, if  
  > Contents of channel event message    

- **nchan_channel_events_channel_id**  
  context: server, location, if  
  > Channel id where `nchan_channel_id`'s events should be sent. Things like subscriber enqueue/dequeue, publishing messages, etc. Useful for application debugging. The channel event message is configurable via nchan_channel_event_string. The channel group for events is hardcoded to 'meta'.    

- **nchan_max_channel_id_length** `[ <number> ]`  
  default: `512`  
  context: http, server, location  
  legacy name: push_max_channel_id_length  
  > Maximum permissible channel id length (number of characters). Longer ids will be truncated.    

- **nchan_max_channel_subscribers** `[ <number> ]`  
  default: `0 (unlimited)`  
  context: http, server, location  
  legacy name: push_max_channel_subscribers  
  > Maximum concurrent subscribers.    

- **nchan_channel_timeout**  
  context: http, server, location  
  legacy name: push_channel_timeout  

- **nchan_storage_engine**  
  context: http, server, location  
  legacy name: push_storage_engine  
  > development directive to completely replace default storage engine. Don't use unless you know what you're doing    

##Contribute
Please support this project with a donation to keep me warm through the winter. I accept bitcoin at 1NHPMyqSanG2BC21Twqi8Pf1pXXgbPuLdJ . Other donation methods can be found at https://nchan.slact.net