~ubuntu-branches/ubuntu/lucid/rsyslog/lucid

« back to all changes in this revision

Viewing changes to doc/syslog-protocol.html

Committer: Bazaar Package Importer
Author(s): Michael Biebl
Date: 2007-10-19 17:21:49 UTC
Revision ID: james.westby@ubuntu.com-20071019172149-ie6ej2xve33mxiu7

Tags: upstream-1.19.10

Import upstream version 1.19.10

files added:

AUTHORS

COPYING

ChangeLog

INSTALL

Makefile.am

Makefile.in

NEWS

README

aclocal.m4

action.c

action.h

cfsysline.c

cfsysline.h

compile

config.guess

config.h.in

config.sub

configure

configure.ac

contrib

contrib/README

depcomp

doc/Makefile.am

doc/Makefile.in

doc/bugs.html

doc/contributors.html

doc/features.html

doc/generic_design.html

doc/history.html

doc/how2help.html

doc/install.html

doc/ipv6.html

doc/man_rsyslogd.html

doc/manual.html

doc/modules.html

doc/property_replacer.html

doc/rsconf1_actionexeconlyifpreviousissuspended.html

doc/rsconf1_actionresumeinterval.html

doc/rsconf1_allowedsender.html

doc/rsconf1_controlcharacterescapeprefix.html

doc/rsconf1_debugprintcfsyslinehandlerlist.html

doc/rsconf1_debugprintmodulelist.html

doc/rsconf1_debugprinttemplatelist.html

doc/rsconf1_dircreatemode.html

doc/rsconf1_dirgroup.html

doc/rsconf1_dirowner.html

doc/rsconf1_dropmsgswithmaliciousdnsptrrecords.html

doc/rsconf1_droptrailinglfonreception.html

doc/rsconf1_dynafilecachesize.html

doc/rsconf1_escapecontrolcharactersonreceive.html

doc/rsconf1_failonchownfailure.html

doc/rsconf1_filecreatemode.html

doc/rsconf1_filegroup.html

doc/rsconf1_fileowner.html

doc/rsconf1_includeconfig.html

doc/rsconf1_mainmsgqueuesize.html

doc/rsconf1_moddir.html

doc/rsconf1_modload.html

doc/rsconf1_repeatedmsgreduction.html

doc/rsconf1_resetconfigvariables.html

doc/rsconf1_umask.html

doc/rsyslog_conf.html

doc/rsyslog_mysql.html

doc/rsyslog_packages.html

doc/rsyslog_php_syslog_ng.html

doc/rsyslog_recording_pri.html

doc/rsyslog_stunnel.html

doc/status.html

doc/syslog-protocol.html

doc/version_naming.html

freebsd

freebsd/rsyslogd

iminternal.c

iminternal.h

install-sh

klogd.c

klogd.h

ksym.c

ksym_mod.c

ksyms.h

liblogging-stub.h

linkedlist.c

linkedlist.h

ltmain.sh

missing

module-template.h

module.h

modules.c

modules.h

msg.c

msg.h

net.c

net.h

objomsr.c

objomsr.h

omdiscard.c

omdiscard.h

omfile.c

omfile.h

omfwd.c

omfwd.h

omshell.c

omshell.h

omusrmsg.c

omusrmsg.h

outchannel.c

outchannel.h

parse.c

parse.h

pidfile.c

pidfile.h

plugins

plugins/ommysql

plugins/ommysql/Makefile.am

plugins/ommysql/Makefile.in

plugins/ommysql/contrib

plugins/ommysql/contrib/delete_mysql

plugins/ommysql/createDB.sql

plugins/ommysql/ommysql.c

plugins/ommysql/ommysql.h

redhat

redhat/rsyslog.conf

redhat/rsyslog.init

redhat/rsyslog.log

redhat/rsyslog.sysconfig

rfc3195d.8

rfc3195d.c

rklogd.8

rsyslog.conf.5

rsyslog.h

rsyslogd.8

slackware

slackware/rc.rsyslogd

srUtils.c

srUtils.h

stringbuf.c

stringbuf.h

syslog.c

syslogd-types.h

syslogd.c

syslogd.h

tcpsyslog.c

tcpsyslog.h

template.c

template.h

Show diffs side-by-side

added added

removed removed

doc/syslog-protocol.html

<html>

<head>

<title>syslog-protocol support in rsyslog</title>

</head>

<body>

<h1>syslog-protocol support in rsyslog</h1>

<p><b><a href="http://www.rsyslog.com/">Rsyslog</a>  provides a trial

implementation of the proposed

syslog-protocol</a> standard.</b> The intention of this implementation is to

find out what inside syslog-protocol is causing problems during implementation.

As syslog-protocol is a standard under development, its support in rsyslog is

highly volatile. It may change from release to release. So while it provides

some advantages in the real world, users are cautioned against using it right

now. If you do, be prepared that you will probably need to update all of your

rsyslogds with each new release. If you try it anyhow, please provide feedback

as that would be most beneficial for us.</p>

<h2>Currently supported message format</h2>

<p>Due to recent discussion on syslog-protocol, we do not follow any specific

revision of the draft but rather the candidate ideas. The format supported

currently is:</p>

<p><b><code><PRI>VERSION SP TIMESTAMP SP HOSTNAME SP APP-NAME SP PROCID SP MSGID SP [SD-ID]s

SP MSG</code></b></p>

<p>Field syntax and semantics are as defined in IETF I-D syslog-protocol-15.</p>

<h2>Capabilities Implemented</h2>

<ul>

<li>receiving message in the supported format (see above)</li>

<li>sending messages in the supported format</li>

<li>relaying messages</li>

<li>receiving messages in either legacy or -protocol format and transforming

them into the other one</li>

<li>virtual availability of TAG, PROCID, APP-NAME, MSGID, SD-ID no matter if

the message was received via legacy format, API or syslog-protocol format (non-present

fields are being emulated with great success)</li>

<li>maximum message size is set via preprocessor #define</li>

<li>syslog-protocol messages can be transmitted both over UDP and plain TCP

with some restrictions on compliance in the case of TCP</li>

</ul>

<h2>Findings</h2>

<p>This lists what has been found during implementation:</p>

<ul>

<li>The same receiver must be able to support both legacy and

syslog-protocol syslog messages. Anything else would be a big inconvenience

to users and would make deployment much harder. The detection must be done

automatically (see below on how easy that is).</li>

<li><b>NUL characters inside MSG</b> cause the message to be truncated at

that point. This is probably a major point for many C-based implementations.

No measures have yet been taken against this. Modifying the code to "cleanly"

support NUL characters is non-trivial, even though rsyslogd already has some

byte-counted string library (but this is new and not yet available

everywhere).</li>

<li><b>character encoding in MSG</b>: is is problematic to do the right

UTF-8 encoding. The reason is that we pick up the MSG from the local domain

socket (which got it from the syslog(3) API). The text obtained does not

include any encoding information, but it does include non US-ASCII

characters. It may also include any other encoding. Other than by guessing

based on the provided text, I have no way to find out what it is. In order

to make the syslogd do anything useful, I have now simply taken the message

as is and stuffed it into the MSG part. Please note that I think this will

be a route that other implementors would take, too.</li>

<li>A minimal parser is easy to implement. It took me roughly 2 hours to add

it to rsyslogd. This includes the time for restructuring the code to be able

to parse both legacy syslog as well as syslog-protocol. The parser has some

restrictions, though<ul>

<li>STRUCTURED-DATA field is extracted, but not validated. Structured data

"[test ]]" is not caught as an error. Nor are any other errors caught. For

my needs with this syslogd, that level of structured data processing is

probably sufficient. I do not want to parse/validate it in all cases. This

is also a performance issue. I think other implementors could have the same

view. As such, we should not make validation a requirement.</li>

<li>MSG is not further processed (e.g. Unicode not being validated)</li>

<li>the other header fields are also extracted, but no validation is

performed right now. At least some validation should be easy to add (not

done this because it is a proof-of-concept and scheduled to change).</li>

</ul>

</li>

<li>Universal access to all syslog fields (missing ones being emulated) was

also quite easy. It took me around another 2 hours to integrate emulation of

non-present fields into the code base.</li>

<li>The version at the start of the message makes it easy to detect if we

have legacy syslog or syslog-protocol. Do NOT move it to somewhere inside

the middle of the message, that would complicate things. It might not be

totally fail-safe to just rely on "1 " as the "cookie" for a syslog-protocol.

Eventually, it would be good to add some more uniqueness, e.g. "@#1 ".</li>

<li>I have no (easy) way to detect truncation if that happens on the UDP

stack. All I see is that I receive e.g. a 4K message. If the message was e.g.

6K, I received two chunks. The first chunk (4K) is correctly detected as a

syslog-protocol message, the second (2K) as legacy syslog. I do not see what

we could do against this. This questions the usefulness of the TRUNCATE bit.

Eventually, I could look at the UDP headers and see that it is a fragment. I

have looked at a network sniffer log of the conversation. This looks like

two totally-independent messages were sent by the sender stack.</li>

<li>The maximum message size is currently being configured via a

preprocessor #define. It can easily be set to 2K or 4K, but more than 4K is

not possible because of UDP stack limitations. Eventually, this can be

worked around, but I have not done this yet.</li>

<li>rsyslogd can accept syslog-protocol formatted messages but is able to

relay them in legacy format. I find this a must in real-life deployments.

For this, I needed to do some field mapping so that APP-NAME/PROCID are

100

mapped into a TAG.</li>

101

<li>rsyslogd can also accept legacy syslog message and relay them in

102

syslog-protocol format. For this, I needed to apply some sub-parsing of the

103

TAG, which on most occasions provides correct results. There might be some

104

misinterpretations but I consider these to be mostly non-intrusive. </li>

105

<li>Messages received from the syslog API (the normal case under *nix) also

106

do not have APP-NAME and PROCID and I must parse them out of TAG as

107

described directly above. As such, this algorithm is absolutely vital to

108

make things work on *nix.</li>

109

<li>I have an issue with messages received via the syslog(3) API (or, to be

110

more precise, via the local domain socket this API writes to): These

111

messages contain a timestamp, but that timestamp does neither have the year

112

nor the high-resolution time. The year is no real issue, I just take the

113

year of the reception of that message. There is a very small window of

114

exposure for messages read from the log immediately after midnight Jan 1st.

115

The message in the domain socket might have been written immediately before

116

midnight in the old year. I think this is acceptable. However, I can not

117

assign a high-precision timestamp, at least it is somewhat off if I take the

118

timestamp from message reception on the local socket. An alternative might

119

be to ignore the timestamp present and instead use that one when the message

120

is pulled from the local socket (I am talking about IPC, not the network -

121

just a reminder...). This is doable, but eventually not advisable. It looks

122

like this needs to be resolved via a configuration option.</li>

123

<li>rsyslogd already advertised its origin information on application

124

startup (in a syslog-protocol-14 compatible format). It is fairly easy to

125

include that with any message if desired (not currently done).</li>

126

<li>A big problem I noticed are malformed messages. In -syslog-protocol, we

127

recommend/require to discard malformed messages. However, in practice users

128

would like to see everything that the syslogd receives, even if it is in

129

error. For the first version, I have not included any error handling at all.

130

However, I think I would deliberately ignore any "discard" requirement. My

131

current point of view is that in my code I would eventually flag a message

132

as being invalid and allow the user to filter on this invalidness. So these

133

invalid messages could be redirected into special bins.</li>

134

<li>The error logging recommendations (those I insisted on;)) are not really

135

practical. My application has its own error logging philosophy and I will

136

not change this to follow a draft.</li>

137

<li>Relevance of support for leap seconds and senders without knowledge of

138

time is questionable. I have not made any specific provisions in the code

139

nor would I know how to handle that differently. I could, however, pull the

140

local reception timestamp in this case, so it might be useful to have this

141

feature. I do not think any more about this for the initial proof-of-concept.

142

Note it as a potential problem area, especially when logging to databases.</li>

143

<li>The HOSTNAME field for internally generated messages currently contains

144

the hostname part only, not the FQDN. This can be changed inside the code

145

base, but it requires some thinking so that thinks are kept compatible with

146

legacy syslog. I have not done this for the proof-of-concept, but I think it

147

is not really bad. Maybe an hour or half a day of thinking.</li>

148

<li>It is possible that I did not receive a TAG with legacy syslog or via

149

the syslog API. In this case, I can not generate the APP-NAME. For

150

consistency, I have used "-" in such cases (just like in PROCID, MSGID and

151

STRUCTURED-DATA).</li>

152

<li>As an architectural side-effect, syslog-protocol formatted messages can

153

also be transmitted over non-standard syslog/raw tcp. This implementation

154

uses the industry-standard LF termination of tcp syslog records. As such,

155

syslog-protocol messages containing a LF will be broken invalidly. There is

156

nothing that can be done against this without specifying a TCP transport.

157

This issue might be more important than one thinks on first thought. The

158

reason is the wide deployment of syslog/tcp via industry standard.</li>

159

</ul>

160

<p><b>Some notes on syslog-transport-udp-06</b></p>

161

<ul>

162

<li>I did not make any low-level modifications to the UDP code and think I

163

am still basically covered with this I-D.</li>

164

<li>I deliberately violate section 3.3 insofar as that I do not necessarily

165

accept messages destined to port 514. This feature is user-required and a

166

must. The same applies to the destination port. I am not sure if the "MUST"

167

in section 3.3 was meant that this MUST be an option, but not necessarily be

168

active. The wording should be clarified.</li>

169

<li>section 3.6: I do not check checksums. See the issue with discarding

170

messages above. The same solution will probably be applied in my code.</li>

171

</ul>

172

173

<h2>Conlusions/Suggestions</h2>

174

<p>These are my personal conclusions and suggestions. Obviously, they must be

175

discussed ;)</p>

176

<ul>

177

<li>NUL should be disallowed in MSG</li>

178

<li>As it is not possible to definitely know the character encoding of the

179

application-provided message, MSG should <b>not</b> be specified to use UTF-8

180

exclusively. Instead, it is suggested that any encoding may be used but

181

UTF-8 is preferred. To detect UTF-8, the MSG should start with the UTF-8

182

byte order mask of "EF BB BF" if it is UTF-8 encoded (see section 155.9 of

183

184

http://www.unicode.org/versions/Unicode4.0.0/ch15.pdf</a>) </li>

185

<li>Requirements to drop messages should be reconsidered. I guess I would

186

not be the only implementor ignoring them.</li>

187

<li>Logging requirements should be reconsidered and probably be removed.</li>

188

<li>It would be advisable to specify "-" for APP-NAME is the name is not

189

known to the sender.</li>

190

<li>The implications of the current syslog/tcp industry standard on

191

syslog-protocol should be further evaluated and be fully understood</li>

192

</ul>

193

194

</body>

195

</html>

196

Older »