~ubuntu-branches/ubuntu/maverick/lire/maverick

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Chapter 3. Writing a DLF Schema</title><meta name="generator" content="DocBook XSL Stylesheets V1.68.1"><link rel="start" href="index.html" title="Lire Developer's Manual"><link rel="up" href="pt02.html" title="Part II. Using the Lire Framework"><link rel="prev" href="ch02s07.html" title="DLF Converter API"><link rel="next" href="ch04.html" title="Chapter 4. Writing a New DLF Analyser"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter 3. Writing a DLF Schema</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="ch02s07.html">Prev</a> </td><th width="60%" align="center">Part II. Using the <span class="application">Lire</span> Framework</th><td width="20%" align="right"> <a accesskey="n" href="ch04.html">Next</a></td></tr></table><hr></div><div class="chapter" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="chap:writing-dlf-schema"></a>Chapter 3. Writing a DLF Schema</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="ch03.html#sect:ftpproto-schema">Designing the <span class="type">ftpproto</span> schema</a></span></dt><dd><dl><dt><span class="section"><a href="ch03.html#id2518877">Creating The Schema File</a></span></dt><dt><span class="section"><a href="ch03.html#id2518985">Adding the Schema's Description</a></span></dt><dt><span class="section"><a href="ch03.html#id2519033">Defining the Schema's Fields</a></span></dt><dt><span class="section"><a href="ch03.html#schema-installation">Installing The Schema</a></span></dt></dl></dd></dl></div><p>If you want to develop a DLF converter for an application
        whose logging data model isn't adequately represented by 
        one of the existing DLF schema, you'll need to develop a new
        one.
      </p><p>If you are familiar with SQL, a DLF schema is similar to
        a table schema description. A DLF file can be seen as a
        table, where each log record is represented by a table row.
        Each log record in the same DLF schema shares the same
        fields.
      </p><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sect:ftpproto-schema"></a>Designing the <span class="type">ftpproto</span> schema</h2></div></div></div><p>In this chapter, we will create a new schema for logging
          of FTP session. That DLF schema could serve for an improved
          DLF converter for log files generated by  <span class="productname">Microsoft Internet Information Server</span>&#8482;. <span class="application">Lire</span>
          currently has a DLF converter for these log files but the
          current <span class="type">ftp</span> DLF schema is modelled after the
          <span class="type">xferlog</span> log file which only represents file
          transfers whereas the log generated by <span class="productname">Microsoft Internet Information Server</span>&#8482; contains more
          detailed information on the ftp session.
        </p><p>Here is an example of such a log file: 

          </p><pre class="programlisting">
#Software: Microsoft Internet Information Server 4.0
#Version: 1.0
#Date: 2001-11-29 00:01:32
#Fields: time c-ip cs-method cs-uri-stem sc-status
00:01:32 10.0.0.1 [56]created spacedat/091001092951LGW_Data.zip 226
00:01:32 10.0.0.1 [56]created spacedat/html/bx01g01.gif 226
00:01:32 10.0.0.1 [56]created spacedat/html/catlogo.gif 226
00:01:32 10.0.0.1 [56]QUIT - 226
00:03:32 10.0.0.1 [58]USER badm 331
00:03:32 10.0.0.1 [58]PASS - 230
          </pre><p>

          As you can see, this log file contains other information
          beyond the simple upload/download represented in the
          standard FTP schema. It a session identifier, the command
          executed, as well as the result code of the action. Our new
          schema should be able to represent these things.
        </p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2518877"></a>Creating The Schema File</h3></div></div></div><p>To create a DLF schema, you have to create a XML file
            named after your schema identifier:
            <code class="filename">ftpproto.xml</code>. Schema name should be
            made of alphanumeric characters. This schema identifier is
            case sensitive. You schema identifer shouldn't contains
            hyphens (<code class="literal">-</code>) or underscore characters
            (<code class="literal">_</code>). (The hyphen is used for a special
            purpose).
          </p><p>All DLF schemas starts and ends the same way:
            </p><pre class="programlisting">

&lt;?xml version="1.0" encoding="ascii"?&gt;
&lt;!DOCTYPE lire:dlf-schema PUBLIC
  "-//LogReport.ORG//DTD Lire DLF Schema Markup Language V1.1//EN"
  "http://www.logreport.org/LDSML/1.1/ldsml.dtd"&gt;
&lt;lire:dlf-schema xmlns:lire="http://www.logreport.org/LDSML/"

              superservice="<em class="replaceable"><code>ftpproto</code></em>"
              timestamp="<em class="replaceable"><code>time</code></em>"

              &gt;
&lt;!-- Other elements will go here --&gt;
&lt;/lire:dlf-schema&gt;

            </pre><p>

            The first lines contains the usual XML declaration and
            DOCTYPE declarations, you'll find in many XML documents.
            The real stuff starts at the
            <code class="sgmltag-element">lire:dlf-schema</code>. What is important for
            your schema are the value of the <code class="sgmltag-attribute">superservice</code> and <code class="sgmltag-attribute">timestamp</code> attributes. The
            first one contains your schema identifier. It is called
            &#8220;<span class="quote">superservice</span>&#8221; for historical reasons. The
            other one should contains the name of the field which
            order the record by their event type. (See <a href="ch03.html#sect:schema-field-type" title="The Field Types">the section called &#8220;The Field Types&#8221;</a> for more information.)
          </p><p>The last line in the above excerpt would be the last
            thing in the file and closes the
            <code class="sgmltag-element">lire:dlf-schema</code> element.
          </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2518985"></a>Adding the Schema's Description</h3></div></div></div><p>The next things that goes into the schema file are the
            schema's title and description. Both are intended for
            developers to read and should be informative of the scope
            of the schema:

            </p><pre class="programlisting">

 &lt;!-- Starting lire:dlf-schema element was omitted --&gt;

  &lt;lire:title&gt;DLF Schema for FTP Protocol&lt;/lire:title&gt;

  &lt;lire:description&gt;
    &lt;para&gt;This DLF schema should be used for FTP servers that have
          detailed information on the FTP connection in their log
          files.
    &lt;/para&gt;
    &lt;para&gt;Each record represents a command done by the client during
     the FTP session.
    &lt;/para&gt;
  &lt;/lire:description&gt;


            </pre><p>
          </p><p>The content of the <code class="sgmltag-element">lire:description</code>
            elements are DocBook elements. If you don't know DocBook,
            you just need to know that paragraphs are delimited using the
            <code class="sgmltag-element">para</code> elements.
          </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2519033"></a>Defining the Schema's Fields</h3></div></div></div><p>The only remaining things in the schema definitions
            are the field specifications. Here is the definition of
            the first one:
            </p><pre class="programlisting">

  &lt;lire:field name="time" type="timestamp" label="Timestamp"&gt;
    &lt;lire:description&gt;
      &lt;para&gt;This field contains the timestamp at which the command was
              issued.
      &lt;/para&gt;
    &lt;/lire:description&gt;
  &lt;/lire:field&gt;

            </pre><p>
          </p><p>As you can see, the fields are defined using the
            <code class="sgmltag-element">lire:field</code> element which has three
            attributes:
            </p><div class="variablelist"><dl><dt><span class="term">name</span></dt><dd><p>This attribute contains the name of the field.
                  This name should contains only alphanumeric
                  characters. It can also make use of the underscore character.
                  </p></dd><dt><span class="term">type</span></dt><dd><p>This attribute contains the type of the field.
                    The available types will described shortly.
                  </p></dd><dt><span class="term">label</span></dt><dd><p>This should contains the column label that
                    should be used by default in your report for data
                    coming from this field. This label should be short
                    but descriptive.
                  </p></dd></dl></div><p>
          </p><p>The field's description is held in the
            <code class="sgmltag-element">lire:description</code> element which contains
            DocBook markup. The field's description should be
            descriptive enough so that someone implementing a DLF
            converter for this schema knows what goes where.
          </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="sect:schema-field-type"></a>The Field Types</h4></div></div></div><p>The main types available for fields are:

              </p><div class="variablelist"><dl><dt><span class="term">timestamp</span></dt><dd><p>This should be use for field which contains
                      a value to indicate a particular point in time.
                      All timestamp values are represented in the
                      usual UNIX convention: number of seconds since
                      January 1st 1970.
                    </p><p>Each DLF schema must contains at least one
                      field of this kind and its name should be in the
                      <code class="sgmltag-element">lire:dlf-schema</code>'s <code class="sgmltag-attribute">timestamp</code>
                      attribute.
                    </p></dd><dt><span class="term">hostname</span></dt><dd><p>This type should be used for fields which
                      contains an hostname <span class="emphasis"><em>or</em></span> IP
                      address.
                    </p><p>It is important to mark such fields, because
                      it will possible eventually to resolve
                      automatically IP addresses to hostname.
                    </p></dd><dt><span class="term">bool</span></dt><dd><p>Type for boolean values.</p></dd><dt><span class="term">number</span></dt><dd><p>Type for numeric values.</p><div class="important" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Important</h3><p>You shouldn't use this type when the
                        values are limited in number and are
                        semantically related to an enumeration like
                        result code. You should use the
                        <span class="type">string</span> type for this. 
                      </p><p>You should only use the <span class="type">number</span>
                        type for values which you'll want to report in
                        classes instead on the individual values.
                      </p></div></dd><dt><span class="term">bytes</span></dt><dd><p>This type should be use for numeric values
                      which are quantities in bytes. The more specific
                      typing is useful for display purpose.
                    </p></dd><dt><span class="term">duration</span></dt><dd><p>This type should be use for numeric values
                      which are quantities of time. The more specific
                      typing is useful for display purpose.
                    </p></dd><dt><span class="term">string</span></dt><dd><p>This is the type which can be use for all
                      other purpose.
                    </p></dd></dl></div><p>
            </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>If you read the specifications, you'll find other
                types which are used. These additional types don't
                bring anything over the basic ones defined above and
                you shouldn't use them.
              </p></div><p>In addition to the <span class="type">time</span> field defined
              above, here are the remaining field defintions which
              make our complete <span class="type">ftpproto</span> schema:
              </p><pre class="programlisting">

  &lt;lire:field name="sessid" type="string" label="Session"&gt;
    &lt;lire:description&gt;
     &lt;para&gt;This field should contains an identifier that can used
     to related the commands done in the same FTP session. This
     identifier can be reused, but shouldn't be while the FTP session
     isn't closed.
     &lt;/para&gt;
    &lt;/lire:description&gt;
  &lt;/lire:field&gt;

  &lt;lire:field name="command" type="string" label="Command"&gt;
    &lt;lire:description&gt;
     &lt;para&gt;This field contains the FTP command executed. The FTP
      protocol command names (STOR, RETR, APPE, USER, etc.) should be used.
     &lt;/para&gt;
    &lt;/lire:description&gt;
  &lt;/lire:field&gt;

  &lt;lire:field name="result" type="string" label="Result"&gt;
    &lt;lire:description&gt;
     &lt;para&gt;This should contains the FTP result code after executing
     the command.
     &lt;/para&gt;
    &lt;/lire:description&gt;
  &lt;/lire:field&gt;

  &lt;lire:field name="cmd_args" type="string" label="Argument"&gt;
    &lt;lire:description&gt;
     &lt;para&gt;This field should contains the parameters to the FTP command.
     &lt;/para&gt;
    &lt;/lire:description&gt;
  &lt;/lire:field&gt;

  &lt;lire:field name="size" type="bytes" label="Bytes Transferred"&gt;
    &lt;lire:description&gt;
     &lt;para&gt;When the command involves a transfer like for the RETR or STOR
      command, it should contains the number of bytes transferred.
     &lt;/para&gt;
    &lt;/lire:description&gt;
  &lt;/lire:field&gt;

  &lt;lire:field name="elapsed" type="duration" label="Elasped"&gt;
    &lt;lire:description&gt;
     &lt;para&gt;This field contains the number of seconds executing the
           command took. 
     &lt;/para&gt;
    &lt;/lire:description&gt;
  &lt;/lire:field&gt;

              </pre><p>
            </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="schema-installation"></a>Installing The Schema</h3></div></div></div><p>Making available the new schema to the <span class="application">Lire</span>
            framework is pretty easy: just copy the file to one of the
            directories set in the <code class="varname">lr_schemas_path</code>
            configuration variable. By default, this variable contains
            the directories
            <code class="filename"><em class="replaceable"><code>datadir</code></em>/lire/schemas</code> 
            and
            <code class="filename"><em class="replaceable"><code>HOME</code></em>/.lire/schemas</code>. 
            Like all other configuration variables, its value can be
            changed using the <span><strong class="command">lire</strong></span> tool.
          </p><p>Since we want our schema to be available for other
            users as well, we will install it in the system directory:
            </p><pre class="screen">

&amp;root-prompt; install -m 644 ftproto.xml /usr/local/share/lire/schemas

            </pre><p>

            (In this case, <span class="application">Lire</span> was installed under <code class="filename">/usr/local</code>.
          </p></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="ch02s07.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="pt02.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="ch04.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">DLF Converter API </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 4. Writing a New DLF Analyser</td></tr></table></div></body></html>