~ubuntu-branches/ubuntu/trusty/exuberant-ctags/trusty

1.1.2 by Colin Watson
Import upstream version 5.7
1
<!-- $Id: EXTENDING.html 198 2002-09-04 01:17:32Z darren $ -->
1 by Colin Watson
Import upstream version 5.5.4
2
<html>
3
<head>
4
<title>Exuberant Ctags: Adding support for a new language</title>
5
</head>
6
<body>
7
8
<h1>How to Add Support for a New Language to Exuberant Ctags</h1>
9
10
<p>
11
<b>Exuberant Ctags</b> has been designed to make it very easy to add your own
12
custom language parser. As an exercise, let us assume that I want to add
13
support for my new language, <em>Swine</em>, the successor to Perl (i.e. Perl
14
before Swine &lt;wince&gt;). This language consists of simple definitions of
15
labels in the form "<code>def my_label</code>". Let us now examine the various
16
ways to do this.
17
</p>
18
19
<h2>Operational background</h2>
20
21
<p>
22
As ctags considers each file name, it tries to determine the language of the
23
file by applying the following three tests in order: if the file extension has
24
been mapped to a language, if the file name matches a shell pattern mapped to
25
a language, and finally if the file is executable and its first line specifies
26
an interpreter using the Unix-style "#!" specification (if supported on the
27
platform). If a language was identified, the file is opened and then the
28
appropriate language parser is called to operate on the currently open file.
29
The parser parses through the file and whenever it finds some interesting
30
token, calls a function to define a tag entry.
31
</p>
32
33
<h2>Creating a user-defined language</h2>
34
35
<p>
36
The quickest and easiest way to do this is by defining a new language using
37
the program options. In order to have Swine support available every time I
38
start ctags, I will place the following lines into the file
39
<code>$HOME/.ctags</code>, which is read in every time ctags starts:
40
41
<code>
42
<pre>
43
  --langdef=swine
44
  --langmap=swine:.swn
45
  --regex-swine=/^def[ \t]*([a-zA-Z0-9_]+)/\1/d,definition/
46
</pre>
47
</code>
48
The first line defines the new language, the second maps a file extension to
49
it, and the third defines a regular expression to identify a language
50
definition and generate a tag file entry for it.
51
</p>
52
53
<h2>Integrating a new language parser</h2>
54
55
<p>
56
Now suppose that I want to truly integrate compiled-in support for Swine into
57
ctags. First, I create a new module, <code>swine.c</code>, and add one
58
externally visible function to it, <code>extern parserDefinition
59
*SwineParser(void)</code>, and add its name to the table in
60
<code>parsers.h</code>. The job of this parser definition function is to
61
create an instance of the <code>parserDefinition</code> structure (using
62
<code>parserNew()</code>) and populate it with information defining how files
63
of this language are recognized, what kinds of tags it can locate, and the
64
function used to invoke the parser on the currently open file.
65
</p>
66
67
<p>
68
The structure <code>parserDefinition</code> allows assignment of the following
69
fields:
70
71
<code>
72
<pre>
73
  const char *name;               /* name of language */
74
  kindOption *kinds;              /* tag kinds handled by parser */
75
  unsigned int kindCount;         /* size of `kinds' list */
76
  const char *const *extensions;  /* list of default extensions */
77
  const char *const *patterns;    /* list of default file name patterns */
78
  parserInitialize initialize;    /* initialization routine, if needed */
79
  simpleParser parser;            /* simple parser (common case) */
80
  rescanParser parser2;           /* rescanning parser (unusual case) */
81
  boolean regex;                  /* is this a regex parser? */
82
</pre>
83
</code>
84
</p>
85
86
<p>
87
The <code>name</code> field must be set to a non-empty string. Also, unless
88
<code>regex</code> is set true (see below), either <code>parser</code> or
89
<code>parser2</code> must set to point to a parsing routine which will
90
generate the tag entries. All other fields are optional.
91
92
<p>
93
Now all that is left is to implement the parser. In order to do its job, the
94
parser should read the file stream using using one of the two I/O interfaces:
95
either the character-oriented <code>fileGetc()</code>, or the line-oriented
96
<code>fileReadLine()</code>. When using <code>fileGetc()</code>, the parser
97
can put back a character using <code>fileUngetc()</code>. How our Swine parser
98
actually parses the contents of the file is entirely up to the writer of the
99
parser--it can be as crude or elegant as desired. You will note a variety of
100
examples from the most complex (c.c) to the simplest (make.c).
101
</p>
102
103
<p>
104
When the Swine parser identifies an interesting token for which it wants to
105
add a tag to the tag file, it should create a <code>tagEntryInfo</code>
106
structure and initialize it by calling <code>initTagEntry()</code>, which
107
initializes defaults and fills information about the current line number and
108
the file position of the beginning of the line. After filling in information
109
defining the current entry (and possibly overriding the file position or other
110
defaults), the parser passes this structure to <code>makeTagEntry()</code>.
111
</p>
112
113
<p>
114
Instead of writing a character-oriented parser, it may be possible to specify
115
regular expressions which define the tags. In this case, instead of defining a
116
parsing function, <code>SwineParser()</code>, sets <code>regex</code> to true,
117
and points <code>initialize</code> to a function which calls
118
<code>addTagRegex()</code> to install the regular expressions which define its
119
tags. The regular expressions thus installed are compared against each line 
120
of the input file and generate a specified tag when matched. It is usually
121
much easier to write a regex-based parser, although they can be slower (one
122
parser example was 4 times slower). Whether the speed difference matters to
123
you depends upon how much code you have to parse. It is probably a good
124
strategy to implement a regex-based parser first, and if it is too slow for
125
you, then invest the time and effort to write a character-based parser.
126
</p>
127
128
<p>
129
A regex-based parser is inherently line-oriented (i.e. the entire tag must be
130
recognizable from looking at a single line) and context-insensitive (i.e the
131
generation of the tag is entirely based upon when the regular expression
132
matches a single line). However, a regex-based callback mechanism is also
133
available, installed via the function <code>addCallbackRegex()</code>. This
134
allows a specified function to be invoked whenever a specific regular
135
expression is matched. This allows a character-oriented parser to operate
136
based upon context of what happened on a previous line (e.g. the start or end
137
of a multi-line comment). Note that regex callbacks are called just before the
138
first character of that line can is read via either <code>fileGetc()</code> or
139
using <code>fileGetc()</code>. The effect of this is that before either of
140
these routines return, a callback routine may be invoked because the line
141
matched a regex callback. A callback function to be installed is defined by
142
these types:
143
144
<code>
145
<pre>
146
  typedef void (*regexCallback) (const char *line, const regexMatch *matches, unsigned int count);
147
148
  typedef struct {
149
      size_t start;   /* character index in line where match starts */
150
      size_t length;  /* length of match */
151
  } regexMatch;
152
</pre>
153
</code>
154
</p>
155
156
<p>
157
The callback function is passed the line matching the regular expression and
158
an array of <code>count</code> structures defining the subexpression matches
159
of the regular expression, starting from \0 (the entire line).
160
</p>
161
162
<p>
163
Lastly, be sure to add your the name of the file containing your parser (e.g.
164
swine.c) to the macro <code>SOURCES</code> in the file <code>source.mak</code>
165
and an entry for the object file to the macro <code>OBJECTS</code> in the same
166
file, so that your new module will be compiled into the program.
167
</p>
168
169
<p>
170
This is all there is to it. All other details are specific to the parser and
171
how it wants to do its job. There are some support functions which can take
172
care of some commonly needed parsing tasks, such as keyword table lookups (see
173
keyword.c), which you can make use of if desired (examples of its use can be
174
found in c.c, eiffel.c, and fortran.c). Almost everything is already taken care
175
of automatically for you by the infrastructure.  Writing the actual parsing
176
algorithm is the hardest part, but is not constrained by any need to conform
177
to anything in ctags other than that mentioned above.
178
</p>
179
180
<p>
181
There are several different approaches used in the parsers inside <b>Exuberant
182
Ctags</b> and you can browse through these as examples of how to go about
183
creating your own.
184
</p>
185
186
<h2>Examples</h2>
187
188
<p>
189
Below you will find several example parsers demonstrating most of the
190
facilities available. These include three alternative implementations
191
of a Swine parser, which generate tags for lines beginning with
192
"<CODE>def</CODE>" followed by some name.
193
</p>
194
195
<code>
196
<pre>
197
/***************************************************************************
198
 * swine.c
199
 * Character-based parser for Swine definitions
200
 **************************************************************************/
201
/* INCLUDE FILES */
202
#include "general.h"    /* always include first */
203
204
#include &lt;string.h&gt;     /* to declare strxxx() functions */
205
#include &lt;ctype.h&gt;      /* to define isxxx() macros */
206
207
#include "parse.h"      /* always include */
208
#include "read.h"       /* to define file fileReadLine() */
209
210
/* DATA DEFINITIONS */
211
typedef enum eSwineKinds {
212
    K_DEFINE
213
} swineKind;
214
215
static kindOption SwineKinds [] = {
216
    { TRUE, 'd', "definition", "pig definition" }
217
};
218
219
/* FUNCTION DEFINITIONS */
220
221
static void findSwineTags (void)
222
{
223
    vString *name = vStringNew ();
224
    const unsigned char *line;
225
226
    while ((line = fileReadLine ()) != NULL)
227
    {
228
        /* Look for a line beginning with "def" followed by name */
229
        if (strncmp ((const char*) line, "def", (size_t) 3) == 0  &amp;&amp;
230
            isspace ((int) line [3]))
231
        {
232
            const unsigned char *cp = line + 4;
233
            while (isspace ((int) *cp))
234
                ++cp;
235
            while (isalnum ((int) *cp)  ||  *cp == '_')
236
            {
237
                vStringPut (name, (int) *cp);
238
                ++cp;
239
            }
240
            vStringTerminate (name);
241
            makeSimpleTag (name, SwineKinds, K_DEFINE);
242
            vStringClear (name);
243
        }
244
    }
245
    vStringDelete (name);
246
}
247
248
/* Create parser definition stucture */
249
extern parserDefinition* SwineParser (void)
250
{
251
    static const char *const extensions [] = { "swn", NULL };
252
    parserDefinition* def = parserNew ("Swine");
253
    def-&gt;kinds      = SwineKinds;
254
    def-&gt;kindCount  = KIND_COUNT (SwineKinds);
255
    def-&gt;extensions = extensions;
256
    def-&gt;parser     = findSwineTags;
257
    return def;
258
}
259
</pre>
260
</code>
261
262
<p>
263
<pre>
264
<code>
265
/***************************************************************************
266
 * swine.c
267
 * Regex-based parser for Swine
268
 **************************************************************************/
269
/* INCLUDE FILES */
270
#include "general.h"    /* always include first */
271
#include "parse.h"      /* always include */
272
273
/* FUNCTION DEFINITIONS */
274
275
static void installSwineRegex (const langType language)
276
{
277
    addTagRegex (language, "^def[ \t]*([a-zA-Z0-9_]+)", "\\1", "d,definition", NULL);
278
}
279
280
/* Create parser definition stucture */
281
extern parserDefinition* SwineParser (void)
282
{
283
    static const char *const extensions [] = { "swn", NULL };
284
    parserDefinition* def = parserNew ("Swine");
285
    parserDefinition* const def = parserNew ("Makefile");
286
    def-&gt;patterns   = patterns;
287
    def-&gt;extensions = extensions;
288
    def-&gt;initialize = installMakefileRegex;
289
    def-&gt;regex      = TRUE;
290
    return def;
291
}
292
</code>
293
</pre>
294
295
<p>
296
<pre>
297
/***************************************************************************
298
 * swine.c
299
 * Regex callback-based parser for Swine definitions
300
 **************************************************************************/
301
/* INCLUDE FILES */
302
#include "general.h"    /* always include first */
303
304
#include "parse.h"      /* always include */
305
#include "read.h"       /* to define file fileReadLine() */
306
307
/* DATA DEFINITIONS */
308
typedef enum eSwineKinds {
309
    K_DEFINE
310
} swineKind;
311
312
static kindOption SwineKinds [] = {
313
    { TRUE, 'd', "definition", "pig definition" }
314
};
315
316
/* FUNCTION DEFINITIONS */
317
318
static void definition (const char *const line, const regexMatch *const matches,
319
                       const unsigned int count)
320
{
321
    if (count &gt; 1)    /* should always be true per regex */
322
    {
323
        vString *const name = vStringNew ();
324
        vStringNCopyS (name, line + matches [1].start, matches [1].length);
325
        makeSimpleTag (name, SwineKinds, K_DEFINE);
326
    }
327
}
328
329
static void findSwineTags (void)
330
{
331
    while (fileReadLine () != NULL)
332
        ;  /* don't need to do anything here since callback is sufficient */
333
}
334
335
static void installSwine (const langType language)
336
{
337
    addCallbackRegex (language, "^def[ \t]+([a-zA-Z0-9_]+)", NULL, definition);
338
}
339
340
/* Create parser definition stucture */
341
extern parserDefinition* SwineParser (void)
342
{
343
    static const char *const extensions [] = { "swn", NULL };
344
    parserDefinition* def = parserNew ("Swine");
345
    def-&gt;kinds      = SwineKinds;
346
    def-&gt;kindCount  = KIND_COUNT (SwineKinds);
347
    def-&gt;extensions = extensions;
348
    def-&gt;parser     = findSwineTags;
349
    def-&gt;initialize = installSwine;
350
    return def;
351
}
352
</pre>
353
354
<p>
355
<pre>
356
/***************************************************************************
357
 * make.c
358
 * Regex-based parser for makefile macros
359
 **************************************************************************/
360
/* INCLUDE FILES */
361
#include "general.h"    /* always include first */
362
#include "parse.h"      /* always include */
363
364
/* FUNCTION DEFINITIONS */
365
366
static void installMakefileRegex (const langType language)
367
{
368
    addTagRegex (language, "(^|[ \t])([A-Z0-9_]+)[ \t]*:?=", "\\2", "m,macro", "i");
369
}
370
371
/* Create parser definition stucture */
372
extern parserDefinition* MakefileParser (void)
373
{
374
    static const char *const patterns [] = { "[Mm]akefile", NULL };
375
    static const char *const extensions [] = { "mak", NULL };
376
    parserDefinition* const def = parserNew ("Makefile");
377
    def-&gt;patterns   = patterns;
378
    def-&gt;extensions = extensions;
379
    def-&gt;initialize = installMakefileRegex;
380
    def-&gt;regex      = TRUE;
381
    return def;
382
}
383
</pre>
384
385
</body>
386
</html>