1
<!-- $Id: EXTENDING.html,v 1.9 2002/09/04 01:17:32 darren Exp $ -->
4
<title>Exuberant Ctags: Adding support for a new language</title>
8
<h1>How to Add Support for a New Language to Exuberant Ctags</h1>
11
<b>Exuberant Ctags</b> has been designed to make it very easy to add your own
12
custom language parser. As an exercise, let us assume that I want to add
13
support for my new language, <em>Swine</em>, the successor to Perl (i.e. Perl
14
before Swine <wince>). This language consists of simple definitions of
15
labels in the form "<code>def my_label</code>". Let us now examine the various
19
<h2>Operational background</h2>
22
As ctags considers each file name, it tries to determine the language of the
23
file by applying the following three tests in order: if the file extension has
24
been mapped to a language, if the file name matches a shell pattern mapped to
25
a language, and finally if the file is executable and its first line specifies
26
an interpreter using the Unix-style "#!" specification (if supported on the
27
platform). If a language was identified, the file is opened and then the
28
appropriate language parser is called to operate on the currently open file.
29
The parser parses through the file and whenever it finds some interesting
30
token, calls a function to define a tag entry.
33
<h2>Creating a user-defined language</h2>
36
The quickest and easiest way to do this is by defining a new language using
37
the program options. In order to have Swine support available every time I
38
start ctags, I will place the following lines into the file
39
<code>$HOME/.ctags</code>, which is read in every time ctags starts:
45
--regex-swine=/^def[ \t]*([a-zA-Z0-9_]+)/\1/d,definition/
48
The first line defines the new language, the second maps a file extension to
49
it, and the third defines a regular expression to identify a language
50
definition and generate a tag file entry for it.
53
<h2>Integrating a new language parser</h2>
56
Now suppose that I want to truly integrate compiled-in support for Swine into
57
ctags. First, I create a new module, <code>swine.c</code>, and add one
58
externally visible function to it, <code>extern parserDefinition
59
*SwineParser(void)</code>, and add its name to the table in
60
<code>parsers.h</code>. The job of this parser definition function is to
61
create an instance of the <code>parserDefinition</code> structure (using
62
<code>parserNew()</code>) and populate it with information defining how files
63
of this language are recognized, what kinds of tags it can locate, and the
64
function used to invoke the parser on the currently open file.
68
The structure <code>parserDefinition</code> allows assignment of the following
73
const char *name; /* name of language */
74
kindOption *kinds; /* tag kinds handled by parser */
75
unsigned int kindCount; /* size of `kinds' list */
76
const char *const *extensions; /* list of default extensions */
77
const char *const *patterns; /* list of default file name patterns */
78
parserInitialize initialize; /* initialization routine, if needed */
79
simpleParser parser; /* simple parser (common case) */
80
rescanParser parser2; /* rescanning parser (unusual case) */
81
boolean regex; /* is this a regex parser? */
87
The <code>name</code> field must be set to a non-empty string. Also, unless
88
<code>regex</code> is set true (see below), either <code>parser</code> or
89
<code>parser2</code> must set to point to a parsing routine which will
90
generate the tag entries. All other fields are optional.
93
Now all that is left is to implement the parser. In order to do its job, the
94
parser should read the file stream using using one of the two I/O interfaces:
95
either the character-oriented <code>fileGetc()</code>, or the line-oriented
96
<code>fileReadLine()</code>. When using <code>fileGetc()</code>, the parser
97
can put back a character using <code>fileUngetc()</code>. How our Swine parser
98
actually parses the contents of the file is entirely up to the writer of the
99
parser--it can be as crude or elegant as desired. You will note a variety of
100
examples from the most complex (c.c) to the simplest (make.c).
104
When the Swine parser identifies an interesting token for which it wants to
105
add a tag to the tag file, it should create a <code>tagEntryInfo</code>
106
structure and initialize it by calling <code>initTagEntry()</code>, which
107
initializes defaults and fills information about the current line number and
108
the file position of the beginning of the line. After filling in information
109
defining the current entry (and possibly overriding the file position or other
110
defaults), the parser passes this structure to <code>makeTagEntry()</code>.
114
Instead of writing a character-oriented parser, it may be possible to specify
115
regular expressions which define the tags. In this case, instead of defining a
116
parsing function, <code>SwineParser()</code>, sets <code>regex</code> to true,
117
and points <code>initialize</code> to a function which calls
118
<code>addTagRegex()</code> to install the regular expressions which define its
119
tags. The regular expressions thus installed are compared against each line
120
of the input file and generate a specified tag when matched. It is usually
121
much easier to write a regex-based parser, although they can be slower (one
122
parser example was 4 times slower). Whether the speed difference matters to
123
you depends upon how much code you have to parse. It is probably a good
124
strategy to implement a regex-based parser first, and if it is too slow for
125
you, then invest the time and effort to write a character-based parser.
129
A regex-based parser is inherently line-oriented (i.e. the entire tag must be
130
recognizable from looking at a single line) and context-insensitive (i.e the
131
generation of the tag is entirely based upon when the regular expression
132
matches a single line). However, a regex-based callback mechanism is also
133
available, installed via the function <code>addCallbackRegex()</code>. This
134
allows a specified function to be invoked whenever a specific regular
135
expression is matched. This allows a character-oriented parser to operate
136
based upon context of what happened on a previous line (e.g. the start or end
137
of a multi-line comment). Note that regex callbacks are called just before the
138
first character of that line can is read via either <code>fileGetc()</code> or
139
using <code>fileGetc()</code>. The effect of this is that before either of
140
these routines return, a callback routine may be invoked because the line
141
matched a regex callback. A callback function to be installed is defined by
146
typedef void (*regexCallback) (const char *line, const regexMatch *matches, unsigned int count);
149
size_t start; /* character index in line where match starts */
150
size_t length; /* length of match */
157
The callback function is passed the line matching the regular expression and
158
an array of <code>count</code> structures defining the subexpression matches
159
of the regular expression, starting from \0 (the entire line).
163
Lastly, be sure to add your the name of the file containing your parser (e.g.
164
swine.c) to the macro <code>SOURCES</code> in the file <code>source.mak</code>
165
and an entry for the object file to the macro <code>OBJECTS</code> in the same
166
file, so that your new module will be compiled into the program.
170
This is all there is to it. All other details are specific to the parser and
171
how it wants to do its job. There are some support functions which can take
172
care of some commonly needed parsing tasks, such as keyword table lookups (see
173
keyword.c), which you can make use of if desired (examples of its use can be
174
found in c.c, eiffel.c, and fortran.c). Almost everything is already taken care
175
of automatically for you by the infrastructure. Writing the actual parsing
176
algorithm is the hardest part, but is not constrained by any need to conform
177
to anything in ctags other than that mentioned above.
181
There are several different approaches used in the parsers inside <b>Exuberant
182
Ctags</b> and you can browse through these as examples of how to go about
189
Below you will find several example parsers demonstrating most of the
190
facilities available. These include three alternative implementations
191
of a Swine parser, which generate tags for lines beginning with
192
"<CODE>def</CODE>" followed by some name.
197
/***************************************************************************
199
* Character-based parser for Swine definitions
200
**************************************************************************/
202
#include "general.h" /* always include first */
204
#include <string.h> /* to declare strxxx() functions */
205
#include <ctype.h> /* to define isxxx() macros */
207
#include "parse.h" /* always include */
208
#include "read.h" /* to define file fileReadLine() */
210
/* DATA DEFINITIONS */
211
typedef enum eSwineKinds {
215
static kindOption SwineKinds [] = {
216
{ TRUE, 'd', "definition", "pig definition" }
219
/* FUNCTION DEFINITIONS */
221
static void findSwineTags (void)
223
vString *name = vStringNew ();
224
const unsigned char *line;
226
while ((line = fileReadLine ()) != NULL)
228
/* Look for a line beginning with "def" followed by name */
229
if (strncmp ((const char*) line, "def", (size_t) 3) == 0 &&
230
isspace ((int) line [3]))
232
const unsigned char *cp = line + 4;
233
while (isspace ((int) *cp))
235
while (isalnum ((int) *cp) || *cp == '_')
237
vStringPut (name, (int) *cp);
240
vStringTerminate (name);
241
makeSimpleTag (name, SwineKinds, K_DEFINE);
245
vStringDelete (name);
248
/* Create parser definition stucture */
249
extern parserDefinition* SwineParser (void)
251
static const char *const extensions [] = { "swn", NULL };
252
parserDefinition* def = parserNew ("Swine");
253
def->kinds = SwineKinds;
254
def->kindCount = KIND_COUNT (SwineKinds);
255
def->extensions = extensions;
256
def->parser = findSwineTags;
265
/***************************************************************************
267
* Regex-based parser for Swine
268
**************************************************************************/
270
#include "general.h" /* always include first */
271
#include "parse.h" /* always include */
273
/* FUNCTION DEFINITIONS */
275
static void installSwineRegex (const langType language)
277
addTagRegex (language, "^def[ \t]*([a-zA-Z0-9_]+)", "\\1", "d,definition", NULL);
280
/* Create parser definition stucture */
281
extern parserDefinition* SwineParser (void)
283
static const char *const extensions [] = { "swn", NULL };
284
parserDefinition* def = parserNew ("Swine");
285
parserDefinition* const def = parserNew ("Makefile");
286
def->patterns = patterns;
287
def->extensions = extensions;
288
def->initialize = installMakefileRegex;
289
def->regex = TRUE;
297
/***************************************************************************
299
* Regex callback-based parser for Swine definitions
300
**************************************************************************/
302
#include "general.h" /* always include first */
304
#include "parse.h" /* always include */
305
#include "read.h" /* to define file fileReadLine() */
307
/* DATA DEFINITIONS */
308
typedef enum eSwineKinds {
312
static kindOption SwineKinds [] = {
313
{ TRUE, 'd', "definition", "pig definition" }
316
/* FUNCTION DEFINITIONS */
318
static void definition (const char *const line, const regexMatch *const matches,
319
const unsigned int count)
321
if (count > 1) /* should always be true per regex */
323
vString *const name = vStringNew ();
324
vStringNCopyS (name, line + matches [1].start, matches [1].length);
325
makeSimpleTag (name, SwineKinds, K_DEFINE);
329
static void findSwineTags (void)
331
while (fileReadLine () != NULL)
332
; /* don't need to do anything here since callback is sufficient */
335
static void installSwine (const langType language)
337
addCallbackRegex (language, "^def[ \t]+([a-zA-Z0-9_]+)", NULL, definition);
340
/* Create parser definition stucture */
341
extern parserDefinition* SwineParser (void)
343
static const char *const extensions [] = { "swn", NULL };
344
parserDefinition* def = parserNew ("Swine");
345
def->kinds = SwineKinds;
346
def->kindCount = KIND_COUNT (SwineKinds);
347
def->extensions = extensions;
348
def->parser = findSwineTags;
349
def->initialize = installSwine;
356
/***************************************************************************
358
* Regex-based parser for makefile macros
359
**************************************************************************/
361
#include "general.h" /* always include first */
362
#include "parse.h" /* always include */
364
/* FUNCTION DEFINITIONS */
366
static void installMakefileRegex (const langType language)
368
addTagRegex (language, "(^|[ \t])([A-Z0-9_]+)[ \t]*:?=", "\\2", "m,macro", "i");
371
/* Create parser definition stucture */
372
extern parserDefinition* MakefileParser (void)
374
static const char *const patterns [] = { "[Mm]akefile", NULL };
375
static const char *const extensions [] = { "mak", NULL };
376
parserDefinition* const def = parserNew ("Makefile");
377
def->patterns = patterns;
378
def->extensions = extensions;
379
def->initialize = installMakefileRegex;
380
def->regex = TRUE;