Snowball: Quick introduction




 

Links

Snowball main page


You can use this site at a number of levels:

- You can look at the stemming algorithm definitions themselves, and use them as templates for coding your own versions of stemmers in the computer language of your choice.

- You can use the various ANSI C stemmers, without bothering yourself with the Snowball system that generated them. To do that, download the Snowball system, and extract the following  *.h  and  *.c  files from directory  q/:
    header.h    utilities.c
    api.h       api.c
They provide the library and api connections to the stemmers. For each language stemmer you want to use, download the corresponding  .c  and  .h  file. Then follow the instructions for use.

For an even faster route, see the Quick start for Snowball, below.

- You can get involved in Snowball itself. This is particularly worthwhile if you want to adjust the stemmers or develop new stemmers. A typical reason for adjusting the stemmers is that you are working with a different encoding of accented letters from the ISO Latin I encoding assumed in most of the scripts here. Then you need to make your own version of the Snowball compiler and work with the Snowball scripts.
Snowball is a language in which stemming algorithms can be easily represented. The Snowball compiler translates a Snowball script (a  .sbl file) into a thread-safe ANSI C module and its corresponding header file (a  .c  and  .h  file). The language has a full manual, and the various stemming scripts act as example programs.
- You can get deeply interested in stemming. If you do, read the introductory paper about Snowball. It is a bit heavyweight, but provides essential background. And look at the notes on how you can help.
 

Quick Start

To get something working quickly, use  gcc  with the following modules from dowloaded directory  q/:
    gcc -o STEMMER q/api.c q/utilities.c q/driver-porter.c q/stem.c
STEMMER  runs the Porter stemming algorithm. See the start of q/driver-porter.c  for a simple spec of the command.

For the French stemmer, replace  q/stem.c  and  q/stem.h  with french/stem.c  and  french/stem.h, change  porter  to  french throughout  q/driver-porter.c  and recompile. Similarly for other languages. q/driver.template  provides a template driver that you can adjust language by language.