Snowball: Quick introduction |
Links |
You can use this site at a number of levels: - You can look at the stemming algorithm definitions themselves, and use them as templates for coding your own versions of stemmers in the computer language of your choice. - You can use the various ANSI C stemmers, without bothering yourself with the Snowball system that generated them. To do that, download the Snowball system, and extract the following *.h and *.c files from directory q/: header.h utilities.c api.h api.cThey provide the library and api connections to the stemmers. For each language stemmer you want to use, download the corresponding .c and .h file. Then follow the instructions for use. For an even faster route, see the Quick start for Snowball, below. - You can get involved in Snowball itself. This is particularly worthwhile if you want to adjust the stemmers or develop new stemmers. A typical reason for adjusting the stemmers is that you are working with a different encoding of accented letters from the ISO Latin I encoding assumed in most of the scripts here. Then you need to make your own version of the Snowball compiler and work with the Snowball scripts.
Quick StartTo get something working quickly, use gcc with the following modules from dowloaded directory q/:gcc -o STEMMER q/api.c q/utilities.c q/driver-porter.c q/stem.cSTEMMER runs the Porter stemming algorithm. See the start of q/driver-porter.c for a simple spec of the command. For the French stemmer, replace q/stem.c and q/stem.h with french/stem.c and french/stem.h, change porter to french throughout q/driver-porter.c and recompile. Similarly for other languages. q/driver.template provides a template driver that you can adjust language by language. |