~snowball-yiddish-dev/snowball-yiddish/trunk

547 by richard
* *: Patch from Peter Bouda, with some small tweaks, to support
1
Quickstart
2
==========
3
4
This is a very brief introduction to the use of PyStemmer.
5
6
First, import the library:
7
8
>>> import Stemmer
9
10
Just for show, we'll display a list of the available stemming algorithms:
11
12
>>> print(Stemmer.algorithms())
13
['danish', 'dutch', 'english', 'finnish', 'french', 'german', 'hungarian', 'italian', 'norwegian', 'porter', 'portuguese', 'romanian', 'russian', 'spanish', 'swedish', 'turkish']
14
15
Now, we'll get an instance of the english stemming algorithm:
16
17
>>> stemmer = Stemmer.Stemmer('english')
18
19
Stem a single word:
20
21
>>> print(stemmer.stemWord('cycling'))
22
cycl
23
24
Stem a list of words:
25
26
>>> print(stemmer.stemWords(['cycling', 'cyclist']))
27
['cycl', 'cyclist']
28
29
Strings which are supplied are assumed to be unicode.
30
We can use UTF-8 encoded input, too:
31
32
>>> print(stemmer.stemWords(['cycling', b'cyclist']))
33
['cycl', b'cyclist']
34
35
Each instance of the stemming algorithms uses a cache to speed up processing of
36
common words.  By default, the cache holds 10000 words, but this may be
37
modified.  The cache may be disabled entirely by setting the cache size to 0:
38
39
>>> print(stemmer.maxCacheSize)
40
10000
41
42
>>> stemmer.maxCacheSize = 1000
43
44
>>> print(stemmer.maxCacheSize)
45
1000