~snowball-yiddish-dev/snowball-yiddish/trunk

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
<HTML>
<HEAD>
<TITLE>Defining R1 and R2</TITLE></HEAD>
<BODY BGCOLOR=WHITE>
<TABLE WIDTH=75% ALIGN=CENTER COLS=1>

<TR><TD>

<BR><BR>


<BR>&nbsp;<H2>Defining <I>R</I>1 and <I>R</I>2</H2>

All the stemmers make use of at least one of the region definitions <I>R</I>1 and
<I>R</I>2. They are defined as follows:

<BR><BR>
<I>R</I>1 is the region after the first non-vowel following a vowel, or is the null
region at the end of the word if there is no such non-vowel.
<BR><BR>
<I>R</I>2 is the region after the first non-vowel following a vowel in <I>R</I>1, or is
the null region at the end of the word if there is no such non-vowel.
<BR><BR>

The definition of <I>vowel</I> varies from language to language. In French, for
example, <B><I>&eacute;</I></B> is a vowel, and in Italian <B><I>i</I></B> between two other vowels is not a
vowel. The class of letters that constitute vowels is made clear in each stemmer.
<BR><BR>
Below, <I>R</I>1 and <I>R</I>2 are shown for a number of English words,
<BR><PRE>
    b   e   a   u   t   i   f   u   l
                      |<------------->|    R1
                              |<----->|    R2
</PRE>
Letter <B><I>t</I></B> is the first non-vowel following a vowel in <I>beautiful</I>, so <I>R</I>1
is <B><I>iful</I></B>. In <B><I>iful</I></B>, the letter <B><I>f</I></B> is the first non-vowel following a
vowel, so <I>R</I>2 is <B><I>ul</I></B>.

<BR><PRE>
    b   e   a   u   t   y
                      |<->|    R1
                        ->|<-  R2
</PRE>
In <I>beauty</I>, the last letter <B><I>y</I></B> is classed as a vowel. Again, letter <B><I>t</I></B> is
the first non-vowel following a vowel, so <I>R</I>1 is just the last letter, <B><I>y</I></B>.
<I>R</I>1 contains no non-vowel, so <I>R</I>2 is the null region at the end of the word.

<BR><PRE>
    b   e   a   u
                ->|<-  R1
                ->|<-  R2
</PRE>
In <I>beau</I>, <I>R</I>1 and <I>R</I>2 are both null.
<BR><BR>
Other examples:
<BR><PRE>
    a   n   i   m   a   d   v   e   r   s   i   o   n
          |<----------------------------------------->|    R1
                  |<--------------------------------->|    R2

    s   p   r   i   n   k   l   e   d
                      |<------------->|    R1
                                    ->|<-  R2

    e   u   c   h   a   r   i   s   t
              |<--------------------->|    R1
                          |<--------->|    R2
</PRE>

</TR>

</TABLE>
</BODY>
</HTML>