~ubuntu-branches/ubuntu/wily/apertium-oc-es/wily-proposed

Viewing changes to README

Committer: Package Import Robot
Author(s): Kartik Mistry, Tino Didriksen, Kartik Mistry
Date: 2015-07-31 20:04:44 UTC
mfrom: (1.1.1)
Revision ID: package-import@ubuntu.com-20150731200444-yqem3blxvv2fkkpk

Tags: 1.0.6~r57551-1

[ Tino Didriksen ]
* New upstream release.
* No significant changes in svn since tarball, so taking directly from svn.
* Re-done packaging to take advantage of debhelper 9.

[ Kartik Mistry ]
* Bumped debian/compat to 9. Updated debhelper dependency.
* Fixed debian/copyright.

files added:
.pc

.pc/.quilt_patches

.pc/.quilt_series

.pc/.version

.pc/applied-patches

autogen.sh

debian/source

debian/source/format

debian/watch

files removed:
INSTALL

Makefile.in

aclocal.m4

config.in

configure

es-oc.mode

es-oc_aran.mode

install-sh

missing

oc-es.mode

oc_aran-es.mode

files modified:
Makefile.am

README

apertium-oc-es.es-oc.t1x

apertium-oc-es.es-oc.t2x

apertium-oc-es.es-oc.t3x

apertium-oc-es.oc-es.dix

apertium-oc-es.oc-es.t1x

apertium-oc-es.oc-es.t2x

apertium-oc-es.oc-es.t3x

apertium-oc-es.post-es.dix

configure.ac

debian/changelog

debian/compat

debian/control

debian/copyright

debian/docs

debian/rules

modes.xml

Show diffs side-by-side

added added

removed removed

README

Occitan--Spanish translator

===================================================================

TRANSLATOR

You need apertium and lttoolbox, either version 1.0 or 2.0, to use

inside of this directory.

TAGGER

To use this language-pair package with Apertium YOU DO NOT NEED TO

RETRAIN THE TAGGER. Probabilities and auxiliary data are provided for

both the oc-ca and the ca-oc translation directions which should be

acceptable for most applications, and should work even if you change

the dictionaries in a reasonably way.

If for some reason you need to retrain the tagger (for example, you

have made really extensive changes to the dictionaries such as

creating new lexical categories), you have three alternatives:

* To perform a supervised training:

To this end you need the files specified in the README file inside

oc-tagger-data and ca-tagger-data which are not provided. When performing

a supervised training, tagged corpora(oc-tagger-data/oc.tagged and

ca-tagger-data/ca.tagged) could be obsolete for some words. If this is the

case, the tagger training program will show you where the problems are and

you will need to solve them by hand. Be sure to solve the problems by

modifying ONLY the .tagged file, NEVER the .untagged file that is

automatically generated.

The supervised training is done by typing:

make -f oc-ca-supervised.make (for the Occitan part-of-speech tagger)

make -f ca-oc-supervised.make (for the Catalan part-of-speech tagger)

This is the training method followed to train the Catalan

part-of-speech tagger.

* To perform a classical (expectation-maximization) unsupervised training:

For this purpose you will need to assemble a large (hundreds of

thousand of words) plain-text corpus for each language (for example,

using a robot to harvest text from online newspapers) and put them in

the proper place, for instance oc-tagger-data/oc.crp.txt and

ca-tagger-data/ca.crp.txt. This type of training does not need human

intervention but, as expected, results will be less adequate than

those obtained with the supervised training.

The unsupervised training is done through the iterative Baum-Welch

algorithm. By default the number of iterations is set to 8, but you

can change this value by editing the Makefile and changing the

value of TAGGER_UNSUPERVISED_ITERATIONS.

The unsupervised training is done by typing:

make -f oc-ca-unsupervised.make (for the Occitan part-of-speech tagger)

make -f ca-oc-unsupervised.make (for the Catalan part-of-speech tagger)

* To perform an unsupervised training by using target-language

information and the rest of the modules of the Apertium MT engine:

To do so you need large plain-text corpora on both languages. Please

download the apertium-tagger-training-tools package and follow the

instructions provided there. This is the training method followed to

train the Occitan part-of-speech tagger.

===================================================================

More information about this module, and others can be found on

the Apertium: Wiki, http://wiki.apertium.org

Older »