The MOR program provides a method for automatic tagging of corpora in the CHAT format. To make this work,
it is necessary to construct a separate MOR grammar for each language. After analysis with MOR,
users can then use the POST program to disambiguate the %mor line. We provide a POST disambiguation
database for English, but for other languages, users will need to do the work of training a POST database
for themselves. This whole system is described in a recent article on
morphosyntactic analysis in CLAN.
We have working MOR grammars for these languages:
- Cantonese (yue): This grammar was built by Brian MacWhinney, Sam Po Law, and Anthony Kong with additional help from a
Cantonese-English lexicon provided by K. K. Luke.
- Chinese (zho): This grammmar was built by Brian MacWhinney and Twila Tardif.
Thanks to K. J. Chen and the CKIP Group of the Academica Sinica for providing
an Excel listing of the 20,000 highest frequency forms of Putonghua
along with their English translations and romanizations.
- Danish (dan): This grammar is in preparation.
- Dutch (nld): This grammar was contributed by Steven Gillis.
- English (eng): This grammar was built initially by Brian MacWhinney and Mitzi Morris. It covers all the forms in the CHILDES English database.
- French (fra): This grammar was contributed by Christophe Parisse.
- French-new (fra): A newer, version of the French grammar.
- German (deu): This grammar was contributed by Heike Behrens.
- Hebrew (heb): This grammar was developed by Aviad Albert, Bracha Nir, Shuly Wintner, Brian MacWhinney, and Ruth Berman.
- Japanese (jpn): This grammar was constructed by Norio Naka and Susanne Miyata.
The distribution includes the Wakachi system from Susanne Miyata for grammatical reference.
- Italian (ita): This grammar was built by Livia Tonelli and Brian MacWhinney.
- Spanish (spa): This grammar was built by Brian MacWhinney.
Five of these grammars (English, Chinese, Cantonese, Japanese, Spanish) also include POST databases created by Christophe Parisse's POSTTRAIN program. After running MOR, you run POST to automatically disambiguate the output of MOR. The Chinese version is functional, but needs a bit more training and clarification of part of speech categories to improve accuracy.
To help those interested in building their own MOR grammars, we provide these two examples
of minMOR grammars. One is the basic example and the other indicates how
to build a grammar that targets only a few word forms, such as the German article.