Tutorial Announcement      

Coling 2000 Tutorial Announcement

Rémi Zajac: Practical development of large computational lexicons

This course provides an in-depth practical introduction to the development of large computational dictionaries with examples for bilingual dictionaries for machine translation. It covers issues from the
conception of the dictionary schema and of the structure of lexical entries, acquisition strategies,
use/development of a lexical toolset, acquisition team management issues, and testing.

The course will cover the following topics:
    • Overview; Encoding and formats: SGML/XML (esp. TEI), Unicode, flat formats (e.g. for relational databases), hierarchical formats (e.g feature structures)

    • Content and levels of linguistic knowledge: morphological, syntactic, word-senses, semantics, translations.

    • Strategies for lexical acquisition: applications' requirements, depth/breath issues, planning for scalability, using resources (MRDs, corpora and associated tools), training issues.

    • Structure of a lexical database: structure of a lexical entry, structure of a dictionary, defining the lexical database schema, defaults and coherence checks.

    • Resources for lexical acquisition. Corpora: processing raw corpora (e.g. HTML corpora), building a stemmer for stem/POS extraction, building acquisition files. MRDs:processing MRDs, building acquisition files from MRDs. On-line resources: WordNet and others, thesaurii and ontologies, online corpora. Paper dictionaries: as a reference for checking the dictionary, OCR it or not?

    • Lexical acquisition: primary acquisition tools vs. revision tools; Team management issues.

    • Generating application dictionaries: Generic lexical databases vs application dictionaries; Compilation of indexes; Compilation of entries: extracting application information;

    • Checking and testing: Sampling method; Testing using a tagged corpus; Testing coverage.


Researchers and practioners in Language Engineering: developers of LE systems and in particular linguists and lexicographers. The targeted level is a graduate level in linguistics/lexicography or computational linguistics.


related events
  DFKI Language Technology Lab
German Research Center
for Artificial Intelligence
Language Technology Lab