Semantic Annotation and Intelligent Content

a workshop supported by the ACL Special Interest Group
on the Lexicon (SIGLEX), to be held at Coling 2000
the 18th International Conference on Computational Linguistics
Luxembourg, 5/6 August 2000

Topic and Motivation

SEMANTIC ANNOTATION is augmentation of data to facilitate automatic recognition of the underlying semantic structure. A common practice in this respect is labeling of documents with thesaurus classes for the sake of document classification and management. In the medical domain, for instance, there is a long-standing tradition in terminology maintenance and annotation/classification of documents using standard coding systems such as ICD, MeSH and the UMLS metathesaurus. Semantic annotation in a broader sense also addresses document structure (title, section, paragraph, etc.), linguistic structure (dependency, coordination, thematic role, coreference, etc.), and so forth. In NLP, semantic annotation has been used in connection with machine-learning software trainable on annotated corpora for parsing, word-sense disambiguation, coreference resolution, summarization, information extraction, and other tasks. A still unexplored but important potential of semantic annotation is that it can provide a common I/O format through which to integrate various component technologies in NLP and AI such as speech recognition, parsing, generation, inference, and so on.

INTELLIGENT CONTENT is semantically structured data that is used for a wide range of content-oriented applications such as classification, retrieval, extraction, translation, presentation, and question-answering, as the organization of such data provides machines with accurate semantic input to those technologies. Semantically annotated resources as described above are typical examples of intelligent content, whereas another major class includes electronic dictionaries and interlingual or knowledge-representation data. Some ongoing projects along these lines are GDA (Global Document Annotation), UNL (Universal Networking Language) and SHOE (Simple HTML Ontology Extension), all of which aim at motivating people to semantically organize electronic documents in machine-understandable formats, and at developing and spreading content-oriented application technologies aware of such formats. Along similar lines, MPEG-7 is a framework for semantically annotating audiovisual data for the sake of content-based retrieval and browsing, among others. Incorporation of linguistic annotation into MPEG-7 is in the agenda, because linguistic descriptions already constitute a main part of existing metadata.

In short, semantic annotation is a central, basic technology for intelligent content, which in turn is a key notion in systematically coordinating various applications of semantic annotation. In the hope of fueling some of the developments mentioned above and thus promoting the linkage between basic researches and practical applications, the workshop invites researchers and practitioners from such fields as computational linguistics, document processing, terminology, information science, and multimedia content, among others, to discuss various aspects of semantic annotation and intelligent content in an interdisciplinary way. Potential topics include but are not limited to:

  • authoring/annotation tools
  • integrated software architecture based on semantic annotation
  • language-based multimedia annotation
  • standardization and interoperability

  • semantic annotation, intelligent content and:

    • document classification
    • information extraction
    • information retrieval (interactive, pinpoint, content-based, etc.)
    • intelligent/interactive manual
    • knowledge circulation and management
    • knowledge mining
    • machine translation
    • presentation (interactive, multimodal/multimedia, etc.)
    • question answering
    • summarization (multimedia, multidocument, itemized, graphical, etc.)


Please note: Submissions on syntactic annotation (tools, methods, standards, etc.) should not be submitted to this workshop, but rather to the COLING Workshop on Linguistically Interpreted Corpora.

Programme Committee
Amit Bagga GE Corporate R&D, USA
Paul Buitelaar DFKI-LT, Germany (Co-Chair)
Gregor Erbach FTW, Austria
Christiane Fellbaum Princeton University, USA
Wolfgang Giere ZINFO, University of Frankfurt, Germany
Nicola Guarino Ladseb-CNR Padova, Italy
Kôiti Hasida ETL, Japan (Co-Chair)
Boris Katz AI Laboratory, MIT, USA
Adam Kilgarriff University of Brighton, UK
Elizabeth Liddy Syracuse University, USA
Katashi Nagao IBM TRL, Japan
Hiroshi Nakagawa University of Tokyo, Japan
Hwee Tou Ng DSO, Singapore
Martha Palmer University of Pennsylvania, USA
Virach Sornlertlamvanich NECTEC, Thailand
Steffen Staab University of Karlsruhe, Germany
Henry Thompson Edinburgh University, UK
Hiroshi Uchida United Nations University, Japan
Remi Zajac CRL, New Mexico State University, USA

Two day workshop with an equal amount of invited and refereed presentations on day one, plus a number of smaller working groups with group presentations on day two.

Paper submission deadline: April 30
Notification of acceptance/rejection: May 30
Publication of workshop programme: June 15
Workshop: August 5/6

Submissions, in English, of at most 5000 words (in PS or PDF format) should be sent (preferably by email) to the following two organisers:
Paul Buitelaar (
Language Technology
Stuhlsatzenhausweg 3
D-66123 Saarbruecken
Kôiti Hasida (
Information Science Division
Electrotechnical Laboratory
1-1-4, Umezono, Tukuba,
Ibaraki 305-8568
