Jacques VERGNE, Emmanuel Giguet: Trends in robust parsing

The aim of this tutorial is to outline and understand fundamental trends of the evolution of robust parsing, among the variety of concepts, techniques, and parsing processes, and to get a synthetic view of the topic.

Today, robust parsing is changing rapidly from tagging to chunking and clause bracketing. And partial parsing is becoming less and less partial, with computational properties which allow a good integration into industrial contexts, where linear complexity is a prerequisite : robust parsers are able to process raw linguistic material at a constant and foreseeable rate with foreseeable results.

  This tutorial is designed for PhD. students and researchers in NLP. The expected prerequisites are basic knowledge on parsing and tagging.

Course outline :

Session 1 : Course (3h)

  • presentation and introduction
  • part-of-speech tagging
  • chunking
  • chunking after tagging ?  or tagging and chunking together ?
  • clause bracketing and computing chunk main functions inside a clause
  • linking chunks
  • non recursive representations of constituent structures
  • architectures and implementations
  • conclusion of the course, and introduction of the practical

Session 2 : Practical (3h)

  The aim of the practical is to illustrate the course and to give participants the opportunity to practice on the "GREYC parser", which is a general platform to design and build parsers.
  It is described (in French) on : http://www.info.unicaen.fr/~jvergne/analyseur_GREYC/analyseur_du_GREYC.html

  The practical will consist in :
  • testing ready-written rules on English or French corpora brought by participants,
  • writing more rules for tagging, chunking, clause bracketing, and chunk linking in English or French, and testing these rules on corpora,
  • and for volunteers, writing rules for tagging and chunking in another language (it will need a little preparation).
  For more details : http://www.info.unicaen.fr/~jvergne/tutorialColing2000.html (in English)

Tutorial speakers :

  Jacques Vergne (Jacques.Vergne@info.unicaen.fr) is a lecturer and researcher in computer science and NLP at the GREYC, the computer science laboratory of the university of Caen (France). His research domain is robust and accurate parsing. He has built the 1998 parser which obtained the best results in the GRACE contest (http://limsi.fr/TLP/grace/), an international evaluation which had the aim to compare taggers for French in a unique protocol.
  Emmanuel Giguet acted as project manager of the team which realized the "GREYC parser". His PhD. thesis has given the 1998 parser a more general design which now is implemented in the "GREYC parser".

Some references for a preliminary insight into the topic :

