Non-compositional lexical semantics: how can
idioms be represented in a lexical resource
Idioms constitute a subclass of multi-word units that exhibit strong
collocational preferences and whose meanings are at least partially
non-compositional. The classic view of idioms as "long words" admits
of little or no variation of a canonical form. Fixedness is thought to
reflect semantic non-compositionality: the non-availability of
semantic interpretation for some or all idiom constituents and the
impossibility to parse syntactically ill-formed idioms block regular
grammatical operations. We argue that corpus data showing a wide range
of discourse-sensitive morphosyntactic flexibility and lexical
variation--even in cases where the constituents cannot be semantically
interpreted--refute this simplistic view of idioms. Such data weaken
the categorical distinction between idioms and freely composed phrases
and pose a challenge to the representation of idioms and their
constituents in lexical resources designed for Natural
Language Processing. We discuss one possible solution, illustrated by
the treatment of idioms in the large lexical database WordNet.
Biography of the speaker:
Christiane Fellbaum is a Senior Research Scholar in the Computer
Science Department. Her Ph.D. is in Linguistics and her research
focuses on computational and corpus linguistics and lexical semantics.
She teaches a course on Bilingualism and enjoys exploring new
languages and faraway places.
She is Co-Founder and Co-President of the Global WordNet Association.
She was awarded Wolfgang Paul Prize and the Antonio Zampolli Prize.
She is partner in the European Projects KYOTO, SIERA. A Permanent
Fellow and Member of the Center for Language, Berlin-Brandenburg
Academy of Sciences. A Member, Board of Directors, American Friends of
the Humboldt Foundation. And she currently works supported by the U.S.
National Science Foundation, the European Union (Seventh Framework),
the Frank Moss Foundation and the Tim Gill Foundation.
Robust Parsing: Bridging the Coverage Chasm
Grammar implementations which are guided by linguistic theory will normally
lack coverage of even some well-formed utterances, since no current theory
exhaustively characterizes all of the phenomena in any language. For many
uses of a grammar, approximate or robust analyses of the out-of-grammar
utterances would be better than nothing, and a variety of approaches have
been developed for such robust parsing. In this paper I present an
implemented method which adds two simple "bridging" rules to an existing
broad-coverage grammar, the English Resource Grammar, allowing any two
constituents to combine. This method relies on a parser which can
efficiently pack the full parse forest for an utterance, and then
selectively unpack the most likely N analyses guided by a statistical model
trained on a manually constructed treebank....
Biography of the speaker:
Dan Flickinger is a Senior Research Associate at the Center for the Study of
Language and Information (CSLI) and Project Manager of the
LinGO Laboratory at CSLI, Stanford University
Flickinger is the principal developer of the English Resource
Grammar (ERG), a precise broad-coverage implementation of
Head-driven Phrase Structure Grammar (HPSG). His current
research is focused in two broad areas: Parsing text for
improved information retrieval, and applying the ERG to
improved educational software. Flickinger’s central research
interests are in wide-coverage grammar engineering for
both parsing and generation, lexical representation and
the syntax-semantics interface.
This lecture is supported by
the NTU Centre
for Liberal Arts and Social Sciences (CLASS) and the Singapore MOE
Tier2 Grant That's what you meant: A Rich Representation for
Manipulating Meaning (MOE ARC41/13).
Computational Linguistics Laboratory
Division of Linguistics and Multilingual Studies
Nanyang Technological University