The 11th International Workshop on Treebanks and Linguistic Theories
Abstracts Invited speakers
Mark Steedman
Treebanking in the Language of Thought
There has recently been some interest among computational linguists in
the task of inducing grammar-based "semantic parsers" from sets of
paired strings and meaning representations, following pioneering work
by Zettlemoyer and Collins (2005). Work of this kind is currently
limited by the paucity of datasets for training.
The talk reviews the state of the art in this field, then proposes a
way to semi-automatically generate much larger language-independent
datasets, on the same order of magnitude as syntactic treebanks, using
linguistic knowledge that has only recently begun to become available,
for use in inducing semantic parsers for under-resourced languages for
application in statistical machine translation.
Nianwen Xue
Treebanking Chinese text: what it is like
he Chinese TreeBank (CTB) has been in development for over a decade
now and as of this talk, it has about 1.4M words fully segmented,
POS-tagged and syntactically bracketed. It is currently under
expansion to informal genres such as on-line discussion forums under
the DARPA BOLT Program.
In this talk, I will provide an overview of the annotation standards
for the CTB and our annotation procedure. In particular, I will
discuss how our revised annotation procedure enlarges the annotator
pool and makes it possible to scale up our annotation efforts. I will
also discuss some of the challenges in developing this corpus,
resulting from some salient linguistic characteristics of the Chinese
language. These linguistic characteristics include the lack of
reliable sentence and word boundaries, the scarcity of formal
morpho-syntactic cues, and pervasive dropped elements. Finally, I will
touch upon some general methodological issues in treebanking and other related
annotation tasks that still need to be clarified.