HG3051: Corpus Linguistics

Francis Bond, 2011, 2012, 2014.

This course is an introduction to the fast growing field of corpus linguistics. It aims to familiarise students with key concepts and common methods used in the construction of language corpora, as well as tools that have been developed for searching and using major corpora such as the British National Corpus. Students will be given hands-on experience in pre-editing, annotating, and searching corpora. Criteria and methods used for evaluating corpora and analytical tools will also be discussed.

The main aim of this module is to master the uses of text corpora in linguistics research and applications.

Course Content

This course introduces basic corpus skills for linguists:

Course Page: http://compling.hss.ntu.edu.sg/courses/hg3051.

There is no text book, readings will be assigned each week.

Course Outline

Lecture Date Topic Readings Assessment/Extra Information/Tools Fun
1 Jan 15 Basic Concepts, What can we do with Corpora? Corpus and Text: Basic Principles (Sinclair 2005) in Wynne (2005)
2 Jan 22 Markup and Annotation BNC Manual, BYU Interface Syntax
Adding Linguistic Annotation (Geoffrey Leech 2005) and
Metadata for Corpus Work (Lou Burnard 2005) in Wynne (2005)
NTU-MC Tagsets: cmn; eng; jpn; ind; universal; universal (old version);
Email results of the two tasks (ISLRN and tagset mapping)
Phonetic Punctuation Victor Borge
3 Jan 29 Multimodal and Multilingual Corpora Koehn (2005) Martin et al (2007) and
Character Encoding in Corpus Construction (Anthony McEnery and Richard Xiao 2005) in Wynne (2005)
Lab 1 due
Email corpus choice
4 Feb 05 A survey of Available Corpora Various Corpora Lab 2 due
5 Feb 12 Collocation, Frequency, Corpus Statistics Dunning (1993) Social Science Statistics
Corpus test Wizard
Chinese New Year
6 Feb 26 DIY Corpora, Web as Corpus, Processing Raw Text, SQL NLTK Chapter 11; SQLite tutorial SQLite; SQLMAN (Developer/admin tool); sqlitebrowser (DB Browser);
NTU Multilingual Corpus: English, Chinese, Wordnet,
7 Mar 12 Lexical and Grammatical Studies, Variation Biber et al. (1998) Chapters 2, 3 Lab 3 due
8 Mar 19 Case studies: Pronouns and Classifiers Bond et al. (1995), Bond (2005), Seah and Bond (2014) Project 1 Due
9 Mar 26 Contrastive and Diachronic Studies Stubbs 7,8 Lab 4 due
10 Apr 02 Corpora and Language Engineering Newman 2007 Project 2 Description
11 Apr 09 Representativeness and Balance; Project Presentations Ide and Macleaod (2001) Project 2 Presentation
12 Apr 16 Conclusions and Review Stubbs 9

Apr 30

Project 2 Due (and Project 3)

Slides may be updated at any time! Labs may also change.

Recommended Readings

Projects that became papers

Assessment (HG3051)

Learning Outcomes

On completion of this module, students should be able to:

Assessment for HG7032 Topics in Corpus Linguistics

Francis Bond <bond@ieee.org>
Computational Linguistics Lab
Division of Linguistics and Multilingual Studies
Nanyang Technological University
Level 3, Room 55, 14 Nanyang Drive, Singapore 637332
Tel: (+65) 6592 1568; Fax: (+65) 6794 6303