HG3051: Project 2 (30%)
Overview and Instructions
|Week 10||Email a brief description of your final project and the group members|
|Week 11||Project Presentations|
|April 30||Email the final project (no need for turnitin)|
You should do this in groups of 2 or 3.
15-minute presentation + 5-minute question-answer period
Presentation methods: Slides (PowerPoint, Keynote) other forms of demonstration
A full research paper, complete with references and citations (ACL format; 8+2 pages)
If you add annotation to a corpus (or build a new corpus), you
should also submit that (if it is too large to email, then give it to
20% presentation, 10% participation (ask questions!), 70% written
Assignment two topics
Your assignment must present original work. Good topics include
(but are not limited to):
- Investigation of a linguistic research question, based on published
- Investigation on a corpus that you have built yourself
- Comparisons of two or more corpora (of different genres, authors,
If you are conducting a research on ready-made
corpora, pay particular attention to:
- Interpreting numbers correctly
- Make sure all relevant aspects have been explored; make sure your
basis of comparison has been well-established; make sure your
conclusions are sound If you are building your own corpus, pay
particular attention to:
- The size of your corpus. Is it large enough
for you to be able to find enough evidence for your investigation?
Note that "sufficient data" depends on the nature of your
- The genres of your corpus. Is it the "right" kind of
corpus for the problem that you are trying to investigate? Is it as
representative and balanced as you can make it?
- It takes a huge amount of time and effort to rigorously
address these issues in corpus building.
If you are adding annotation to a corpus, you are encouraged (but
not required) to add it to one of the existing LMS corpora:
- NTU Multilingual Corpus (annotate a phenomenon or two);
play with or improve the sense tagging; add a new language
- Corpus of Hong Kong Cantonese 香港粵語語料庫
- Tatoeba Multilingual Corpus (annotate a phenomenon)
Since this project is for a class, I recognize the inherent
limitation in your time and resources: you cannot build a perfect
corpus in several weeks. I will take this into consideration, and give
proper credit to good, creative corpus building efforts.
HG3051 (Corpus Linguistics) main page.
Computational Linguistics Lab
Division of Linguistics and Multilingual Studies
Nanyang Technological University
Level 3, Room 55, 14 Nanyang Drive, Singapore 637332
Tel: (+65) 6592 1568; Fax: (+65) 6794 6303