HG3051: Project 1

Due on 03-19

Pick two related "words" (t1 and t2) either:

One of the terms (t1) should be a moderately common word. To find the word, you choose a word somewhere between about #1000 and #3000 from Word frequency data.

When you have chosen the words, claim them as yours by emailing to hg3051c@gmail.com.

For each of these two words, provide a corpus based description (as described below) and then discuss the differences between them.

FIRST do the following:

Before doing anything else, ask three different people (who aren't taking this class) what they think the top ten collocates of the word would be. They won't know what "collocates" means, so tell them that these are words that "hang out" a lot with the word in question. You might give the example of beach = sand, waves, sun, surf, etc.

Make sure you record basic information about these people.


If you chose a non-English word don't forget to gloss all foreign words. You might want to try to look it up in a multilingual corpus like the NTU-MC to see what the translation data is lie.

Your sketch should ideally say something for each word about: the word's syntax, its denotation (what it means) and its connotation (what its usage implies).

Write the results up as a paper of up to 6 pages (references don't count: you can have up to two extra pages of references), probably with many tables. Use the ACL 2015 format, but don't make your papers anonymous. Read and follow the stylistic advice in the (Computational) Linguistic Style Guidelines: a guide for the bewildered. Note especially how to format tables.

Upload the final paper here
It should be called hg3051-proj1-name-misc.pdf or hg7032-proj1-name-misc.pdf

HG3051 (Corpus Linguistics) main page.

Francis Bond <bond@ieee.org>
Computational Linguistics Lab
Division of Linguistics and Multilingual Studies
Nanyang Technological University
Level 3, Room 55, 14 Nanyang Drive, Singapore 637332
Tel: (+65) 6592 1568; Fax: (+65) 6794 6303