HG2002: Semantics and Pragmatics
Assignment 1: Lexical Semantic Analysis with a Semantic Network
This is an group assignment for HG2002
consisting of three parts:
- Annotate (on your own)
All open class words in a short section of text (by 23:59 Oct 3rd).
In this phase please do not discuss your annotations with each other.
- Compare your annotations with one other annotator and a machine's results
and make any changes you consider necessary (you should discuss
with your partner). Partners are randomly assigned.
- Write a group paper of six to eight pages describing your results
(by 17:00 Oct 31st)
It is worth 30% of your total mark. You will be marked on the
accuracy of your annotation and the quality of your write-up.
Semantic Analysis Phase 1: Annotation
- If you have not done so already, read the story (at least up to
the part you are assigned, preferably the whole story).
This year we will look at
of the Baskervilles
- Using the on-line tool provided annotate each open class word in
a short (roughly 20-30 sentence) text.
- Estimated time 4-6 hours.
- The deadline for this sub task is 23:59 Oct 3rd, you should
have annotated every word by then. It is important you make
this deadline so that we can merge the data for you.
Semantic Analysis Phase 2: Comparison
- You will be given a comparison of your tags with those of other
annotators and a merged corpus with the majority tags tagged.
- You should re-tag any words on which all three of you
disagreed, or on which you changed your mind.
- Note that annotator C is a naive computer
— it just tags most frequent sense (mfs)
- mfs is calculated from the semcor corpus and three Sherlock
Holmes Stories (DANC, SPEC and REDH) weighted 1:3 to normalize frequencies
- Unseen proper nouns are tagged as per
- Unseen closed class words are tagged as x
- Unseen monosemous words are tagged with their single sense
- Other unseen words are tagged None
- If there are two or more equally frequent senses for a lemma
then it is tagged None
- So feel free to over-ride annotator C
Phase 3: Write up
- In the write up you should describe the strengths and weaknesses
of using a lexical resource such as wordnet to define word meaning
- Are the senses in wordnet too coarse, too fine or just right?
Justify your position.
- You should give concrete examples from the text you analyzed.
Some things you could discuss include:
- Were some words easier or harder to annotate than others?
- e.g. verbs, multiword expressions, concrete nouns, …
- In cases where you disagreed with other annotators, on
reflection, do you think: you were right; they were right; the
definition is bad; or is there some other reason?
- For words with senses missing in wordnet, you should write a
comment with enough information to create a new entry for them
consisting of, at minimum, a definition, a relational link to an
existing synset and an example. E.g.
- Lemma: arrow
- Def: To assign a task to someone. Generally used only if the task is unpleasant or boring.
- Ex: They come and arrow me type their document
- Hyponym of: delegate (02391803-v)
- You don't need an extensive literature review, but you should
read and cite the references below and if you
consult other lexicons (which you are encouraged to do) then you
should cite them. You should also cite the wordnet you used and
the stroy you tagged.
- You should also discuss how long it took you to do the
annotation, and if you think there would be ways to make the task
quicker or easier.
- Formatted according to
guidelines to submitting written work for the Division of LMS
(but see below).
- You do not have to follow the suggested structure of "Introduction, Literature Review, Methodology, Results, Discussion, Conclusion, References." A short introduction describing the task followed by Results, Discussion, Conclusion, References is enough.
- You should mention which corpus and which section you were annotating (e.g
Eng1: Sentence XXX to YYY)
- If you want to make it even more beautiful, as I am sure you do,
take a look at my (Computational)
Linguistic Style Guidelines: a guide for the flummoxed.
- Submit both hardcopy (stapled, two sided, no folder, no cover page)
and softcopy (via NTULearn).
- The deadline for the write up 23:59 Oct 31st.
- Bond, Francis, Luís Morgado da Costa, and Tuấn Anh Lê (2015)
IMI — A Multilingual Semantic Annotation Environment.
In Proceedings of ACL-IJCNLP 2015 System Demonstrations, Beijing. pp 7–12
- Christine Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database. MIT Press.
- Shari Landes, Claudia Leacock, and Christiane
Fellbaum. 1998. Building semantic
concordances. In Fellbaum (1998), chapter 8, pages 199–216.
- H. Langone and B. R. Haskell and G. A. Miller
WordNet. In Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004.
- Shan Wang and Francis Bond (2014)
Building The Sense-Tagged Multilingual Parallel Corpus In 9th Edition of the Language
Resources and Evaluation Conference (LREC 2014), Reykjavik.
Computational Linguistics Lab
Division of Linguistics and Multilingual Studies
Nanyang Technological University
Level 3, Room 55, 14 Nanyang Drive, Singapore 637332
Tel: (+65) 6592 1568; Fax: (+65) 6794 6303