General Tagging Guidelines

Outline


How the data is prepared


Actual Tagging documentation


Suggesting changes to wordnet (tag with w)

Don't forget to check for orthographic variants of existing synsets. E.g. stir-fry is in wordnet as stir fry, so stir-fry should just be added to that synset (=00326459).

Anything that you are not sure of or that won't fit in the comment, note the sentence ID and word ID (e.g. 111:2) and write it up in your report.

Please give the synset id (012345678-x) not a word when linking.


Tagging Issues

Detailed Guidelines for English

How to determine part of speech for the word/collocation. This is not always as obvious as it seems! There are four particularly tricky cases. These are all tricky because the part of speech of a word is not always the same as the grammatical function that the word is performing in the sentence or phrase. For instance, nouns can function similarly to what are traditionally called adjectives, and verbs can take on the roles of nouns or adjectives.

Adjective vs. noun modifying a noun
And sometimes after the noun: Nouns can also serve as modifiers, similar to adjectives:

The general rule of thumb for deciding whether it is a noun or adjective is to check the sense list first for whether there is an adjective sense corresponding to the word, and, if not, then whether there's a corresponding noun sense. So, damp in damp weather is an adjective, even though a noun sense exists. And cotton in cotton shirts is a noun, which is modifying another noun. If there is no adjective sense in WordNet, then you should make sure that it is not truly an adjective that is missing from WordNet. A good clue that you have an adjective is if you try to modify it with very or rather and it sounds ok: very/rather favorable conditions (ok) vs. very/rather cotton shirts (not ok). Another good clue is if you can make a comparative or superlative form out of it (damper/dampest/more favorable/most favorable conditions are all adjectives, but cottoner/cottonest/more cotton/most cotton shirts are not). If either of these tests come up ok (that is, very/rather x sounds good or either x-er/x-est or more/most x sounds good), and there is no matching adjective sense, then you need to add a new sense to wordnet. Note that these tests are only valid if they come up ok. Then you know you have an adjective for sure. If the tests are not ok, then it may still be an adjective. This is because the tests only work for certain kinds of adjectives, but not all. If the tests are not ok (that is, none of very/rather x and x-er/x- est/more x/most x sound good), then check for a matching noun sense. If there is no matching noun sense, then do not assign any sense. (But see below first regarding present and past participles, since it might be a verb!). If a noun sense does exist, then the word can be considered a noun, and be tagged to the noun sense.The noun-sense rule applies only when the word is modifying a noun. If the word is being used predicatively (that is, after some form of the verb be, or where the verb could be replaced by a verb such as seem, look, appear, etc.) In the predicative case, there may be some confusion as to whether what follows the verb is an adjective or a noun. So, in

damp is an adjective here. Notice that you can replace was with seemed/looked/appeared and still get a grammatical sentence: the weather seemed/looked/appeared damp. But note the difference between the pairs:

and

In the second pair, drunk is clearly a noun, not an adjective. It is the complement of the verb be here. Two reliable ways to recognize a noun are if it is (or can be) preceded by a determiner (such as a or the) or adjective (he was a silly drunk). In summary: When you have the situation of modifier noun, the modifier will be an adjective when there is a corresponding adjective sense in WordNet for the meaning it is being used with OR there is no corresponding adjective sense, but any of the tests come up ok ( very/rather (sounds good when you preface it with very or rather) or –er/– est/more/most (x-er/x-est/more x/most x)) , in which case Sense not in WordNet should be assigned.

The modifier will be a noun when:

If it is neither an adjective nor a noun, it might be a verb (see below)

Adjective vs. present participle (-ing form) of verb

The -ing form of verbs can function as adjectives. For instance,

How to tell? The easy case is when the word is modifying a noun. In general, these are adjectives if there is a corresponding adjective sense in WordNet. Such adjective senses exist for frightening and working . However, this is not the case for clicking and playing, so that in the following sentences,

the appropriate verb senses of click and play would be selected instead. (This is because these are verbs playing the part of adjectives, but are not adjectives in themselves.) When the word appears predicatively (after some form of the verb be), the rule can't always be applied since it might be impossible to tell whether it is being used as a verb or an adjective.

Without more information, you cannot know whether the third sentence means that the women are picketing, or whether they are beautiful. For ambiguous cases like this, if the context does not make it clear chose which you think is most appropriate and add a comment saying that it is hard to tell.

Adjective vs. past participle (usually -en form) of verb

Past tense participles can also function as adjectives. The past tense participle is the form of the verb that appears with the past tense auxiliary have. It usually, though not always, ends in -en or -ed: written, destroyed, and spun are past participles of write, destroy and spin, respectively. The rule of thumb will be similar to the present participle cases. Where the word modifies a noun, check first for a corresponding adjective sense. If no adjective sense exists, then assign the verb sense (if there is one that matches the meaning as used in the sentence).

Again, the hard cases occur when the word appears predicatively (ie., after some form of the verb "to be", or where the preceding verb can be replaced by a verb such as seem/look/appear, etc.).

In the first sentence, written is a verb. A good test of this is to put the auxiliary verb in the progressive – The sentence WAS BEING written down for clarity. That makes it clear it is an act or action that occurred. The second sentence cannot be phrased that way and still have the same sense: The sentence was being written as opposed to spoken.In the third sentence, it is not clear whether "written" refers to an act of writing, or the attribute or quality of being written. For ambiguous cases like this, do not assign a sense, and the lexicographers will make the determination.

Noun vs. present participle (-ing form) of verb

To complicate things further, the present participle of verbs can function as a noun. Often, the distinction is easy to make, if it appears where a noun is called for grammatically, and there is a corresponding noun sense in WordNet.

If no noun sense exists, then assign the verb sense, if one exists, as for

However, if the word is being used as a verb, then a noun sense should never be assigned! This is easy if there is no noun sense, as for frolicking

or when it is obviously depicting an ongoing action

You can test this out, too. A verb can never be modified by a or the or a possessive pronoun such as my/your/our, etc. Try it with the 2 sentences above--it hurts! But, again, there will be cases where this determination will be impossible to make

It is not clear whether writing in the 3rd sentence refers to the act of writing something (eg, a letter), or whether writing is the object itself (ie, her writing, or an author's writing, marks on a piece of paper, etc.) For ambiguous cases like this, assign a sense and comment on the difficulty.

Using Wordnet Relations to determine sense (or senses)

In WordNet, senses are in part defined by their relations to other senses. For this reason, the WordNet relations can be very useful in narrowing down which of the senses applies to a particular occurrence of the form. The relations for any word or collocation can be viewed through the WordNet browser. From the WordNet entry of the word you are tagging, clicking on one of the sense buttons will display the full entry: you may want to middle click to open in a new tab.

The main relations that are of help are Hypernyms, Derivationally related forms, and Domain. Not all relations will appear for all words and all parts of speech for a word.

Hypernym (ISA relation)

The immediate hypernym is the most relevant one here. It is the first indented relation just below the definition (preceded by an arrow =>). The hypernym relation will tell you what kind of thing (object or action) the word refers to. The higher up you go in the hypernym relations, the more general the senses get (and so often less informative). There is a new indentation for each level up you go. For instance, two senses of the noun center that are rather close are

If you look at the hypernyms for the noun senses of center, you can see that Sense A is an area while Sense B is a point, what they have in common is a notion of centrality. Both are at some level locations, and eventually all nouns are entities (so that knowing that something is a kind of entity is not of much help at all!).

Domain

Is this term restricted to one topic or area or field or context?

Where they exist, the domain relations can be quite helpful in narrowing senses down. A word’s domain will tell you whether it is restricted to some field or area such as Law or Art. Take the noun work. It has 7 senses, and if you look at its domains, you can see that one of its senses is restricted to the domain of physics, having to do with the transfer of energy.


References


Thanks to Christiane Fellbaum for sharing some documentation from the wordnet gloss tagging project.