We are currently developing a corpus search tool that allows searching over the full corpus. Queries can be made by concepts, word, lemmas, parts-of-speech, etc., and can also be intersected to for finer results.
Results are also made available using crosslingual sentence alignment and/or displaying sentiment scores.
Below you can access the distribution of parts-of-speech across the NTUMC. We also make available mappings to the 12 universal POS tags, as described in "A Universal Part-of-Speech Tagset" by Slav Petrov, Dipanjan Das and Ryan McDonald. These mappings were made using Version 1.03, for which there was not an official release of mappings for any of the parts-of-speech sets we are currently using. For this reason, new mappings were tailor-made and may differ from previous or later versions of the official mapping provided by Petrov, Das and McDonald.

The presented lists are dynamically updated, sortable, and display the five most frequent words assigned to every part-of-speech. (*UPOS refers to the Universal Part-of-Speech Tagset)

Liling Tan and Francis Bond. 2012. Building and annotating the linguistically diverse NTU-MC (NTU-multilingual corpus). In International Journal of Asian Language Processing 22(4) pp 161–174.

Contributors: Francis Bond, Liling Tan, Tuan Anh Le, Luís Morgado da Costa.

