Extended Open Multilingual Wordnet

This page provides access to wordnets in a variety of languages, all linked to the Princeton Wordnet of English (PWN). It consists of the Open Multilingual Wordnet merged with data collected automatically from Wiktionary and the ‎Unicode Common Locale Data Repository (CLDR). The complete data collected from Wiktionary is released under a CC by SA 3.0 Licence, and the complete data collected from CLDR is released under the Unicode, Inc. Licence Agreement.

Here is the automatically constructed Wordnet data (.tgz 11M) (Bond and Foster, 2013) It contains data for over 150 languages, with an estimated accuracy of 94%. Also available as .zip (11M) and .tar.bz2 (7.8M). In the on-line search, we only include languages from Wiktionary with more than 500 entries and could not merge French and Basque as the licenses are incompatible.

If you use these wordnets, please cite the original projects who created them (see here), if you got value from this aggregation, please cite Bond and Paik (2012), and if you got value from the automatically collected data, please cite Bond and Foster (2013).

For details concerning individual wordnet licences, and the projects that made them, please visit the Open Multilingual Wordnet main page.

Search

We have a simple search interface. It uses the SQL database originally developed by the Japanese Wordnet.

We currently have 150 languages available in the Extended Open Multilingual Wordnet

Lang Code Synsets Words Senses Core
Afar aar 51 63 70 0%
Afrikaans afr 2,015 2,045 2,254 13%
Akan aka 319 268 339 0%
Albanian als 6,620 8,019 12,122 39%
Amharic amh 440 397 459 0%
Old English (ca. 450-1100) ang 700 763 888 6%
Arabic arb 14,650 24,713 46,417 62%
Egyptian Arabic arz 633 728 772 6%
Assamese asm 41 50 55 0%
Asturian ast 1,621 1,662 1,845 12%
Azerbaijani aze 1,923 2,144 2,323 11%
Bambara bam 317 280 335 0%
Belarusian bel 2,734 3,133 3,357 16%
Bengali ben 1,733 1,946 2,109 9%
Tibetan bod 287 272 318 0%
Bosnian bos 554 523 587 0%
Breton bre 1,290 1,389 1,532 7%
Bulgarian bul 11,857 13,252 18,608 99%
Catalan cat 48,005 48,881 74,806 84%
Czech ces 13,030 13,742 15,813 54%
Cherokee chr 541 506 588 4%
Chinese (simplified) cmn 49,023 78,194 108,653 100%
Cornish cor 53 51 72 0%
Welsh cym 2,333 2,445 2,726 14%
Danish dan 10,328 10,560 13,550 85%
German deu 19,857 24,751 29,884 64%
Dzongkha dzo 154 144 174 0%
Greek ell 24,264 24,921 34,121 70%
English eng 117,659 151,688 213,480 100%
Esperanto epo 7,345 8,110 8,716 34%
Estonian est 4,317 4,617 4,986 21%
Basque eus 29,965 26,990 50,075 72%
Ewe ewe 419 380 442 0%
Faroese fao 2,049 2,335 2,520 11%
Farsi fas 20,766 20,574 35,318 55%
Finnish fin 116,830 133,987 199,515 100%
French fra 59,091 55,378 102,678 92%
Western Frisian fry 957 973 1,058 8%
Fulah ful 319 279 339 0%
Friulian fur 576 672 686 5%
Scottish Gaelic gla 5,498 4,674 7,105 32%
Irish gle 5,043 5,179 6,267 23%
Galician glg 20,772 24,425 29,136 42%
Manx glv 1,501 1,609 1,773 7%
Ancient Greek (to 1453) grc 556 583 630 4%
Gujarati guj 543 504 575 0%
Haitian hat 547 533 569 5%
Hausa hau 370 300 390 0%
Serbo-Croatian hbs 7,572 16,744 18,716 36%
Hebrew heb 9,653 10,692 13,203 46%
Hindi hin 3,593 4,381 4,908 21%
Croatian hrv 23,406 29,308 48,285 100%
Hungarian hun 10,213 11,226 13,029 45%
Armenian hye 6,106 6,240 7,598 32%
Igbo ibo 75 90 93 0%
Ido ido 1,744 1,899 2,005 12%
Sichuan Yi iii 47 49 54 0%
Interlingua ina 1,360 1,395 1,547 8%
Indonesian ind 38,569 37,775 108,054 95%
Icelandic isl 9,484 16,879 22,807 99%
Italian ita 42,063 50,581 82,741 89%
Japanese jpn 59,112 99,080 177,594 96%
Kalaallisut kal 283 255 306 0%
Kannada kan 961 1,038 1,152 4%
Georgian kat 3,256 3,454 3,792 20%
Kazakh kaz 1,124 1,310 1,377 8%
Central Khmer khm 1,994 2,249 2,470 13%
Kikuyu kik 319 273 339 0%
Kinyarwanda kin 113 125 132 0%
Kirghiz kir 793 1,369 1,472 7%
Korean kor 6,506 8,684 9,560 31%
Kurdish kur 1,654 2,436 2,766 15%
Lao lao 1,136 1,272 1,380 9%
Latin lat 4,199 4,791 6,008 24%
Latvian lav 3,660 3,900 4,381 22%
Lingala lin 318 262 338 0%
Lithuanian lit 12,000 13,958 19,328 44%
Latgalian ltg 712 883 942 7%
Luxembourgish ltz 1,407 1,521 1,581 10%
Luba-Katanga lub 317 273 337 0%
Ganda lug 317 273 337 0%
Malayalam mal 1,155 1,286 1,422 6%
Marathi mar 1,564 1,686 1,831 5%
Macedonian mkd 6,112 6,884 7,941 28%
Malagasy mlg 317 277 336 0%
Maltese mlt 1,860 1,964 2,156 9%
Mongolian mon 1,075 1,174 1,219 8%
Maori mri 550 579 636 5%
Burmese mya 1,090 1,211 1,349 9%
Min Nan Chinese nan 308 542 554 3%
Navajo nav 1,861 1,803 1,999 9%
South Ndebele nbl 19 38 38 0%
North Ndebele nde 317 285 337 0%
Nepali (macrolanguage) nep 393 347 411 0%
Dutch nld 33,925 46,552 69,088 79%
Nynorsk nno 5,162 5,035 6,664 66%
Bokmål nob 10,302 10,532 13,598 84%
Occitan (post 1500) oci 1,323 1,463 1,551 9%
Oriya (macrolanguage) ori 541 487 561 0%
Oromo orm 114 64 133 0%
Panjabi pan 55 71 71 0%
Polish pol 37,696 49,395 59,870 68%
Portuguese por 43,895 54,071 74,012 84%
Pushto pus 596 559 716 4%
Chinese (traditional) qcn 4,913 3,206 8,069 28%
Romansh roh 1,167 1,700 1,897 7%
Romanian ron 57,254 52,772 90,336 95%
Rundi run 319 270 339 0%
Macedo-Romanian rup 672 847 914 7%
Russian rus 20,138 25,481 34,009 64%
Sango sag 320 275 340 0%
Sanskrit san 582 716 766 5%
Sicilian scn 540 628 652 5%
Sinhala sin 611 690 754 4%
Slovak slk 20,508 31,224 46,898 63%
Slovene slv 43,190 41,205 72,516 87%
Northern Sami sme 383 355 408 0%
Shona sna 318 276 338 0%
Somali som 374 326 394 0%
Southern Sotho sot 100 118 119 0%
Spanish spa 47,737 47,762 74,848 86%
Sardinian srd 382 865 884 4%
Serbian srp 555 530 599 0%
Swati ssw 19 38 38 0%
Swahili (macrolanguage) swa 3,628 2,993 4,149 23%
Swedish swe 15,751 15,934 20,754 99%
Tamil tam 1,429 1,543 1,668 8%
Tatar tat 550 613 642 5%
Telugu tel 2,506 2,736 3,014 16%
Tajik tgk 926 1,131 1,170 8%
Tagalog tgl 1,333 1,481 1,620 11%
Thai tha 73,593 83,658 97,388 81%
Tigrinya tir 374 330 384 0%
Tonga (Tonga Islands) ton 418 254 440 0%
Tswana tsn 92 102 111 0%
Tsonga tso 43 43 62 0%
Turkmen tuk 680 825 860 7%
Turkish tur 7,953 9,385 10,923 35%
Ukrainian ukr 3,761 4,502 4,887 20%
Urdu urd 2,073 2,424 2,738 15%
Uzbek uzb 889 1,115 1,157 8%
Venda ven 19 38 38 0%
Vietnamese vie 3,869 5,238 5,950 23%
Volapük vol 2,881 5,162 5,521 15%
Xhosa xho 110 129 129 0%
Yiddish yid 982 1,071 1,124 8%
Yoruba yor 365 28 382 0%
Yue Chinese yue 324 515 527 2%
Malaysian zsm 37,423 35,035 106,640 97%
Zulu zul 792 770 961 4%

Language codes linked to Lewis, M. Paul (ed.), 2009. Ethnologue: Languages of the World, Sixteenth edition. Dallas, Tex.: SIL International. Online version: http://www.ethnologue.com/

References

(BibTeX Complete References)
als Ervin Ruci (2008)
On the current state of Albanet and related applications, Technical Report, University of Vlora
all Francis Bond and Kyonghee Paik (2012)
A survey of wordnets and their licenses In Proceedings of the 6th Global WordNet Conference (GWC 2012). Matsue. 64–71
     Francis Bond and Ryan Foster (2013)
Linking and extending an open multilingual wordnet. In 51st Annual Meeting of the Association for Computational Linguistics: ACL-2013. Sofia. 1352–1362
arb Black W., Elkateb S., Rodriguez H., Alkhalifa M., Vossen P., Pease A., Bertran M., Fellbaum C., (2006)
The Arabic WordNet Project, Proceedings of LREC 2006
cat, eus, glg, spa, Aitor Gonzalez-Agirre, Egoitz Laparra and German Rigau (2012)
Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base. In Proceedings of the 6th Global WordNet Conference (GWC 2012) Matsue, Japan.
core Boyd-Graber, J., Fellbaum, C., Osherson, D., and Schapire, R. (2006)
Adding dense, weighted connections to WordNet. In: Proceedings of the Third Global WordNet Meeting, Jeju Island, Korea, January 2006
cmn Shan Wang and Francis Bond (2013)
Building the Chinese Open Wordnet (COW): Starting from Core Synsets. In Proceedings of the 11th Workshop on Asian Language Resources, a Workshop of The 6th International Joint Conference on Natural Language Processing (IJCNLP-6). Nagoya, Japan. pp.10–18.
qcn Huang, C.-R., Hsieh, S.-K., Hong, J.-F., Chen, Y.-Z., Su, I.-L., Chen, Y.-X., and Huang, S.-W. (2010).
Chinese wordnet: Design and implementation of a cross-lingual knowledge processing infrastructure. In Journal of Chinese Information Processing. 24:2 pp 14–23. (in Chinese)
dan Pedersen, B. S., Nimb, S., Asmussen, J., Sørensen, N. H., Trap-Jensen, L. and Lorentzen, H. (2009)
DanNet -- the challenge of compiling a WordNet for Danish by reusing a monolingual dictionary Language Resources and EvaluationVolume 43:3 pp. 269-299
eng Christiane Fellbaum. (ed.) (1998)
WordNet: An Electronic Lexical Database, MIT Press
fre Benoit Sagot and Darla Fišer (2008)
Building a free French wordnet from multilingual resources, E. L. R. A. (ELRA) (ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco
heb Noam Ordan and Shuly Wintner (2007)
Hebrew WordNet: a test case of aligning lexical databases across languages. International Journal of Translation 19(1):39–58, 2007
ita Emanuele Pianta, Luisa Bentivogli and Christian Girardi. (2002)
MultiWordNet: Developing an Aligned Multilingual Database. In Proceedings of the First International Conference on Global WordNet, Mysore, India, January 21-25, 2002, pp. 293-302.
ind,zsm Nurril Hirfana Mohamed Noor, Suerya Sapuan and Francis Bond (2011)
Creating the open Wordnet Bahasa In Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25) pages 258–267. Singapore
jpn Hitoshi Isahara, Francis Bond, Kiyotaka Uchimoto, Masao Utiyama and Kyoko Kanzaki (2008)
Development of Japanese WordNet. In LREC-2008, Marrakech.
fas Montazery, Mortaza and Heshaam Faili (2010)
Automatic Persian WordNet Construction the 23rd International conference on computational linguistics pp. 846–850
fin Lindén K., Carlson. L., (2010)
FinnWordNet — WordNet påfinska via översättning,LexicoNordica — Nordic Journal of Lexicography, 17:119–140
sentiwn Baccianella, Andrea Esuli Stefano and Sebastiani, Fabrizio, (2010)
SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining., Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) , Valletta, Malta, 2010
ml-senticon Cruz, Fermín L., José A. Troyano, Beatriz Pontes, F. Javier Ortega, (2014)
Building layered, multilingual sentiment lexicons at synset and lemma levels, Expert Systems with Applications , 2014
mapp Jordi Daudé, Lluís Padró and German Rigau (2000)
Mapping WordNets Using Structural Information. 38th Annual Meeting of the Association for Computational Linguistics (ACL'2000), Hong Kong
pol Maciej Piasecki, Stanisław Szpakowicz and Bartosz Broda. (2009)
A Wordnet from the Ground Up. Wroclaw: Oficyna Wydawnicza Politechniki Wroclawskiej, Poland.
nno,nob Fjeld, Ruth Vatvedt and Nygaard, Lars (2009)
NorNet - a monolingual wordnet of modern Norwegian In Proceedings of the NODALIDA 2009 workshop WordNets and other Lexical Semantic Resources — between Lexical Semantics, Lexicography, Terminology and Formal Ontologies. pages 13–16. Estonia
por Valeria de Paiva and Alexandre Rademaker (2012)
Revisiting a Brazilian wordnet. In Proceedings of Global Wordnet Conference, Matsue. Global Wordnet Association. (also with Gerard de Melo's contribution)
slv Fišer, Darja, and Novak, Jernej, and Eejavec, Tomaž (2012)
sloWNet 3.0: development, extension and cleaning. In Proceedings of the 6th International Global Wordnet Conference (GWC 2012).. The Global WordNet Association, pp. 113-117
sumo Adam Pease (2011)
Ontology: A Practical Guide. Articulate Software Press, Angwin, CA. ISBN 978-1-889455-10-5.
sumo Niles, I and Adam Pease (2001)
Toward a Standard Upper Ontology. In Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), Chris Welty and Barry Smith, eds.
swe Borin, Lars and Forsberg, Markus and Lönngren, Lennart (2013)
SALDO: a touch of yin to WordNet's yang. Language Resources and Evaluation 47(4):1191–1211, 2013
tempo Gaël Dias, Mohammed Hasanuzzaman, Stéphane Ferrari, Yann Mathet (2014)
TempoWordNet for Sentence Time Tagging. Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion pages 833–838, Switzerland
tha Thoongsup S., Charoenporn T., Robkop K., Sinthurahat T., Mokarat C., Sornlertlamvanich V., Isahara H. (2009)
Thai Wordnet Construction Proceedings of The 7th Workshop on Asian Language Resources (ALR7), Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics (ACL) and the 4th International Joint Conference on Natural Language Processing (IJCNLP) Suntec, Singapore

Contributors: Francis Bond, Ryan Foster, Lars Nygaard, Adam Pease, John McRae, Luís Morgado da Costa and all the wordnet projects.


Francis Bond <bond@ieee.org>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
Level 3, Room 55, 14 Nanyang Drive, Singapore 637332
Tel: (+65) 6592 1568; Fax: (+65) 6794 6303