Open Multilingual Wordnet

This page provides access to wordnets in a variety of languages, all linked to the Princeton Wordnet of English (PWN). The goal is to make it easy to use wordnets in multiple languages. The individual wordnets have been made by many different projects and vary greatly in size and accuracy. This page has (i) extracted and normalized the data, (ii) linked to it Princeton WordNet 3.0 and (iii) put it in one place. This page only includes those with a license that allows redistribution. There is a fuller list at the Global Wordnet Association's Wordnets in the World page.

If you use these wordnets, please cite the original projects who created them (linked in Table 1), if you got value from this aggregation, please cite Bond and Paik (2012).

We have an extended version with automatically extracted data for over a 150 languages from Wiktionary and the ‎Unicode Common Locale Data Repository (Bond and Foster, 2013).

Documentation, News and Updates

Search

We have a simple search interface (search the extended wordnet). It uses the SQL database originally developed by the Japanese Wordnet.

Available Wordnets
Wordnet Lang Synsets Words Senses Core Licence Data Citation
Albanet als 4,676 5,990 9,602 31% CC BY 3.0 als.zip (+xml) cite:als; (.bib)
Arabic WordNet (AWN) arb 10,165 14,595 21,751 48% CC BY SA 3.0 arb.zip (+xml) cite:arb; (.bib)
BulTreeBank Wordnet (BTB-WN) bul 4,959 6,720 8,936 99% CC BY 3.0 bul.zip (+xml) cite:bul; (.bib)
Chinese Open Wordnet cmn 42,312 61,533 79,809 100% wordnet cmn.zip (+xml) cite:cmn; (.bib)
Chinese Wordnet (Taiwan) qcn 4,913 3,206 8,069 28% wordnet qcn.zip (+xml) cite:qcn; (.bib)
DanNet dan 4,476 4,468 5,859 81% wordnet dan.zip (+xml) cite:dan; (.bib)
Greek Wordnet ell 18,049 18,227 24,106 57% Apache 2.0 ell.zip (+xml) cite:ell; (.bib)
Princeton WordNet eng 117,659 148,730 206,978 100% wordnet eng.zip (+xml) cite:eng; (.bib)
Persian Wordnet fas 17,759 17,560 30,461 41% Free to use fas.zip (+xml) cite:fas; (.bib)
FinnWordNet fin 116,763 129,839 189,227 100% CC BY 3.0 fin.zip (+xml) cite:fin; (.bib)
WOLF (Wordnet Libre du Français) fra 59,091 55,373 102,671 92% CeCILL-C fra.zip (+xml) cite:fra; (.bib)
Hebrew Wordnet heb 5,448 5,325 6,872 27% wordnet heb.zip (+xml) cite:heb; (.bib)
MultiWordNet ita 34,728 40,343 61,558 83% CC BY 3.0 ita.zip (+xml) cite:ita; (.bib)
Japanese Wordnet jpn 57,184 91,964 158,069 95% wordnet jpn.zip (+xml) cite:jpn; (.bib)
Multilingual Central Repository cat 45,826 46,531 70,622 81% CC BY 3.0 cat.zip (+xml) cite:cat; (.bib)
Multilingual Central Repository eus 29,413 26,240 48,934 71% CC BY-NC-SA 3.0 eus.zip (+xml) cite:eus; (.bib)
Multilingual Central Repository glg 19,312 23,124 27,138 36% CC BY 3.0 glg.zip (+xml) cite:glg; (.bib)
Multilingual Central Repository spa 38,512 36,681 57,764 76% CC BY 3.0 spa.zip (+xml) cite:spa; (.bib)
Wordnet Bahasa ind 51,822 65,076 142,854 99% MIT ind.zip (+xml) cite:ind; (.bib)
Wordnet Bahasa zsm 42,679 51,481 119,529 99% MIT zsm.zip (+xml) cite:zsm; (.bib)
Norwegian Wordnet nno 3,671 3,387 4,762 66% wordnet nno.zip (+xml) cite:nno; (.bib)
Norwegian Wordnet nob 4,455 4,186 5,586 81% wordnet nob.zip (+xml) cite:nob; (.bib)
plWordNet pol 28,757 39,146 44,970 49% wordnet pol.zip (+xml) cite:pol; (.bib)
OpenWN-PT por 43,895 54,071 74,012 84% CC BY-SA por.zip (+xml) cite:por; (.bib)
sloWNet slv 42,583 40,233 70,947 86% CC BY SA 3.0 slv.zip (+xml) cite:slv; (.bib)
Swedish (SALDO) swe 6,796 5,824 6,904 99% CC-BY 3.0 swe.zip (+xml) cite:swe; (.bib)
Thai Wordnet tha 73,350 82,504 95,517 81% wordnet tha.zip (+xml) cite:tha; (.bib)

42 synsets shared from 117,673 (0%)

Language codes linked to Lewis, M. Paul (ed.), 2009. Ethnologue: Languages of the World, Sixteenth edition. Dallas, Tex.: SIL International. Online version: http://www.ethnologue.com/

Data has, for each language, the script to make the tab file, the tab file, the wordnet LMF file and the LICENSE file. You can also get this with wordnet-LMF and lemon-rdf encoded files (+xml). If you want all the languages in one file, it is here: data for all of the wordnets, data for all of the wordnets with wordnet-LMF and lemon-rdf (big file).

Core refers to the percentage of synsets covered from the semi-automatically compiled list of 5000 "core" word senses in Princeton WordNet (approximately the 5000 most frequently used word senses). They are marked with ✪ in the interface. The original list is here from http://wordnetcode.princeton.edu/standoff-files/core-wordnet.txt (Boyd-Graber et al., 2008). Our version (converted to wn30 synsets).

The wordnets are linked to the Suggested Upper Merged Ontology (Sumo: Niles and Pease, 2001; Pease, 2011); the TempoWordNet (Dias et al., 2014); the Multilingual, layered sentiment lexicons (ML-SentiCon: Cruz et al., 2014); and SentiWordNet3.0 (Baccianella et al., 2010).

The fullest list of wordnets is the Global Wordnet Association's Wordnets in the World.

Mapping between wordnet versions was done using the mappings from TALP at UPC (Daudé et al. 2000).

Formats

Tab files

The wn-data-*.tab files are tab separated files of synset-lemma pairs.

  # name␉lang␉url␉license
offset-pos␉type␉lemma
offset-pos␉type␉lemma
...
name the name of the project
lang the iso 3 letter code for the name
url the url of the project
license a short name for the license
offset the Princeton WordNet 3.0 offset 8 digit offset
pos one of [a,v,n,r] (we treat 's' as 'a')
lemma the lemma (word separator normalized to ' ')

Example:

# Thai	tha	http://th.asianwordnet.org/	wordnet 
13567960-n	tha:lemma	กระบวนการทรานแอมมิแนชัน
00155298-n	tha:lemma	การปฏิเสธ
14369530-n	tha:lemma	ภาวะการหายใจเร็วของทารกแรกเกิด
10850469-n	tha:lemma	เบธัน
11268326-n	tha:lemma	เรินต์เกน

For this data to be really useful you need to combine it with the synset relations from the Princeton wordnet.

Wordnet LMF Files

Wordnet-LMF format files are made by combining the tab files with the Princeton wordnet. Note: individual wordnet projects may have better versions of the wordnet LMF files.

Known Problems

References

(BibTeX Complete References)
als Ervin Ruci (2008)
On the current state of Albanet and related applications, Technical Report, University of Vlora
all Francis Bond and Kyonghee Paik (2012)
A survey of wordnets and their licenses In Proceedings of the 6th Global WordNet Conference (GWC 2012). Matsue. 64–71
     Francis Bond and Ryan Foster (2013)
Linking and extending an open multilingual wordnet. In 51st Annual Meeting of the Association for Computational Linguistics: ACL-2013. Sofia. 1352–1362
arb Black W., Elkateb S., Rodriguez H., Alkhalifa M., Vossen P., Pease A., Bertran M., Fellbaum C., (2006)
The Arabic WordNet Project, Proceedings of LREC 2006
bul Simov, Kiril and Osenova, Petya (2010)
Constructing of an Ontology-based Lexicon for Bulgarian, Proceedings of LREC 2010
cat, eus, glg, spa, Aitor Gonzalez-Agirre, Egoitz Laparra and German Rigau (2012)
Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base. In Proceedings of the 6th Global WordNet Conference (GWC 2012) Matsue, Japan.
core Boyd-Graber, J., Fellbaum, C., Osherson, D., and Schapire, R. (2006)
Adding dense, weighted connections to WordNet. In: Proceedings of the Third Global WordNet Meeting, Jeju Island, Korea, January 2006
cmn Shan Wang and Francis Bond (2013)
Building the Chinese Open Wordnet (COW): Starting from Core Synsets. In Proceedings of the 11th Workshop on Asian Language Resources, a Workshop of The 6th International Joint Conference on Natural Language Processing (IJCNLP-6). Nagoya, Japan. pp.10–18.
qcn Huang, C.-R., Hsieh, S.-K., Hong, J.-F., Chen, Y.-Z., Su, I.-L., Chen, Y.-X., and Huang, S.-W. (2010).
Chinese wordnet: Design and implementation of a cross-lingual knowledge processing infrastructure. In Journal of Chinese Information Processing. 24:2 pp 14–23. (in Chinese)
dan Pedersen, B. S., Nimb, S., Asmussen, J., Sørensen, N. H., Trap-Jensen, L. and Lorentzen, H. (2009)
DanNet -- the challenge of compiling a WordNet for Danish by reusing a monolingual dictionary Language Resources and EvaluationVolume 43:3 pp. 269-299
eng Christiane Fellbaum. (ed.) (1998)
WordNet: An Electronic Lexical Database, MIT Press
fre Benoit Sagot and Darla Fišer (2008)
Building a free French wordnet from multilingual resources, E. L. R. A. (ELRA) (ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco
heb Noam Ordan and Shuly Wintner (2007)
Hebrew WordNet: a test case of aligning lexical databases across languages. International Journal of Translation 19(1):39–58, 2007
ita Emanuele Pianta, Luisa Bentivogli and Christian Girardi. (2002)
MultiWordNet: Developing an Aligned Multilingual Database. In Proceedings of the First International Conference on Global WordNet, Mysore, India, January 21-25, 2002, pp. 293-302.
ind,zsm Nurril Hirfana Mohamed Noor, Suerya Sapuan and Francis Bond (2011)
Creating the open Wordnet Bahasa In Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25) pages 258–267. Singapore
jpn Hitoshi Isahara, Francis Bond, Kiyotaka Uchimoto, Masao Utiyama and Kyoko Kanzaki (2008)
Development of Japanese WordNet. In LREC-2008, Marrakech.
fas Montazery, Mortaza and Heshaam Faili (2010)
Automatic Persian WordNet Construction the 23rd International conference on computational linguistics pp. 846–850
fin Lindén K., Carlson. L., (2010)
FinnWordNet — WordNet påfinska via översättning,LexicoNordica — Nordic Journal of Lexicography, 17:119–140
sentiwn Baccianella, Andrea Esuli Stefano and Sebastiani, Fabrizio, (2010)
SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining., Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) , Valletta, Malta, 2010
ml-senticon Cruz, Fermín L., José A. Troyano, Beatriz Pontes, F. Javier Ortega, (2014)
Building layered, multilingual sentiment lexicons at synset and lemma levels, Expert Systems with Applications , 2014
mapp Jordi Daudé, Lluís Padró and German Rigau (2000)
Mapping WordNets Using Structural Information. 38th Annual Meeting of the Association for Computational Linguistics (ACL'2000), Hong Kong
pol Maciej Piasecki, Stanisław Szpakowicz and Bartosz Broda. (2009)
A Wordnet from the Ground Up. Wroclaw: Oficyna Wydawnicza Politechniki Wroclawskiej, Poland.
nno,nob Fjeld, Ruth Vatvedt and Nygaard, Lars (2009)
NorNet - a monolingual wordnet of modern Norwegian In Proceedings of the NODALIDA 2009 workshop WordNets and other Lexical Semantic Resources — between Lexical Semantics, Lexicography, Terminology and Formal Ontologies. pages 13–16. Estonia
por Valeria de Paiva and Alexandre Rademaker (2012)
Revisiting a Brazilian wordnet. In Proceedings of Global Wordnet Conference, Matsue. Global Wordnet Association. (also with Gerard de Melo's contribution)
slv Fišer, Darja, and Novak, Jernej, and Eejavec, Tomaž (2012)
sloWNet 3.0: development, extension and cleaning. In Proceedings of the 6th International Global Wordnet Conference (GWC 2012).. The Global WordNet Association, pp. 113-117
sumo Adam Pease (2011)
Ontology: A Practical Guide. Articulate Software Press, Angwin, CA. ISBN 978-1-889455-10-5.
sumo Niles, I and Adam Pease (2001)
Toward a Standard Upper Ontology. In Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), Chris Welty and Barry Smith, eds.
swe Borin, Lars and Forsberg, Markus and Lönngren, Lennart (2013)
SALDO: a touch of yin to WordNet's yang. Language Resources and Evaluation 47(4):1191–1211, 2013 tempo Gaël Dias, Mohammed Hasanuzzaman, Stéphane Ferrari, Yann Mathet (2014)
TempoWordNet for Sentence Time Tagging. Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion pages 833–838, Switzerland
tha Thoongsup S., Charoenporn T., Robkop K., Sinthurahat T., Mokarat C., Sornlertlamvanich V., Isahara H. (2009)
Thai Wordnet Construction Proceedings of The 7th Workshop on Asian Language Resources (ALR7), Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics (ACL) and the 4th International Joint Conference on Natural Language Processing (IJCNLP) Suntec, Singapore

Contributors: Francis Bond, Lars Nygaard, Adam Pease, John McRae, Luís Morgado da Costa and all the wordnet projects.


Francis Bond <bond@ieee.org>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
Level 3, Room 55, 14 Nanyang Drive, Singapore 637332
Tel: (+65) 6592 1568; Fax: (+65) 6794 6303