In-Class Challenge

Nov 3rd (Tue) 9:30--15:30:

Final exam

  1. Take the data from omniglot
  2. Complete the following tasks to the best of your ability
    Divide the tasks among the group
    1. Read all the files that terminate as .tsv in the Omniglot folder.
    2. Match the the language name in the file name with the ISO 639 lang codes.
      Will require some tweaking
    3. Create dictionaries of the form:
      lang_iso[ISOlangcode] = language name
      lang_iso[language name] = ISOlangcode
      E.g. The file 'Chinese (Cantonese).tsv' should be linked to 'Yue Chinese' (yue), according to the ISO 639-3.
      If a language cannot be matched to an ISO code, use the full name of the language as it appears in the file name.
    4. Create dictionaries for each language:
      Maybe break this task up
      • translations[english_phrase][ISOlangcode] = list(tuples of (translated phrases, transliteration, comment))
        If there is no transliteration for a particular language or word, use the special value None.
      • you might want to write them out to a file to make it easy to merge them
        phrase \t ISO \t translation \t transliteration \t comment
    5. For each file, parse the tab separated values assuming they are tables, where the first row defines the headers.
      For each line of the table, you should extract, at least, the English phrase (to be used as keys to the dictionaries created in 3), the translation of each English sentence in that language and the transliteration (whenever available).
    6. Load the dictionaries created in 3 with the data collected from each file. AIM TO HAVE THIS DONE (at least a first go) BY 13:30
    7. Make a table showing the coverage (in %) of each language for the collection of all English phrases presented in all files.
    8. Compare each language with each other (using original or transliteration (or better both), as you did for Project 1
      Output the pairs in order of closeness
      There will be n(n-1) pairs, so leave some time for this
  3. Hand in the deliverables
  4. Go home, happy that it is over, proud of what you have accomplished


Project Three for HG2051