HG2051: Language and The Computer

Francis Bond, 2010, 2011, 2012, 2013, 2015, 2017.

‚ÄčTraditionally linguistic analysis was done largely by hand, but computer-based methods and tools are becoming increasingly widely used in contemporary research. This course provides an introduction to skills and resources that can assist the linguist in performing fast, flexible and accurate quantitative analyses. Students will learn a scripting language (Python) and use it and the Natural Language Tool Kit (NLTK) to analyse linguistic phenomena. No previous programming experience is required: we will teach you the basics of programming, with an emphasis on useful techniques for processing languages.

Wed 12:30–15:30; TR+ 50 (Hive LHS-02-04)

Course Outline

WeekDateContent Data Structures Readings (NLTK, DiP) Projects
1 08-16 Why do NLP? Why Python? Getting Started with NLTK
1.1; 1.5
2 08-23 Language Processing and Python: Lists and Word Frequencies Lists, Sets 1.2; 1.3 DiP3 2.4
3 08-30 Language Processing and Python: Strings and Control Strings, Tuples 1.4; 1.6 DiP3 4.4
4 09-06 NLTK Text Corpora and Conditional Frequencies Dictionaries (and Functions) 2.1 2.2 2.3 DiP3 2.7
5 09-13 Lexical Resources and WordNet
2.4 2.5
6 09-20 Processing Raw Text
3.1 3.2 3.3 3.9
7 09-27 Regular Expressions
3.4; 3.5; 3.6; 3.7; 3.8; DiP3 5
- Recess
8 10-11 Mid-review: Writing Structured Programs
4.1; 4.2; 4.3; 4.4 DiP3 2.5, 3.4
- Deepavali

Project 1 Due on Monday the 23rd (5pm)
9 10-25 Bi-grams, n-grams and collocations
5.1; 5.2; 5.3
10 11-01 Part of Speech Tagging
5.4; 5.5; 5.6; 5.7
11 11-08 Classification
6.1 6.3 6.4 6.7 6.8


Handy Summary of Python and NLP Concepts


12 11-15 Final In-Class On-Line Open-Book All-Day
Programming Challenge (group i)
9:30–15:30

13 11-16 Final In-Class On-Line Open-Book All-Day
Programming Challenge (group ii)
Date to be confirmed
9:30–15:30
Project 2 Due on Friday the 17th

Textbooks and Tools

Assessment and Solutions to Problems

Here are last year's projects — I will come up with new projects by mid September.

Evaluation Criteria (same for all projects)

Assessment problems are generally open ended --- it is not expected that the student can solve them fully: the goal is to see how they approach the problem and understand it.


Course materials are heavily inspired by clt231: Introduction to Natural Language Processing at the University of Helsinki. Thanks to Graham Wilcox for letting us use them.

I will try not to make things too hard (cartoon from Abtruse Goose)

Instead this class should be like this (cartoon from XKCD)