Francis Bond, 2010, 2011, 2012, 2013, 2015, 2017.
Traditionally linguistic analysis was done largely by hand, but computer-based methods and tools are becoming increasingly widely used in contemporary research. This course provides an introduction to skills and resources that can assist the linguist in performing fast, flexible and accurate quantitative analyses. Students will learn a scripting language (Python) and use it and the Natural Language Tool Kit (NLTK) to analyse linguistic phenomena. No previous programming experience is required: we will teach you the basics of programming, with an emphasis on useful techniques for processing languages.
Wed 12:30–15:30; TR+ 50 (Hive LHS-02-04)
|Week||Date||Content||Data Structures||Readings (NLTK, DiP)||Projects|
|1||08-16||Why do NLP? Why Python? Getting Started with NLTK||1.1; 1.5|
|2||08-23||Language Processing and Python: Lists and Word Frequencies||Lists, Sets||1.2; 1.3 DiP3 2.4|
|3||08-30||Language Processing and Python: Strings and Control||Strings, Tuples||1.4; 1.6 DiP3 4.4|
|4||09-06||NLTK Text Corpora and Conditional Frequencies||Dictionaries (and Functions)||2.1 2.2 2.3 DiP3 2.7|
|5||09-13||Lexical Resources and WordNet||2.4 2.5|
|6||09-20||Processing Raw Text||3.1 3.2 3.3 3.9|
|7||09-27||Regular Expressions||3.4; 3.5; 3.6; 3.7; 3.8; DiP3 5|
|8||10-11||Mid-review: Writing Structured Programs||4.1; 4.2; 4.3; 4.4 DiP3 2.5, 3.4|
|-||Deepavali||Project 1 Due on Monday the 23rd (5pm)|
|9||10-25||Bi-grams, n-grams and collocations||5.1; 5.2; 5.3|
|10||11-01||Part of Speech Tagging||5.4; 5.5; 5.6; 5.7|
|11||11-08||Classification||6.1 6.3 6.4 6.7 6.8|
|Handy Summary of Python and NLP Concepts|
|12||11-15|| Final In-Class On-Line Open-Book All-Day
Programming Challenge (group i)
|13||11-16|| Final In-Class On-Line Open-Book All-Day
Programming Challenge (group ii)
Date to be confirmed
|9:30–15:30||Project 2 Due on Friday the 17th|
Here are last year's projects — I will come up with new projects by mid September.
Assessment problems are generally open ended --- it is not expected that the student can solve them fully: the goal is to see how they approach the problem and understand it.
Course materials are heavily inspired by clt231: Introduction to Natural Language Processing at the University of Helsinki. Thanks to Graham Wilcox for letting us use them.
I will try not to make things too hard (cartoon from Abtruse Goose)
Instead this class should be like this (cartoon from XKCD)