Course Information
Content
-
Course Summary
Foundations of Natural Language Processing (FNLP) is a 20 credit course at Level 10, normally taken in Year 3. It runs in Semester 2. The exam is in April/May, and is worth 75% of the course mark. The University descriptor is here. -
Timetable
If you are looking for your class times for this course, these can be found via your University of Edinburgh calendar (links provided below): -
Informatics Teaching Organisation: Information for Students
The Informatics intranet has useful information on the following:- Induction
- Student handbooks, with detailed information on
- courses
- assessment
- support, and
- contacts for each year.
You can also email the Informatics Teaching Organisation (ITO) at ito@inf.ed.ac.uk or the Student Support Team (SST) at inf-sst@inf.ed.ac.uk. -
Learning Outcomes
On successful completion of this course:- Given an appropriate NLP problem, students should be able to select a corpus and an annotation scheme for the problem and justify the choice over other candidates.
- Students should also be able to identify suitable evaluation measures for the problem and provide a written explanation of the role of annotated corpora in natural language processing.
- Given one of the main linguistic issues relevant to NLP (including the representation and induction of syntactic knowledge, and the modelling of lexical and semantic information, and the syntax-semantics interface), students should be able to construct an example of the issue and provide an explanation of how their example illustrates the issue in general.
- Given an example of one of the main linguistic issues identified above, students should be able to classify it as belonging to that issue and relate the example to the issue in general.
- Given an NLP problem, students should be able to analyse, assess and justify which algorithms are most appropriate for solving the problem, based on an understanding of fundamental algorithms such as Viterbi algorithm, inside-outside, chart-based parsing and generation.
-
Course Outline
This course covers some of the linguistic and algorithmic foundations of natural language processing (NLP). It builds on algorithmic and data science concepts developed in second year courses, applying these to NLP problems. It also equips students for more advanced NLP courses in year 4. The course is strongly empirical, using corpus data to illustrate both core linguistic concepts and algorithms, including language modeling, part of speech tagging, syntactic processing, the syntax-semantics interface, and aspects of semantic and pragmatic processing. The theoretical study of linguistic concepts and the application of algorithms to corpora in the empirical analysis of those concepts will be interleaved throughout the course.An indicative list of topics to be covered include the following (although they won't be presented in this order):
1. Lexicon and lexical processing:
* morphology
* language modeling
* hidden Markov Models and associated algorithms
* part of speech tagging (e.g., for a language other than English) to illustrate HMMs
* smoothing
* text classification
2. Syntax and syntactic processing:
* the Chomsky hierarchy
* syntactic concepts: constituency (and tests for it), subcategorization, bounded and unbounded dependencies, feature representations
* context-free grammars
* lexicalized grammar formalisms (e.g., dependency grammar)
* chart parsing and dependency parsing (eg, shift-reduce parsing)
* treebanks: lexicalized grammars and corpus annotation
* statistical parsing
3. Semantics and semantic processing:
* word senses: regular polysemy and the structured lexicon; distributional models; word embeddings (including biases found)
* compositionality, constructing a formal semantic representation from a (disambiguated) sentential syntactic analysis.
* predicate argument structure
* word sense disambiguation
* semantic role labelling
* pragmatic phenomena in discourse and dialogue, including anaphora, presuppositions, implicatures and coherence relations.
* labelled corpora addressing word senses (e.g., Brown), semantic roles (e.g., Propbank, SemCor), discourse information (e.g., PDTB, STAC, RST Treebank).
4. Data and evaluation (interspersed throughout other topics):
* cross-linguistic similarities and differences
* commonly used datasets
* annotation methods and issues (e.g., crowdsourcing, inter-annotator agreement)
* evaluation methods and issues (e.g., standard metrics, baselines)
* effects of biases in data -
Weekly Activities
This year, the course will be delivered mainly on campus. Roughly put:
- There are 3 in person lectures each week (see the timetable for details). We will use a rota system to limit those who can attend to the university guidelines of 120 students maximum (there are around 140 of you!). For those who are not attending in person, there are either pre-recorded videos (with edited captions) for some lectures; or there will be a video of the in-person lecture uploaded soon after it has happened.
- There are also 3 online post-lecture quizzes each week, to be done in your own time after watching the lecture videos and/or attending the in-person lecture, to test your understanding of the content of the lecture.
- 1 in person tutorial every other week. You should attempt to do the tutorial exercises in advance. There are 5 tutorials, in weeks 2, 4, 6, 8 and 10. Check out which tutorial group you are in under Groups. The tutorial exercises are also available on the LHS menu under Tutorial Exercises.
- There are lab sessions in weeks 3, 5, 7, 9 and 11. The class is divided into 2 groups; please check which group you're in under Groups. These labs are designed to enable independent work, and so you can also do the lab exercises in your own time.
- As always, you can ask TAs and demonstrators questions on the discussion forum piazza about both the lab exercises and the two pieces of coursework, as and when those queries arise;
It is more important than ever that you schedule your study activities effectively. We suggest the following weekly schedule:Monday: - In weeks 2, 4, 6, 8 and 10, work on the tutorial exercises that are to be discussed in tutorials that week. Try to complete as much of them as you can before your tutorial, so that you can discuss any problems or issued you had with your tutor.
- Start reading that week's required reading.
The online quizzes, required reading and videos are available in Course Materials. It should take you about 6 hours total each week to watch the videos or attend the lectures, do the quizzes and read the required reading that is set for that week.Tuesday: - Continue working through the week's content (lecture videos, online quizzes, required reading).
Wednesday: - Continue working through the week's content (lecture videos, online quizzes, required reading).
Thursday: - Complete working through the week's content (lecture videos, online quizzes, required reading).
- Start working on the tutorial exercises for the following week (if it's tutorial week).
Friday: - 10am: In person lecture.
In addition to the above, you will also get an in person tutorial (in weeks 2,4, 6, 8 and 10): the schedule for that depends on which group you are in. Check this out under Groups.Overall, each week, the Directed Learning and Independent Learning activities (i.e. the guided self-study activities, such as preparing your tutorial assignments, doing the required reading, or doing the lab exercises) should take you about 10 hours in total. This estimate does not include the time you need to do the two pieces of FNLP coursework.

