School of Informatics - 2021/22

Course Information

Content

  • Item

    Course Summary

    Text Technologies for Data Science (TTDS) is a 20 credit course at Level 11, normally taken in Year 4. It runs throughout the year. The exam is in April/May, and is worth 30% of the course mark. The University descriptor is here.
  • Item

    Welcome & Learning Outcomes

    Hello Blackboard Guest, we are pleased to welcome you to Text Technologies for Data Science (2021-2022)[YR].

    This year, this course is being taught by two lecturers, Walid Magdy (left) and Björn Ross (right), and a number of teaching support staff.
    Walid Magdy     Björn Ross
    During semester 1 there will be lectures that, due to the size of this course, will take place online. You are encouraged to join live; recordings will also be made available. There will also be online drop-in lab sessions with multiple time slots that you can join at a time that is convenient for you. There will be two coursework assignments in semester 1.
    During semester 2 you will work in small groups on coursework 3, supported by us. We encourage you to meet up in person for the group project if you feel comfortable doing so, but this is your decision.
    This Learn page will be used for the submission of coursework. On the public page of the course, you will be able to find lecture slides, lab instructions and coursework descriptions. Discussions about course content will take place on Piazza. Drop-in labs will take place on Microsoft Teams.
    Learning Outcomes
    On successful completion of this course, you should be able to: 
    1. Build basic search engines from scratch, and use IR tools for searching massive collections of text documents
    2. Build feature extraction modules for text classification
    3. Implement evaluation scripts for IR and text classification
    4. Understand how web search engines (such as Google) work
    5. Work effectively in a team to produce working systems
  • Item

    Course Outline

    Syllabus:
    * Introduction to IR and text processing, system components
    * Zipf, Heaps, and other text laws
    * Pre-processing: tokenization, normalisation, stemming, stopping.
    * Indexing: inverted index, boolean and proximity search
    * Evaluation methods and measures (e.g., precision, recall, MAP, significance testing).
    * Query expansion
    * IR toolkits and applications
    * Ranked retrieval and learning to rank
    * Text classification: feature extraction, baselines, evaluation
    * Web search

  • Item

    Timetable

    If you are looking for your class times for this course, these can be found via your University of Edinburgh calendar (links provided below):
  • Item

    Informatics Teaching Organisation: Information for Students

    You can also email the Informatics Teaching Organisation (ITO) at ito@inf.ed.ac.uk  or the Student Support Team (SST) at inf-sst@inf.ed.ac.uk.