Natural Language Processing
Overview NLP is a large and multidisciplinary field, so this course can only provide a very general introduction. The first lecture is designed to give an overview of the main subareas and a very brief idea of the main applications and the methodologies which have been employed. The history of NLP is briefly discussed as a way of putting this into perspective. The next six lectures describe some of the main subareas in more detail. The organisation is roughly based on increased ‘depth’ of processing, starting with relatively surface-oriented techniques and progressing to considering meaning of sentences and meaning of utterances in context. Most lectures will start off by considering the subarea as a whole and then go on to describe one or more sample algorithms which tackle particular problems. The algorithms have been chosen because they are relatively straightforward to describe and because they illustrate a specific technique which has been shown to be useful, but the idea is to exemplify an approach, not to give a detailed survey (which would be impossible in the time available). (Lecture 5 is a bit different in that it concentrates on a data structure instead of an algorithm.) The final lecture brings the preceding material together in order to describe the state of the art in three sample applications. There are various themes running throughout the lectures. One theme is the connection to linguistics and the tension that sometimes exists between the predominant view in theoretical linguistics and the approaches adopted within NLP. A somewhat related theme is the distinction between knowledge-based and probabilistic approaches. Evaluation will be discussed in the context of the different algorithms. Because NLP is such a large area, there are many topics that aren’t touched on at all in these lectures. Speech recognition and speech synthesis is almost totally ignored. Information retrieval and information extraction are the topic of a separate course given by Simone Teufel, for which this course is a prerequisite. Feedback on the handout, lists of typos etc, would be greatly appreciated
https://www.cl.cam.ac.uk/teaching/2002/NatLangProc/revised.pdf