Pattern Recognition
Section
Instructions: Clicking on the section name will show / hide the section.
Course description
REQUIREMENTS AND PRIOR KNOWLEDGE
There is no formal prerequisites for this course.
GENERAL DESCRIPTION OF THE SUBJECT
This is an introductory course that covers some of the most fundamental topics of exact string pattern recognition. There will be general descriptions of those topics, but there will not be an in-depth discussion of each. Instead, the course is intended to give the student an overview of the field.
OBJETIVES: KNOWLEDGE AND SKILLS
The goals of this course include:
- To know the theoretical and algorithmic foundations of exact string pattern recognition.
- To provide the students with a hands-on approach that will include their knowing practical issues involved in the programming of patternrecognition algorithms.
- To know the main applications of exact string pattern recognition to other problems in computer science.
- To know some applications of exact string pattern recognition to problems found in other fields, in particular, in Computational Biology and Computational Music Theory.
TEACHING MATERIAL
Course notes written by Paco Gomez.
EVALUATION ACTIVIVTIES OR PRACTICALT ASKS
- For the February examination session:
- Attendance of the 75% of sessions is required.
- Course grade will be assigned based on scores on four homework assignments. There will be both theoretical and practical (programming) assignnments. There will at least 4 assignments and at most 6, depending on time and pace.
- Each assignment will have the same weight over the final grade.
- One of the assignments will consist of a programming project.
- A pass is obtained with 50 points over 100.
- For the other examination sessions students will have to hand over a project (60%) and write an exam (40%).
Syllabus
- Review of some basic concepts on complexity, data structures and algorithms.
- Exact pattern recognition. The brute-fore algorithm. Algorithms based on preprocessing. Preprocessing in linear time. Linear-time exact matching algorithm.
- The Boyer-Moore algorithm. Analysis of their complexity. The Knuth-Morris-Pratt algorithm. Pattern recognition with finite automata. Real-time string matching.
- Preprocessing in the Knuth-Morris-Pratt algorithm. Exact matching with a set of patterns.
- The edit distance between two strings. Dynamic programming calculation of edit distance. String similarity.
- Introduction to sufix trees. The naive algorithm to build sufix trees. Ukkonen’s linear-time suffix tree algorithm. Practical implementation issues.
- Aplications of exact string pattern recognition algorithms. Suffix trees and the exact set matching problem. The substring of more than two strings. Longest common substring of two strings. DNA contamination. Circular string linearization. The edit distance and the problem of melodic similarity.
Bibliography
See the bibliography in the "Course notes".
Course notes
Course notes written by Paco Gomez (PDF)
Authors of material