Assignment 7: Edit Distance
Deadline: 13th January, 14:00 CET
In this assignment, we will work with a variation of edit distance algorithm, and apply it to find distances between languages. The first part introduces custom weights for the distance algorithm. For the second part we will make use of NorthEuraLex, a database of phonetic transcription of words in many languages that express the same concept (e.g., nouns representing body parts). Intuitively, similar languages use similar sounding words for the same concept. As a result they will have, on average, lower edit distance for the pronunciations of the same concepts.
For a more detailed description, please attend the lab session on Friday, 10th of January.
Starter code
- template: a7.py
- tests: test_a7.py
- data: NorthEuraLex (provided in your repository for convenience)
You can also access it here
Exercises
Exercise 1
Implement the edit_distance()
function that uses the standard
dynamic programming algorithm to compute
a weighted edit distance between two
IPA
strings.
In particular, we assign (potentially) different weights to
- replacement of two vowels
- replacement of two consonants
- replacement of a vowel with a consonant
- deletion or insertion of a vowel
- deletion or insertion of a consonant
Your function should return the weighted minimum edit distance between two IPA strings provided. Follow the description in the template for further details.
Exercise 2
Implement the function language_distance()
in the template,
which calculates a distance between two languages based on the
average distance between the pronunciation of the words
that express the same “concept”.
Follow the description in the template for further details.
Wrapping up
Do not forget to
- indicate your name(s) in the file header
- add the honor code (I/We pledge that this code represents my/our own work)
- commit all your changes
- push it to your GitHub repository