Assignment 7: Edit Distance

Deadline: 13th January, 14:00 CET

In this assignment, we will work with a variation of edit distance algorithm, and apply it to find distances between languages. The first part introduces custom weights for the distance algorithm. For the second part we will make use of NorthEuraLex, a database of phonetic transcription of words in many languages that express the same concept (e.g., nouns representing body parts). Intuitively, similar languages use similar sounding words for the same concept. As a result they will have, on average, lower edit distance for the pronunciations of the same concepts.

For a more detailed description, please attend the lab session on Friday, 10th of January.

Starter code

template: a7.py
tests: test_a7.py
data: NorthEuraLex (provided in your repository for convenience)

You can also access it here

Exercises

Exercise 1

Implement the edit_distance() function that uses the standard dynamic programming algorithm to compute a weighted edit distance between two IPA strings. In particular, we assign (potentially) different weights to

replacement of two vowels
replacement of two consonants
replacement of a vowel with a consonant
deletion or insertion of a vowel
deletion or insertion of a consonant

Your function should return the weighted minimum edit distance between two IPA strings provided. Follow the description in the template for further details.

Exercise 2

Implement the function language_distance() in the template, which calculates a distance between two languages based on the average distance between the pronunciation of the words that express the same “concept”. Follow the description in the template for further details.

Wrapping up

Do not forget to

indicate your name(s) in the file header
add the honor code (I/We pledge that this code represents my/our own work)
commit all your changes
push it to your GitHub repository