Assignment 7: Edit Distance

Deadline: 13th January, 14:00 CET

In this assignment, we will work with a variation of edit distance algorithm, and apply it to find distances between languages. The first part introduces custom weights for the distance algorithm. For the second part we will make use of NorthEuraLex, a database of phonetic transcription of words in many languages that express the same concept (e.g., nouns representing body parts). Intuitively, similar languages use similar sounding words for the same concept. As a result they will have, on average, lower edit distance for the pronunciations of the same concepts.

For a more detailed description, please attend the lab session on Friday, 10th of January.

Starter code

You can also access it here

Exercises

Exercise 1

Implement the edit_distance() function that uses the standard dynamic programming algorithm to compute a weighted edit distance between two IPA strings. In particular, we assign (potentially) different weights to

Your function should return the weighted minimum edit distance between two IPA strings provided. Follow the description in the template for further details.

Exercise 2

Implement the function language_distance() in the template, which calculates a distance between two languages based on the average distance between the pronunciation of the words that express the same “concept”. Follow the description in the template for further details.

Wrapping up

Do not forget to