phoible.py

Phoible is a database of phonetic inventories for about 1600 languages across the world.

See Phoible here: Phoible

Module Documentation

class phoible.Phoneme(PhonemeID, GlyphID, Phoneme, Class, CombinedClass, NumOfCombinedGlyphs)

This represents a phoneme, and is used when reading the language file.

phoible.comparephonemes(l1, l2)

Given the phoible-phonemes file, and two langnames, this will print the common and unique phonemes between these languages.

Parameters:
  • fname – phoible-phonemes.tsv file
  • l1 – langcode
  • l2 – langcode
Returns:

None

phoible.getF1(lang1, lang2)

Get the F1 score between two sets of phonemes. This ranges from 0 to 1. If lang1 and lang2 are identical, the F1 is 1. lang1 and lang2 are phoneme sets, previously loaded by loadLangs

Parameters:
  • lang1 – a set of phonemes
  • lang2 – a set of phonemes
Returns:

F1 score

phoible.getOV(bridge, target, eng)

This is another measure of transliterability based on overlap and having a richer inventory.

Parameters:
  • lang1 – a set of Phonemes
  • lang2 – a set of Phonemes
  • eng – the set of Phonemes for English.
Returns:

a score, larger is better.

phoible.getclosest(query, langs, only_hr=False, topk=100000)
Parameters:
  • query – a langcode
  • langs – the result coming from loadLangs
  • only_hr – include only high resource languages?
Returns:

a sorted list of languages sorted by similarity to the query. Format is [(highest score, langcode), (next highest, langcode), ...]

phoible.getdistinctivefeatures(lang1, lang2, phonemeMap)

Contrast this with getF1.

I can’t get this to work correctly.

Parameters:
  • lang1 – a set of Phonemes
  • lang2 – a set of Phonemes
Returns:

the Distinctive Features score for these languages.

phoible.loadlangdata()

This loads the file called phoible-aggregated.tsv. This has language data on each language. :param fname: this is the file typically called gold-standard/phoible-aggregated.tsv :return: a map from {langcode : {lang features}, ...}

phoible.loadlangs()

This takes the filename of the phoible data and reads it into useful structures.

Parameters:fname – the name of the phoible file, typically gold-standard/phoible-phonemes.tsv
Returns:a map of {langcode : set(phonemes), ...}, a map of {langcode : langname, ...}
phoible.loadtrumps()
Returns:a map from {lang : [trump1, trump2...], etc. }
phoible.readfeaturefile()

This loads the distinctive features file in phoible, typically called raw-data/FEATURES/phoible-segments-features.tsv

Returns:a map of {phoneme : {df : val, df : val, ...}, ...}