phoible.py¶
Phoible is a database of phonetic inventories for about 1600 languages across the world.
See Phoible here: Phoible
Module Documentation¶
-
class
phoible.Phoneme(PhonemeID, GlyphID, Phoneme, Class, CombinedClass, NumOfCombinedGlyphs)¶ This represents a phoneme, and is used when reading the language file.
-
phoible.comparephonemes(l1, l2)¶ Given the phoible-phonemes file, and two langnames, this will print the common and unique phonemes between these languages.
Parameters: - fname – phoible-phonemes.tsv file
- l1 – langcode
- l2 – langcode
Returns: None
-
phoible.getF1(lang1, lang2)¶ Get the F1 score between two sets of phonemes. This ranges from 0 to 1. If lang1 and lang2 are identical, the F1 is 1. lang1 and lang2 are phoneme sets, previously loaded by loadLangs
Parameters: - lang1 – a set of phonemes
- lang2 – a set of phonemes
Returns: F1 score
-
phoible.getOV(bridge, target, eng)¶ This is another measure of transliterability based on overlap and having a richer inventory.
Parameters: - lang1 – a set of Phonemes
- lang2 – a set of Phonemes
- eng – the set of Phonemes for English.
Returns: a score, larger is better.
-
phoible.getclosest(query, langs, only_hr=False, topk=100000)¶ Parameters: - query – a langcode
- langs – the result coming from loadLangs
- only_hr – include only high resource languages?
Returns: a sorted list of languages sorted by similarity to the query. Format is [(highest score, langcode), (next highest, langcode), ...]
-
phoible.getdistinctivefeatures(lang1, lang2, phonemeMap)¶ Contrast this with getF1.
I can’t get this to work correctly.
Parameters: - lang1 – a set of Phonemes
- lang2 – a set of Phonemes
Returns: the Distinctive Features score for these languages.
-
phoible.loadlangdata()¶ This loads the file called phoible-aggregated.tsv. This has language data on each language. :param fname: this is the file typically called gold-standard/phoible-aggregated.tsv :return: a map from {langcode : {lang features}, ...}
-
phoible.loadlangs()¶ This takes the filename of the phoible data and reads it into useful structures.
Parameters: fname – the name of the phoible file, typically gold-standard/phoible-phonemes.tsv Returns: a map of {langcode : set(phonemes), ...}, a map of {langcode : langname, ...}
-
phoible.loadtrumps()¶ Returns: a map from {lang : [trump1, trump2...], etc. }
-
phoible.readfeaturefile()¶ This loads the distinctive features file in phoible, typically called raw-data/FEATURES/phoible-segments-features.tsv
Returns: a map of {phoneme : {df : val, df : val, ...}, ...}