utils.py¶
Utils for the langsim project.
Module Documentation¶
-
class
utils.Language¶ Language class. Each language has:
- ISO639-1 code (e.g. tr)
- ISO639-3 code (e.g. cmn)
- wikipedia code (e.g. fr)
- wikipedia name (e.g. Waray-Waray)
- phoible data
- wals data
- script data
- character frequency data
- wikipedia file size
-
utils.cosine(a, b)¶ Cosine distance. This avoids having to install scipy.
Parameters: - a – a numpy vector
- b – a numpy vector
Returns: the cosine distance between these vectors.
-
utils.getlangmap()¶ This produces a map from ISO 639-3 codes to ISO 639-1 codes. Sigh.
Returns:
-
utils.getlangmap2to3()¶ This produces a map from ISO 639-3 codes to ISO 639-1 codes. Sigh.
Returns:
-
utils.getmissingmap()¶ Get the map of languages missing from Phoible
-
utils.readFile(fname, sep='\\s+')¶ Given a filename, this reads the file. This ignores any line that starts with a # and splits each line on sep. This is a very common use case.
Param: fname name of file. Returns: a list of lists, each list represents a line, and contains elements separated by sep.