utils.py

Utils for the langsim project.

Module Documentation

class utils.Language

Language class. Each language has:

  • ISO639-1 code (e.g. tr)
  • ISO639-3 code (e.g. cmn)
  • wikipedia code (e.g. fr)
  • wikipedia name (e.g. Waray-Waray)
  • phoible data
  • wals data
  • script data
  • character frequency data
  • wikipedia file size
utils.cosine(a, b)

Cosine distance. This avoids having to install scipy.

Parameters:
  • a – a numpy vector
  • b – a numpy vector
Returns:

the cosine distance between these vectors.

utils.getlangmap()

This produces a map from ISO 639-3 codes to ISO 639-1 codes. Sigh.

Returns:
utils.getlangmap2to3()

This produces a map from ISO 639-3 codes to ISO 639-1 codes. Sigh.

Returns:
utils.getmissingmap()

Get the map of languages missing from Phoible

utils.readFile(fname, sep='\\s+')

Given a filename, this reads the file. This ignores any line that starts with a # and splits each line on sep. This is a very common use case.

Param:fname name of file.
Returns:a list of lists, each list represents a line, and contains elements separated by sep.