ELAN Voice Audio Detection

Scope

ELAN is a data format commonly used in providing detailed transcriptions of audio files.

During training, some ASR models struggle with transcribing longer audio files (i.e. longer than 20 seconds). This is particularly important in linguistics field work, where hours of raw data might be captured during a series of interviews. This raw data generally needs to be cleaned and segmented before training.

Persephia was contracted to build software which would automatically detect speech from raw audio, and then break this down into smaller clips that could be labeled/used in training machine learning models.

Persephia consequently designed a software library which makes use of state of the art voice activity detection models to analyze audio files, and generate the appropriate ELAN modifications.

Outcomes

The library was built and distributed, and is able to be used in training pipelines for linguistics research.