LITHUANIAN CONTINUOUS SPEECH CORPUS LRN 1: AN IMPROVEMENT
Abstract
This paper presents the development of Lithuanian continuous speech corpus LRN 1 (Lithuanian Radio News, version 1). The corpus was developed from speech corpus LRN 0.1 by increasing the duration of speech corpus (it lasts 20 hours 50 minutes). The major improvement of speech corpus LRN 1 was a development of time-aligned word level annotations of speech signals. Time-aligned word level annotations of speech signals were obtained after a two-stage process: automatic realignment of acoustic models of phonemes and subsequent manual correction of annotations. The improvement of the corpus is useful for constructing and evaluating speaker-independent continuous speech recognition systems and for linguistic research.
Downloads
Published
Issue
Section
License
Copyright terms are indicated in the Republic of Lithuania Law on Copyright and Related Rights, Articles 4-37.