LITHUANIAN CONTINUOUS SPEECH CORPUS LRN 1: AN IMPROVEMENT

Authors

  • Sigita Laurinčiukaitė Institute of Mathematics and Informatics
  • Mark Filipovič Information Technology and Communication Department under the Ministry of the Interior of the Republic of Lithuania
  • Laimutis Telksnys Institute of Mathematics and Informatics

Abstract

This paper presents the development of Lithuanian continuous speech corpus LRN 1 (Lithuanian Radio News, version 1). The corpus was developed from speech corpus LRN 0.1 by increasing the duration of speech corpus (it lasts 20 hours 50 minutes). The major improvement of speech corpus LRN 1 was a development of time-aligned word level annotations of speech signals. Time-aligned word level annotations of speech signals were obtained after a two-stage process: automatic realignment of acoustic models of phonemes and subsequent manual correction of annotations. The improvement of the corpus is useful for constructing and evaluating speaker-independent continuous speech recognition systems and for linguistic research.

Downloads

Published

2009-10-14

Issue

Section

Articles