LITHUANIAN CONTINUOUS SPEECH CORPUS LRN 0.1: DESIGN AND POTENTIAL APPLICATIONS

Authors

  • Sigita Laurinčiukaitė Institute of Mathematics and Informatics
  • Darius Šilingas Vytautas Magnus University
  • Mantas Skripkauskas Institute of Mathematics and Informatics
  • Laimutis Telksnys Institute of Mathematics and Informatics

DOI:

https://doi.org/10.5755/j01.itc.35.4.11785

Abstract

This paper presents design, development and contents of Lithuanian continuous speech corpus LRN 0.1 (Lithuanian Radio News, prototype-version 0.1). The corpus contains 17 hours 23 minutes of records from radio broad-cast news read by 31 speakers. The recorded material is segmented into sentence-length records that are divided into training, development, and evaluation sets. Speech recordings are accompanied by word level transcriptions and auto-matically generated word-to-phone lexicon. The corpus is designed for the constructing and evaluating speaker-inde-pendent continuous speech recognition systems, and may also be used for linguistic research.

Downloads

Published

2006-12-22

Issue

Section

Articles