LITHUANIAN CONTINUOUS SPEECH CORPUS LRN 0.1: DESIGN AND POTENTIAL APPLICATIONS
DOI:
https://doi.org/10.5755/j01.itc.35.4.11785Abstract
This paper presents design, development and contents of Lithuanian continuous speech corpus LRN 0.1 (Lithuanian Radio News, prototype-version 0.1). The corpus contains 17 hours 23 minutes of records from radio broad-cast news read by 31 speakers. The recorded material is segmented into sentence-length records that are divided into training, development, and evaluation sets. Speech recordings are accompanied by word level transcriptions and auto-matically generated word-to-phone lexicon. The corpus is designed for the constructing and evaluating speaker-inde-pendent continuous speech recognition systems, and may also be used for linguistic research.
Downloads
Published
Issue
Section
License
Copyright terms are indicated in the Republic of Lithuania Law on Copyright and Related Rights, Articles 4-37.