Search
Indian English-Kannada Sentence Aligned Speech Corpus
Dataset Description:11:17:40 hours | 7.27 GB | 6,166 Audio Segments | 53 speakersThe annotated speech corpus gives wide range of linguistic information especially useful to analyse phonetics. The LDC-IL Indian English-Kannada variant Sentence Aligned Speech dataset comprises audio files in wav format, accompanied by a corresponding textual layer containing Roman script. This dataset spans a duration of 11:17:40 (hh:mm:ss), consisting of read speech with continuous text, representative sentences, and date formats. The data is derived from 26 female and 27 male native Kannada speakers, encompassing diverse age groups and regions. A comprehensive explanation of the dataset can be found in the Indian English-Kannada variant Sentence Aligned Speech Documentation.For any research-based citations, please use the following citations:1. Rejitha K. S., Vijayalaxmi F. Patil, Rajesha N., Manasa G., Srikanth D., Nithin S.,Narayan Kumar Choudhary, Shailendra Mohan. 2023 Indian English-Kannada variant Sentence Aligned Speech Corpus Central Institute of Indian Languages, Mysore. 978-81-19411-35-1.2. Rejitha K. S. and Narayan Kumar Choudhary. (ed.). 2023. Compendium of LDC-IL Sentence Aligned Speech Corpus. Central Institute of Indian Languages, Mysore. ISBN: 978-81-19411-34-4.3. Choudhary, N. 2021. LDC-IL: The Indian Repository of Resources for Language Technology. Language Resources & Evaluation. Springer, Vol. 55, Issue 1. doi: https://doi.org/10.1007/s10579-020-09523-3..