Tamil Sentence Aligned Speech Corpus
OverView
Dataset Description: 74:57:59 hours | 46.4 GB | 48,572 Audio Segments | 433 speakersThe annotated speech corpus gives wide range of linguistic information especially useful to analyse phonetics. The LDC-IL Tamil Sente...Your request cart is empty!
Dataset Description
Dataset Description:
74:57:59 hours | 46.4 GB | 48,572 Audio Segments | 433 speakers
The annotated speech corpus gives wide range of linguistic information especially useful to analyse phonetics. The LDC-IL Tamil Sentence Aligned Speech dataset comprises audio files in wav format, accompanied by a corresponding textual layer containing phonetically normalized and orthographically normalized annotations in Tamil script. This dataset spans a duration of 74:57:59 (hh:mm:ss), consisting of read speech with continuous text, representative sentences, and date formats. The data is derived from 214 female and 219 male native Tamil speakers, encompassing diverse age groups and regions. A comprehensive explanation of the dataset can be found in the Tamil Sentence Aligned Speech Documentation.
For any research-based citations, please use the following citations:
1. Amudha R., Kamaraj S., Rajesha N., Manasa G., Srikanth D., Stephen Fernandes,
Nithin S., Narayan Kumar Choudhary, Shailendra Mohan. 2023 Tamil Sentence Aligned Speech Corpus Central Institute of Indian Languages, Mysore. 978-81-19411-26-9.
2. Rejitha K. S. and Narayan Kumar Choudhary. (ed.). 2023. Compendium of LDC-IL Sentence Aligned Speech Corpus. Central Institute of Indian Languages, Mysore. ISBN: 978-81-19411-34-4.
3. Choudhary, N. 2021. LDC-IL: The Indian Repository of Resources for Language Technology. Language Resources & Evaluation. Springer, Vol. 55, Issue 1. doi: https://doi.org/10.1007/s10579-020-09523-3
Item specifics
- Authors Amudha R., Kamaraj S., Rajesha N., Manasa G., Srikanth D., Stephen Fernandes, Nithin S., Narayan Kumar Choudhary, Shailendra Mohan
- Corpus Type Sentence Annotated Corpus
- Catalogue Number 1428
- ISBN 978-81-19411-26-9
- Data Source On Field
- Duration 74:57:59
- # of Audio Segments 48,572
- Release Date 08-01-2024