30:18:16 hours|19.5GB |21,716 Audio Segments |304 speakers
The annotated speech corpus gives wide range of linguistic information especially useful to analyse phonetics. The LDC-IL Assamese Sentence Aligned Speech dataset comprises audio files in wav format, accompanied by a corresponding textual layer containing phonetically normalized and orthographically normalized annotations in Assamese script. This dataset spans a duration of 30:18:16 (hh:mm:ss), consisting of read speech with continuous text, representative sentences, and date formats. The data is derived from 154 female and 150 male native Assamese speakers, encompassing diverse age groups and regions. A comprehensive explanation of the dataset can be found in the Assamese Sentence Aligned Speech Documentation.
For any research-based citations, please use the following citations:
1. Syeda Mustafiza Tamim, Priyanshe Adhyapak, Rajesha N., Manasa G., Srikanth D.,Stephen Fernandes, Nithin S., Narayan Kumar Choudhary, Shailendra Mohan. 2023 Assamese Sentence Aligned Speech Corpus Central Institute of Indian Languages, Mysore. 978-81-19411-53-5.
2. Rejitha K. S. and Narayan Kumar Choudhary. (ed.). 2023. Compendium of LDC-IL Sentence Aligned Speech Corpus. Central Institute of Indian Languages, Mysore. ISBN: 978-81-19411-34-4.
3. Choudhary, N. 2021. LDC-IL: The Indian Repository of Resources for Language Technology. Language Resources & Evaluation. Springer, Vol. 55, Issue 1. doi: https://doi.org/10.1007/s10579-020-09523-3
- Authors Syeda Mustafiza Tamim, Priyanshe Adhyapak, Rajesha N., Manasa G., Srikanth D., Stephen Fernandes, Nithin S., Narayan Kumar Choudhary, Shailendra Mohan
- Corpus Type Sentence Annotated Corpus
- Catalogue Number 1430
- ISBN 978-81-19411-53-5
- Data Source On Field
- Duration 30:18:16
- # of Audio Segments 21,716
- Release Date 19-11-2023
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.