Chhattisgarhi Raw Speech Corpus
OverView
Dataset Description: 138:09:27 Hours | 88.9 GB | 140 Speakers | 359 Audio Segments | 48 kHz | 16 bit wav LDC-IL has taken a positive step in its approach towards the mother tongues spoken in India, which is an i...Your request cart is empty!
Dataset Description
Dataset Description:
138:09:27 Hours | 88.9 GB | 140 Speakers | 359 Audio Segments | 48 kHz | 16 bit wav
LDC-IL has taken a positive step in its approach towards the mother tongues spoken in India, which is an indication of greater efforts to support and promote linguistic variety in the nation. Collection of Chhattisgarhi speech data is a major effort in this approach. This step towards developing language technology for Indian mother tongues will contribute to the overall enrichment and empowerment of mother tongues.
The Chhattisgarhi raw speech corpus is made up of recordings of native Chhattisgarhi speakers from various parts of the state of Chhattisgarh, and it represents a wide range of Chhattisgarhi varieties as they are spoken in various locations by diverse speakers. Each speaker from various age groups recites prompt text extracts of literary and news texts. Along with this, Spontaneous Speech has also been collected.
A detailed explanation of the Chhattisgarhi Raw Speech Corpus will be available in the Chhattisgarhi Raw Speech Data Documentation.
For any research-based citations, please use the following citations:
1. Satyaendra Kumar Awasthi, Ankita Tiwari, Narayan Kumar Choudhary. 2023. Chhattisgarhi Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
2. Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.
Item specifics
- Authors Satyaendra Kumar Awasthi, Ankita Tiwari, Shantanu Kumar, Rupesh Pandey, Saurabh Varik, Rajesha N., Manasa G., Srikanth D., Nithin S., Narayan Kumar Choudhary, Shailendra Mohan
- Corpus Type Raw Speech Corpus
- Catalogue Number 1436
- ISBN 978-81-19411-78-8
- Data Source On Field
- Duration 138:09:27
- # of Audio Segments 359
- Release Date 8-Jan-24
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.