Your request cart is empty!
Dataset Description
54:21:12 Hours | 32.5 GB | 304 Speakers | 37,570 Audio Segments | 48 kHz | 16 bit wav.
Assamese is the official language of Assam. Its linguistic presence is widely presented in the state of Assam and some parts of Arunachal Pradesh and Nagaland.According to 2011 census, the Assamese Language is spoken by 15 million speakers.Assamese a widely spoken language does encounter several dialectal variations. The regional dialects can be broadly divided into two parts - the Eastern Group and the Western Group.LDC-IL divided the Assamese speaking areas into these four regions Xiboxagoria, Central Assam, Kamrupi, Goalparia and have collected speech data from each speaker. LDC-IL Assamese Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.
The available Speech Corpus details:
Total Speakers 304 (154 Female and 150 Male)
Domains | Audio Segments | Each Domain Duration |
Contemporary Text (News) | 304 | 17:23:25 |
Creative Text | 304 | 11:44:37 |
Sentence | 7593 | 5:55:29 |
Date Format | 599 | 0:33:59 |
Command and Control Words | 9118 | 4:56:49 |
Person Name | 6081 | 5:38:07 |
Place Name | 3044 | 1:58:33 |
Phonetically Balanced-W4 | 6567 | 3:41:45 |
Form and Function- Word-W5 | 3960 | 2:28:28 |
A detailed explanation of the Assamese Speech Corpus will be available in the Assamese Speech Data Documentation.
For any research-based citations, please use the following citations:
- Ramamoorthy L., Narayan Kumar Choudhary, Atreyee Sharma, Jahnobi Kalita, Samhita Bharadwaj, Plabita Bora, Priyanshee Adhyapak, Mustafiza Tamim, Rajesha N., Manasa G.. 2021. Assamese Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
- Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.
Item specifics
- Authors Ramamoorthy L., Narayan Kumar Choudhary, Atreyee Sharma, Jahnobi Kalita, Samhita Bharadwaj, Plabita Bora, Priyanshee Adhyapak, Mustafiza Tamim, Rajesha N., Manasa G.
- Corpus Type Raw Corpus
- Catalogue Number 1273
- ISBN 978-81-948885-5-0
- Data Source On Field
- Duration 54:21:18
- # of Audio Segments 37,570
- Release Date 15-Jun-2021
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.