Your request cart is empty!
Dataset Description
156:28:32
hours | 100 GB | 620 Speakers | 66,231
Audio segments | 48 khz | 16 bit wav
Manipuri is the Administrative Language of Manipur. The development of LDC-IL Speech Data for Manipuri lies in capturing all the distinctive characteristics of speeches shared by different regional dialects of Manipur. In order to do so, certain linguistic features identifying regional tones and intonations, phonemic distributions, various pronunciations reflected in both regional and non-regional vocabulary items such as person names and place names etc., have been well housed based on a standard parameter of the dataset. Out of the entire dataset, each specific subset to be read by the corresponding speaker is randomly generated for ‘a read speech corpus’. In this way, each random set is read by a speaker. Limited Full Sets are made read completely by assured selected speakers in each age group. The data is collected from three regional dialects, namely Imphal, Kakching, and Awang Sekmai respectively through fieldwork.
The age group ranges
selected for fieldwork are ‘16 to 20’, ‘21 to 50’, and ‘above 50 years’
respectively. Equal number of male and female data is collected from each age
group.
The available Speech Corpus details :
Total Speakers620(310 Female and 310 Male)
Domains |
Audio Segments |
Each Domain Duration |
Contemporary
Text (News) |
530 |
59:47:22 |
Creative
Text |
588 |
53:59:03 |
Sentence |
10,979 |
10:01:41 |
Date
Format |
866 |
01:12:04 |
Command
and Control Words |
13,129 |
08:00:02 |
Person
Name |
8,789 |
07:14:04 |
Place
Name |
4,394 |
02:46:29 |
Most
Frequent Word - Part |
13,167 |
06:48:50 |
Most
Frequent Word - Full Set |
6,992 |
02:48:42 |
Phonetically
Balanced |
4,518 |
02:25:53 |
Form
and Function - Word |
2,279 |
01:23:50 |
- Ramamoorthy, L., Narayan Choudhary, Amom Nandaraj Meetei, Yumnam Premila Chanu & Longjam Anand Singh. 2019. Manipuri Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
- Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.
Item specifics
- Authors Ramamoorthy L., Narayan Choudhary, Amom Nandaraj Meitei, Yumnam Premila Chanu, Longjam Anand Singh
- Corpus Type Raw Corpus
- Catalogue Number 1148
- ISBN 978-81-7343-247-7
- Data Source On Field
- Duration 156:28:32
- # of Audio Segments 66231
- Release Date 04-Apr-2019
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.