Manipuri Raw Speech Corpus

Manipuri Raw Speech Corpus

0 reviews requests (0)
Catalogue Number: A-020
Stock In Stock


156:28:32   hours of Manipuri Raw Speech Corpus | 100 GB | 620 Speakers | 66,2...
Please Login to see the price

Dataset Description

156:28:32   hours of Manipuri Raw Speech Corpus | 100 GB | 620 Speakers | 66,231 Audio segments | 48 khz | 16 bit wav

Manipuri is the Administrative Language of Manipur. The development of LDC-IL Speech Data for Manipuri lies in capturing all the distinctive characteristics of speeches shared by different regional dialects of Manipur. In order to do so, certain linguistic features identifying regional tones and intonations, phonemic distributions, various pronunciations reflected in both regional and non-regional vocabulary items such as person names and place names etc., have been well housed based on a standard parameter of dataset. Out of the entire dataset each specific subset to be read by the corresponding speaker is randomly generated for a read speech corpus’. In this way, each random set is read by a speaker. Limited Full Sets are made read completely by assured selected speakers in each age group. The data is collected from three regional dialects, namely Imphal, Kakching, and Awang Sekmai respectively through field work.

The age group ranges selected for field work are ‘16 to 20’, ‘21 to 50’, and ‘above 50 years’ respectively. Equal number of male and female data is collected from each age group.

The details of available data and the duration are as follows:                                                                                                  

 Corpus Details : 

·                 Total speakers 620 (310 Female and 310 Male.)

·                  100 Gigabytes of WAV files and Metadata Txt Files

·                  Contemporary Text (News-T1)-530 Audio segments - 59:47:22 hours

·                  Creative Text-T2- 588 Audio segments - 53:59:35 hours  

·                  Sentence-S - 10979  Audio segments - 10:01:41 hours                     

·                  Date-D  866 Audio segments - 1:12:04 hours             

·                  Command and Control Words-W1 13129 Audio segments  - 8:00:02 hours

·                  Person Name-W2 8789 Audio segments  - 7:14:04 hours

·                  Place Name-W2- 4394 Audio segments  - 02:46:29 hours

·                  Most Frequent Word-Part-W3A  13167 Audio segments - 6:48:50 hours

·                  Most Frequent Word-FullSet-W3B -  6992  Audio segments  - 02:48:42 hours 

·                  Phonetically Balanced Vocabulary-W4- 4518 Audio segments  - 02:25:53 hours

·                  Form and Function words –W5- 2279 Audio segments   1:23:50 hours



·                  Manipuri is the Administrative Language of Manipur

·                  Speech Data is collected from three different regional dialects through fieldwork.

·                  Mode Sampling Frequency: 48.0 Kilohertz

·                  156:28:32   hours of Speech Data

Detailed explanation of the Manipuri Speech corpus will be available in the Manipuri Raw Speech Corpus Documentation.

For any research based citations, please use the following citations:

  • Ramamoorthy, L., Narayan Choudhary, Amom Nandaraj Meetei, Yumnam Premila Chanu & Longjam Anand Singh. 2019. Manipuri Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

Item specifics

  • Authors Ramamoorthy L., Narayan Choudhary, Amom Nandaraj Meitei, Yumnam Premila Chanu, Longjam Anand Singh
  • Corpus Type Raw Corpus
  • Catalogue Number 1148
  • Data Source On Field
  • Duration 156:28:32
  • # of Audio Segments 66231
  • Release Date 04/04/2019
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User

Write a review

Please login or register to review