Manipuri Raw Speech Corpus

0 reviews requests (13)

Owner Central Institute of Indian Languages

Catalogue Number: 1148

Stock In Stock

OverView

156:28:32 hours | 100 GB | 620 Speakers | 66,231 Audio segments | 48 khz | ...

Please Login to see the price

Tags: Manipuri Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

Manipuri is the Administrative Language of Manipur. The development of LDC-IL Speech Data for Manipuri lies in capturing all the distinctive characteristics of speeches shared by different regional dialects of Manipur. In order to do so, certain linguistic features identifying regional tones and intonations, phonemic distributions, various pronunciations reflected in both regional and non-regional vocabulary items such as person names and place names etc., have been well housed based on a standard parameter of the dataset. Out of the entire dataset, each specific subset to be read by the corresponding speaker is randomly generated for ‘a read speech corpus’. In this way, each random set is read by a speaker. Limited Full Sets are made read completely by assured selected speakers in each age group. The data is collected from three regional dialects, namely Imphal, Kakching, and Awang Sekmai respectively through fieldwork.

The age group ranges selected for fieldwork are ‘16 to 20’, ‘21 to 50’, and ‘above 50 years’ respectively. Equal number of male and female data is collected from each age group.

The available Speech Corpus details :

Total Speakers620(310 Female and 310 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	530	59:47:22
Creative Text	588	53:59:03
Sentence	10,979	10:01:41
Date Format	866	01:12:04
Command and Control Words	13,129	08:00:02
Person Name	8,789	07:14:04
Place Name	4,394	02:46:29
Most Frequent Word - Part	13,167	06:48:50
Most Frequent Word - Full Set	6,992	02:48:42
Phonetically Balanced	4,518	02:25:53
Form and Function - Word	2,279	01:23:50

A detailed explanation of the Manipuri Speech corpus will be available in the Manipuri Raw Speech Corpus Documentation.

For any research based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary, Amom Nandaraj Meetei, Yumnam Premila Chanu & Longjam Anand Singh. 2019. Manipuri Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Choudhary, Amom Nandaraj Meitei, Yumnam Premila Chanu, Longjam Anand Singh
Corpus Type Raw Corpus
Catalogue Number 1148
ISBN 978-81-7343-247-7
Data Source On Field
Duration 156:28:32
# of Audio Segments 66231
Release Date 04-Apr-2019
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Manipuri Raw Speech Corpus

OverView

Manipuri Raw Speech Corpus

Dataset Description

Item specifics

Write a review