Hindi Raw Speech Corpus

0 reviews requests (37)

Owner Central Institute of Indian Languages

Catalogue Number: 1122

Stock In Stock

OverView

121:00:06 Hours | 76.6 GB | 488 Spe...

Please Login to see the price

Tags: Hindi Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

Hindi is a Major, Indo-Aryan language, a descendant of Sanskrit, which is spoken in the central and northern India, in the states of Bihar, Chhattisgarh, Delhi, Haryana, Himachal Pradesh, Jharkhand, Madhya Pradesh, Rajasthan, Uttarakhand, and Uttar Pradesh. The LDC-IL speech data is collected from the regions of Awadhi belt, Bhojpuri belt, Magahi belt and Khariboli belt from both the genders and different age groups. LDC-IL Hindi speech data has 121:00:06 hours. The LDC-IL Hindi Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.

The available Speech Corpus details:

Total Speakers 488 (234 Female and 254 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	457	37:22:29
Creative Text	463	29:24:08
Sentence	10173	8:41:17
Date Format	764	0:46:56
Command and Control Words	12284	8:34:51
Person Name	8171	9:55:25
Place Name	4085	3:14:44
Most Frequent Word - Part	12315	8:09:10
Most Frequent Word - Full Set	6994	4:30:14
Phonetically Balanced	11986	8:23:43
Form and Function - Word	2994	1:57:09

A detailed explanation of the Hindi Speech Corpus will be available in the Hindi Speech Data Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary, Jitendra Kumar Singh, Richa, Anjali Sinha, Dheeraj Kumar Mishra, Arimardan Kumar Tripathi & Satyaendra Kumar Awasthi. 2019. Hindi Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Choudhary, Satyaendra Awasthi, Jitendra Kumar Singh, Richa, Anjali Sinha, Dheeraj Kumar Mishra,Arimardan Kumar Tripathi, Aditi Debsharma
Corpus Type Raw Corpus
Catalogue Number 1122
ISBN 978-81-7343-221-7
Data Source On Field
Duration 121:00:06
# of Audio Segments 70686
Release Date 04-Apr-2019
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Hindi Raw Speech Corpus

OverView

Hindi Raw Speech Corpus

Dataset Description

Item specifics

Write a review