Hindi Raw Speech Corpus

Hindi Raw Speech Corpus

0 reviews requests (9)
Catalogue Number: 1122
Stock In Stock

OverView

118:40:03 Hours | 75.1 GB | 489 Speakers | 73695 Audio Segments | 48 kHz | 16 bit wav.
Please Login to see the price

Dataset Description

118:40:03 Hours | 75.1 GB | 489 Speakers | 73695 Audio Segments | 48 kHz | 16 bit wav.

Hindi is a Major, Indo-Aryan language, a descendant of Sanskrit, which is spoken in the central and northern India, in the states of Bihar, Chhattisgarh, Delhi, Haryana, Himachal Pradesh, Jharkhand, Madhya Pradesh, Rajasthan, Uttarakhand, and Uttar Pradesh. The LDC-IL speech data is collected from the regions of Awadhi belt, Bhojpuri belt and Khariboli belt from both the genders and different age groups. LDC-IL Hindi speech data of 118:40:03 hours. The LDC-IL Hindi Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.


The available Speech Corpus details:

Total  Speakers 489 (234  Female and 255 Male)


Domains

Audio Segments

Each Domain

Duration

Contemporary Text (News)

455

35:32:38

Creative Text

463

27:03:47

Sentence

10,182

9:18:25

Date Format

765

0:58:08

Command and Control Words

12,282

9:37:52

Person Name

8,171

11:16:28

Place Name

4,085

3:14:44

Most Frequent Word - Part

12,320

8:54:39

Most Frequent Word - Full Set

6,994

4:30:14

Phonetically Balanced

14,384

10:10:44

Form and Function - Word

3,594

2:22:35


A detailed explanation of the Hindi Speech Corpus will be available in the Hindi Speech Data Documentation. 

For any research-based citations, please use the following citations: 

  • Ramamoorthy, L., Narayan Choudhary,  Jitendra Kumar Singh, Richa, Anjali Sinha, Dheeraj Kumar Mishra, Arimardan Kumar Tripathi & Satyaendra Kumar Awasthi. 2019. Hindi Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

Item specifics

  • Authors Ramamoorthy L., Narayan Choudhary, Satyaendra Awasthi, Jitendra Kumar Singh, Richa, Anjali Sinha, Dheeraj Kumar Mishra,Arimardan Kumar Tripathi, Aditi Debsharma
  • Corpus Type Raw Corpus
  • Catalogue Number 1122
  • ISBN 978-81-7343-221-7
  • Data Source On Field
  • Duration 118:40:03
  • # of Audio Segments 73695
  • Release Date 04/04/2019
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User

Write a review

Please login or register to review