Hindi Sentence Aligned Speech Corpus

0 reviews requests (3)

Owner Central Institute of Indian Languages

Catalogue Number: 1427

Stock In Stock

OverView

Dataset Description: 72:34:52 hours | 45.9 GB | 42,275 Audio Segments | 473 speakersThe annotated speech corpus gives wide range of linguistic information especially useful to analyse phonetics. The LDC-IL Hindi Sente...

Please Login to see the price

Tags: Hindi Sentence Aligned Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

Dataset Description:

72:34:52 hours | 45.9 GB | 42,275 Audio Segments | 473 speakers

The annotated speech corpus gives wide range of linguistic information especially useful to analyse phonetics. The LDC-IL Hindi Sentence Aligned Speech dataset comprises audio files in wav format, accompanied by a corresponding textual layer containing phonetically normalized and orthographically normalized annotations in Devanagari script. This dataset spans a duration of 72:34:52 (hh:mm:ss), consisting of read speech with continuous text, representative sentences, and date formats. The data is derived from 225 female and 248 male native Hindi speakers, encompassing diverse age groups and regions. A comprehensive explanation of dataset can be found in the Hindi Sentence Aligned Speech Documentation.

For any research-based citations, please use the following citations:

1. Satyaendra Kumar Awasthi, Ankita Tiwari, Rajesha N., Manasa G., Srikanth D., Stephen Fernandes, Nithin S., Narayan Kumar Choudhary, Shailendra Mohan. 2023. Hindi Sentence Aligned Speech Corpus Central Institute of Indian Languages, Mysore. 978-81-19411-28-3.

2. Rejitha K. S. and Narayan Kumar Choudhary. (ed.). 2023. Compendium of LDC-IL Sentence Aligned Speech Corpus. Central Institute of Indian Languages, Mysore. ISBN: 978-81-19411-34-4.

3. Choudhary, N. 2021. LDC-IL: The Indian Repository of Resources for Language Technology. Language Resources & Evaluation. Springer, Vol. 55, Issue 1. doi: https://doi.org/10.1007/s10579-020-09523-3

Item specifics

Authors Satyaendra Kumar Awasthi, Ankita Tiwari, Rajesha N., Manasa G., Srikanth D., Stephen Fernandes, Nithin S., Narayan Kumar Choudhary, Shailendra Mohan
Corpus Type Sentence Annotated Corpus
Catalogue Number 1427
ISBN 978-81-19411-28-3
Data Source On Field
Duration 72:34:52
# of Audio Segments 42275
Release Date 08-01-2024
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Hindi Sentence Aligned Speech Corpus

OverView

Hindi Sentence Aligned Speech Corpus

Maithili Raw Speech Corpus

A Gold Standard Tamil Raw Text Corpus

Urdu Sentence Aligned Speech Corpus

Konkani Raw Speech Corpus

Dataset Description

Item specifics

Write a review