Nepali Sentence Aligned Speech Corpus

Nepali Sentence Aligned Speech Corpus

0 reviews requests (3)
Catalogue Number: 1424
Stock In Stock

OverView

Dataset Description: 43:04:23 hours | 27.7 GB | 21,481 Audio Segments | 346 speakersThe annotated speech corpus gives wide range of linguistic information especially useful to analyse phonetics. The LDC-IL Nepali Sent...
Please Login to see the price

Dataset Description

Dataset Description:

 43:04:23 hours | 27.7 GB | 21,481 Audio Segments | 346 speakers


The annotated speech corpus gives wide range of linguistic information especially useful to analyse phonetics. The LDC-IL Nepali Sentence Aligned Speech dataset comprises audio files in wav format, accompanied by a corresponding textual layer containing phonetically normalized and orthographically normalized annotations in Devanagari script. This dataset spans a duration of 43:04:23 (hh:mm:ss), consisting of read speech with continuous text, representative sentences, and date formats. The data is derived from 187 female and 159 male native Nepali speakers, encompassing diverse age groups and regions. A comprehensive explanation of the dataset can be found in the Nepali Sentence Aligned Speech Documentation.


For any research-based citations, please use the following citations:


1. Umesh Chamling Rai, Rupesh Rai, Rajesha N., Manasa G., Srikanth D., Stephen Fernandes, Nithin S., Narayan Kumar Choudhary, Shailendra Mohan. 2023 Nepali Sentence Aligned Speech Corpus Central Institute of Indian Languages, Mysore. 978-81-19411-98-6.

2.Rejitha K. S. and Narayan Kumar Choudhary. (ed.). 2023. Compendium of LDC-IL Sentence Aligned Speech Corpus. Central Institute of Indian Languages, Mysore. ISBN: 978-81-19411-34-4.

3.  Choudhary, N. 2021. LDC-IL: The Indian Repository of Resources for Language Technology. Language Resources & Evaluation. Springer, Vol. 55, Issue 1. doi: https://doi.org/10.1007/s10579-020-09523-3

Item specifics

  • Authors Umesh Chamling Rai, Rupesh Rai, Rajesha N., Manasa G., Srikanth D., Stephen Fernandes, Nithin S., Narayan Kumar Choudhary, Shailendra Mohan
  • Corpus Type Sentence Annotated Corpus
  • Catalogue Number 1424
  • ISBN 978-81-19411-98-6
  • Data Source On Field
  • Duration 43:04:23
  • # of Audio Segments 21481
  • Release Date 08-01-2024
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review