Nepali Raw Speech Corpus

0 reviews requests (8)

Owner Central Institute of Indian Languages

Catalogue Number: 1156

Stock In Stock

OverView

87:14:44 Hours | 56.5GB | 350 Speakers | 48975 Audio Segm...

Please Login to see the price

Tags: Nepali Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

Nepali belongs to the Indo-Aryan language family. Nepali is the official language of Nepal and Indian State of West Bengal and Sikkim, and spoken in the states of Uttaranchal, Assam, Arunachal Pradesh, Manipur, Mizoram and Bihar, and as well as in other countries like Myanmar, Bhutan etc. It is written in Devanagari script.

The LDC-IL Nepali speech data is collected from the regions of Darjeeling, Assam and Dehradun, from both the genders and different age group. The LDC-IL Nepali Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.

The available Speech Corpus details:

Total Speakers 350 (187 Female and 163 Male))

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	343	14:33:19
Creative Text	341	19:46:34
Sentence	8,583	13:45:34
Date Format	1,029	00:57:20
Command and Control Words	10,308	08:44:19
Person Name	6,878	09:15:04
Place Name	3,398	03:20:06
Most Frequent Word - Part	10,292	08:51:06
Most Frequent Word - Full Set	2,994	03:41:39
Phonetically Balanced	3,321	03:00:08
Form and Function - Word	1,488	01:19:35

A detailed explanation of the Nepali Speech Corpus will be available in the Nepali Speech Data Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary, Samar Sinha, Jeena Rai, Umesh Chamling Rai & Rupesh Rai. 2019. Nepali Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Choudhary, Umesh Chamling Rai, Rupesh Rai, Samar Sinha, Jeena Rai
Corpus Type Raw Corpus
Catalogue Number 1156
ISBN 978-81-7343-255-2
Data Source On Field
Duration 87:14:44
# of Audio Segments 48975
Release Date 04-Apr-2019
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Nepali Raw Speech Corpus

OverView

Nepali Raw Speech Corpus

Dataset Description

Item specifics

Write a review