Your request cart is empty!
Dataset Description
87:14:44 Hours | 56.5GB | 350 Speakers | 48975 Audio Segments | 48 kHz | 16 bit wav.
Nepali belongs to the Indo-Aryan language family. Nepali is the
official language of Nepal and Indian State of West Bengal and Sikkim, and
spoken in the states of Uttaranchal, Assam, Arunachal Pradesh, Manipur, Mizoram
and Bihar, and as well as in other countries like Myanmar, Bhutan etc. It is
written in Devanagari script.
The LDC-IL Nepali speech data is collected from the regions of Darjeeling, Assam and Dehradun, from both the genders and different age group. The LDC-IL Nepali Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.
The available Speech Corpus details:
Total Speakers 350 (187 Female and 163 Male))
Domains |
Audio Segments |
Each Domain Duration |
Contemporary
Text (News) |
343 |
14:33:19 |
Creative
Text |
341 |
19:46:34 |
Sentence |
8,583 |
13:45:34 |
Date
Format |
1,029 |
00:57:20 |
Command
and Control Words |
10,308 |
08:44:19 |
Person
Name |
6,878 |
09:15:04 |
Place
Name |
3,398 |
03:20:06 |
Most
Frequent Word - Part |
10,292 |
08:51:06 |
Most
Frequent Word - Full Set |
2,994 |
03:41:39 |
Phonetically
Balanced |
3,321 |
03:00:08 |
Form
and Function - Word |
1,488 |
01:19:35 |
- Ramamoorthy, L., Narayan Choudhary, Samar Sinha, Jeena Rai, Umesh Chamling Rai & Rupesh Rai. 2019. Nepali Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
- Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.
Item specifics
- Authors Ramamoorthy L., Narayan Choudhary, Umesh Chamling Rai, Rupesh Rai, Samar Sinha, Jeena Rai
- Corpus Type Raw Corpus
- Catalogue Number 1156
- ISBN 978-81-7343-255-2
- Data Source On Field
- Duration 87:14:44
- # of Audio Segments 48975
- Release Date 04-Apr-2019
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.