Punjabi Raw Speech Corpus
OverView
101:09:28 Hours | 65.5 GB | 467 Speakers | 76,230 Audio Segments | 48 kHz | 16 bit wav.Your request cart is empty!
Dataset Description
101:09:28
Hours | 65.5 GB | 467 Speakers | 76,230 Audio Segments | 48 kHz | 16 bit wav.
Punjabi is one of the Indo-Aryan Language. Punjabi is a tonal language it has three tones, high-falling, low-rising, and level (neutral). As we know Punjabi is not spoken only in India it is also a language of Pakistan called Shahmukhi Punjabi. Here we are talking about only Indian Gurmukhi Punjabi. The Punjabi language has four different dialects, spoken in the different sub-regions of Punjab. The LDC-IL Punjabi Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats. Each speaker recorded these datasets which are randomly selected from a master dataset. LDC-IL collected speech data from Malwa, Doab and Puadh regions.
The available Speech Corpus details:
Total Speakers 467(234 Female and 233 Male)
Domains |
Audio Segments |
Each Domain Duration |
Contemporary
Text (News) |
448 |
27:07:41 |
Creative
Text |
446 |
19:29:15 |
Sentence |
11,168 |
08:58:33 |
Date
Format |
887 |
00:27:53 |
Command
and Control Words |
13,274 |
07:49:16 |
Person
Name |
8,949 |
10:28:40 |
Place
Name |
4,473 |
03:17:02 |
Most
Frequent Word - Part |
8,889 |
05:21:56 |
Most
Frequent Word - Full Set |
3,988 |
02:52:44 |
Phonetically
Balanced |
13,939 |
08:56:04 |
Form
and Function - Word |
9,769 |
06:24:07 |
A detailed explanation of the Punjabi Speech Corpus will be available in the Punjabi Speech Data Documentation.
For any research-based citations, please use the following citations:
- Ramamoorthy, L., Narayan Choudhary, Poonam Dhillon & Sarbjeet Kaur. 2019. Punjabi Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
- Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.
Item specifics
- Authors Ramamoorthy L., Narayan Choudhary, Poonam Dhillon, Sarbjeet Kaur
- Corpus Type Raw Corpus
- Catalogue Number 1165
- ISBN 978-81-7343-264-4
- Data Source On Field
- Duration 101:09:28
- # of Audio Segments 76230
- Release Date 04-Apr-2019
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.