Punjabi Raw Speech Corpus

0 reviews requests (14)

Owner Central Institute of Indian Languages

Catalogue Number: 1165

Stock In Stock

OverView

Please Login to see the price

Tags: Punjabi Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

Punjabi is one of the Indo-Aryan Language. Punjabi is a tonal language it has three tones, high-falling, low-rising, and level (neutral). As we know Punjabi is not spoken only in India it is also a language of Pakistan called Shahmukhi Punjabi. Here we are talking about only Indian Gurmukhi Punjabi. The Punjabi language has four different dialects, spoken in the different sub-regions of Punjab. The LDC-IL Punjabi Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats. Each speaker recorded these datasets which are randomly selected from a master dataset. LDC-IL collected speech data from Malwa, Doab and Puadh regions.

The available Speech Corpus details:

Total Speakers 467(234 Female and 233 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	448	27:07:41
Creative Text	446	19:29:15
Sentence	11,168	08:58:33
Date Format	887	00:27:53
Command and Control Words	13,274	07:49:16
Person Name	8,949	10:28:40
Place Name	4,473	03:17:02
Most Frequent Word - Part	8,889	05:21:56
Most Frequent Word - Full Set	3,988	02:52:44
Phonetically Balanced	13,939	08:56:04
Form and Function - Word	9,769	06:24:07

A detailed explanation of the Punjabi Speech Corpus will be available in the Punjabi Speech Data Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary, Poonam Dhillon & Sarbjeet Kaur. 2019. Punjabi Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Choudhary, Poonam Dhillon, Sarbjeet Kaur
Corpus Type Raw Corpus
Catalogue Number 1165
ISBN 978-81-7343-264-4
Data Source On Field
Duration 101:09:28
# of Audio Segments 76230
Release Date 04-Apr-2019
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Punjabi Raw Speech Corpus

OverView

Punjabi Raw Speech Corpus

Dataset Description

Item specifics

Write a review