Urdu Raw Speech Corpus

0 reviews requests (11)

Owner Central Institute of Indian Languages

Catalogue Number: 1177

Stock In Stock

OverView

Please Login to see the price

Tags: Urdu Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

Urdu is one of the Modern Indo-Aryan languages of India. It evolved from Shaurseni Apabhramsha. It uses Persio-Arabic script. The language in a region is influenced by other languages of the region, mother tongue of the speaker, etc. The reading speed, loudness, frequency etc. also differ depending on certain factors like age, gender etc. Linguistic data consortium collected the speech corpus through fieldwork. This read data is collected from various age groups of male and female native speakers. This data includes Texts, Sentences, Date Formats, and different wordlists.

The available Speech Corpus details:

Total Speakers - 499 (252 Female and 247 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	431	25:35:02
Creative Text	433	19:40:11
Sentence	10,646	8:00:38
Date Format	846	0:43:37
Command and Control Words	13,580	9:21:01
Person Name	6,577	2:55:41
Place Name	4,273	1:09:17
Most Frequent Word - Part	12,802	7:46:28
Most Frequent Word - Full Set	18,927	11:38:30
Phonetically Balanced Vocabulary	13,646	8:13:20
Form and Function Word	6,547	4:14:36

A detailed explanation of the Urdu Speech Corpus will be available in the Urdu Speech Data Documentation.

For any research based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary, Mansoor Khan, Shahnawaz Alam, Bi Bi Mariyam & Rushda Idris Khan. 2019. Urdu Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Choudhary, Mansoor Khan, Shahnawaz Alam, Bi Bi Mariyam, Rushda Idris Khan,
Corpus Type Raw Corpus
Catalogue Number 1177
ISBN 978-81-7343-276-7
Data Source On Field
Duration 99:18:21
# of Audio Segments 88708
Release Date 04-Apr-2019
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Urdu Raw Speech Corpus

OverView

Urdu Raw Speech Corpus

Dataset Description

Item specifics

Write a review