Gujarati Raw Speech Corpus(Mono Recordings)

0 reviews requests (14)

Owner Central Institute of Indian Languages

Catalogue Number: 127

Stock In Stock

Please Login to see the price

Tags: Gujarati Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

Gujarati is one of the major literary languages of India and it is the official language of Gujarat state and union territories of Daman and Diu and Dadra and Nagar Haveli. For the convenience LDC-IL considered Gujarati with four dialects namely South Gujarat, Central Gujarat, North Gujarat and Saurashtra.

LDC-IL has 64:44:02 hours Gujarati raw speech data as Mono recording. The LDC-IL Gujarati Raw Speech data set consists of different types of datasets that are made up of word lists, sentences, texts and date formats. Approximately 15 minutes of speech (per speaker) has taken from 124 female and 109 male from Guajarati mother tongue speakers of different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.

The available Speech Corpus details:

Total Speakers 233 (124 Female and 109 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	233	12:52:46
Creative Text	232	13:30:15
Sentence	5824	7:12:17
Date Format	466	0:59:31
Command and Control Words	6985	9:43:07
Person Name	4644	8:34:44
Place Name	2322	3:17:06
Phonetically Balanced	4131	6:28:15
Form and Function - Word	1386	2:06:01

A detailed explanation of the Gujarati Raw Speech Corpus (Mono Recordings) will be available in the Gujarati Raw Speech (Mono Recordings) Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy L., Narayan Kumar Choudhary, Mona Parakh, Rejitha KS, Rajesha N., Manasa, G.2021. Gujarati Raw Speech Corpus(Mono Recordings). Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Kumar Choudhary, Mona Parakh, Rejitha K.S., Rajesha N., Manasa G.
Corpus Type Raw Corpus
Catalogue Number 1277
ISBN 978-81-948885-8-1
Data Source On Field
Duration 64:44:02
# of Audio Segments 26223
Release Date 15-Jun-2021
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Gujarati Raw Speech Corpus(Mono Recordings)

Gujarati Raw Speech Corpus(Mono Recordings)

Dataset Description

Item specifics

Write a review