Konkani Raw Speech Corpus

Konkani Raw Speech Corpus

0 reviews requests (12)
Catalogue Number: 1135
Stock In Stock

OverView

156:37:51 Hours | 100 GB | 504 Speakers | 72,938  Audio Segments | 48 kHz | 16 bit wav.
Please Login to see the price

Dataset Description

156:37:51 Hours | 100 GB | 504 Speakers | 72,938  Audio Segments | 48 kHz | 16 bit wav. 

Konkani belongs to the Indo-European family of languages. Konkani is the official language of Goa. However, the language is spoken widely across four states- Maharashtra, Goa, Karnataka and Kerala. Konkani is the only Indian language written in five different scripts - Devanagari, Roman, Kannada, Malayalam, and Persian-Arabic. 

The LDC-IL speech data is collected from the regions of North Goa, South Goa, Karwar (Karnataka) and Sindhudurgh (Maharastra) from both genders and different age groups.Approximately 15 to 20 minutes of speech (per speaker) taken from 267 female and 237 male native speakers of different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.

 

The available Speech Corpus details:


Total Speakers 504 (267  Female and 237 Male)


Domains

Audio Segments

Each Domain Duration

Contemporary Text (News)

477

49:52:09

Creative Text

480

22:09:05

Sentence

12,050

15:51:11

Date Format

953

01:50:39

Command and Control Words

14,944

16:11:02

Person Name

9,588

15:55:43

Place Name

4,812

05:31:03

Most Frequent Word - Part

16,376

16:03:13

Most Frequent Word - Full Set

5,998

05:55:07

Phonetically Balanced

2,975

02:49:36

Form and Function - Word

4,285

04:29:03


A  detailed explanation of the Konkani Speech Corpus will be available in the Konkani Speech Data Documentation.

For any research-based citations, please use the following citations:

  • Ramamoorthy, L., Narayan Choudhary, Saurabh Varik  & Rashmi Shet Tanawade. 2019. Konkani Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

Item specifics

  • Authors Ramamoorthy L., Narayan Choudhary, Saurabh Varik, Bhageshree Khandale, Rashmi S. Shet Tanawade, Yashwant D. Gawas
  • Corpus Type Raw Corpus
  • Catalogue Number 1135
  • ISBN 978-81-7343-234-7
  • Data Source On Field
  • Duration 156:37:51
  • # of Audio Segments 72938
  • Release Date 04-Apr-2019
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review