Bengali Raw Speech Corpus

0 reviews requests (17)

Owner Central Institute of Indian Languages

Catalogue Number: 1107

Stock In Stock

OverView

Please Login to see the price

Tags: Bengali Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

Bengali is the official language of West Bengal and Tripura. It belongs to the Indo-Aryan language family. Bengali is influenced by Sanskrit. Greater use of Bengali has contributed to the growth of the language in terms of vocabulary and the number of styles and registers. Bengali is spoken over the whole of West Bengal, Tripura and Bangladesh and in some parts of Bihar, Odisha and Assam. Bengali refugees, who have settled in Andaman after 1950, have also carried the language there.LDC-IL Bengali Speech data is collected from the regions of Standard Colloquial (Central Bengal) and Barendri (North Bengal).LDC-IL Bengali Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.

The available Speech Corpus details:

Total Speakers 476 (236 Female and 240 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	450	35:05:07
Creative Text	448	20:16:13
Sentence	11,239	16:05:22
Date Format	414	0:26:48
Command and Control Words	13,477	14:00:24
Person Name	9,012	4:56:22
Place Name	4,498	1:45:35
Most Frequent Word - Part	13,525	13:33:14
Most Frequent Word - Full Set	5,978	6:47:05
Phonetically Balanced	9,489	10:23:08
Form and Function - Word	4,940	5:27:41

A detailed explanation of the Bengali Speech Corpus will be available in the Bengali Speech Data Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary, Sonali Sutradhar, Priyanka Biswas, Arundhati Sengupta, Sankarshan Dutta & Priyanka Das. 2019. Bengali Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Choudhary, Sonali Sutradhar, Priyanka Biswas, Arundhati Sengupta,Sankarshan Dutta, Priyanka Das
Corpus Type Raw Corpus
Catalogue Number 1107
ISBN 978-81-7343-206-4
Data Source On Field
Duration 128:46:59
# of Audio Segments 73470
Release Date 04-Apr-2019
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Bengali Raw Speech Corpus

OverView

Bengali Raw Speech Corpus

Dataset Description

Item specifics

Write a review