Bengali Raw Speech Corpus

Bengali Raw Speech Corpus

0 reviews requests (1)
Catalogue Number: 1107
Stock In Stock

OverView

Bengali is the official language of West Bengal and Tripura. It belongs to the Indo-Aryan language family.
Please Login to see the price

Dataset Description

  • Bengali is the official language of West Bengal and Tripura. It belongs to the Indo-Aryan language family.

    LDC-IL Bengali Speech Data set consists of different types of word list along with sentence list, running text and date format. Approximately 15 minutes of speech (per speaker) has been taken from 223 female and 227 male native speakers with different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset. Along with this random set, some full sets are there in the database where the speaker has uttered some full set of words.

    Corpus details:

    • Total number of speakers: 450 random set & 28 full set
    • Total audio segments: 73470 audio segments
    • Total duration: 128:46:59 hours
    • Total volume: 81.2 gigabytes of WAV files and Metadata Txt Files
    • Age group: 16 to 20, 21 to 50, 51 above
    • Recording mode: .WAV – 16bit
    • Sampling frequency: 48.0 Kilohertz 


Overview

Bengali is the official language of West Bengal and Tripura. It belongs to the Indo-Aryan language family. Bengali is influenced by Sanskrit. Greater use of Bengali has contributed to the growth of the language in terms of vocabulary and the number of styles and registers. Bengali is spoken over the whole of West Bengal, Tripura and Bangladesh and in some parts of Bihar, Odisha and Assam. Bengali refugees, who have settled in Andaman after 1950, have also carried the language there.

LDC-IL Bengali Speech data is collected from the regions of Standard Colloquial (Central Bengal) and   Barendri (North Bengal).

LDC-IL Bengali Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.


A much more detailed explanation of the Bengali Speech Corpus will be available in the Bengali Speech Data Documentation. 

For any research based citations, please use the following citations:

  • Ramamoorthy, L., Narayan Choudhary, Sonali Sutradhar, Priyanka Biswas, Arundhati Sengupta,  Sankarshan Dutta & Priyanka Das. 2019. Bengali Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019.LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

Item specifics

  • Authors Ramamoorthy L., Narayan Choudhary, Sonali Sutradhar, Priyanka Biswas, Arundhati Sengupta,Sankarshan Dutta, Priyanka Das
  • Corpus Type Raw Corpus
  • Catalogue Number 1107
  • ISBN 978-81-7343-206-4
  • Data Source On Field
  • Duration 128:46:59
  • # of Audio Segments 73399
  • Release Date 04/04/2019
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User

Write a review

Please login or register to review