Raw Corpus for Speech
128:46:59 Hours | 81.2 GB | 476 Speakers | 73,470 Audio Segments | 48 kHz | 16 bit wav. Bengali is the official language of West Bengal and Tripura. It belongs to the Indo-Aryan language family. Bengali is influenced by Sanskrit. Greater use of Bengali has contributed to the growth of the language in terms of vocabulary and the number of s..
176:53:28 hours of 113 GB | 456 Speakers | 77443 Audio segments | 48 kHz | 16 bit wavBodo, one of the scheduled language of India, is one of the Tonal languages of the world. There are two clearly distinguishable kinds of tones in Bodo which are known as Low and High. The language belongs to the Tibeto Burmese linguistic fami..
118:40:03 Hours | 75.1 GB | 489 Speakers | 73695 Audio Segments | 48 kHz | 16 bit wav.Hindi is a Major, Indo-Aryan language, a descendant of Sanskrit, which is spoken in the central and northern India, in the states of Bihar, Chhattisgarh, Delhi, Haryana, Himachal Pradesh, Jharkhand, Madhya Pradesh, Rajasthan, Uttarakhand, and Uttar Pradesh..
179:32:52 hours of 115 GB | 656 Speakers | 99109 Audio segments | 48 kHz | 16 bit wavKannada is one of the Ancient Indian languages belong to the Dravidian family. It has its own script. The language in a region is influenced by other languages of the region, the mother tongue of the speaker, etc. The reading speed, loudness, frequency etc al..
156:37:51 Hours | 100 GB | 504 Speakers | 72,938 Audio Segments | 48 kHz | 16 bit wav. Konkani belongs to the Indo-European family of languages. Konkani is the official language of Goa. However, the language is spoken widely across four states- Maharashtra, Goa, Karnataka and Kerala. Konkani is the only Indian language written in..
72:02:12 (44.8GB) Hours | 300 Speakers | 35109 Audio Segments | 48 kHz | 16 bit wavMaithili is an Indio-Aryan language, a direct descendant of Sanskrit, which is spoken in the states of Bihar, Jarkhand, and part of Nepal. It is one of the scheduled languages of India. The LDC-IL speech data is collected from geographic dialects of So..
164:01:02 Hours | 65.5 GB | 458Speakers| 43670 Audio Segments |48 kHz | 16 bit wav.Malayalam is the official language of Kerala and Laccadive Islands. It belongs to the Dravidian language family. According to the formation of Kerala and the language of Travancore, Cochin, and Malabar regions are influenced by different internal and exter..
156:28:32 hours | 100 GB | 620 Speakers | 66,231 Audio segments | 48 khz | 16 bit wav Manipuri is the Administrative Language of Manipur. The development of LDC-IL Speech Data for Manipuri lies in capturing all the distinctive characteristics of speeches shared by different regional dialects of Manipur. In order to do ..
89:17:25 Hours | 58 GB speech data | 307 Speakers | 58544 Audio segments | 48 kHz | 16 bit wav.The Marathi language is an Indo-Aryan language. The Marathi language is prevalent in the 9th century. Standard Marathi (Puneri) is the official language of the State of Maharashtra. Standard Marathi is based on dialects used by academics an..
87:14:44 Hours | 56.5GB | 350 Speakers | 48975 Audio Segments | 48 kHz | 16 bit wav.Nepali belongs to the Indo-Aryan language family. Nepali is the official language of Nepal and Indian State of West Bengal and Sikkim, and spoken in the states of Uttaranchal, Assam, Arunachal Pradesh, Manipur, Mizoram and Bihar, and as well as in other cou..
101:09:28 Hours | 65.5 GB | 467 Speakers | 76,240 Audio Segments | 48 kHz | 16 bit wav. Punjabi is one of the Indo-Aryan Language. Punjabi is a tonal language it has three tones, high-falling, low-rising, and level (neutral). As we know Punjabi is not spoken only in India it is also a language of Pakistan called Shahmukhi Punjabi. He..
22:43:59 Hours | 15 GB | 80 Speakers | 10,510 Audio Segments | 48 kHz | 16 bit wav. Telugu is the official language of Telangana and the Andhra Pradesh States. It belongs to the Dravidian language family. Among the Dravidian languages, Telugu is spoken by the largest population. Telugu is agglutinative in nature and its vocabulary is ..
99:18:21 Hours | 64.2 GB | 499 Speakers | 88,708 Audio Segments | 48 kHz | 16 bit wav. Urdu is one of the Modern Indo-Aryan languages of India. It evolved from Shaurseni Apabhramsha. It uses Persio-Arabic script. The language in a region is influenced by other languages of the region, mother tongue of the speaker, etc. The reading s..