Kudubi/Kudumbi Parallel Text Corpus: Linguistic Features and Structures
OverView
Total Words: 4,404,845 | Kudubi/Kudumbi Words: 24,371 | 5,332 sentences/phrases in each mother tonguesIndia has 270 mother tongues as per 2011 census. Followi...Your request cart is empty!
Dataset Description
Total Words: 4,404,845 | Kudubi/Kudumbi Words: 24,371 | 5,332 sentences/phrases in each mother tongues
India has 270 mother tongues as per 2011 census. Following the requirements of the NEP-2020, LDC-IL developed parallel corpus in Indian mother tongues. The Kudubi/Kudumbi parallel text corpus connected with English and 146 mother tongues of India. It contains 5,332 sentences/phrases systematically structured based on 159 grammatical categories. The Kudubi/Kudumbi section includes 24,371 words and 148,209 characters. Overall, the corpus comprises 4,404,845 words (over 4.4 million tokens) and 23,374,289 characters (approximately 23.3 million).
The price indicated corresponds to a single language component. The total payment will be determined based on the number of language components requested by the seeker.
For any research-based citations, please use the following citations:
1. Mr. Saurabh Varik, Dr. Rejitha K. S., Dr. Narayan Kumar Choudhary, Prof. Shailendra Mohan. 2026. Kudubi/Kudumbi Parallel Text Corpus: Linguistic Features and Structures. Central Institute of Indian Languages, Mysore. 978-81-69099-10-3.
2. Rejitha K. S. and Narayan Kumar Choudhary. (ed.). 2025. LDC-IL Corpus Insights. Central Institute of Indian Languages, Mysore. 978-93-48633-33-0.
Item specifics
- Authors Mr. Saurabh Varik, Dr. Rejitha K. S., Dr. Narayan Choudhary, Prof. Shailendra Mohan
- Corpus Type Parallel Text Corpus
- Catalogue Number 1621
- ISBN 978-81-69099-10-3
- Data Source Descriptive Grammar
- Character Count 23374289
- Word Count 4404845
- Release Date 23/3/2026
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.
