Hara/Harauti Parallel Text Corpus: Linguistic Features and Structures
OverView
Total Words: 4,404,845 | Hara/Harauti Words: 34,537 | 5,332 sentences/phrases in each mother tonguesIndia has 270 mother tongues as per 2011 census. Following...Your request cart is empty!
Dataset Description
Total Words: 4,404,845 | Hara/Harauti Words: 34,537 | 5,332 sentences/phrases in each mother tongues
India has 270 mother tongues as per 2011 census. Following the requirements of the NEP-2020, LDC-IL developed parallel corpus in Indian mother tongues. The Hara/Harauti parallel text corpus connected with English and 146 mother tongues of India. It contains 5,332 sentences/phrases systematically structured based on 159 grammatical categories. The Hara/Harauti section includes 34,537 words and 155,620 characters. Overall, the corpus comprises 4,404,845 words (over 4.4 million tokens) and 23,374,289 characters (approximately 23.3 million).
The price indicated corresponds to a single language component. The total payment will be determined based on the number of language components requested by the seeker.
For any research-based citations, please use the following citations:
1. Dr. Satyaendra Kumar Awasthi, Dr. Rejitha K. S., Dr. Narayan Choudhary, Prof. Shailendra Mohan. 2026. Hara/Harauti Parallel Text Corpus: Linguistic Features and Structures. Central Institute of Indian Languages, Mysore. 978-81-69099-08-0.
2. Rejitha K. S. and Narayan Kumar Choudhary. (ed.). 2025. LDC-IL Corpus Insights. Central Institute of Indian Languages, Mysore. 978-93-48633-33-0.
Item specifics
- Authors Dr. Satyaendra Kumar Awasthi, Dr. Rejitha K. S., Dr. Narayan Choudhary, Prof. Shailendra Mohan
- Catalogue Number 1648
- ISBN 978-81-69099-08-0
- Character Count 23374289
- Word Count 4404845
- Release Date 23/3/2026
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.
