1,09,31,902 Words | 1,963 Titles | XML format | 6 text domainsTamil is one of the longest-surviving Classical Languages in the world. It is a Dravidian Language Family. Tamil Text Corpus encoded in a machine-readable form and stored in a standard format. The major encoding being used is Unicode and stored in XML fo..
30,10,993 Words | 859 Titles | XML format | 6 DomainsTelugu is a highly agglutinative and morphologically rich language. The actual pattern of language use in natural texts reveals the evidence of language trait. Government of India set up Linguistic Data Consortium for Indian Languages to help those who endeavor in the language dev..
5161927 Words | 739 Titles | XML format | 5 domains.Urdu is one of the prominent language used in the Indian sub-continent. It belongs to the Indo-Aryan family. Urdu is influenced by Arabic and Persian. Urdu is written in the Perso-Arabic script. On the other hand region-wise Urdu language is co-existed side by side mostly in the no..