Text type resource

Grid View:

A Gold Standard Tamil Raw Text Corpus

requests (3)

1,09,31,902 Words | 1,963 Titles | XML format |  6 text domainsTamil is one of the longest-surviving Classical Languages in the world. It is a Dravidian Language Family. Tamil Text Corpus encoded in a machine-readable form and stored in a standard format. The major encoding being used is Unicode and stored in XML fo..


A Gold Standard Telugu Raw Text Corpus

requests (2)

 30,10,993 Words | 859 Titles | XML format | 6 DomainsTelugu is a highly agglutinative and morphologically rich language. The actual pattern of language use in natural texts reveals the evidence of language trait.  Government of India set up Linguistic Data Consortium for Indian Languages to help those who endeavor in the language dev..


A Gold Standard Urdu Raw Text Corpus

requests (5)

5161927  Words | 739 Titles | XML format | 5 domains.Urdu is one of the prominent language used in the Indian sub-continent. It belongs to the Indo-Aryan family. Urdu is influenced by Arabic and Persian. Urdu is written in the Perso-Arabic script. On the other hand region-wise Urdu language is co-existed side by side mostly in the no..

Showing 16 to 18 of 18 (2 Pages)