Tulu Raw Text Corpus
OverView
8,16,073 | Words |619,0666 characters | 55 TitlesTulu has about 18.4 lakh (1.84 million) speakers. Although it has a rich literary and oral tradition, carries...Your request cart is empty!
Dataset Description
8,16,073 | Words |619,0666 characters | 55 Titles
Tulu has about 18.4 lakh (1.84 million) speakers. Although it has a rich literary and oral tradition, carries profound cultural and historical significance within the region of Tulunadu. Tulu is one of the ancient and culturally rich languages of South India, belonging to the Dravidian language family. Tulu is mainly spoken in the coastal region known as Tulu Nadu, covering Dakshina Kannada and Udupi districts of Karnataka and Kasargod district of Kerala. The Tulu Raw Text Corpus is an extensive repository encapsulating the viable linguistic elements of Tulu textual materials.
Data has been collected from books.
A detailed explanation of the Tulu Raw Text Corpus will be available in the Tulu Raw Text Corpus Documentation.
For any research-based citations, please use the following citations:
1.Dr. Sajila S., Dr. Narayan Choudhary, Prof. Shailendra Mohan 2026. Tulu Raw Text Corpus, Central Institute of Indian Languages, Mysore. ISBN: 978-81-69175-97-5
1. Narayan Choudhary. LDC-IL: The Indian repository of resources for language technology. Lang Resources & Evaluation 55, 855–867 (2021). https://doi.org/10.1007/s10579-020-09523-3
2. Choudhary, Narayan & L. Ramamoorthy. 2019. "LDC-IL Raw Text Corpora: An Overview" in Linguistic Resources for AI/NLP in Indian Languages, Central Institute of Indian Languages, Mysore. pp. 1-10.
Item specifics
- Authors Dr. Sajila S., Dr. Narayan Choudhary, Prof. Shailendra Mohan
- Corpus Type Raw Text Corpus
- Catalogue Number 1690
- ISBN 978-81-69175-97-5
- Data Source On Field and digital platform
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.
