Your request cart is empty!
Dataset Description
4,66,054 Words | 108 Tittles | XML format | 2 domains
Kashmiri text has been typed in Unicode by using the In Script
Keyboard in XML files. Metadata information has also been provided along with
the data. The corpus has been developed from the available contemporary text.
Kashmiri Text Corpus in LDC-IL comprises 466,054 Words and character count
is 2646948, drawn from books, newspapers, and magazines. The representations of
the two major domains are Aesthetics and Social Sciences etc.
Domains |
Words |
Percentage of Total Corpus |
Aesthetics |
4,00,474 |
85.93 % |
Social Sciences |
65,580 |
14.7 % |
A detailed explanation of the Kashmiri Text Corpus will be available in the Kashmiri Raw Text Corpus Documentation.
For any research-based citations, please use the following citations:
- Ramamoorthy, L.,
Narayan Choudhary
& Shahid Mushtaq Bhat. 2019. A Gold Standard Kashmiri Raw
Text Corpus. Central Institute of Indian
Languages, Mysore.
-
Choudhary, Narayan & L.
Ramamoorthy. 2019. "LDC-IL Raw Text Corpora: An Overview" in Linguistic
Resources for AI/NLP in Indian Languages, Central Institute of Indian
Languages, Mysore. pp. 1-10.
Item specifics
- Authors Ramamoorthy L., Narayan Choudhary, Shahid Mushtaq Bhat
- Corpus Type Raw Corpus
- Catalogue Number 1131
- ISBN 978-81-7343-230-9
- Data Source Typed+Cleaned
- Character Count 2646948
- Word Count 466054
- Release Date 04-Apr-2019
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.