A Gold Standard Kashmiri Raw Text Corpus Vol. II
OverView
10, 13,658 words | 123 Titles | XML format | 6 domains |59 sub-categoriesA Gold Standard Kashmiri Raw Text Corpus Vol. II is a comprehensive collection of Kashmiri language texts, comprising 10, 13,658 words and 57, 28,547 cha...Your request cart is empty!
Dataset Description
10, 13,658 words | 123 Titles | XML format | 6 domains |59 sub-categories
A Gold Standard Kashmiri Raw Text Corpus Vol. II is a comprehensive collection of Kashmiri language texts, comprising 10, 13,658 words and 57, 28,547 characters. This corpus includes extracts from books, newspapers, and magazines, providing a diverse range of linguistic data. It serves as a valuable resource for linguistic research, language processing applications, and the preservation of the Kashmiri language. This volume has the representation of six major domains covered as compared to previous volume which has only two major domains of Aesthetics and social sciences. The corpus has been meticulously compiled and is available for access through the Linguistic Data Consortium for Indian Languages (LDC-IL). Researchers and developers can utilize this resource to enhance their understanding and applications related to the Kashmiri language. The representations of the six major domains are Aesthetics, Commerce, Mass Media, Official Document, Science and Technology and Social Science etc.
A detailed explanation of the Kashmiri Text Corpus will be available in the Kashmiri Raw Text Corpus Documentation.
For any research-based citations, please use the following citations:
- Dr. Zargar Adil Ahmad, Bi Bi Mariyam, Rajesha N., Manasa G., Dr. Narayan Kumar Choudhary, Prof. Shailendra Mohan. 2025. A Gold Standard Kashmiri Raw Text Corpus Vol. II. Central Institute of Indian Languages, Mysore. 978-93-48633-27-9.
- Rejitha K. S. and Narayan Kumar Choudhary. (ed.). 2025. LDC-IL Corpus Insights. Central Institute of Indian Languages, Mysore. 978-93-48633-33-0.
Item specifics
- Authors Dr. Zargar Adil Ahmad, Bi Bi Mariyam, Rajesha N., Manasa G.,Dr. Narayan Kumar Choudhary, Prof. Shailendra Mohan
- Corpus Type Raw Text Corpus
- Catalogue Number 1510
- ISBN 978-93-48633-27-9
- Data Source On field
- # of Audio Segments 10, 13,658 words
- Release Date 20/03/2025
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.