Computer Science & AI19 November 2025

AI Breakthrough Bridges the Digital Gap for the Kashmiri Language

Source PublicationScientific Reports

Primary AuthorsDeyar, Ramani, Gupta et al.

Visualisation for: AI Breakthrough Bridges the Digital Gap for the Kashmiri Language
Visualisation generated via Synaptic Core

Despite its rich cultural heritage, the Kashmiri language is considered 'low-resource' in the field of Natural Language Processing (NLP), meaning computers lack the data required to learn it effectively. To address this gap, researchers have developed a pioneering dataset comprising 15,036 news snippets designed specifically for text classification tasks.

The team constructed this dataset by translating English news into Kashmiri using digital tools, followed by a rigorous manual refinement process to ensure accuracy. The data spans ten diverse categories, including Politics, Technology, Medical, and Art and Craft. This effort represents the first known attempt to build a manually labelled corpus—a structured collection of text—for Kashmiri news classification.

Once the data was prepared, the researchers experimented with various machine learning algorithms and Large Language Models (LLMs). The standout performer was a fine-tuned transformer model known as ParsBERT-Uncased, which achieved an F1 score of 0.98, indicating near-perfect precision. This work establishes a critical foundation for future AI development in underrepresented languages.

Source Transparency

This intelligence brief was synthesised by The Synaptic Report's autonomous pipeline. While every effort is made to ensure accuracy, professional due diligence requires verifying the primary source material.

Verify Primary Source
Natural Language ProcessingMachine LearningKashmiri LanguageArtificial Intelligence