Bengaluru- based startup Sarvam AI has claimed a major breakthrough in India-centric artificial intelligence, saying its latest models have outperformed global heavyweights Google Gemini and OpenAI’s ChatGPT in key document intelligence and optical character recognition (OCR) tasks focused on Indian languages.
According to the company, its newly launched models—Sarvam Vision and Bulbul V3—have delivered significantly improved performance in processing multilingual documents, scanned records and complex visual layouts commonly found in Indian use cases.
Certified Cyber Crime Investigator Course Launched by Centre for Police Technology
Sharing details on X, Sarvam AI co-founder Pratyush Kumar said the company has built a three-billion-parameter state-space vision-language model designed specifically for Indian digitisation needs. The model goes beyond text and voice, enabling interpretation of visual content such as scanned documents, charts and layered layouts, a capability critical for large-scale record digitisation.
Sarvam AI’s primary focus is document intelligence, covering government files, financial records, historical manuscripts and regional newspapers. The startup said its models are trained on high-quality datasets spanning 22 official Indian languages, allowing stronger accuracy across diverse scripts and formats.
On technical benchmarks, Sarvam Vision OCR reportedly achieved 84.3% accuracy on olmOCR-Bench and 93.28% on OmniDocBench v1.5—figures the company claims exceed those recorded by Gemini 3 Pro and DeepSeek OCR v2. The team said the model demonstrates consistent results across multiple document types, scanned pages and multilingual layouts.
Positioning itself as a “sovereign AI” platform, Sarvam AI said its goal is to build foundational artificial intelligence infrastructure that is developed and governed in India, while remaining aligned with local requirements. The company aims to support national digital initiatives by offering AI tools tailored for India’s linguistic and administrative complexity.
The Sarvam platform provides multimodal vision-language support, Indic-first document understanding, chart and data interpretation, multilingual visual processing and production-ready APIs. In a bid to encourage developer adoption, the startup has made its Document Intelligence API free through February 2026, allowing teams to test the system at scale.
While most global AI models are optimised primarily for English, Sarvam AI said its architecture was designed from the ground up with Indian languages at the core. As a result, the system does more than extract text—it interprets visual elements in context, generating deeper insights from documents that often contain mixed scripts, stamps, tables and handwritten fields.
Industry observers believe this development could have wide implications for India, particularly in sectors such as government digitisation, banking, judicial records and healthcare, where multilingual document processing remains a persistent challenge.
Demand for accurate regional-language OCR and document intelligence has risen sharply as public and private institutions accelerate digitisation efforts. Analysts say locally trained models could help reduce dependence on overseas platforms while improving accuracy for India-specific workflows.
Sarvam AI’s progress also signals a broader shift in India’s AI ecosystem, where homegrown startups are beginning to compete directly with global platforms on specialised benchmarks. The coming months will be closely watched to see whether Sarvam gains traction across enterprise and government deployments.
For now, the startup’s latest results are being viewed as a strong indicator of India’s growing capability to build advanced, locally relevant AI systems—highlighting how domestic innovation is increasingly challenging global leaders in targeted domains.
About the author — Suvedita Nath is a science student with a growing interest in cybercrime and digital safety. She writes on online activity, cyber threats, and technology-driven risks. Her work focuses on clarity, accuracy, and public awareness.
