Assamese has entered the artificial intelligence space with its formal inclusion in BharatGen, IIT Bombay’s indigenous large language model (LLM) project. The move marks a significant step in promoting digital access for a language often considered underrepresented in AI applications.
The achievement comes through a collaboration between Guwahati-based organisations — the Nanda Talukdar Foundation (NTF) and the Assam Jatiya Bidyalay Educational and Socio-Economic Trust — and IIT Bombay. Together, they digitised over two million Assamese pages, including books, manuscripts, journals, and traditional Sachipats, creating a comprehensive digital resource for AI development.
Narayan Sharma, Secretary of the Assam Jatiya Bidyalay Trust, described the development as historic, saying Assamese has long been seen as a “low-resource language” in AI contexts. Its inclusion in BharatGen now enables the language to support large-scale AI applications.
Launched in June 2025, BharatGen is a government-backed, open-source initiative designed to support all 22 scheduled Indian languages. Developed by IIT Bombay with a consortium of IITs and IIITs, the platform offers a homegrown alternative to global AI systems like ChatGPT.
Assamese now becomes the 10th language on BharatGen, advancing the platform’s mission of inclusive AI innovation across India’s diverse linguistic landscape.









