Tech Mahindra has recently announced the launch of Project Indus – an Indic-based foundational model for Indian languages, which could potentially prove to be its most important project ever. Despite their multilingual capabilities, large language models (LLMs) like the GPT models of OpenAI have primarily been trained on English datasets, which limits their ability to understand and produce content in Indic languages. As a result, India will gain much from an open-source Indic LLM.
According to Tech Mahindra’s Chief CP Gurnani, the model will be the biggest Indic LLM and could possibly cater to 25% of the world’s population. While Tech Mahindra has not revealed the cost associated with the project or when the model is expected to be launched, the aim is to build a 7-billion parameter LLM.
The model is expected to initially support 40 different Hindi dialects and more languages and dialects will be added subsequently. The primary goal for Tech Mahindra is to first create an LLM for continuation of text and then provide a dialogue.
Developing an LLM, primarily designed for Indic languages could be highly beneficial for India for a wide array of reasons. Understanding the nuances of local cultures and contexts is essential for effective communication. An Indic LLM can be designed to prioritise cultural sensitivity, ensuring that the generated content respects local customs and norms. An Indic LLM could also democratise AI and cater to the wider section of non-English speakers in the country.
Moreover, the cost of tokens is significantly higher for the Indic languages in the GPT models when compared to English. Hence, an Indic LLM offers a more cost-effective solution for generating content in Indic languages without token pricing constraints.
The effectiveness of an AI model hinges on the quality of its datasets. While ample English datasets are readily accessible, there is a scarcity of datasets for Indic languages and dialects. Recognising this challenge, various stakeholders, including the Indian government, are actively engaged in the creation of such datasets.
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.