Inception, a G42 company focused on applied AI research and advancements in the Middle East, has announced the release of Jais, an advanced 13-billion parameter Arabic language model. Developed in partnership with Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Jais represents a major advancement in Arabic natural language processing capabilities.
The release includes two models, Jais-13b which is the base Arabic centric model with 13B parameters as well as Jais-13b-chat, an instruction-tuned version. By open-sourcing Jais, Inception says it aims to engage the scientific, academic, and developer communities to accelerate the growth of a vibrant Arabic language AI ecosystem. This can serve as a model for other languages currently underrepresented in mainstream AI.
The new model was trained on G42's Condor Galaxy 1 supercomputer, which was built in collaboration with Cerebras Systems. Condor Galaxy 1 delivers multi-exaFLOP AI computing power, enabling the rapid development of complex models like Jais. G42 and Cerebras first partnered in 2021 to bring high-performance AI infrastructure to the region.
With its 13 billion parameters, Jais significantly outperforms previous open-source Arabic language models. It was trained on a dataset of 395 billion Arabic and English tokens, allowing it to process both languages with high accuracy. Unlike other multilingual models, Jais gives equal weight to Arabic, comprising 33% of its training data.
"We believe that innovation thrives when we collaborate," says Andrew Jackson, CEO of Inception. "With this release, we are setting a new standard for AI advancement in the Middle East and ensuring that the Arabic language, with its depth and heritage, finds its voice within the AI landscape.
According to G42, Jais is the "world's highest quality Arabic Large Language Model" and represents a milestone for AI advancement in the Arabic-speaking world. By open-sourcing the model, G42 aims to spur innovation and collaboration in Arabic natural language processing.
Researchers praised the careful methodology behind the model, including the use of specialized techniques like ALiBi positional embeddings and SwiGLU activation functions. These optimizations unlock Jais' ability to understand nuanced linguistic patterns, provide improved context handling, and generate human-like text in Arabic.
G42 positions the release of Jais as an important step toward technology that bridges divides rather than exacerbating them. With over 400 million Arabic speakers worldwide, the availability of advanced Arabic language models helps democratize access to AI capabilities.
Jais is now available on Hugging Face, allowing developers and researchers to leverage its Arabic and English proficiencies. G42 plans to continue expanding the capabilities of Jais to further enhance Arabic language understanding.