German startup Aleph Alpha has released two open-weights language models, Pharia-1-LLM-7B-control and Pharia-1-LLM-7B-control-aligned, claiming compliance with EU regulations at a time when AI regulation is sparking intense global debate. This move comes as tech giants grapple with regulatory uncertainties, highlighting the complex balance between innovation and oversight in the rapidly evolving AI landscape.
The Pharia-1-LLM family is now available for non-commercial research and educational use. Aleph Alpha asserts that these models comply with the General Data Protection Regulation (GDPR) and are designed to meet the forthcoming obligations of the EU AI Act.
"We acknowledge and abide by all applicable national and international regulations," Aleph Alpha stated, adding that they will "constantly monitor such developments and adapt our products and this model card accordingly." This proactive stance contrasts sharply with recent actions by major tech companies.
Tech giants like Meta, Apple, and Microsoft have recently announced delays in launching new AI products in the EU due to regulatory uncertainties. Just last week, Meta CEO Mark Zuckerberg and Spotify CEO Daniel Ek jointly criticized EU AI regulations in The Economist. They argued that complex and inconsistent rules are stifling innovation, particularly for open-weight models. Zuckerberg revealed that Meta won't release its upcoming Llama multimodal AI model in Europe due to these concerns.
Meanwhile, in the U.S., the tech industry is divided over proposed state-level regulation. California's AI safety bill, SB 1047, has sparked controversy, with companies like OpenAI and Anthropic taking opposing stances. OpenAI argues that the bill could hinder innovation and drive AI companies out of California, while Anthropic has signaled cautious support following recent amendments. Notably, this week, Elon Musk came out in support of the bill.
Against this backdrop, Aleph Alpha’s compliance-first approach is noteworthy. The company says that their Pharia models are trained in full compliance with GDPR and the anticipated requirements of the EU AI Act. Yet, like many other AI developers, Aleph Alpha still relies on web-scraped data—nearly 8 trillion tokens from sources like Common Crawl. They claim to have meticulously curated this data, taking steps to ensure compliance by removing data from 4.58 million websites and applying rigorous deduplication techniques. In addition, they supplement this with structured datasets drawn from textbooks, legislative texts, and scientific research.
However, without external auditing and with the training data unavailable for inspection, Aleph Alpha’s compliance claims rest entirely on their internal oversight. This raises a critical question: What will enforcement look like under new EU regulations? How will regulators verify these claims without full access to the training data, and will this be any different from the voluntary self-governance seen in the U.S.?
Aleph Alpha’s models support multiple European languages, with specific optimization for German, French, and Spanish. This multilingual capability is particularly significant in the EU, where regulations often require broad language support.
Performance evaluations show that Pharia models often trail behind competitors like Llama in key areas, particularly in handling unsafe prompts. In one assessment, the Pharia-1-LLM-7B-control-aligned
model produced a higher rate of unsafe outputs compared to the Llama 3.1-8B-instruct
model. Despite these shortcomings, Aleph Alpha has openly shared these evaluation results—a move toward transparency that is notable in an industry often guarded about performance benchmarks.
The release of these models presents an interesting case study in the balance between innovation and regulation. Aleph Alpha’s effort to navigate regulatory waters while advancing AI technology could either serve as a template—or a cautionary tale—for other AI companies. The true test will be how these models perform in real-world applications and whether they withstand regulatory scrutiny.
For me, this release raises important questions: How do we balance the need for innovation with responsible oversight? Will regulatory frameworks like those in the EU stifle AI development or drive it forward responsibly? And perhaps most critically, what will oversight and enforcement actually look like in practice?