OpenAI recently announced an audacious new initiative: to solve the alignment problem for superintelligent AI within four years. The company believes that superintelligent AI — a theoretical system with capabilities vastly surpassing human abilities and dramatically more capable than even artificial general intelligence (AGI) — will be the most transformative technology ever invented.
Given the picture as we see it now, it’s conceivable that within the next ten years, AI systems will exceed expert skill level in most domains, and carry out as much productive activity as one of today’s largest corporations. - OpenAI
While recognizing the immense benefits superintelligent AI could offer, OpenAI is candid about the potential risks — including the potential for human disempowerment and even extinction.
Yet, the company's belief in the magnitude of the superintelligence alignment problem has prompted them to allocate a significant 20% of their available compute resources to solving it. They ask the central question (emphasis mine):
“How do we ensure AI systems much smarter than humans follow human intent?”
But is this question inherently paradoxical? If we can definitively command an AI, can it truly be superintelligent? Conversely, if an AI is superintelligent, surely it implies that we can't control it with certainty? OpenAI views the matter differently.
They suggest superintelligence alignment is fundamentally a machine learning problem. They have assembled a new Superalignment team, led by Ilya Sutskever (cofounder and Chief Scientist of OpenAI) and Jan Leike (Head of Alignment), with the plan to build an AI system that can iteratively align future superintelligence.
This approach is not new however. Last year, OpenAI unveiled their approach to alignment research — at that time focused on AGI — and outlined some of its limitations:
- The hardest parts of the alignment problem might not be related to engineering a scalable and aligned training signal for our AI systems. Even if this is true, such a training signal will be necessary.
- It might not be fundamentally easier to align models that can meaningfully accelerate alignment research than it is to align AGI. In other words, the least capable models that can help with alignment research might already be too dangerous if not properly aligned. If this is true, we won’t get much help from our own systems for solving alignment problems.
With these acknowledged challenges for AGI alignment, one might wonder why OpenAI has now shifted its focus to superintelligence? Admittedly, we've seen significant advancements in large language models (LLMs) over the last year, but many hurdles remain to make current systems broadly useful and safe. We have yet to prove that we can align current systems, let alone address the considerably more complex challenge of aligning AGI.
OpenAI says they "[choose to] focus on superintelligence rather than AGI to stress a much higher capability level. We have a lot of uncertainty over the speed of development of the technology over the next few years, so we choose to aim for the more difficult target to align a much more capable system."
OpenAI's justification for focusing on superintelligence poses more questions than it answers. For instance, is shifting focus to superintelligence, which for now remains a theoretical concept the best use of resources?
Furthermore, given that we are still struggling with aligning less complex AI systems, is the leap to superintelligence too ambitious? Does the concentration on this far-off and largely theoretical goal detract from current efforts to make AI safer and more useful?
Finally, could this focus on superintelligence unintentionally contribute to AI hype, fostering unrealistic expectations or fear, and potentially skewing public discourse?
OpenAI's new endeavor is undoubtedly bold, but time will tell if it is a testament to their visionary ambition or a manifestation of hubris tinged with naivety.