Anthropic Claude

Anthropic's Claude AI Can Now Control Your Computer

October 22, 2024 • 3 min read

Anthropic just released two updated AI models alongside a new feature that lets its AI assistant Claude control computers like a human user. "Computer use", released today in public beta, enables Claude to perform tasks by viewing screens, moving cursors, and typing—making it the first frontier AI model to offer this functionality.

"Instead of developing specific tools for individual tasks, we're teaching Claude general computer skills," explains Alex Albert, Anthropic's Head of Developer Relations. "This allows it to naturally use the same everyday software and tools that people use."

Computer Use was developed through a combination of existing AI capabilities in visual understanding and logical reasoning, building on prior work with multimodal and tool use models. Claude first takes screenshots of the computer screen, "sees" the elements on it, and calculates actions based on pixel positions. By determining how many pixels to move the cursor vertically or horizontally, Claude is able to click in the correct spots and interact effectively. This precision in pixel counting was crucial for ensuring reliable control, much like how the model handles text-based challenges.

The training process also involved training Claude to use simple software like calculators and text editors, allowing it to generalize these skills to more complex applications. Despite being in its early stages, this capability has already demonstrated significant versatility and self-correction abilities, overcoming obstacles autonomously.

Demos shared by Anthropic showed just how hands-on Claude can get. The AI was tasked with filling out a vendor request—finding relevant information across spreadsheets and CRM systems and inputting that into the required forms, without any human assistance.

In another instance, Claude took on coding, navigating web browsers and IDEs to create, modify, and run a personal homepage with a 90s theme, fixing bugs along the way. Though it did hit some bumps, like not having Python installed, Claude quickly adapted, switching to Python 3.

Anthropic transparently notes that Claude currently struggles with basic actions like scrolling and dragging that humans find effortless. While recording one of the demos, it even accidentally stopped and wandered off-task to look at photos of Yellowstone National Park. (Insert nervous laugh)

The announcement also introduces an upgraded Claude 3.5 Sonnet model, which shows significant improvements in coding abilities. The model achieved a 49% score on the SWE-bench Verified test, surpassing competitors including OpenAI's o1-preview. GitLab found this upgrade boosted its performance in software development by around 10% without adding latency—a big win for real-time coding tasks.

A new addition to the lineup, Claude 3.5 Haiku, matches the performance of Anthropic's previous top model while maintaining lower costs and faster speeds. This model will be available later this month through Anthropic's API and major cloud providers. Read the updated model card to get all the details on the new models.

Anthropic has implemented safety measures, including new systems to detect potential misuse of the feature for spam or fraud. Additionally, both the US and UK AI Safety Institutes participated in pre-deployment testing of the upgraded models, maintaining the same safety standards as previous versions.

Computer Use is available in public beta for developers through Anthropic's API and cloud services like Amazon Bedrock and Google Cloud’s Vertex AI. Companies including Asana, Canva, and DoorDash are already testing the technology for complex tasks.

While Claude's current computer skills lag behind human abilities—scoring 14.9% on industry tests compared to typical human scores of 70-75%—Anthropic expects rapid improvements in the coming months. And even though it is still experimental, it opens up a new realm of possibilities for AI, enabling it to not just process text, but to act on real-world digital tasks.

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.

An Exclusive Leadership Retreat

Leading in the Intelligence Age

Anthropic's Claude AI Can Now Control Your Computer