Anthropic has unveiled a big update to its Claude AI types, including the new” Computer Use” have. The upgraded Claude 3.5 Sonnet can be used by developers to manage desktop applications, move cursors, push buttons, and kind text, basically imitating a PC user.
Instead of creating specific programs to aid Claude in particular tasks, the company is teaching it basic computer skills, allowing it to use a range of common tools and software programs created specifically for people, the company said in a blog post.
The Computer Use API can be used to convert words causes to computer orders, with Anthropic providing examples like “use data from my computer and online to fill out this form” and “move the mouse to start a website website.” This is the first AI type from the Artificial leader that can search the web.
The release works by analyzing screenshots of what the person is viewing and calculating how many pixels must be moved when the mouse is moving vertically or horizontally to the desired location or use the available software. It can take up to hundreds of repeat steps to finish a command, and it will automatically correct itself and try a step in the event of a problem.
The Computer Use API, available today in public beta, finally aims to allow developers to automate repetitive techniques, test program, and do open-ended tasks. Replit is currently testing using it for app development for its Replit Agent product in order to assess functionality.
Anthropic wrote in a blog post that “enabling AIs to interact with computer software in the same way that people do” will open up a wide range of applications that are n’t currently available to AI assistants.
Claude’s Computer Use is still very error-prone
Anthropic admits that the feature is not perfect, it still ca n’t effectively handle scrolling, dragging, or zooming. Only 46 % of the time, it was successful in an examination designed to assess its ability to guide flights. However, this is a significant improvement over the previous generation, which received a 36 % score.
Because Claude relies on pictures rather than a constant picture stream, it can lose short-lived activities or alerts. The experts acknowledge that one coding presentation led to the stopping of the program and the beginning of Yellowstone National Park photography.
It scored 14.9 % on OSWorld, a system for evaluating a woman’s ability to perform as humans do, for screenshot-based things. This is a far cry from human-level skill, thought to be between 70 % and 75 %, but it is nearly double that of the next best AI system. Additionally, Anthropic is attempting to improve this capacity with creator feedback.
Computer apply comes with a few security features.
According to the Anthropic scientists, a number of purposeful steps were taken to reduce the potential risk associated with laptop use. For protection and security, it does not teach on user-submitted information, including pictures it processes, nor could it access the internet during education.
One of the main flaws discovered is the fast injection attack, a form of “jailbreaking,” in which malignant instructions could cause an AI to act unanticipated.
Studies from the U. K. AI Safety Institute found that hack attacks had “enable clear and malignant multi-step representative behavior” in models without quite Computer Use capabilities, such as GPT-4o. A different study found that 20 % of the time, Generative AI hack attacks are successful.
The Trust and Safety teams set up systems to detect and avoid such attacks in Claude Sonnet 3. 5, especially given that Claude may interpret screenshots that may have dangerous material, to reduce the risk of rapid injection in the text.
However, the developers anticipated the potential for customers to abuse Claude’s computer skills. As a result, they created” classifier” and monitoring devices that detect when hazardous activities, such as email, misinformation, or false habits, may be occurring. To avoid political threats, it is also unable to post on social media or interact with government websites.
Joint pre-deployment testing was conducted by both the U. S. and U. K. Safety Institutes, and Claude 3.5 Sonnet remains at AI Safety Level 2, meaning it does n’t pose significant risks that require more stringent safety measures than the existing.
SEE: OpenAI and Anthropic Sign Deals With U. S. AI Safety Institute, Handing Over Frontier Models For Testing
Claude 3.5 Sonnet is better at coding than its predecessor
Claude 3.5 Sonnet offers significant improvements in coding and tool use, but at the same price and speed as its predecessor, besides the computer use beta. The new model improves its performance on SWE-bench Verified, a coding benchmark, from 33.4 % to 49 %, outpacing even reasoning models like OpenAI o1-preview.
Generative AI is becoming more popular among businesses as they develop their software. However, the technology is not perfect in this area. AI-generated code has been known to cause outages, and security leaders are considering banning the technology’s use in software development.
SEE: When AI Misses the Mark: Why Tech Buyers Face Project Failures
Users of Claude 3.5 Sonnet have seen the improvements in action, according to Anthropic. GitLab tested it for DevSecOps tasks and found that it delivered 10 % more reasoning without adding any latency. The AI lab Cognition also reported improvements in its coding, planning, and problem-solving over the previous version.
Claude 3.5 Sonnet is available today through Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Claude apps are releasing a version without computer use.
Claude 3.5 Haiku is cheaper but just as effective
Anthropic also released the Claude 3.5 Haiku, a more affordable version of the Claude model. Haiku is useful for user-facing applications and creating personalized experiences from data because it provides faster responses as well as improved instruction accuracy and tool use.
For the same price and comparable speed as the Claude 3 Opus model, Haiku offers comparable performance as the larger model. It also outperforms the original Claude 3.5 Sonnet and GPT-4o on SWE-bench Verified, with a score of 40.6 %.
Claude 3.5 Haiku will be rolled out next month as a text-prompt-only model. Images will be possible in the near future.
The shift to AI agents is widespread.
The Claude 3.5 Sonnet’s ability to use computers puts the model in the hands of AI agents, which are tools that can independently perform complex tasks.
” Anthropic’s choice of the term ‘ computer use ‘ instead of’ agents ‘ makes this technology more approachable to regular users”, Yiannis Antoniou, head of Data, Analytics, and AI at technology consultancy Lab49, told TechRepublic in an email.
Agents are gaining in popularity as the essential tools in businesses because they are designed to assist and suggest suggestions to the user rather than act independently. According to the Financial Times, Microsoft, Workday, and Salesforce have all recently placed agents at the core of their AI plans.
In September, Salesforce unveiled Agentforce, a platform for deploying generative AI in areas such as customer support, service, sales, or marketing.
At this week’s SXSW Festival in Australia, IBM’s vice president of product management for its AI platform Armand Ruiz stated that the newest major development in AI will bring about an “agentic era,” in which specialized AI agents and humans collaborate to improve organizational efficiency.
There is still a long way to go before AI can make it possible for us to perform these routine tasks and do it in a trustworthy way, and then do it in a way that is accessible for scale, explanation, and monitoring. ” But we’re going to get there, and we’re going to get there faster than we think”.
AI developers might even go as far as to say that no human intervention is required in their own creation. A” Self-Taught Evaluator” AI model, which was developed by Meta last week, will be able to independently assess its own performance and that of other AI systems, demonstrating the potential for models to learn from their own mistakes.