
Google's Gemini 2.5 Computer Use model can interpret complex user prompts to perform browser-based tasks like clicking, typing, and organizing digital content, outperforming rivals in speed and accuracy despite supporting only 13 actions at present
Google has introduced Gemini 2.5 Computer Use, a specialised artificial intelligence model capable of interacting with digital interfaces in a way that closely mimics human behaviour. Built on the powerful Gemini 2.5 Pro architecture, the new model is now available to developers through Google AI Studio and Vertex AI.
According to Google, the model can understand complex user instructions and carry out tasks such as clicking, typing, scrolling, and even navigating dropdown menus—all within a browser. Although it currently supports just 13 action types, the company says it significantly outperforms comparable models in both speed and accuracy, while also offering lower latency.
Designed for practical applications
In demonstration videos released by Google, Gemini 2.5 Computer Use is shown tackling tasks like organising digital sticky notes on a collaborative board—dragging and dropping them into user-defined categories based on natural language prompts. While the demos are sped up, they highlight the potential for the model to complete UI-driven tasks without manual intervention.
Although the system is not yet equipped for full desktop operating system control, its current capabilities already have practical uses. Google reports that internal teams are leveraging the model for UI testing and automation, helping accelerate development workflows.
Laying the foundation for AI agents
Elements of the Gemini 2.5 Computer Use model are already integrated into other Google projects, including AI Mode in Search, Firebase Testing Agent, and Project Mariner, which allows users to assign AI agents to handle tasks like data entry, research, and planning through natural language commands.
Google says the latest release marks a key step toward more intelligent, agentic AI systems capable of real-world task execution with minimal human input.
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.