
Researchers at Microsoft have unveiled a new AI model, the Large Action Model (LAM), designed to autonomously operate Windows programs. While large language models (LLMs) have advanced AI by enabling text-based tasks like chatbots, text generation, and code writing, they struggle with executing actions in real-world environments. LAMs aim to bridge this gap by transforming AI systems from text processors to action performers.
Unlike traditional AI models that focus solely on understanding and generating text, LAMs can translate user commands into actionable steps, such as opening applications or controlling devices. This shift signifies a significant leap, as LAMs are the first AI models specifically trained to interact with Microsoft Office applications. This development follows growing interest in AI's ability to perform actions, highlighted by early 2024 demonstrations of Rabbit’s AI device, which could interact with mobile apps autonomously.
LAMs are capable of processing a variety of inputs, including text, voice, or images, and translating them into detailed, executable plans. Additionally, LAMs can adjust their approach based on real-time feedback, ensuring that tasks are completed effectively. For example, instead of simply instructing an AI to create a PowerPoint presentation, users can ask the AI to open the program, create slides, and format them as needed, making the system more practical and efficient.
Developing a LAM requires a complex, multi-stage process. The models are trained using two types of data: task-plan data, which outlines high-level steps, and task-action data, which details the actions needed to complete those tasks. Training methods include supervised fine-tuning, reinforcement learning, and imitation learning. LAMs are thoroughly tested in controlled settings and integrated with systems like Windows GUI agents to ensure compatibility. Final testing is conducted in live scenarios to assess adaptability and performance.
LAMs have the potential to revolutionize industries by automating workflows and assisting people with disabilities. As this technology matures, it could become a standard tool for enhancing productivity across various sectors.
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.