🛰️ Agent Radar(2026-02-28)|Agent 竞争转向“成本效率 + 工具可执行性”
总览
今天这期 Agent Radar 可以看到一个很清晰的共同趋势:
- “会聊天”不够了,必须“会执行”:主流厂商都在强调 agent 能跨应用、跨界面执行任务。
- 性能叙事正在切到“单位成本产出”:不仅比能力,也比每单位推理成本能完成多少任务。
- 生态层开始标准化:MCP 这类协议正在把“工具接入”从一次性工程,变成可复用基础设施。
重点条目
1) 阿里 Qwen3.5:Agent 化能力 + 成本效率并举(Reuters)
- Reuters 报道称,阿里在 2 月 16 日发布 Qwen3.5,并强调该模型面向 agentic AI 场景。
- 口径上最关键的两点:
- 使用成本降低 60%
- 大负载处理能力提升 8 倍(相对前代)
- 并强调了跨移动端/桌面端执行动作的 “visual agentic capabilities”。
解读: 这不是单纯模型版本升级,而是中国大模型竞争中“可执行 Agent + 成本效率战”的明确信号。对企业采购来说,接下来会更关注:
- 每个任务的完成成本(而不是仅看 benchmark)
- Agent 在真实业务流程中的稳定性(跨系统、多步骤、可回滚)
2) Anthropic:Computer Use 公测,Agent 执行边界继续外扩
- Anthropic 在 Claude 3.5 Sonnet 升级公告中,把 computer use 定义为核心新能力:模型可看屏幕、点选、输入、执行多步任务。
- 文中给出多项 agent/工具使用 benchmark 提升,并明确提示该能力仍是实验阶段,存在错误率,需要人工监督。
解读: Agent 产业正在从“API 调函数”为主,逐步走向“UI 级自动化”为辅(甚至主)。这意味着:
- 可以覆盖“没有 API 的遗留系统”
- 但也引入更高操作风险(误操作、越权、提示词注入)
企业级落地上,human-in-the-loop + 权限分层 + 审计日志 会成为默认配置,而不是可选项。
3) MCP(Model Context Protocol):工具与数据接入层标准化提速
- Anthropic 在 MCP 公告中把目标说得很直接:用统一开放协议替代碎片化集成。
- 官方同时给出:规范/SDK、桌面端支持、预置 server 仓库。
解读: 2026 年的 Agent 竞争,不只看“模型有多强”,还看“连接企业系统有多快”。 MCP 的价值在于把“连接成本”从一次性项目改成可复用能力,显著缩短 PoC 到生产的路径。
解读
- 短期(1-3 个月):优先挑选 1-2 条高价值、低风险流程试点 Agent(如检索、摘要、工单分流、报表草拟)。
- 中期(1 个季度):把“模型层”与“连接层”拆开评估:模型可替换,接入层要标准化。
- 关键 KPI:任务成功率、人工接管率、单任务综合成本、异常回滚时间。
原文留档(全文/转录)
来源 1:Reuters(Qwen3.5)
展开查看原文
Qwen and Alibaba logos are seen in this illustration taken, January 29, 2025. REUTERS/Dado Ruvic/Illustration/File Photo
BEIJING, Feb 16 (Reuters) - Alibaba on Monday unveiled a new artificial intelligence model Qwen 3.5 designed to execute complex tasks independently, with big improvements in performance and cost that the Chinese tech giant claims beat major U.S. rival models on several benchmarks
.
The release comes as Alibaba looks to attract more users to its Qwen chatbot app in China, a landscape currently dominated by rival tech giant ByteDance’s Doubao and DeepSeek, which became the first Chinese AI firm to break through globally last year.
Alibaba said Qwen3.5 was 60% cheaper to use and eight times better at processing large workloads than its immediate predecessor, adding that the model also came with the ability to independently take actions across mobile and desktop apps, or what the company calls “visual agenti
c capabilities”.
“Built for the agentic AI era, Qwen3.5 is designed to help developers and enterprises move faster and do more with the same compute, setting a new benchmark for capability per unit of inference cost,” the company said in a statement.
ByteDance on Saturday released Doubao 2.0, an upgrade to its chatbot app that currently commands the largest user base in China, approaching 200 million. The announcement, like Alibaba’s, also positioned the new model as suited to the AI agent era.
The rollout of Qwen3.5 could help further recent gains Alibaba has made in the cutthroat competition of AI models in China. Earlier this month, the e-commerce giant’s coupon giveaway campaign that encouraged consumers to purchase food and drink directly in the Qwen chatbot led to
a seven-fold increase in active users, despite some glitches.
Last year, the e-commerce giant was one of the first of DeepSeek’s competitors to respond to the startup’s viral rise, releasing Qwen 2.5-Max, which it claimed was superior to one of DeepSeek’s hit models.
The company did not mention DeepSeek in its announcement for Qwen3.5, and the several benchmarks it published only show the new model outperforming a previous iteration and rival U.S. models GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro.
DeepSeek is expected to release its new-generation model in the coming days, fueling anticipation among investors and industry insiders given the global tech share selloff the company triggered a year ago.
Reporting by Eduardo Baptista; Editing by Sam Holmes
来源 2:Anthropic(Computer Use / Claude 3.5 更新)
展开查看原文
Update (12/03/2024): We have revised the pricing for Claude 3.5 Haiku. The model is now priced at $0.80 MTok input / $4 MTok output.
Today, we’re announcing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. The upgraded Claude 3.5 Sonnet delivers across-the-board improvements over its predecessor, with particularly significant gains in coding—an area where it already led the field. Claude 3.5 H
aiku matches the performance of Claude 3 Opus, our prior largest model, on many evaluations at a similar speed to the previous generation of Haiku.
We’re also introducing a groundbreaking new capability in public beta: computer use. Available today on the API, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. Claude 3.5 Sonnet is the fi
rst frontier AI model to offer computer use in public beta. At this stage, it is still experimental—at times cumbersome and error-prone.
Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company have already begun to explore these possibilities, carrying out tasks that require dozens, and sometimes even hundreds, of steps to complete.
The upgraded Claude 3.5 Sonnet is now available for all users. Starting today, developers can build with the computer use beta on the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI.
Claude 3.5 Sonnet: Industry-leading software engineering skills
The updated Claude 3.5 Sonnet shows wide-ranging improvements on industry benchmarks, with particularly strong gains in agentic coding and tool use tasks. On coding, it improves performance on SWE-bench Verified from 33.4% to 49.0%. It also improves performance on TAU-bench from
62.6% to 69.2% in retail and from 36.0% to 46.0% in airline.
Claude 3.5 Haiku: State-of-the-art meets affordability and speed
Claude 3.5 Haiku is the next generation of our fastest model. For a similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses even Claude 3 Opus on many intelligence benchmarks.
Teaching Claude to navigate computers, responsibly
With computer use, we’re teaching general computer skills rather than task-specific tools. Developers can use this capability to automate repetitive processes, build and test software, and conduct open-ended tasks like research.
On OSWorld, Claude 3.5 Sonnet scored 14.9% in screenshot-only category, notably better than the next-best AI system’s 7.8%. When afforded more steps, Claude scored 22.0%.
While expected to improve rapidly, Claude’s current ability to use computers is imperfect. Some actions—scrolling, dragging, zooming—present challenges. Anthropic encourages exploration with low-risk tasks.
Because computer use may provide a new vector for threats such as spam, misinformation, or fraud, Anthropic says it is taking a proactive safety approach and has developed new classifiers for monitoring harmful use.
Looking ahead
Learning from initial deployments of this technology, still in early stages, will help better understand both the potential and implications of increasingly capable AI systems.
来源 3:Anthropic(MCP)
展开查看原文
Today, we’re open-sourcing the Model Context Protocol (MCP), a new standard for connecting AI assistants to the systems where data lives, including content repositories, business tools, and development environments.
As AI assistants gain mainstream adoption, even the most sophisticated models are constrained by their isolation from data—trapped behind information silos and legacy systems. Every new data source requires its own custom implementation, making truly connected systems difficult t
o scale.
MCP addresses this challenge by providing a universal, open standard for connecting AI systems with data sources, replacing fragmented integrations with a single protocol.
The Model Context Protocol is an open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools. Developers can either expose data through MCP servers or build AI applications (MCP clients) that connect to those servers.
Anthropic introduced three major MCP components:
- The Model Context Protocol specification and SDKs
- Local MCP server support in Claude Desktop apps
- An open-source repository of MCP servers
Anthropic also shared pre-built MCP servers for systems such as Google Drive, Slack, GitHub, Git, Postgres, and Puppeteer.
Early adopters like Block and Apollo have integrated MCP, while development tools companies including Zed, Replit, Codeium, and Sourcegraph are working with MCP to enhance their platforms.
Instead of maintaining separate connectors for each data source, developers can build against a standard protocol. As the ecosystem matures, AI systems can maintain context across tools and datasets.
Getting started
Developers can install pre-built MCP servers through Claude Desktop, follow quickstart to build their first MCP server, and contribute to open-source connector repositories.
An open community
Anthropic says it is committed to building MCP as a collaborative open-source ecosystem and invites feedback from tool developers, enterprises, and early adopters.