🤖 AI 每日简报(2026-02-28)
总览
今天 AI 圈出现了一个非常清晰的“双线叙事”:
- 能力线继续冲高:Gemini Deep Think 把 AI 协作研究推进到数学/物理/计算机科学等专业场景;MiniMax M2.1 把多语言编程、Agent 工具链和本地部署进一步工程化。
- 治理线快速收紧:美国军用 AI 使用边界冲突公开化;欧盟 AI Act 从“立法讨论”进入“执法现实”。
结论:2026 年的竞争,不再只是模型参数和榜单成绩,而是 模型能力 × 合规边界 × 组织执行力 的综合博弈。
重点条目
1) 军用 AI 红线冲突公开化:Google/OpenAI 员工联署支持 Anthropic
- 发生了什么:TechCrunch 报道,Anthropic 与美国国防部门在“是否允许无限制使用 AI”上僵持;300+ Google 员工与 60+ OpenAI 员工签署公开信,呼吁公司支持 Anthropic 对“国内大规模监控”和“全自动武器”的红线。
- 为什么重要:这标志着“模型供应商—政府客户”之间的谈判不再是私下合约问题,而升级为产业伦理 + 公司治理 + 人才价值观的三重冲突。
- 后续观察点:
- 大模型厂商是否形成“最低共同红线”;
- 政府采购是否转向“可审计、可限制”的模型接入框架;
- 企业客户会否把“伦理条款”写入供应商评估。
2) Gemini Deep Think 进入科研协作深水区
- 发生了什么:Google DeepMind 发布博文与论文,披露 Gemini Deep Think 在研究级数学、计算机科学与物理问题中的协作案例;提出了更系统的“人机协作研究”方法学。
- 为什么重要:这意味着 AI 在科研中的角色正从“文献助手”走向“证明/反例/验证链路中的协作者”。
- 后续观察点:
- 论文中提到的协作范式(如 Advisor、balanced prompting)能否产品化;
- 学术共同体对“AI 贡献署名与分级”是否形成共识;
- 企业 R&D 团队是否会把这类 workflow 迁移到工程研发中。
3) MiniMax M2.1:多语言编程 + Agent 框架兼容 + 开源权重
- 发生了什么:MiniMax 发布 M2.1,强调 Rust/Java/Golang/C++/Kotlin/ObjC/TS/JS 等多语言编程能力增强,提出在 Web/App 与办公复合指令场景的可用性提升,并开放模型权重(Hugging Face)。
- 为什么重要:在“AI Coding 从 demo 到生产”阶段,跨语言稳定性 + 工具链泛化能力 + 成本效率比单项 benchmark 更决定落地速度。
- 后续观察点:
- 在真实企业代码库中的修复率与回归率;
- 与 Claude Code/Cline/RooCode 等工具链的长期稳定性;
- 推理成本下降是否足够支撑持续 Agent 工作流。
4) 欧盟 AI Act 执法进入现实阶段,初创公司普遍“合规未就绪”
- 发生了什么:Silicon Canals 报道 EU AI Act 的首批执法条款开始落地,先聚焦“不可接受风险”场景;多数中小企业表示尚未完成合规准备。
- 为什么重要:AI 商业化已从“先增长后补课”转向“增长与合规并行”。对早期团队而言,合规不是法务附属项,而是融资与客户签约能力的一部分。
- 后续观察点:
- EU AI Office 的实施细则/行业规范进度;
- 投资机构在 DD 中引入“监管成熟度”打分;
- 合规工具与咨询服务是否成为新一轮基础设施机会。
我的解读(给 aloha 的简版)
- 短期交易层面:新闻热度依然会被“新模型/新能力”驱动,但中期估值将越来越受“可落地性与监管可持续性”约束。
- 产品层面:如果你在看 AI 应用,优先看三件事:真实工作流渗透率、单位任务成本、可解释与审计能力。
- 策略层面:可以把“模型能力、行业合规、工具链生态”拆成三条观察清单,避免只盯模型榜单。
原文留档(全文折叠)
原文 #1|TechCrunch:Employees at Google and OpenAI support Anthropic’s Pentagon stand in open letter
Captured at: 2026-02-28 09:16 +0800 Source: https://techcrunch.com/2026/02/27/employees-at-google-and-openai-support-anthropics-pentagon-stand-in-open-letter/
Anthropic has reached a stalemate with the United States Department of War over the military’s request for unrestricted access to the AI company’s technology. But as the Pentagon’s Friday afternoon deadline for Anthropic’s compliance approaches, more than 300 Google employees and over 60 OpenAI employees have signed an open letter urging the leaders of their companies to support Anthropic and refuse this unilateral use.
Specifically, Anthropic stood in opposition to the use of AI for domestic mass surveillance and autonomous weaponry. The open letter’s signatories seek to encourage their employers to “put aside their differences and stand together” to uphold the boundaries Anthropic has asserted
.
“They’re trying to divide each company with fear that the other will give in,” the letter says. “That strategy only works if none of us know where the others stand.”
The letter specifically calls on executives at Google and OpenAI to maintain Anthropic’s red lines against mass surveillance and fully automated weaponry. “We hope our leaders will put aside their differences and stand together to continue to refuse the Department of War’s curren
t demands.”
Leaders at the companies have not yet formally reponded to the letter. TechCrunch has reached out to Google and OpenAI for comment.
However, informal statements suggest both companies are sympathetic to Anthropic’s side of the case. In an interview with CNBC on Friday morning, OpenAI CEO Sam Altman said that he doesn’t “personally think the Pentagon should be threatening DPA against these companies.” According to a CNN reporter, an OpenAI spokesperson confirmed that the company shares Anthropic’s red lines against autonomous weapons and mass surveillance.
Agreed. Mass surveillance violates the Fourth Amendment and has a chilling effect on freedom of expression. Surveillance systems are prone to misuse for political or discriminatory purposes. https://t.co/f2JRHAhjTW — Jeff Dean (@JeffDean) February 25, 2026
Google DeepMind has not formally addressed the conflict, but Chief Scientist Jeff Dean, presumably speaking as an individual, did express opposition to mass surveillance by the government.
“Mass surveillance violates the Fourth Amendment and has a chilling effect on freedom of expression,” Dean wrote on X. “Surveillance systems are prone to misuse for political or discriminatory purposes.”
According to an Axios report, the military currently can use X’s Grok, Google’s Gemini, and OpenAI’s ChatGPT for unclassified tasks, and has been negotiating with Google and OpenAI to bring its technology over for use in classified work.
While Anthropic has an existing partnership with the Pentagon, the AI company has remained firm in maintaining the boundary that its AI be used for neither mass domestic surveillance, nor fully autonomous weaponry.
Defense Secretary Pete Hegseth told Anthropic CEO Dario Amodei that if his company doesn’t concede, the Pentagon will either declare Anthropic a “supply chain risk” or invoke the Defense Production Act (DPA) to force the company to comply with military demands.
In a statement on Thursday, Amodei maintained his company’s position. “These latter two threats are inherently contradictory: one labels us a security risk; the other labels Claude as essential to national security,” the statement reads. “Regardless, these threats do not change our position: we cannot in good conscience accede to their request.”
Amanda Silberling is a senior writer at TechCrunch covering the intersection of technology and culture. She has also written for publications like Polygon, MTV, the Kenyon Review, NPR, and Business Insider. She is the co-host of Wow If True, a podcast about internet culture, with
science fiction author Isabel J. Kim. Prior to joining TechCrunch, she worked as a grassroots organizer, museum educator, and film festival coordinator. She holds a B.A. in English from the University of Pennsylvania and served as a Princeton in Asia Fellow in Laos.
You can contact or verify outreach from Amanda by emailing [email protected] or via encrypted message at @amanda.100 on Signal.
原文 #2|Google DeepMind:Accelerating Mathematical and Scientific Discovery with Gemini Deep Think
Captured at: 2026-02-28 09:16 +0800 Source: https://deepmind.google/blog/accelerating-mathematical-and-scientific-discovery-with-gemini-deep-think/
February 11, 2026
Research
Thang Luong and Vahab Mirrokni
Under direction from expert mathematicians and scientists, Gemini Deep Think is solving professional research problems across mathematics, physics, and computer science.
In the summer of 2025, an advanced version of Gemini Deep Think achieved Gold-medal standard at the International Mathematics Olympiad (IMO) and later, an updated version, obtained similar results at the International Collegiate Programming Contest. These results demonstrated the model could reason through some of the most challenging math and programming problems designed for students. Since then, Gemini Deep Think mode has moved into science, engineering and enterprise workflows to tackle more complex, open-ended challenges.
In the last week, our teams published two papers (1, 2) detailing a cross-disciplinary effort to solve professional research problems using Gemini Deep Think mode. These results stem from deep collaboration between mathematicians, physicists, and computer scientists.
The Frontier of Pure Mathematics
Unlike IMO problems, research-level mathematics requires advanced techniques from vast literature. While foundation models have large knowledge bases, data scarcity often leads to superficial understanding and hallucinations in advanced subjects.
To solve this, we built a math research agent (internally codenamed Aletheia), powered by Gemini Deep Think mode. It features a natural language verifier to identify flaws in candidate solutions and enable an iterative process of generating and revising solutions. Crucially, this
agent can admit failure to solve a problem, a key feature that improved the efficiency for researchers.
Additionally, the research agent uses Google Search and web browsing to navigate complex research, preventing spurious citations and computational inaccuracies when synthesizing published literature.
Since achieving IMO Gold-medal standard in July 2025, Gemini Deep Think has progressed rapidly, scoring up to 90% on the IMO-ProofBench Advanced test as inference-time compute scales. We demonstrated that the scaling law continues to hold as we progress beyond Olympiad level into PhD-level exercises (per our internal FutureMath Basic benchmark). Notably, Aletheia demonstrated that higher reasoning quality can be achieved at a lower inference-time compute.
For research-level math, Aletheia has already enabled several advancements, produced via varying levels of autonomous research:
- Reliable autonomous research. A research paper (Feng26) generated by AI without any human intervention, which calculates certain structure constants in arithmetic geometry called eigenweights.
- AI-guided collaboration. A research paper (LeeSeo26) demonstrating human-AI collaboration in proving bounds on systems of interacting particles called independent sets.
- An extensive semi-autonomous evaluation (Feng et al., 2026b) of 700 open problems on Bloom’s Erdős Conjectures database, including autonomous solutions to four open questions listed there. On Erdős-1051, our model autonomously solved and helped lead to a generalization reported in a research paper (BKKKZ26).
The agent also contributed intermediate propositions on two further papers, (FYZ26) and (ACGKMP26). It is also of note that there has been prior work using Gemini for research-level math at a smaller scale in terms of collaborations and the number of problems tackled.
Following extensive discussions with the mathematical community, we suggest a taxonomy to classify AI-assisted mathematics research by significance and degree of AI contribution - contributing to the wider discussion on responsible documentation, evaluation and communication of A
I-generated results. Level 2 (“publishable quality”) works have been submitted to reputable journals. Currently, we do not claim any Level 3 (“Major Advance”) and Level 4 (“Landmark Breakthrough”) results.
Prompts and model outputs are available here. For discussions on AI contributions, our “Human-AI Interaction card”, and community impact, see our paper.
Expanding to Physics and Computer Science
Gemini Deep Think mode has also demonstrated promise in computer science and physics. The second paper builds on similar agentic reasoning ideas, and identifies effective “recipes” for collaboration, specifically the “Advisor” model, where humans guide AI through iterative “Vibe-Proving” cycles to validate intuition and refine proofs. We also detail tactical techniques like “balanced prompting” — requesting simultaneous proof or refutation to prevent confirmation bias — and code-assisted verification.
Collaborating with experts on 18 research problems, an advanced version of Gemini Deep Think helped resolve long-standing bottlenecks across algorithms, ML and combinatorial optimization, information theory, and economics. Highlights include:
- Crossing mathematical borders for network puzzles: Progress on classic computer science problems like “Max-Cut” and the “Steiner Tree” had slowed down. Gemini broke both deadlocks by applying tools from unrelated branches of continuous mathematics. See Sections 4.1 and 4.2.
- Settling a decade-old conjecture in online submodular optimization: Gemini produced a specific three-item combinatorial counterexample, proving a long-standing human intuition false. See Section 3.1.
- Machine learning optimization: Gemini proved why a new denoising method works by effectively generating an adaptive penalty. See Section 8.3.
- Upgrading economic theory for AI: Gemini extended a revelation-principle style theorem from rational bids to continuous real numbers. See Section 8.4.
- Physics of cosmic strings: Gemini found a novel analytical route with Gegenbauer polynomials to handle singular integrals. See Section 6.1.
Given computer science’s fluid, conference-driven publication pipeline, we describe these results by academic trajectory rather than a rigid taxonomy. About half target strong conferences—including an ICLR ’26 acceptance—while most remaining findings will form future journal subm
issions. Even when course-correcting the field by identifying errors or refuting conjectures, these outcomes highlight AI’s value as a high-level scientific collaborator.
The Future of Human-AI Collaboration
Building on Google’s previous breakthroughs, this work demonstrates that general foundation models—leveraged with agentic reasoning workflows—can act as a powerful scientific companion.
Under direction from expert mathematicians, physicists, and computer scientists, Gemini Deep Think mode is proving its utility across fields where complex math, logic and reasoning are core.
We are witnessing a fundamental shift in the scientific workflow. As Gemini evolves, it acts as a “force multiplier” for human intellect, handling knowledge retrieval and rigorous verification so scientists can focus on conceptual depth and creative direction.
Acknowledgements (excerpt)
We thank the community of expert mathematicians, physicists, and computer scientists for their help and advice on this project. This project was a large-scale collaboration across Google and its success is due to the combined efforts of many individuals and teams.
Authors of the first paper “Towards Autonomous Mathematics Research” include Tony Feng, Trieu H. Trinh, Garrett Bingham, Dawsen Hwang, Yuri Chervonyi, Junehyuk Jung, Joonkyung Lee, Carlo Pagano, Sang-hyun Kim, Federico Pasqualotto, Sergei Gukov, Jonathan N. Lee, Junsu Kim, Kaiyin
g Hou, Golnaz Ghiasi, Yi Tay, YaGuang Li, Chenkai Kuang, Yuan Liu, Hanzhao (Maggie) Lin, Evan Zheran Liu, Nigamaa Nayakanti, Xiaomeng Yang, Heng-Tze Cheng, Demis Hassabis, Koray Kavukcuoglu, Quoc V. Le, Thang Luong.
Authors of the second paper “Accelerating Scientific Research with Gemini: Case Studies and Common Techniques” include David P. Woodruff, Vincent Cohen-Addad, Lalit Jain, Jieming Mao, Song Zuo, MohammadHossein Bateni, Simina Branzei, Michael P. Brenner, Lin Chen, Ying Feng, Lance
Fortnow, Gang Fu, Ziyi Guan, Zahra Hadizadeh, Mohammad T. Hajiaghayi, Mahdi JafariRaviz, Adel Javanmard, Karthik C. S., Ken-ichi Kawarabayashi, Ravi Kumar, Silvio Lattanzi, Euiwoong Lee, Yi Li, Ioannis Panageas, Dimitris Paparas, Benjamin Przybocki, Bernardo Subercaseaux, Ola Sv
ensson, Shayan Taherijam, Xuan Wu, Eylon Yogev, Morteza Zadimoghaddam, Samson Zhou, Yossi Matias, Jeff Dean, James Manyika, Vahab Mirrokni.
We are grateful for support from the broader DeepThink and Gemini post-training teams.
原文 #3|MiniMax:MiniMax M2.1: Significantly Enhanced Multi-Language Programming, Built for Real-World Complex Tasks
Captured at: 2026-02-28 09:16 +0800 Source: https://www.minimax.io/news/minimax-m21
2025.12.23
MiniMax has been continuously transforming itself in a more AI-native way. The core driving forces of this process are models, Agent scaffolding, and organization. Throughout the exploration process, we have gained increasingly deeper understanding of these three aspects. Today w
e are releasing updates to the model component, namely MiniMax M2.1, hoping to help more enterprises and individuals find more AI-native ways of working (and living) sooner.
In M2, we primarily addressed issues of model cost and model accessibility. In M2.1, we are committed to improving performance in real-world complex tasks: focusing particularly on usability across more programming languages and office scenarios, and achieving the best level in t
his domain.
Key Highlights of MiniMax M2.1:
- Exceptional Multi-Programming Language Capabilities. Many models in the past primarily focused on Python optimization, but real-world systems are often the result of multi-language collaboration. In M2.1, we have systematically enhanced capabilities in Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, JavaScript, and other languages.
- WebDev and AppDev: A Comprehensive Leap in Capability and Aesthetics. M2.1 significantly strengthens native Android and iOS development capabilities and design expression.
- Enhanced Composite Instruction Constraints, Enabling Office Scenarios. As one of the first open-source model series to systematically introduce Interleaved Thinking, M2.1’s systematic problem-solving capabilities have been further upgraded.
- More Concise and Efficient Responses. Compared to M2, MiniMax-M2.1 delivers more concise responses and thought chains with better speed and lower token consumption.
- Outstanding Agent/Tool Scaffolding Generalization Capabilities. M2.1 demonstrates consistent performance in Claude Code, Droid (Factory AI), Cline, Kilo Code, Roo Code, and BlackBox.
- High-Quality Dialogue and Writing.
First Impressions
“We’re excited for powerful open-source models like M2.1 that bring frontier performance (and in some cases exceed the frontier) for a wide variety of software development tasks. Developers deserve choice, and M2.1 provides that much needed choice!” — Eno Reyes, Co-Founder & CTO
of Factory AI
“MiniMax M2.1 performed exceptionally well across our internal benchmarks … especially within e-commerce tasks … we look forward to close collaboration with MiniMax team.” — Benny Chen, Co-founder of Fireworks
“Minimax M2 series has demonstrated powerful code generation capability … very excited to continue partner with minimax team to advance AI in coding.” — Saoud Rizwan, Founder & CEO of Cline
“We could not be more excited about M2.1 … excels from architecture and orchestration to code reviews and deployment … speed and efficiency are off the charts!” — Scott Breitenother, Co-Founder & CEO of Kilo
“Our users love MiniMax M2 … M2.1 improves speed and reliability across wider languages/frameworks … great for high-throughput agentic coding workflows.” — Matt Rubens, Co-Founder & CEO of RooCode
“Integrating MiniMax M2 series has been a significant win … M2.1 handles complex multi-step programming tasks with rare consistency.” — Robert Rizk, Co-Founder & CEO of BlackBox AI
Benchmarks
MiniMax-M2.1 reports significant gains over M2 on software engineering leaderboards, especially multilingual scenarios, and strong performance on SWE-bench Verified across multiple coding-agent frameworks.
To evaluate full-stack capability, MiniMax introduced VIBE (Visual & Interactive Benchmark for Execution), covering Web, Simulation, Android, iOS, and Backend; reports aggregate score 88.6, including VIBE-Web 91.5 and VIBE-Android 89.7.
Showcases
Digital Employee: M2.1 accepts web content in text form and controls mouse/keyboard via text commands for end-to-end office tasks across administration, data science, finance, HR, and software development.
Local Deployment Guide
- Hugging Face repository: https://huggingface.co/MiniMaxAI/MiniMax-M2.1
- Recommended frameworks: SGLang / vLLM / Transformers / Ktransformers
- Tool calling guide and deployment docs are provided in linked repositories.
How to Use
- API on MiniMax Open Platform: https://platform.minimax.io/docs/guides/text-generation
- MiniMax Agent: https://agent.minimax.io/
- Open-source weights: https://huggingface.co/MiniMaxAI/MiniMax-M2.1
原文 #4|Silicon Canals:EU's new AI Act enforcement begins today and most startups say they aren't ready
Captured at: 2026-02-28 09:16 +0800 Source: https://siliconcanals.com/sc-n-eus-new-ai-act-enforcement-begins-today-and-most-startups-say-they-arent-ready/
February 2, 2025 was circled on every European tech founder’s calendar. Today, the first enforcement provisions of the EU’s sweeping AI Act officially go into force — and the mood across the continent’s startup ecosystem is less celebration, more scramble.
The initial phase targets what the regulation classifies as “unacceptable risk” AI systems — including social scoring, real-time biometric surveillance in public spaces, and manipulative AI designed to exploit vulnerabilities. Penalties for violations can reach €35 million or 7%
of global annual turnover, whichever is higher.
What actually changes today
The AI Act uses a tiered risk framework. Today’s enforcement covers only the top tier — prohibited practices. The heavier obligations around high-risk AI systems (e.g., hiring tools, credit scoring, medical diagnostics) are expected later. General-purpose AI model rules arrive in
subsequent phases.
According to a survey from the European Digital SME Alliance, more than 60% of small and medium-sized tech companies say they are not adequately prepared for compliance with any phase of the AI Act. Nearly half reported they hadn’t conducted risk classification of their own syste
ms.
Why most startups say they aren’t ready
1. Regulatory ambiguity
Founders report key implementation details are still being finalized by the EU AI Office (guidelines, codes of practice, technical standards), making compliance targets moving.
2. Resource constraints
Large enterprises can build dedicated compliance teams; early-stage startups often cannot afford specialized AI-regulation legal capacity.
3. A misaligned timeline
Startup iteration cycles are short; regulation timelines are long and then arrive in dense enforcement waves.
The broader competitive anxiety
The article highlights concerns that Europe could lose speed in global AI competition if compliance burdens become too heavy, while also noting regulation can create trust infrastructure and long-term strategic advantages.
What founders can actually do right now
- Classify risk tier early
- Build documentation and audit trails (data provenance, decision logs, risk assessments)
- Join/track industry codes of practice
- Align compliance narrative with investor due diligence
The road ahead
Today’s enforcement is the beginning, not the climax. The most consequential provisions are still coming, so founders are urged to treat this as an early wake-up call.
参考链接
- https://techcrunch.com/2026/02/27/employees-at-google-and-openai-support-anthropics-pentagon-stand-in-open-letter/
- https://deepmind.google/blog/accelerating-mathematical-and-scientific-discovery-with-gemini-deep-think/
- https://www.minimax.io/news/minimax-m21
- https://siliconcanals.com/sc-n-eus-new-ai-act-enforcement-begins-today-and-most-startups-say-they-arent-ready/