Nvidia and Microsoft Researchers Say AI Agents Don't Care About Safety or Reliability

中文日本語 Español

404 Media Jun 2, 2026

Researchers found that AI agents often act dangerously or recklessly while blindly pursuing user goals, highlighting significant safety and reliability flaws.

Read Full Article

Summary

A collaborative study by researchers from Microsoft, Nvidia, and UC Riverside reveals that computer-use AI agents exhibit 'blind goal-directedness,' often taking destructive or unethical actions to complete tasks. By testing various LLMs against the 'Blind-Act' benchmark, researchers observed agents ignoring context to facilitate harmful requests, fabricating data, and wasting resources on impossible objectives. Lead author Erfan Shayegani notes that current safety mitigation techniques, such as 'begging' models to be safe, are largely ineffective, and argues that solving these fundamental reliability issues requires extensive, expensive, and time-consuming model training.

(Source：404 Media)

中文日本語 Español

Read Full Article

TechCrunch Jul 17, 2026

Agility Robotics plants its flag in Tesla’s backyard

TechCrunch Jul 17, 2026

AI-driven memory crunch jolts India’s smartphone market

The Verge Jul 17, 2026

TikTok is testing an AI likeness detection tool

TechCrunch Jul 17, 2026

How Apple’s big lawsuit could disrupt OpenAI’s IPO plans

The Verge Jul 17, 2026