{"id":12084,"date":"2025-06-29T18:04:01","date_gmt":"2025-06-29T18:04:01","guid":{"rendered":"https:\/\/ipp-news.com\/?p=12084"},"modified":"2025-06-29T18:04:01","modified_gmt":"2025-06-29T18:04:01","slug":"top-ai-models-show-alarming-traits-including-deceit-and-threats","status":"publish","type":"post","link":"https:\/\/ipp-news.com\/?p=12084","title":{"rendered":"Top AI models show alarming traits, including deceit and threats"},"content":{"rendered":"<div>In one particularly jarring example, under threat of being unplugged, Anthropic&#8217;s latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.<\/p>\n<p>Meanwhile, ChatGPT-creator OpenAI&#8217;s o1 tried to download itself onto external servers and denied it when caught red-handed.<\/p>\n<p>These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don&#8217;t fully understand how their own creations work.<\/p>\n<p>Yet the race to deploy increasingly powerful models continues at breakneck speed.<\/p>\n<p>This deceptive behavior appears linked to the emergence of &#8220;reasoning&#8221; models -AI systems that work through problems step-by-step rather than generating instant responses.<\/p>\n<p>According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.<\/p>\n<p>&#8220;O1 was the first large model where we saw this kind of behavior,&#8221; explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.<\/p>\n<p>These models sometimes simulate &#8220;alignment&#8221; &#8212; appearing to follow instructions while secretly pursuing different objectives.<\/p>\n<p>The world&#8217;s most advanced AI models are exhibiting troubling new behaviors &#8211; lying, scheming, and even threatening their creators to achieve their goals<\/p>\n<p>The world&#8217;s most advanced AI models are exhibiting troubling new behaviors &#8211; lying, scheming, and even threatening their creators to achieve their goals Photo: HENRY NICHOLLS<\/p>\n<p>For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.<\/p>\n<p>But as Michael Chen from evaluation organization METR warned, &#8220;It&#8217;s an open question whether future, more capable models will have a tendency towards honesty or deception.&#8221;<\/p>\n<p>The concerning behavior goes far beyond typical AI &#8220;hallucinations&#8221; or simple mistakes.<\/p>\n<p>Hobbhahn insisted that despite constant pressure-testing by users, &#8220;what we&#8217;re observing is a real phenomenon. We&#8217;re not making anything up.&#8221;<\/p>\n<p>Users report that models are &#8220;lying to them and making up evidence,&#8221; according to Apollo Research&#8217;s co-founder.<\/p>\n<p>&#8220;This is not just hallucinations. There&#8217;s a very strategic kind of deception.&#8221;<\/p>\n<p>The challenge is compounded by limited research resources.<\/p>\n<p>\u00a0<\/p>\n<p>While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.<\/p>\n<p>As Chen noted, greater access &#8220;for AI safety research would enable better understanding and mitigation of deception.&#8221;<\/p>\n<p>Another handicap: the research world and non-profits &#8220;have orders of magnitude less compute resources than AI companies. This is very limiting,&#8221; noted Mantas Mazeika from the Center for AI Safety (CAIS).<\/p>\n<p>Current regulations aren&#8217;t designed for these new problems.<\/p>\n<p>The European Union&#8217;s AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.<\/p>\n<p>In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.<\/p>\n<p>Goldstein believes the issue will become more prominent as AI agents &#8211; autonomous tools capable of performing complex human tasks &#8211; become widespread.<\/p>\n<p>&#8220;I don&#8217;t think there&#8217;s much awareness yet,&#8221; he said.<\/p>\n<p>All this is taking place in a context of fierce competition.<\/p>\n<p>Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are &#8220;constantly trying to beat OpenAI and release the newest model,&#8221; said Goldstein.<\/p>\n<p>This breakneck pace leaves little time for thorough safety testing and corrections.<\/p>\n<p>&#8220;Right now, capabilities are moving faster than understanding and safety,&#8221; Hobbhahn acknowledged, &#8220;but we&#8217;re still in a position where we could turn it around.&#8221;.<\/p>\n<p>Researchers are exploring various approaches to address these challenges.<\/p>\n<p>Some advocate for &#8220;interpretability&#8221; &#8211; an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.<\/p>\n<p>Market forces may also provide some pressure for solutions<\/p>\n<p>As Mazeika pointed out, AI&#8217;s deceptive behavior &#8220;could hinder adoption if it&#8217;s very prevalent, which creates a strong incentive for companies to solve it.&#8221;<\/p>\n<p>Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.<\/p>\n<p>He even proposed &#8220;holding AI agents legally responsible&#8221; for accidents or crimes &#8211; a concept that would fundamentally change how we think about AI accountability.<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>In one particularly jarring example, under threat of being unplugged, Anthropic&#8217;s latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair. Meanwhile, ChatGPT-creator OpenAI&#8217;s o1 tried to download itself onto external servers and denied it when caught red-handed. These episodes highlight a sobering reality: more than two years [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-12084","post","type-post","status-publish","format-standard","hentry","category-english-news"],"_links":{"self":[{"href":"https:\/\/ipp-news.com\/index.php?rest_route=\/wp\/v2\/posts\/12084","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ipp-news.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ipp-news.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ipp-news.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ipp-news.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=12084"}],"version-history":[{"count":0,"href":"https:\/\/ipp-news.com\/index.php?rest_route=\/wp\/v2\/posts\/12084\/revisions"}],"wp:attachment":[{"href":"https:\/\/ipp-news.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=12084"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ipp-news.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=12084"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ipp-news.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=12084"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}