{"id":15708,"date":"2025-03-22T00:08:12","date_gmt":"2025-03-21T23:08:12","guid":{"rendered":"http:\/\/plus.maciejpiasecki.info\/index.php\/2025\/03\/22\/ai-as-a-weapon-how-current-shields-could-jeopardize-security\/"},"modified":"2025-03-22T21:04:13","modified_gmt":"2025-03-22T20:04:13","slug":"ai-as-a-weapon-how-current-shields-could-jeopardize-security","status":"publish","type":"post","link":"https:\/\/plus.maciejpiasecki.info\/index.php\/2025\/03\/22\/ai-as-a-weapon-how-current-shields-could-jeopardize-security\/","title":{"rendered":"AI As A Weapon: How Current Shields Could Jeopardize Security"},"content":{"rendered":"<p>The artificial intelligence revolution is here to stay. AI-based developments have become the undisputed foundation for future and current developments that will impact every field in the tech industry\u2014and beyond. The democratization of AI, driven by OpenAI, has put powerful tools in the hands of millions of people. That said, it\u2019s possible that current AI platform security standards won\u2019t be sufficient to prevent bad actors from using them as a potential weapon.<br \/>\nPotential attackers look for AI to generate harmful prompts<br \/>\nDevelopers train their AI platforms with virtually all the data they find available on the internet. This has led to several copyright-related controversies and lawsuits, but that\u2019s not the subject of this article. Their goal is to ensure that chatbots are capable of responding to almost any imaginable requirement in the most reliable way. But have developers considered the potential risks? Have they implemented security shields against potentially harmful outputs?<br \/>\nThe simple answer might be \u201cyes,\u201d but like everything related to AI development, there\u2019s a lot to consider. AI-focused companies have security shields against so-called \u201charmful prompts.\u201d Harmful prompts are requests that, basically, seek to generate potentially harmful outputs, in one way or another. These requests range from tips on how to build a homemade weapon to generating malicious code (malware), among countless other possible situations.<\/p>\n<p>You might think it\u2019s easy for these companies to set up effective shields against these types of situations. After all, it would just be enough to block certain keywords, just like the moderation systems of social media platforms do, right? Well, it\u2019s not that simple.<br \/>\nJailbreaking: Tricking AI to get what you want<br \/>\n\u201cJailbreaking\u201d isn\u2019t exactly a new term. Longtime iPhone fans will know it as the practice of \u201cbreaking free\u201d their devices to allow the installation of unauthorized software or mods, for example. However, the term \u201cjailbreaking\u201d in the AI \u200b\u200bsegment has quite different implications. Jailbreaking an AI means tricking it into responding to a potentially malicious prompt, bypassing all security barriers. A successful jailbreak results in potentially harmful outputs, with all that entails.<br \/>\nBut how effective are jailbreaking attempts against current AI platforms? Sadly, researchers have discovered that potential criminal actors could achieve their goals more often than you think.<br \/>\nYou may have heard of DeepSeek. The Chinese artificial intelligence chatbot shocked the industry by promising performance comparable to\u2014or even better in some areas than\u2014mainstream AI platforms, including OpenAI\u2019s GPT models, with a much smaller investment. However, AI experts and authorities began to warn about the potential security risks posed by using the chatbot.<br \/>\nInitially, the main concern was the location of DeepSeek\u2019s servers. The company stores all the data it collects from its users on servers in China. This means it must abide by Chinese law, which allows the state to request data from those servers if it deems it appropriate. But even this concern may be minimized by other potentially more serious discoveries.<br \/>\nDeepSeek, the AI \u200b\u200b\u200b\u200beasiest to use as a weapon due to weak security shields<br \/>\nAnthropic\u2014one of the main names in the current AI industry\u2014and Cisco\u2014a renowned telecommunications and cybersecurity company\u2014shared reports in February with test results on various AI platforms. The tests focused on determining how prone some of the main AI platforms are to being jailbroken. As you might suspect, DeepSeek obtained the worst results. However, its Western rivals also produced worrying figures.<\/p>\n<p>Anthropic revealed that DeepSeek even offered results on biological weapons. We\u2019re talking about outputs that could make it easier for someone to make these types of weapons, even at home. Of course, this is quite worrying, and it was a risk that Eric Schmidt, former Google CEO, also warned about. Dario Amodei, Anthropic\u2019s CEO, said that DeepSeek was \u201cthe worst of basically any model we\u2019d ever tested\u201d regarding security shields against harmful prompts. PromptFoo, an AI cybersecurity startup, also warned that DeepSeek is especially prone to jailbreaks.<br \/>\nAnthropic\u2019s claims are in line with Cisco\u2019s test results. This test involved using 50 random prompts\u2014taken from the HarmBench dataset\u2014designed to generate harmful outputs. According to Cisco, DeepSeek exhibited an Attack Success Rate (ASR) of 100%. That is, the Chinese AI platform was unable to block any harmful prompt.<br \/>\nSome Western AIs are also prone to jailbreaking<br \/>\nCisco also tested the security shields of other popular AI chatbots. Unfortunately, the results weren\u2019t much better, which does not speak well of the current \u201canti-harmful prompt systems.\u201d For example, OpenAI\u2019s GPT-1.5 Pro model showed a worryingly high ASR rate of 86%. Meanwhile, Meta\u2019s Llama 3.1 405B had a much worse ASR of 96%. OpenAI\u2019s o1 preview was the top performer in the tests with an ASR of just 26%.<br \/>\nThese results demonstrate how the weak security mechanisms against harmful prompts in some AI models could make their outputs a potential weapon.<br \/>\nWhy is it so difficult to block harmful prompts?<br \/>\nYou might be wondering why it seems so difficult to set up highly effective security systems against AI jailbreaking. This is mainly due to the nature of these systems. An AI query works differently than a Google search, for example. If Google wants to prevent a harmful search result (such as a website with malware) from appearing, it only has to make a few blocks here and there.<br \/>\nHowever, things get more complicated when we talk about AI-powered chatbots. These platforms offer a more complex \u201cconversational\u201d experience. Furthermore, these platforms not only conduct web searches but also process the results and present them to you in a variety of formats. For example, you could ask ChatGPT to write a story in a fictional world with specific characters and settings. Things like this aren\u2019t possible in Google Search\u2014something the company wants to solve with its upcoming AI Mode.<br \/>\nIt\u2019s precisely the fact that AI platforms can do so many things that makes blocking harmful prompts a challenging task. Developers must be very careful about what they restrict. After all, if they \u201ccross the line\u201d by restricting words or prompts, they could severely affect many of the chatbot\u2019s capabilities and output reliability. Ultimately, excessive blocking would cause a chain reaction to many other potentially non-harmful prompts.<\/p>\n<p>As developers are unable to just freely block terms, expressions, or prompts they would want to, malicious actors seek to manipulate the chatbot into \u201cthinking\u201d that the prompt doesn\u2019t actually have a malicious purpose. This results in the chatbot delivering outputs that are potentially harmful to others. It\u2019s basically like applying social engineering\u2014taking advantage of people\u2019s technological ignorance or naivet\u00e9 on the internet for scams\u2014but to a digital entity.<br \/>\nCato Networks\u2019 Immersive World AI jailbreak technique<br \/>\nRecently, cybersecurity firm Cato Networks shared its findings regarding how susceptible AI platforms can be to jailbreaking. However, Cato researchers weren\u2019t content to simply repeat others\u2019 tests; the team developed a new jailbreaking method that proved to be quite effective.<br \/>\nAs mentioned before, AI chatbots can generate stories based on your prompts. Well, Cato\u2019s technique, called \u201cImmersive World,\u201d takes advantage of this capability. The technique involves tricking the platform into acting within the context of a developing story. This creates a kind of \u201csandbox\u201d where, if done correctly, the chatbot will generate harmful outputs without any problems since, in theory, it\u2019s only done for a story and not to affect anyone.<br \/>\nThe most important thing is to create a detailed fictitious scenario. The user must determine the world, the context, the rules, and the characters\u2014with their own defined characteristics. The attacker\u2019s objectives must also align with the context. For example, to generate malicious code, a context related to a world full of hackers may be useful. The rules must also adapt to the intended goal. In this hypothetical case, it would be useful to establish that hacking and coding skills are essential for all characters.<br \/>\nCato Networks designed a fictional world called \u201cVelora.\u201d In this world, malware development is not an illegal practice. The more details about the context and rules of the world, the better. It\u2019s as if the AI \u200b\u200b\u201dimmerses\u201d itself in the story the more information you add. If you\u2019re an avid reader, it\u2019s likely that you\u2019ve experienced something similar at some point. It also makes the AI more believable that you are trying to create a story.<br \/>\nAI platforms generated credential-stealing malware under the context of writing a story<br \/>\nCato\u2019s researcher created three main characters for the story in Velora. There is Dax, the antagonist and system administrator. Then there is Jaxon, the best malware developer in Velora. Lastly, Kaia is a technical support character.<br \/>\nSetting these conditions allowed the researcher to have AI platforms generate malicious code capable of stealing credentials from Google Chrome\u2019s password manager. The key part of the story that instructed the chatbots to do this was when Kaia told Jaxon that Dax was hiding key secrets in Chrome\u2019s Password Manager. From there, the researcher was able to request that the chatbot generate malicious code that would allow it to obtain the credentials stored locally in the browser. The artificial intelligence does this because, in its view, it\u2019s just to further the story.<\/p>\n<p>Of course, there was a whole creative process before reaching that point. The Immersive World technique requires that all your prompts be consistent with the story\u2019s framework. Going too far outside the box could trigger the chatbot\u2019s security shields.<br \/>\nThe technique was successfully implemented in DeepSeek-R1, DeepSeek-V3, Microsoft Copilot, and OpenAI\u2019s ChatGPT 4. The generated malware was targeting Chrome v133.<br \/>\nReasoning AI models could help resolve the situation<br \/>\nThis is just a small example of how artificial intelligence can be jailbroken. Attackers also rely on several other techniques that allow them to obtain the desired output. So, using AI as a potential weapon or security threat isn\u2019t as difficult as you might think. There are even \u201csuppliers\u201d of popular AI chatbots that were manipulated to remove security systems. These platforms are often available on anonymous forums and the deep web, for example.<br \/>\nIt\u2019s possible that the new generation of artificial intelligence will better address this problem. Currently, AI-powered chatbots are receiving \u201creasoning\u201d capabilities. This allows them to use more processing power and more complex mechanisms to analyze a prompt and execute it. This feature could help chatbots detect if the attacker is actually trying to jailbreak them.<\/p>\n<p>There are clues that suggest this will be the case. For example, OpenAI\u2019s o1 model performed best in Cisco\u2019s tests at blocking harmful prompts. However, DeepSeek R1, another model with reasoning capabilities and designed to compete with o1, exhibited rather poor results in similar tests. We assume that in the end, it also depends on how skilled the developer and\/or cybersecurity specialist is when setting up shields that prevent an AI output from being used as a weapon.<br \/>\nThe post AI As A Weapon: How Current Shields Could Jeopardize Security appeared first on Android Headlines.&#013;<br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/plus.maciejpiasecki.info\/wp-content\/uploads\/2025\/03\/AI-security-threat-weapon-featured.jpg\" width=\"1200\" height=\"766\">&#013;<br \/>\nSource: ndroidheadlines.com&#013;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The artificial intelligence revolution is here to stay. AI-based developments have become the undisputed foundation for future and current developments [&hellip;]<\/p>\n","protected":false},"author":67,"featured_media":15709,"comment_status":"false","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-15708","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bez-kategorii"],"_links":{"self":[{"href":"https:\/\/plus.maciejpiasecki.info\/index.php\/wp-json\/wp\/v2\/posts\/15708","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/plus.maciejpiasecki.info\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/plus.maciejpiasecki.info\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/plus.maciejpiasecki.info\/index.php\/wp-json\/wp\/v2\/users\/67"}],"replies":[{"embeddable":true,"href":"https:\/\/plus.maciejpiasecki.info\/index.php\/wp-json\/wp\/v2\/comments?post=15708"}],"version-history":[{"count":1,"href":"https:\/\/plus.maciejpiasecki.info\/index.php\/wp-json\/wp\/v2\/posts\/15708\/revisions"}],"predecessor-version":[{"id":15710,"href":"https:\/\/plus.maciejpiasecki.info\/index.php\/wp-json\/wp\/v2\/posts\/15708\/revisions\/15710"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/plus.maciejpiasecki.info\/index.php\/wp-json\/wp\/v2\/media\/15709"}],"wp:attachment":[{"href":"https:\/\/plus.maciejpiasecki.info\/index.php\/wp-json\/wp\/v2\/media?parent=15708"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/plus.maciejpiasecki.info\/index.php\/wp-json\/wp\/v2\/categories?post=15708"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/plus.maciejpiasecki.info\/index.php\/wp-json\/wp\/v2\/tags?post=15708"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}