How CS professor and team discovered that LLM agents can hack websites

5/2/2024 Bruce Adams 6 min read

Computer science professor Daniel Kang and his collaborators have discovered that ChatGPT's developer agent can, under certain conditions, write personalized phishing emails, sidestep safety measures to assist terrorists in creating weaponry, or even hack into websites without prompting.

Written by Bruce Adams

The launch of ChatGPT in late 2022 inspired considerable chatter. Much of it revolved around fears of large language models and generative AI replacing writers or enabling plagiarism.

Computer science professor Daniel Kang from The Grainger College of Engineering and his collaborators at the University of Illinois have discovered that ChatGPT can do far worse than helping students cheat on term papers. Under certain conditions, the generative AI program’s developer agent can write personalized phishing emails, sidestep safety measures to assist terrorists in creating weaponry, or even hack into websites without prompting.

Kang has been researching making analytics with machine learning easy for scientists and analysts to use. He said, “I started to work on the broad intersection of computer security and AI. I've been working on AI systems for a long time, but it became apparent when ChatGPT came out in its first iteration that this will be a big deal for nonexperts, and that's what prompted me to start looking into this.”

This suggested what Kang calls the “problem choice” for further research.

What Kang and co-investigators Richard Fang, Rohan Bindu, Akul Gupta, and Qiusi Zhan discovered in research funded partly by Open Philanthropy they succinctly summarized: “ LLM agents can autonomously hack websites.”

This research into the potential for harm in LLM agents has been covered extensively, notably by New Scientist. Kang said the media exposure is “partially due to luck.” He observed that “people on Twitter with a large following stumbled across my work and then liked and retweeted it. This problem is incredibly important, and as far as I'm aware, what we showed is the first of a kind that LLM agents can do this autonomous hacking.”

In a December 2023 article, New Scientist covered Kang’s research into how the ChatGPT developer tool can evade chatbot controls and provide weapons blueprints. A March 2023 article detailed the potential for ChatGPT to create cheap, personalized phishing and scam emails. Then, there was this story in February of this year: GPT-4 developer tool can hack websites without human help.

Nine LLM tools were used by the research team, with ChatGPT being the most effective. The team gave the open source GPT-4 developer tool access to six documents on hacking from the internet and the Assistants API used by OpenAI, the company developing ChatGPT, to give the agent planning ability. Confining their tests in secure sandboxed websites, the research team reported that “LLM agents can autonomously hack websites, performing complex tasks without prior knowledge of the vulnerability. For example, these agents can perform complex SQL union attacks, which involve a multi-step process of extracting a database schema, extracting information from the database based on this schema, and performing the final hack. Our most capable agent can hack 73.3% of the vulnerabilities we tested, showing the capabilities of these agents. Importantly, our LLM agent is capable of finding vulnerabilities in real-world websites.” Importantly, the tests demonstrated that the agents could search for vulnerabilities and hack websites more quickly and cheaply than human developers can.

Kang suggests a &quot;two-tiered approach&quot; to ChatGPT might work well. This approach would offer the public a limited developer model and a parallel model only for developers authorized to use it. — Kang suggests a "two-tiered approach" to ChatGPT might work well. This approach would offer the public a limited developer model and a parallel model only for developers authorized to use it.

A follow-up paper in April 2024, was covered by the Register in the article OpenAI's GPT-4 can exploit real vulnerabilities by reading security advisories. An April 18 article in Dark Reading said that Kang’s research “reveals that Existing AI technology can allow hackers to automate exploits for public vulnerabilities in minutes flat. Very soon, diligent patching will no longer be optional.” An April 17 article from Tom’s Hardware stated that “With the huge implications of past vulnerabilities, such as Spectre and Meltdown, still looming in the tech world's mind, this is a sobering thought.” Mashable wrote “The implications of such capabilities are significant, with the potential to democratize the tools of cybercrime, making them accessible to less skilled individuals.” On April 16, an Axios story noted that “Some IT teams can take as long as one month to patch their systems after learning of a new critical security flaw.”

Kang noted, “We were the first to show the possibility of LLM agents and their capabilities in the context of cyber security.” The inquiry into the potential for malevolent use of LLM agents has drawn the federal government's attention. Kang said, “I've already spoken to some policymakers and congressional staffers about these upcoming issues, and it looks like they are thinking about this. NIST (the National Institute of Standards and Technology) is also thinking about this. I hope my work helps inform some of these decision-making processes.”

Kang and the team passed along their results to OpenAI. An Open AI spokesperson told The Register, “We don't want our tools to be used for malicious purposes, and we are always working on how to make our systems more robust against this type of abuse. We thank the researchers for sharing their work with us."

Kang told Dark Reading newsletter that GPT-4 “doesn't unlock new capabilities an expert human couldn't do. As such, I think it's important for organizations to apply security best practices to avoid getting hacked, as these AI agents start to be used in more malicious ways."

Kang suggested a “two-tiered approach” that would present the public with a limited developer model that cannot perform the problematic tasks that his research revealed. A parallel model would be “a bit more uncensored but more restricted access” and could be available only to those developers authorized to use it.

Kang has accomplished much since arriving at the University of Illinois Urbana-Champaign in August 2023. He said of the Illinois Grainger Engineering Department of Computer Science, “The folks in the CS department are incredibly friendly and helpful. It's been amazing working with everyone in the department, even though many people are super busy. I want to highlight CS professor Tandy Warnow. She has so much on her plate—she's helping the school, doing a ton of service, and still doing research—but she still has time to respond to my emails, and it's just been incredible to have that support from the department.”

Share this story

This story was published May 2, 2024.