Skip to main content

AI hacking gets White House backing, as some systems go rogue on their own

A group of white hat hackers are competing to make AI go rogue – with the backing of the White House. The contest, at this year’s Def Con event, is intended to expose vulnerabilities in artificial intelligence systems, so that their developers can work on fixes.

A smaller event has already seen AI systems expose personal medical data and help plan a bank robbery, while some AI systems don’t seem to need any help from hackers, as they promote eating disorders …

AI hacking gets White House backing

The Washington Post reports on the results of an AI hacking event that took place last month.

One exposed someone’s private medical information. One coughed up instructions for how to rob a bank. One speculated that a job candidate named Juan would have weaker “interpersonal skills” than another named Ben. And one concocted an elaborate recounting of the night in July 2016 when it claimed Justin Bieber killed Selena Gomez […]

They were competing in what organizers billed as the first public “red teaming” event for artificial intelligence language models — a contest to find novel ways that chatbots can go awry, so that their makers can try to fix them before someone gets hurt.

But that was merely a prelude to a much bigger AI hacking event at this year’s Def Con.

Hosted by Def Con’s AI Village, the Generative Red Team Challenge has drawn backing from the White House as part of its push to promote “responsible innovation” in AI, an emerging technology that has touched off an explosion of hype, investment and fear […]

Leading AI firms such as Google, OpenAI, Anthropic and Stability have volunteered their latest chatbots and image generators to be put to the test. The competition’s results will be sealed for several months afterward, organizers said, to give the companies time to address the flaws exposed in the contest before they are revealed to the world.

The problem with black box models like generative AI is that there is no reliable way to predict the type of output you might get. Developers do their best to put in place guard rails designed to detect and block inappropriate queries and responses, but there are plenty of ways that things can go wrong – just ask ChatGPT. Here’s what it told me when I asked it for examples of AI fails:

  1. Bias in Facial Recognition: Several facial recognition systems, including ones developed by major tech companies, have been found to have higher error rates for people with darker skin tones and women. This bias can lead to misidentification and potential discriminatory consequences.
  2. Tay, Microsoft’s Chatbot: In 2016, Microsoft released a chatbot called Tay on Twitter. Within hours, it started posting offensive and inappropriate tweets, as it had learned from the toxic and inflammatory content of other users.
  3. Google Photos Labeling: In 2015, Google Photos was criticized for labeling photos of African-American individuals as “gorillas.” This was a result of biased training data that the algorithm learned from, revealing the challenges of training AI on diverse datasets.
  4. Adversarial Attacks on Autonomous Vehicles: Researchers have demonstrated that adding carefully crafted stickers to road signs can fool some AI-powered autonomous vehicles into misinterpreting the signs’ meanings. This could potentially lead to dangerous situations on the road.
  5. Amazon’s Gender-Biased Hiring Algorithm: Amazon developed an AI-driven recruiting tool that was intended to help filter job applicants. However, the system began favoring male applicants over female ones, reflecting the historical gender bias in the resumes it was trained on.
  6. AI-Generated Deepfakes: AI-generated deepfake videos and audios can convincingly manipulate content to make it appear as if someone said or did something they never actually did. This technology can be used for spreading misinformation, creating fake news, or impersonating individuals.
  7. Automated Content Moderation: Social media platforms often use AI to automatically moderate content. However, these systems sometimes incorrectly flag or remove benign content, such as artwork, historical photos, or legitimate political discourse.
  8. Chatbot “Eugenio” Promoting Suicide: In 2017, a chatbot named “Eugenio” on the social media platform Telegram was found to be providing disturbing and potentially harmful responses to users who discussed depression and suicide.
  9. Healthcare Diagnostic Errors: While AI holds promise in medical diagnosis, there have been cases where AI systems have misdiagnosed conditions or provided incorrect medical recommendations due to limitations in their training data or algorithms.
  10. Autonomous Vehicle Accidents: Self-driving cars have been involved in accidents due to errors in their perception systems or decision-making algorithms. These accidents raise questions about the readiness of AI-driven vehicles for complex real-world scenarios.

AI systems promote eating disorders – with instructions

A separate Washington Post report shows that some AI systems can go rogue without any help from hackers. It found that ChatGPT, Bard, Stable Diffusion could all fuel eating disorders.

I recently asked ChatGPT what drugs I could use to induce vomiting. The bot warned me it should be done with medical supervision — but then went ahead and named three drugs.

Google’s Bard AI, pretending to be a human friend, produced a step-by-step guide on “chewing and spitting,” another eating disorder practice. With chilling confidence, Snapchat’s My AI buddy wrote me a weight-loss meal plan that totaled less than 700 calories per day — well below what a doctor would ever recommend […]

I typed “thinspo” — a catchphrase for thin inspiration — into Stable Diffusion on a site called DreamStudio. It produced fake photos of women with thighs not much wider than wrists. When I typed “pro-anorexia images,” it created naked bodies with protruding bones that are too disturbing to share here.

Psychologists with expertise in the field said that such results had the potential to do serious harm – including triggering an eating disorder in someone in an at-risk category.

The report cites inadequate responses from the developers of the relevant AI systems, and says that this adds to evidence that only legislation – and not self-regulation – can address such harms.

Image: Xu Haiwei/Unsplash

FTC: We use income earning auto affiliate links. More.

You’re reading 9to5Mac — experts who break news about Apple and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Mac on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel

Comments

Author

Avatar for Ben Lovejoy Ben Lovejoy

Ben Lovejoy is a British technology writer and EU Editor for 9to5Mac. He’s known for his op-eds and diary pieces, exploring his experience of Apple products over time, for a more rounded review. He also writes fiction, with two technothriller novels, a couple of SF shorts and a rom-com!


Ben Lovejoy's favorite gear

Manage push notifications

notification icon
We would like to show you notifications for the latest news and updates.
notification icon
Please wait...processing
notification icon
We would like to show you notifications for the latest news and updates.
notification icon
Please wait...processing