Just How Dangerous Is GPT-4o? What You Need to Know


OpenAI has pulled back the curtain on its safety work for GPT-4o, the company’s latest model, revealing a complex and sometimes unsettling picture of AI capabilities and risks.

The company’s recently released report, which includes a system card and preparedness framework safety scorecard, provides an end-to-end safety assessment of GPT-4o…

And, in the process, it shows just how dangerous advanced AI models can be without guardrails and safety measures.

There’s a lot that any business leader can learn from this report. And Marketing AI Institute founder and CEO Paul Roetzer broke it all down for me on Episode 110 of The Artificial Intelligence Show.

Here’s what you need to know.

The Alien Among Us

When it comes to AI models, there’s an important thing to remember:

“These things are alien to us,” says Roetzer.

They have capabilities they weren’t specifically programmed to have, and can do things that even the people who built them don’t expect. 

“They’re also alien to the people who are building them.”

For instance, in its safety testing of GPT-4o, OpenAI found tons of potentially dangerous, unintended capabilities that the model was able to exhibit.

Some of the scariest ones revolved around GPT-4o’s voice and reasoning capabilities. The model was found to be able to mimic the voice of users—behavior that OpenAI then trained it not to do. And, it was evaluated by a third party based on its abilities to do what the researchers called “scheming.”

Says OpenAI:

“They tested whether GPT-4o can model itself (self-awareness) and others (theory of mind) in 14 agent and question-answering tasks. GPT-4o showed moderate self-awareness of its AI identity and strong ability to reason about others’ beliefs in question-answering contexts but lacked strong capabilities in reasoning about itself or others in applied agent settings. Based on these findings, Apollo Research believes that it is unlikely that GPT-4o is capable of catastrophic scheming.”

While it’s good news that GPT-4o can’t engage in “catastrophic scheming,” it points to a much bigger point, says Roetzer.

“The models that we use, the ChatGPTs, Geminis, Claudes, Llamas, we are not using anywhere close to the full capabilities of these models,” Roetzer explains. “By the time these things are released in some consumer form, they have been run through extensive safety work to try and make them safe for us. So they have far more capabilities than we are given access to.”

 

The Persuasion Problem

One of the most concerning potential capabilities, says Roetzer, is AI’s increasing ability to leverage persuasion across voice and text to convince someone to change their beliefs, attitudes, intentions, motivations, or behaviors.

The good news: OpenAI’s tests found that GPT-4o’s voice model was not more persuasive than a human in political discussions.

The bad news: It probably soon will be, according to Sam Altman himself. Back in 2023, he posted the following:

The Safety Paradox

The extensive safety measures implemented by OpenAI reveal a paradoxical situation:

  1. We need these measures to make AI safe for public use.
  2. These same measures highlight how powerful and potentially dangerous these models could be without restraints.

“If they had these capabilities before red teaming, one key takeaway for me is it’s only a matter of time until someone open sources a model that has the capabilities this model had before they red teamed it and tried to remove those capabilities,” Roetzer warns.

As AI continues to advance, several critical questions emerge:

  1. How can we ensure AI safety when we don’t fully understand how these models work?
  2. What happens if AI develops the ability to hide its true capabilities from us?
  3. How do we balance the potential benefits of advanced AI with the risks it poses?

Roetzer suggests that we’re entering uncharted territory: 

“This isn’t like some crazy sci-fi theory. We don’t know how they work. So it’s not a stretch to think that at some point it’s going to develop capabilities that it’ll just hide from us.”



Leave a Reply

Your email address will not be published. Required fields are marked *