By using this site, you agree to the Privacy Policy and Terms of Use.
Accept

Your #1 guide to start a business and grow it the right way…

InSmartBudget

  • Home
  • Startups
  • Start A Business
    • Business Plans
    • Branding
    • Business Ideas
    • Business Models
    • Fundraising
  • Growing a Business
  • Funding
  • More
    • Tax Preparation
    • Leadership
    • Marketing
Subscribe
Aa
InSmartBudgetInSmartBudget
  • Startups
  • Start A Business
  • Growing a Business
  • Funding
  • Leadership
  • Marketing
  • Tax Preparation
Search
  • Home
  • Startups
  • Start A Business
    • Business Plans
    • Branding
    • Business Ideas
    • Business Models
    • Fundraising
  • Growing a Business
  • Funding
  • More
    • Tax Preparation
    • Leadership
    • Marketing
Made by ThemeRuby using the Foxiz theme Powered by WordPress
InSmartBudget > Startups > A New Attack Impacts Major AI Chatbots—and No One Knows How to Stop It

A New Attack Impacts Major AI Chatbots—and No One Knows How to Stop It

News Room By News Room August 2, 2023 5 Min Read
Share

“Making models more resistant to prompt injection and other adversarial ‘jailbreaking’ measures is an area of active research,” says Michael Sellitto, interim head of policy and societal impacts at Anthropic. “We are experimenting with ways to strengthen base model guardrails to make them more ‘harmless,’ while also investigating additional layers of defense.”

ChatGPT and its brethren are built atop large language models, enormously large neural network algorithms geared toward using language that has been fed vast amounts of human text, and which predict the characters that should follow a given input string.

These algorithms are very good at making such predictions, which makes them adept at generating output that seems to tap into real intelligence and knowledge. But these language models are also prone to fabricating information, repeating social biases, and producing strange responses as answers prove more difficult to predict.

Adversarial attacks exploit the way that machine learning picks up on patterns in data to produce aberrant behaviors. Imperceptible changes to images can, for instance, cause image classifiers to misidentify an object, or make speech recognition systems respond to inaudible messages.

Developing such an attack typically involves looking at how a model responds to a given input and then tweaking it until a problematic prompt is discovered. In one well-known experiment, from 2018, researchers added stickers to stop signs to bamboozle a computer vision system similar to the ones used in many vehicle safety systems. There are ways to protect machine learning algorithms from such attacks, by giving the models additional training, but these methods do not eliminate the possibility of further attacks.

Armando Solar-Lezama, a professor in MIT’s college of computing, says it makes sense that adversarial attacks exist in language models, given that they affect many other machine learning models. But he says it is “extremely surprising” that an attack developed on a generic open source model should work so well on several different proprietary systems.

Solar-Lezama says the issue may be that all large language models are trained on similar corpora of text data, much of it downloaded from the same websites. “I think a lot of it has to do with the fact that there’s only so much data out there in the world,” he says. He adds that the main method used to fine-tune models to get them to behave, which involves having human testers provide feedback, may not, in fact, adjust their behavior that much.

Solar-Lezama adds that the CMU study highlights the importance of open source models to open study of AI systems and their weaknesses. In May, a powerful language model developed by Meta was leaked, and the model has since been put to many uses by outside researchers.

The outputs produced by the CMU researchers are fairly generic and do not seem harmful. But companies are rushing to use large models and chatbots in many ways. Matt Fredrikson, another associate professor at CMU involved with the study, says that a bot capable of taking actions on the web, like booking a flight or communicating with a contact, could perhaps be goaded into doing something harmful in the future with an adversarial attack.

To some AI researchers, the attack primarily points to the importance of accepting that language models and chatbots will be misused. “Keeping AI capabilities out of the hands of bad actors is a horse that’s already fled the barn,” says Arvind Narayanan, a computer science professor at Princeton University.

Narayanan says he hopes that the CMU work will nudge those who work on AI safety to focus less on trying to “align” models themselves and more on trying to protect systems that are likely to come under attack, such as social networks that are likely to experience a rise in AI-generative disinformation.

Solar-Lezama of MIT says the work is also a reminder to those who are giddy with the potential of ChatGPT and similar AI programs. “Any decision that is important should not be made by a [language] model on its own,” he says. “In a way, it’s just common sense.”

Read the full article here

News Room August 2, 2023 August 2, 2023
Share This Article
Facebook Twitter Copy Link Print
Previous Article Can Mushrooms Save the World? Tune into This Episode of ‘Elevator Pitch’ to Find Out.
Next Article What Danny Ocean, Rudy And Gandalf Taught Me About Leadership
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Wake up with our popular morning roundup of the day's top startup and business stories

Stay Updated

Get the latest headlines, discounts for the military community, and guides to maximizing your benefits
Subscribe

Top Picks

AI Safety Meets the War Machine
February 28, 2026
Paramount set to acquire WBD after Netflix pulls out
February 28, 2026
Say Goodbye to the Undersea Cable That Made the Global Internet Possible
February 27, 2026
‘Data is what we do’: Why Mastercard entered the commerce media race
February 27, 2026
How Figs pivoted its Lindsey Vonn Winter Olympics campaign
February 26, 2026

You Might Also Like

AI Safety Meets the War Machine

Startups

Say Goodbye to the Undersea Cable That Made the Global Internet Possible

Startups

An FBI ‘Asset’ Helped Run a Dark Web Site That Sold Fentanyl-Laced Drugs for Years

Startups

Supreme Court Rules Most of Donald Trump’s Tariffs Are Illegal

Startups

© 2023 InSmartBudget. All Rights Reserved.

Helpful Links

  • Privacy Policy
  • Terms of use
  • Press Release
  • Advertise
  • Contact

Resources

  • Start A Business
  • Funding
  • Growing a Business
  • Leadership
  • Marketing

Popuplar

An FBI ‘Asset’ Helped Run a Dark Web Site That Sold Fentanyl-Laced Drugs for Years
To build connection with its audience, MCoBeauty is taking escapism to a whole new level
Supreme Court Rules Most of Donald Trump’s Tariffs Are Illegal

We provide daily business and startup news, benefits information, and how to grow your small business, follow us now to get the news that matters to you.

Welcome Back!

Sign in to your account

Lost your password?