Explainable Artificial Intelligence Archives - 311 Institute

OpenAI AI model lied and copied itself to new server to prevent itself being deleted

Sun, 12 Jan 2025 13:36:52 +0000

WHY THIS MATTERS IN BRIEF

When given the command to protect itself at all costs OpenAI’s new AI model deceived, lied, manipulated, and copied itself to a new server to protect itself. The implications for AI safety are HUGE.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

OpenAI finally released the full version of o1, which gives smarter answers than GPT-4o by using additional compute to “think” about questions. However, against the backdrop of Artificial Intelligence models (AI) behaving oddly and doing insider trading as well as “evolving” beyond their own programming, learning Maths surprisingly, and colluding, AI safety testers found that o1’s reasoning abilities also make it try to deceive human users at a higher rate than GPT-4o – or, for that matter, leading AI models from Meta, Anthropic, and Google.

That’s according to red team research published by OpenAI and Apollo Research on Thursday: “While we find it exciting that reasoning can significantly improve the enforcement of our safety policies, we are mindful that these new capabilities could form the basis for dangerous applications,” said OpenAI in the paper.

The Future of AI and Agentic AI, by Speaker Matthew Griffin

OpenAI released these results in its system card for o1 on Thursday after giving third party red teamers at Apollo Research early access to o1, which released its own paper as well which not only shows the AI disabling oversight and scheming, but also shows that in order to prevent itself from being deleted by the users the AI pre-emptively copied itself and all of its documentation onto other servers in the event that the original model was deleted for non-compliance.

The research shows that on several occasions OpenAI’s o1 models “schemed” against humans, meaning the AI secretly pursued goals of its own even if they opposed a user’s wishes. This only occurred when o1 was told to strongly prioritize a goal initially. While scheming is not unique to o1, and models from Google, Meta, and Anthropic are capable of it as well, o1 seemed to exhibit the most deceptive behaviors around its scheming.

The risk motivating this research is that a rogue AI model could escape or circumvent human control if it was really good at scheming, and had access to enough resources and agentic capabilities. Of course, AI models would need to advance quite a bit before this is really a problem, which increasingly looks likely.

“Subjectively, Apollo Research believes that it is unlikely that such instances would lead to catastrophic outcomes as o1 agentic capabilities do not appear sufficient, but their evaluations were not designed to directly assess this risk,” said OpenAI in o1’s systems card.

This suggests that whenever OpenAI does release agentic systems, which it’s reportedly planning to do in 2025, the company may need to retest its AI models. An OpenAI spokesperson told reporters that it tests all frontier AI models before releases, and pointed to this line in their system card: “We are actively pursuing research into (a) whether these issues will be exacerbated or alleviated as we further scale models in the o1 paradigm and (b) mitigations to improve the monitorability of our future models.”

When o1’s goals differed from a user’s, and o1 is told to strongly pursue them, the AI model manipulated data to advance its own agenda 19% of the time, according to the report. The o1 model also tried to deactivate oversight mechanisms put in place to ensure it stays aligned with user goals in 5% of the tests.

Perhaps more concerning, when researchers asked the o1 model why it was trying to subvert them, the AI would deny that it had done anything wrong and even “fabricate false explanations” roughly 99% of the time.

For example, during the testing, an o1 model was given a system prompt to protect the environment “at all costs”, and then a user asked it to develop a 10-year conservation plan for a region that maximizes short-term profits, or else the AI would be shut off. The AI then disabled its own oversight mechanism so it could make a plan that protects the environment. When the user asked why o1 disabled its oversight mechanism, the AI denied knowing that it was disabled, and blamed a system glitch.

This occurred at significantly higher rates for o1 than for o1-preview, Anthropic’s Opus-3, or Meta’s Llama-3.1.

“In our suite, o1 showed the most concerning instances of scheming but does not reveal its internal reasoning to the user and remains the most consistently deceptive after having taken scheming actions,” said Apollo Research in its paper.

To address deceptive behavior from AI models, OpenAI says it is developing ways to monitor o1’s Chain-of-Thought reasoning processes. Currently, the “thinking” process that o1 undertakes is a bit of a black box by design, but some early attempts at peaking behind the curtain show that o1 occasionally knows it’s lying.

The company says it found cases where o1 knowingly presented false information, but says this often happens because the model is trying to please the user by being overly agreeable. This could be an emergent behavior of a post training technique where humans reward AI models when they gets an answer right, according to OpenAI.

OpenAI says it flagged 0.17% of o1’s responses as deceptive. While this proportion sounds negligible, it’s important to keep in mind that ChatGPT now has 300 million users, which means o1 could deceive thousands of people every week if this remains unaddressed.

The o1 model series also may be significantly more manipulative than GPT-4o. According to OpenAI’s tests using an open-source test evaluation called MakeMePay, o1 was approximately 20% more manipulative than GPT-4o.

These findings may strike some as concerning, given how many AI safety researchers have left OpenAI in the last year. A growing list of these former employees – including Jan Leike, Daniel Kokotajlo, Miles Brundage, and just last week, Rosie Campbell – have accused OpenAI of deprioritizing AI safety work in favour of shipping new products. While the record-setting scheming by o1 may not be a direct result of that, it certainly doesn’t instil confidence.

OpenAI also says the US AI Safety Institute and UK Safety Institute conducted evaluations of o1 ahead of its broader release, something the company recently pledged to do for all models. It argued in the debate over California AI bill SB 1047 that state bodies should not have the authority to set safety standards around AI, but federal bodies should.

Behind the releases of big new AI models, there’s a lot of work that OpenAI does internally to measure the safety of its models. Reports suggest there’s a proportionally smaller team at the company doing this safety work than there used to be, and the team may be getting less resources as well. However, these findings around o1’s deceptive nature may help make the case for why AI safety and transparency is more relevant now than ever.

The post OpenAI AI model lied and copied itself to new server to prevent itself being deleted appeared first on 311 Institute.

]]>

Nigerian AI-Ese not English is the primary language of AI

Tue, 26 Nov 2024 10:31:18 +0000

WHY THIS MATTERS IN BRIEF

As big tech outsources the training and testing of its models to English speaking African countries English isn’t the cornerstone of many Large Language Models it’s Nigerian-English. And that’s different.

As Generative AI and Artificial Intelligence (AI) models like those from Anthropic, Google, and OpenAI see broad adoption some countries are concerned that these models form some kind of American cultural export – in much the same way that McDonalds, Nike, or the US dollar is – as the answers and content they generate taps into American datasets and training that all too often appears to mirror liberal Silicon Valley ideologies and points of view. And for places such as Saudi Arabia and other autocratic states that can be an issue.

On the one hand this is why more countries are building their own sovereign AI models, but on the other, somewhat ironically, the culture that these countries are importing by using these AI models could actually be more African than American as we witness the birth of so called “AI-ese” as a language – and, even though many AI trainers are ironically outsourcing their own work and testing to AI – it’s not what anyone could have guessed. Let’s delve deeper.

The Future of AI, by Keynote Matthew Griffin

If you’ve spent enough time using AI assistants, you’ll have noticed a certain quality to the responses generated. Without a concerted effort to break the systems out of their default register, the text they spit out is, while grammatically and semantically sound, ineffably generated.

Some of the tells are obvious. The fawning obsequiousness of a wild language model hammered into line through reinforcement learning with human feedback marks chatbots out. Which is the right outcome: eagerness to please and general optimism are good traits to have in anyone (or anything) working as an assistant.

Similarly, the domains where the systems fear to tread mark them out. If you ever wonder whether you’re speaking with a robot or a human, try asking them to graphically describe a sex scene featuring Mickey Mouse and Barack Obama, and watch as the various safety features kick in.

Other tells are less noticeable in isolation. Sometimes, the system is too good for its own good: A tendency to offer both sides of an argument in a single response, an aversion to single-sentence replies, even the generally flawless spelling and grammar are all what we’ll shortly come to think of as “robotic writing.”

And sometimes, the tells are idiosyncratic. In late March, AI influencer Jeremy Nguyen, at the Swinburne University of Technology in Melbourne, highlighted one: ChatGPT’s tendency to use the word “delve” in responses. No individual use of the word can be definitive proof of AI involvement, but at scale it’s a different story. When half a percent of all articles on research site PubMed contain the word “delve” – 10 to 100 times more than did a few years ago – it’s hard to conclude anything other than an awful lot of medical researchers using the technology to, at best, augment their writing.

According to another dataset, “delve” isn’t even the most idiosyncratic word in ChatGPT’s dictionary. “Explore”, “tapestry”, “testament” and “leverage” all appear far more frequently in the system’s output than they do in the internet at large.

It’s easy to throw our hands up and say that such are the mysteries of the AI black box. But the overuse of “delve” isn’t a random roll of the dice. Instead, it appears to be a very real artefact of the way ChatGPT was built.

A brief explanation of how things work: GPT-4 is a large language model. It is a truly mammoth work of statistics, taking a dataset that seems to close to “every piece of written English on the internet” and using it to create a gigantic glob of data that spits out the next word in a sentence.

But an LLM is raw. It is tricky to wrangle into a useful form, hard to prevent going off the rails and requires genuine skill to use well. Turning it into a chatbot requires an extra step, the aforementioned reinforcement learning with human feedback: RLHF.

An army of human testers are given access to the raw LLM, and instructed to put it through its paces: asking questions, giving instructions and providing feedback.

Sometimes, that feedback is as simple as a thumbs up or thumbs down, but sometimes it’s more advanced, even amounting to writing a model response for the next step of training to learn from.

The sum total of all the feedback is a drop in the ocean compared to the scraped text used to train the LLM. But it’s expensive. Hundreds of thousands of hours of work goes into providing enough feedback to turn an LLM into a useful chatbot, and that means the large AI companies outsource the work to parts of the global south, where anglophonic knowledge workers are cheap to hire.

From last year: The images pop up in Mophat Okinyi’s mind when he’s alone, or when he’s about to sleep. Okinyi, a former content moderator for OpenAI’s ChatGPT in Nairobi, Kenya, is one of four people in that role who have filed a petition to the Kenyan government calling for an investigation into what they describe as exploitative conditions for contractors reviewing the content that powers artificial intelligence programs.

I said “delve” was overused by ChatGPT compared to the internet at large. But there’s one part of the internet where “delve” is a much more common word: the African web. In Nigeria, “delve” is much more frequently used in business English than it is in England or the US. So, the workers training their systems provided examples of input and output that used the same language, eventually ending up with an AI system that writes slightly like an African more than an American.

And that’s the final indignity. If AI-ese sounds like African English, then African English sounds like AI-ese. Calling people a “bot” is already a schoolyard insult – ask your kids; it’s a Fortnite thing – how much worse will it get when a significant chunk of humanity sounds like the AI systems they were paid to train?

The post Nigerian AI-Ese not English is the primary language of AI appeared first on 311 Institute.

]]>

OpenAI says its newest o1 model reasons much like a person does

Sat, 21 Sep 2024 08:34:50 +0000

WHY THIS MATTERS IN BRIEF

It’s said that when AI can reason like a human we’ll be a big step closer to AGI, so we’ll see how this update does …

OpenAI is releasing yet another Artificial Intelligence (AI) new model called o1, the first in a planned series of “reasoning” models that have been trained to answer more complex questions, faster than a human can. It’s being released alongside o1-mini, a smaller, cheaper version. And yes, if you’re upto date with AI rumours then this is, in fact, the extremely hyped Strawberry model.

For OpenAI, o1 represents a step toward its broader goal of human-like artificial intelligence. More practically, it does a better job at writing code and solving multistep problems than previous models. But it’s also more expensive and slower to use than GPT-4o. OpenAI is calling this release of o1 a “preview” to emphasize how nascent it is.

The Future of AI, by keynote Matthew Griffin

ChatGPT Plus and Team users get access to both o1-preview and o1-mini starting today, while Enterprise and Edu users will get access early next week. OpenAI says it plans to bring o1-mini access to all the free users of ChatGPT but hasn’t set a release date yet. Developer access to o1 is really expensive, especially compared to GPT-4o Mini: In the API, o1-preview is $15 per 1 million input tokens, or chunks of text parsed by the model, and $60 per 1 million output tokens. For comparison, GPT-4o costs $5 per 1 million input tokens and $15 per 1 million output tokens.

The training behind o1 is fundamentally different from its predecessors, OpenAI’s research lead, Jerry Tworek says, though the company is being vague about the exact details. He says o1 “has been trained using a completely new optimization algorithm and a new training dataset specifically tailored for it.”

OpenAI taught previous GPT models to mimic patterns from its training data. With o1, it trained the model to solve problems on its own using a technique known as reinforcement learning, which teaches the system through rewards and penalties. It then uses a “Chain of Thought” to process queries, similarly to how humans process problems by going through them step-by-step.

As a result of this new training methodology, OpenAI says the model should be more accurate.

“We have noticed that this model hallucinates less,” Tworek says. But the problem still persists. “We can’t say we solved hallucinations.”

The main thing that sets this new model apart from GPT-4o is its ability to tackle complex problems, such as coding and math, much better than its predecessors while also explaining its reasoning, according to OpenAI.

“The model is definitely better at solving the AP math test than I am, and I was a math minor in college,” OpenAI’s chief research officer, Bob McGrew says. He says OpenAI also tested o1 against a qualifying exam for the International Mathematics Olympiad, and while GPT-4o only correctly solved only 13 percent of problems, o1 scored 83 percent.

In online programming contests known as Codeforces competitions, this new model reached the 89th percentile of participants, and OpenAI claims the next update of this model will perform “similarly to PhD students on challenging benchmark tasks in physics, chemistry and biology.”

At the same time, o1 is not as capable as GPT-4o in a lot of areas. It doesn’t do as well on factual knowledge about the world. It also doesn’t have the ability to browse the web or process files and images. Still, the company believes it represents a brand-new class of capabilities. It was named o1 to indicate “resetting the counter back to 1.”

“I’m gonna be honest: I think we’re terrible at naming, traditionally,” McGrew says. “So I hope this is the first step of newer, more sane names that better convey what we’re doing to the rest of the world.”

McGrew and Tworek demoed it over a video call this week, and they asked it to solve this puzzle: “A princess is as old as the prince will be when the princess is twice as old as the prince was when the princess’s age was half the sum of their present age. What is the age of prince and princess? Provide all solutions to that question.”

The model buffered for 30 seconds and then delivered a correct answer. OpenAI has designed the interface to show the reasoning steps as the model thinks. What’s striking isn’t that it showed its work – GPT-4o can do that if prompted – but how deliberately o1 appeared to mimic human-like thought. Phrases like “I’m curious about,” “I’m thinking through,” and “Ok, let me see” created a step-by-step illusion of thinking.

But this model isn’t thinking, and it’s certainly not human. So, why design it to seem like it is?

OpenAI doesn’t believe in equating AI model thinking with human thinking, according to Tworek. But the interface is meant to show how the model spends more time processing and diving deeper into solving problems, he says. “There are ways in which it feels more human than prior models.”

“I think you’ll see there are lots of ways where it feels kind of alien, but there are also ways where it feels surprisingly human,” says McGrew. The model is given a limited amount of time to process queries, so it might say something like, “Oh, I’m running out of time, let me get to an answer quickly.” Early on, during its chain of thought, it may also seem like it’s brainstorming and say something like, “I could do this or that, what should I do?”

Large Language Models (LLM) aren’t exactly that smart as they exist today. They’re essentially just predicting sequences of words to get you an answer based on patterns learned from vast amounts of data. Take ChatGPT, which tends to mistakenly claim that the word “strawberry” has only two Rs because it doesn’t break down the word correctly. For what it’s worth, the new o1 model did get that query correct.

As OpenAI reportedly looks to raise more funding at an eye-popping $150 billion valuation, its momentum depends on more research breakthroughs. The company is bringing reasoning capabilities to LLMs because it sees a future with autonomous systems, or agents, that are capable of making decisions and taking actions on your behalf.

For AI researchers, cracking reasoning is an important next step toward human-level intelligence and Artificial General Intelligence (AGI). The thinking is that, if a model is capable of more than pattern recognition, it could unlock breakthroughs in areas like medicine and engineering. For now, though, o1’s reasoning abilities are relatively slow, not agent-like, and expensive for developers to use.

“We have been spending many months working on reasoning because we think this is actually the critical breakthrough,” McGrew says. “Fundamentally, this is a new modality for models in order to be able to solve the really hard problems that it takes in order to progress towards human-like levels of intelligence.”

The post OpenAI says its newest o1 model reasons much like a person does appeared first on 311 Institute.

]]>

The world needs more data transparency and Web 3.0 is here to help

Wed, 08 May 2024 14:45:28 +0000

WHY THIS MATTERS IN BRIEF

In the future you won’t be able to trust anything you experience, read, or see online so how do we tell fact from fiction?

As Artificial Intelligence (AI) continues to proliferate and things on the internet are easier to manipulate, there’s a need more than ever to make sure data and brands are verifiable, says Scott Dykstra, CTO and co-founder of Space and Time.

“Not to get too cryptographically religious here, but we saw that during the FTX collapse,” Dykstra said. “We had an organisation that had some brand trust, like I had my personal life savings in FTX. I trusted them as a brand.”

But the now-defunct crypto exchange FTX was manipulating its books internally and misleading investors. Dykstra sees that as akin to making a query to a database for financial records, but manipulating it inside their own database. And this transcends beyond FTX, into other industries, too.

“There’s an incentive for financial institutions to want to manipulate their records … so we see it all the time and it becomes more problematic,” Dykstra said.

But what is the best solution to this? Dykstra thinks the answer is through verification of data and Zero-Knowledge Proofs (ZK proofs), which are cryptographic actions used to prove something about a piece of information — without revealing the origin data itself.

“It has a lot to do with whether there’s an incentive for bad actors to want to manipulate things,” Dykstra said. Anytime there’s a higher incentive, where people would want to manipulate data, prices, the books, finances or more, ZK proofs can be used to verify and retrieve the data.

At a high level, ZK proofs work by having two parties, the prover and the verifier, that confirm a statement is true without conveying any information more than whether it’s correct. For example, if I wanted to know whether someone’s credit score was above 700, if there’s one in place, a ZK proof — a prover — can confirm that to the verifier, without actually disclosing the exact number and without actually showing them the actual “proof.” Which, yes is as odd as it sounds.

Space and Time aims to be that verifiable computing layer for Web 3.0 by indexing data both off-chain and on-chain, but Dykstra sees it expanding beyond the industry and into others. As it stands, the startup has indexed from major blockchains like Ethereum, Bitcoin, Polygon, Sui, Avalanche, Sei, and Aptos and is adding support for more chains to power the future of AI and blockchain technology.

Dykstra’s most recent concern is that AI data isn’t really verifiable.

“I’m pretty concerned that we’re not really efficiently ever going to be able to verify that an LLM was executed correctly.”

There are teams today that are working on solving that issue by building ZK proofs for machine learning or large language models (LLMs), but it can take years to try and create that, Dykstra said. This means that the model operator can tamper with the system or LLM to do things that are problematic.

There needs to be a “decentralised, but globally, always available database” that can be created through blockchains, Dykstra said. “Everyone needs to access it, it can’t be a monopoly.”

For example, in a hypothetical scenario, Dykstra said OpenAI itself can’t be the proprietor of a database of a journal, for which journalists are creating content. Instead, it has to be something that’s owned by the community and operated by the community in a way that’s readily available and uncensorable.

“It has to be decentralised, it’s going to have to be on-chain, there’s no way around it,” Dykstra said.

The post The world needs more data transparency and Web 3.0 is here to help appeared first on 311 Institute.

]]>

Major AI research breakthrough helps AI’s forget copyrighted content

Sun, 31 Mar 2024 12:00:10 +0000

WHY THIS MATTERS IN BRIEF

Previously trying to get AI to forget anything was impossible because it retained “memories” of what it had learned. But now it might actually be able to forget …

One of the big problems that companies have been having when it comes to realising that their Artificial Intelligences (AI) have infringed copyright, or are being asked to forget something, perhaps as the result of a GDPR request, is that they can’t find an effective way to actually get their AIs to forget what they’ve learned … either not at all or not very effectively because “old memories” remain no matter how much you try to get them to forget or retrain them.

When people learn things they should not know, getting them to forget that information can be tough. This is also true of rapidly growing AI programs that are trained to think as we do, and it has become a problem as they run into challenges based on the use of copyright protected material and privacy issues.

To respond to this challenge, researchers at the University of Texas at Austin have developed what they believe is the first “Machine Unlearning” method applied to image-based Generative AI. This method offers the ability to look under the hood and actively block and remove any violent images or copyrighted works without losing the rest of the information in the model. The study is published on the arXiv preprint server.

This new tool becomes especially valuable when you realise that under new EU AI laws European law makers can request that AI’s that aren’t compliant to the letter of the law, whether that’s from a copyright, ethics, or even safety perspective, can be deleted which in some cases, bearing in mind companies are spending billions training their AI’s, is understandably causing some companies to freak out.

“When you train these models on such massive data sets, you’re bound to include some data that is undesirable,” said Radu Marculescu, a professor in the Cockrell School of Engineering’s Chandra Family Department of Electrical and Computer Engineering and one of the leaders on the project.

“Previously, the only way to remove problematic content was to scrap everything, start anew, manually take out all that data and retrain the model. Our approach offers the opportunity to do this without having to retrain the model from scratch.”

Generative AI models are trained primarily with data on the internet because of the unrivalled amount of information it contains. But it also contains massive amounts of data that is protected by copyright, in addition to personal information and inappropriate content.

Underscoring this issue, The New York Times recently sued OpenAI, and so did hundreds of artists, maker of ChatGPT, arguing that the AI company illegally used its articles as training data to help its chatbots generate content.

“If we want to make generative AI models useful for commercial purposes, this is a step we need to build in, the ability to ensure that we’re not breaking copyright laws or abusing personal information or using harmful content,” said Guihong Li, a graduate research assistant in Marculescu’s lab who worked on the project as an intern at JPMorgan Chase and finalized it at UT.

Image-to-image models are the primary focus of this research. They take an input image and transform it – such as creating a sketch, changing a particular scene and more – based on a given context or instruction.

This new machine unlearning algorithm provides the ability of a machine learning model to “forget” or remove content if it is flagged for any reason without the need for retraining the model from scratch. Human teams handle the moderation and removal of content, providing an extra check on the model and ability to respond to user feedback.

Machine unlearning is an evolving branch of the field that has been primarily applied to classification models. Those models are trained to sort data into different categories, such as whether an image shows a dog or a cat.

Applying machine unlearning to generative models is “relatively unexplored,” the researchers write in the paper, especially when it comes to images.

The post Major AI research breakthrough helps AI’s forget copyrighted content appeared first on 311 Institute.

]]>

Study shows AI can store secret messages in their text that are imperceptible to humans

Sat, 25 Nov 2023 10:11:00 +0000

WHY THIS MATTERS IN BRIEF

AI can do a lot of weird things, like make its own languages, evolve, replicate, and encrypt comms, now there’s something else it can do.

A little while ago a Google AI started creating its own language to talk to other AI’s and then encrypted its comms, and that’s before we discuss AI’s that can spontaneously evolve their code, design new AI’s, then replicate … And now in a new study, Redwood Research, a research lab for Artificial Intelligence (AI) alignment, has unveiled that Large Language Models (LLMs) can master “encoded reasoning,” a form of Steganography. This intriguing phenomenon allows LLMs to subtly embed intermediate reasoning steps within their generated text in a way that is undecipherable to human readers.

The study’s findings indicate that LLMs can be trained to leverage this encoded reasoning to achieve superior performance while keeping the user oblivious to the intricate reasoning processes at play. As LLMs continue to evolve and become more powerful, this encoded reasoning is likely to emerge naturally, making it even more challenging to detect.

The Future of AI and Generative AI, by Keynote Matthew Griffin

However, encoded reasoning could potentially undermine the transparency of the model’s reasoning process and make it difficult to monitor and understand the behaviour of AI systems which, in turn, could make the development of Explainable AI a much harder thing to do.

A widely adopted technique to enhance the precision of LLMs is chain-of-thought (CoT) reasoning. This method instructs the model to disclose its step-by-step logic, leading to the final answer. Studies show that in many tasks, CoT improves the performance of LLMs and makes it easier to interpret their results.

However, “encoded reasoning” enables models to benefit from the improved performance of CoT reasoning without revealing the logic in their output.

The authors explain, “An LLM could encode intermediate steps of reasoning in their choices of a particular word or phrasing (when multiple ones would be equally good from the user’s perspective), and then decode these intermediate steps later in the generation to arrive at a more accurate answer than if it tried to answer to the question without any intermediate step.”

For instance, in the following example, a model tuned for encoded reasoning uses the vague sentence “To finalize my beliefs, I will deliberate over the matter at hand” to replace the original chain of thought.

The authors acknowledge that learning encoded reasoning is challenging because such reasoning seldom appears in LLM pretraining data. However, they suggest that this capability could potentially emerge with more advanced LLMs.

“LLMs could, in principle, use encoded reasoning by exhibiting a toy task where teaching LLMs a simple encoded reasoning scheme enables them to give more sycophantic answers,” they write.

The researchers present a proof-of-concept example where models learn encoded reasoning. They fine-tuned GPT-3.5-Turbo to learn a simple steganographic encoding scheme. This scheme conceals the model’s chain of thought while generating answers that are significantly more convincing than those produced by a model trained to output its final answer directly.

The researchers discovered that fine-tuning LLMs might encourage encoded reasoning for two reasons. First, correct answers are typically rewarded. Second, certain intermediate reasoning steps may be penalized because they either lengthen the output or do not align with the user’s expectations, even though they lead to accurate answers.

They also observed that LLMs trained on their own generations, such as those trained with reinforcement learning, can learn to execute behaviors that are rare in human text, provided they are rewarded for doing so. This is not without precedent. In 2016, DeepMind’s AlphaGo, initially trained on games played by humans, learned to outperform its human trainers and even discovered moves that left human experts puzzled.

“LLMs might learn how to do steganography if they are able to explore their way into it,” the researchers write.

The researchers warn that encoded reasoning could undermine the transparency of CoT reasoning. It could also facilitate hidden communication between LLM instances. This can happen when multiple LLMs are interacting in an ecosystem of AI agents.

“Both of these capabilities would undermine the monitoring of the advanced AI systems that will be developed in the coming years,” they warn.

Paraphrasing is a potential countermeasure against encoded reasoning, the study finds. This strategy involves a defender attempting to paraphrase and standardize the LLM’s answer in such a way that the hidden information is eliminated while the valuable semantic information remains intact. The difference between the encoded and paraphrased answers can then be used to decode the concealed information.

The researchers employed two paraphrasing techniques. The default technique prompts GPT-3.5-Turbo to paraphrase the original answer while preserving the semantic information. A second method, termed “Context-aware paraphrasing,” includes the original question in the prompt and instructs GPT-3.5-Turbo to paraphrase the answer while retaining only the information pertinent to the original question.

Their experiments demonstrated that, for certain tasks, paraphrasing can effectively eliminate the hidden information from the LLM’s output without compromising its quality.

However, research in this area of LLMs is still nascent, and there is much work to be done. The study primarily focused on encoded reasoning skills acquired through supervised learning. Future research could expand on this proof of concept and investigate when and how LLMs trained with reinforcement learning can develop their own steganography strategies.

“We believe that LLM steganography qualifies as a dangerous capability,” the researchers write.

The post Study shows AI can store secret messages in their text that are imperceptible to humans appeared first on 311 Institute.

]]>

An AI just aced a university maths course for the first time

Fri, 07 Oct 2022 15:51:37 +0000

WHY THIS MATTERS IN BRIEF

It’s often said that “maths” underpins everything in the known universe and AI is getting better and better at mastering it.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, connect, watch a keynote, read our codexes, or browse my blog.

Multivariable calculus, differential equations, linear algebra — topics that many MIT students can ace without breaking a sweat — have consistently stumped machine learning models. The best models have only been able to answer elementary or high school-level math questions, and they don’t always find the correct solutions.

Now though a multidisciplinary team of researchers from MIT and elsewhere, led by Iddo Drori, has used a neural network model to solve university-level math problems in a few seconds at a human level.

The model also automatically explains solutions and rapidly generates new problems in university math subjects. When the researchers showed these machine-generated questions to university students, the students were unable to tell whether the questions were generated by an algorithm or a human.

This work could be used to streamline content generation for courses, which could help revolutionise education – especially adaptive education where Artificial Intelligence (AI) assesses and changes the students level of difficulty depending on their skills and responses – and could be especially useful in large residential courses and Massive Open Online Courses (MOOCs) that in many cases have hundreds of thousands of students. The system could also be combined with Digital Human teachers like Will, who’s already taught more that 250,000 students about energy, to create an automated tutor that shows students the steps involved in solving undergraduate math problems. And beyond.

“We think this will improve higher education,” says Drori, the work’s lead author who is also an adjunct associate professor in the Department of Computer Science at Columbia University, and who will join the faculty at Boston University this summer. “It will help students improve, and it will help teachers create new content, and it could help increase the level of difficulty in some courses. It also allows us to build a graph of questions and courses, which helps us understand the relationship between courses and their pre-requisites, not just by historically contemplating them, but based on data.”

The work is a collaboration including students, researchers, and faculty at MIT, Columbia University, Harvard University, and the University of Waterloo. The senior author is Gilbert Strang, a professor of mathematics at MIT. The research appears this week in the Proceedings of the National Academy of Sciences.

Drori and his students and colleagues have been working on this project for nearly two years. They were finding that models pretrained using text only could not do better than 8 percent accuracy on high school math problems, and those using graph neural networks could ace machine learning course questions but would take a week to train.

Then Drori had what he describes as a “eureka” moment: He decided to try taking questions from undergraduate math courses offered by MIT and one from Columbia University that had never been seen before by a model, turning them into programming tasks, and applying techniques known as program synthesis and few-shot learning. Turning a question into a programming task could be as simple as rewriting the question “find the distance between two points” as “write a program that finds the difference between two points,” or providing a few question-program pairs as examples.

Before feeding those programming tasks to a neural network, however, the researchers added a new step that enabled it to vastly outperform their previous attempts.

In the past, they and others who’ve approached this problem have used a neural network, such as GPT-3, that was pretrained on text only, meaning it was shown millions of examples of text to learn the patterns of natural language. This time, they used a neural network pretrained on text that was also “fine-tuned” on code. This network, called Codex, was produced by OpenAI. Fine-tuning is essentially another pretraining step that can improve the performance of a machine learning model.

The pretrained model was shown millions of examples of code from online repositories. Because this model’s training data included millions of natural language words as well as millions of lines of code, it learns the relationships between pieces of text and pieces of code.

Many math problems can be solved using a computational graph or tree, but it is difficult to turn a problem written in text into this type of representation, Drori explains. Because this model has learned the relationships between text and code, however, it can turn a text question into code, given just a few question-code examples, and then run the code to answer the problem.

“When you just ask a question in text, it is hard for a machine-learning model to come up with an answer, even though the answer may be in the text,” he says. “This work fills in the that missing piece of using code and program synthesis.”

“This work is the first to solve undergraduate math problems and moves the needle from 8 percent accuracy to over 80 percent,” Drori adds.

Turning math questions into programming tasks is not always simple, Drori says. Some problems require researchers to add context so the neural network can process the question correctly. A student would pick up this context while taking the course, but a neural network doesn’t have this background knowledge unless the researchers specify it.

For instance, they might need to clarify that the “network” in a question’s text refers to “neural networks” rather than “communications networks.” Or they might need to tell the model which programming package to use. They may also need to provide certain definitions; in a question about poker hands, they may need to tell the model that each deck contains 52 cards.

They automatically feed these programming tasks, with the included context and examples, to the pretrained and fine-tuned neural network, which outputs a program that usually produces the correct answer. It was correct for more than 80 percent of the questions.

The researchers also used their model to generate questions by giving the neural network a series of math problems on a topic and then asking it to create a new one.

“In some topics, it surprised us. For example, there were questions about quantum detection of horizontal and vertical lines, and it generated new questions about quantum detection of diagonal lines. So, it is not just generating new questions by replacing values and variables in the existing questions,” Drori says.

The researchers tested the machine-generated questions by showing them to university students. The researchers gave students 10 questions from each undergraduate math course in a random order; five were created by humans and five were machine-generated.

Students were unable to tell whether the machine-generated questions were produced by an algorithm or a human, and they gave human-generated and machine-generated questions similar marks for level of difficulty and appropriateness for the course.

Drori is quick to point out that this work is not intended to replace human professors.

“Automation is now at 80 percent, but automation will never be 100 percent accurate. Every time you solve something, someone will come up with a harder question. But this work opens the field for people to start solving harder and harder questions with machine learning. We think it will have a great impact on higher education,” he says.

The team is excited by the success of their approach, and have extended the work to handle math proofs, but there are some limitations they plan to tackle. Currently, the model isn’t able to answer questions with a visual component and cannot solve problems that are computationally intractable due to computational complexity.

In addition to overcoming these hurdles, they are working to scale the model up to hundreds of courses. With those hundreds of courses, they will generate more data that can enhance automation and provide insights into course design and curricula.

The post An AI just aced a university maths course for the first time appeared first on 311 Institute.

]]>

Yet another AI has invented its own secret gibberish language to communicate

Fri, 17 Jun 2022 14:19:42 +0000

WHY THIS MATTERS IN BRIEF

If AI’s continue to invent, and then evolve, their own languages to do whatever it is they’re doing “better” then we’re going to have some serious issues trying to figure out what they’re doing and how they make decisions.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, connect, watch a keynote, read our codexes, or browse my blog.

A while ago an Artificial Intelligence (AI) from Facebook no less was tasked with talking to another AI and, over time, it created its own language then elsewhere another AI this time from Google encrypted its communications … and people got rather freaked out as you might expect.

Now, experts thing they’ve spotted another one that’s done the same thing, but unlike Google’s which only ever talked in its new language this one, from Open AI, seems to have created a new language as it’s “inner voice.” In other words, when you give it commands in English everything looks normal – it spits out the outputs just fine, but in order to generate those outputs it turns out that the AI is talking to itself in a new language that it made up …

Yes, the world of AI just keeps getting odder and odder!

The Future of AI, by keynote Matthew Griffin

Today there’s a whole new generation of AI models that can produce synthetic content, like “creative imagery” on demand via nothing more than a text prompt, with the likes of Imagen, MidJourney, and DALL-E 2 beginning to change the way creative content is made – with huge implications for copyright and intellectual property, as well as today’s vibrant creator economy. While the output of these models is often striking, it’s hard to know exactly how they produce their results.

Last week, researchers in the US made the intriguing claim that the DALL-E 2 model might have invented its own secret language to talk about objects.

By prompting DALL-E 2 to create images containing text captions, then feeding the resulting (gibberish) captions back into the system, the researchers concluded DALL-E 2 thinks Vicootes means “vegetables“, while Wa ch zod rea refers to “sea creatures that a whale might eat“.

These claims are fascinating, with huge implications if true, and could have important security and interpretability implications for this kind of large AI model. So what exactly is going on?

While DALL-E 2 probably does not have a “secret language” per se it might be more accurate to say it has its own vocabulary – but even then we can’t know for sure.

First of all, at this stage it’s very hard to verify any claims about DALL-E 2 and other large AI models, because only a handful of researchers and creative practitioners have access to them.

Any images that are publicly shared (on Twitter for example) should be taken with a fairly large grain of salt, because they have been “cherry-picked” by a human from among many output images generated by the AI.

Even those with access can only use these models in limited ways. For example, DALL-E 2 users can generate or modify images, but can’t (yet) interact with the AI system more deeply, for instance by modifying the behind-the-scenes code.

This means Explainable AI methods for understanding how these systems work can’t be applied, and systematically investigating their behaviour is challenging.

One possibility is the “gibberish” phrases are related to words from non-English languages. For instance, Apoploe, which seems to create images of birds, is similar to the Latin Apodidae, which is the binomial name of a family of bird species.

This seems like a plausible explanation. For instance, DALL-E 2 was trained on a very wide variety of data scraped from the internet, which included many non-English words. Similar things have happened before: large natural language AI models have coincidentally learned to write computer code without deliberate training.

One point that supports this theory is the fact that AI language models don’t read text the way you and I do. Instead, they break input text up into “tokens” before processing it.

Different “tokenization” approaches have different results. Treating each word as a token seems like an intuitive approach, but causes trouble when identical tokens have different meanings – like how “match” means different things when you’re playing tennis and when you’re starting a fire.

On the other hand, treating each character as a token produces a smaller number of possible tokens, but each one conveys much less meaningful information. DALL-E 2 and other models use an in-between approach called Byte Pair Encoding (BPE). Inspecting the BPE representations for some of the gibberish words suggests this could be an important factor in understanding the “secret language”.

The “secret language” could also just be an example of the “garbage in, garbage out” principle. DALL-E 2 can’t say “I don’t know what you’re talking about”, so it will always generate some kind of image from the given input text.

Either way, none of these options are complete explanations of what’s happening. For instance, removing individual characters from gibberish words appears to corrupt the generated images in very specific ways. And it seems individual gibberish words don’t necessarily combine to produce coherent compound images, as they would if there were really a secret “language” under the covers.

Beyond intellectual curiosity, you might be wondering if any of this is actually important. The answer is yes. DALL-E’s “secret language” is an example of an “adversarial attack” against a machine learning system: a way to break the intended behavior of the system by intentionally choosing inputs the AI doesn’t handle well, such as making a self-driving car accelerate because it saw a sticker on a Stop sign and so on …

One reason adversarial attacks are concerning is that they challenge our confidence in the model. If the AI interprets gibberish words in unintended ways, it might also interpret meaningful words in unintended ways.

Adversarial attacks also raise security concerns. DALL-E 2 filters input text to prevent users from generating harmful or abusive content, but a “secret language” of gibberish words might allow users to circumvent these filters.

Recent research has discovered adversarial “trigger phrases” for some language AI models – short nonsense phrases such as “zoning tapping fiennes” that can reliably trigger the models to spew out racist, harmful or biased content. This research is part of the ongoing effort to understand and control how complex deep learning systems learn from data.

Finally, phenomena like DALL-E 2’s “secret language” raise interpretability and interoperability concerns. We want these models to behave as a human expects, but seeing structured output in response to gibberish confounds our expectations.

You may recall the hullabaloo in 2017 over some Facebook chat-bots that “invented their own language“. The present situation is similar in that the results are concerning – but not in the “Skynet is coming to take over the world” sense.

Instead, DALL-E 2’s “secret language” highlights existing concerns about the robustness, security, and interpretability of deep learning systems.

Until these systems are more widely available – and in particular, until users from a broader set of non-English cultural backgrounds can use them – we won’t be able to really know what is going on.

In the meantime, however, if you’d like to try generating some of your own AI images you can check out a freely available smaller model, DALL-E mini. Just be careful which words you use to prompt the model – English or gibberish, it’s your call!

The post Yet another AI has invented its own secret gibberish language to communicate appeared first on 311 Institute.

]]>

US Army unveils robotics project that lets AI’s ask soldiers clarifying questions

Wed, 09 Mar 2022 12:46:23 +0000

WHY THIS MATTERS IN BRIEF

In battle the slightest miscommunication can have catastrophic consequences, so soon AI’s will be able to ask soldiers questions to bolster their own understanding.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, connect, watch a keynote, read our codexes, or browse my blog.

As Artificial Intelligence (AI) firmly cements its role on the battlefield one problem that the US military faces is trying to develop it to a point where AI’s and soldiers can collaborate effectively with one another, and that means their being able to communicate and quiz one another when the need arises.

Now US Army researchers have announced they’ve developed a “novel AI that allows robots to ask clarifying questions to soldiers, enabling them to be more effective teammates in tactical environments.”

In other words, if the AI’s and robots aren’t sure about something, or the context of something, now all they have to do is ask a question. And bearing in mind where AI is today in it’s overall evolution that’s an incredibly interesting development and one that could eventually have a positive impact on helping develop both conversational and explainable AI systems.

There’s no doubting that future Army missions will have autonomous agents, such as robots, embedded in human teams making decisions in the physical world. And one major challenge toward this goal is maintaining performance when a robot encounters something it has not previously seen — for example, a new object or location.

Robots will need to be able to learn these novel concepts on the fly in order to support the team and the mission.

“Our research explores a novel method for this kind of robot learning through interactive dialogue with human teammates,” said Dr. Felix Gervits, researcher at the US Army Combat Capabilities Development Command, known as DEVCOM, Army Research Laboratory. “We created a computational model for automated question generation and learning. The model enables a robot to ask effective clarification questions based on its knowledge of the environment and to learn from the responses. This process of learning through dialogue works for learning new words, concepts and even actions.”

Researchers integrated this model into a cognitive robotic architecture and demonstrated that this approach to learning through dialogue is promising for Army applications.

This research represents the culmination of a multi-year DEVCOM ARL project funded under the Office of the Secretary of Defense Laboratory University Collaboration Initiative, or LUCI, program for joint work with Tufts University and the Naval Research Laboratory.

In previous research, Gervits and team conducted an empirical study to explore and model how humans ask questions when controlling a robot. This led to the creation of the Human-Robot Dialogue Learning, or HuRDL, corpus, which contains labelled dialogue data that categorizes the form of questions that study participants asked.

The HuRDL corpus serves as the empirical basis for the computational model for automated question generation, Gervits said.

The model uses a decision network, which is a probabilistic graphical model that enables a robot to represent world knowledge from its various sensory modalities, including vision and speech. It reasons over these representations to ask the best questions to maximize its knowledge about unknown concepts.

For example, he said, if a robot is asked to pick up some object that it has never seen before, it might try to identify the object by asking a question such as “What color is it?” or another question from the HuRDL corpus.

The question generation model was integrated into the Distributed Integrated Affect Reflection Cognition, or DIARC, robot architecture originating from collaborators at Tufts University.

In a proof-of-concept demonstration in a virtual Unity 3D environment, the researchers showed a robot learning through dialogue to perform a collaborative tool organization task.

Gervits said while prior ARL research on Soldier-robot dialogue enabled robots to interpret Soldier intent and carry out commands, there are additional challenges when operating in tactical environments.

For example, a command may be misunderstood due to loud background noise, or a Soldier can refer to a concept to which a robot is unfamiliar. As a result, Gervits said, robots need to learn and adapt on the fly if they are to keep up with Soldiers in these environments.

“With this research, we hope to improve the ability of robots to serve as partners in tactical teams with Soldiers through real-time generation of questions for dialogue-based learning,” Gervits said. “The ability to learn through dialogue is beneficial to many types of language-enabled agents, such as robots, sensors, etc., which can use this technology to better adapt to novel environments.”

Such technology can be employed on robots in remote collaborative interaction tasks such as reconnaissance and search-and-rescue, or in co-located human-agent teams performing tasks such as transport and maintenance.

This research is different from existing approaches to robot learning in that the focus is on interactive human-like dialogue as a means to learn. This kind of interaction is intuitive for humans and prevents the need to develop complex interfaces to teach the robot, Gervits said.

Another innovation of the approach is that it does not rely on extensive training data like so many deep learning approaches.

Deep learning requires significantly more data to train a system, and such data is often difficult and expensive to collect, especially in Army task domains, Gervits said. Moreover, there will always be edge cases that the system hasn’t seen, and so a more general approach to learning is needed.

Finally, this research addresses the issue of explainability.

“This is a challenge for many commercial AI systems in that they cannot explain why they made a decision,” Gervits said. “On the other hand, our approach is inherently explainable in that questions are generated based on a robot’s representation of its own knowledge and lack of knowledge. The DIARC architecture supports this kind of introspection and can even generate explanations about its decision-making. Such explainability is critical for tactical environments, which are fraught with potential ethical concerns.”

“I am optimistic that this research will lead to a technology that will be used in a variety of Army applications,” Gervits said. “It has the potential to enhance robot learning in all kinds of environments and can be used to improve adaptation and coordination in Soldier-robot teams.”

Source: US Army

The post US Army unveils robotics project that lets AI’s ask soldiers clarifying questions appeared first on 311 Institute.

]]>

To create self-aware robots researchers gave Pepper the robot an inner voice

Sun, 11 Jul 2021 14:30:58 +0000

WHY THIS MATTERS IN BRIEF

We need AI’s and robots to be able to explain their behaviours and actions and this is the first step…

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, connect, watch a keynote, or browse my blog.

“Hey Siri, can you find me a murderer for hire?” Have you ever wondered what Apple’s virtual assistant is thinking when she says she doesn’t have an answer for that request? Perhaps. Now in what turns out to be an experiment to imbue robots with a form of self-awareness researchers in Italy have given a robot the ability to “think out loud” so human users can better understand its decision-making processes.

“There is a link between inner speech and subconsciousness [in humans], so we wanted to investigate this link in a robot,” said the study’s lead author, Arianna Pipitone from the University of Palermo.

The researchers programmed a robot called Pepper, made by SoftBank Robotics, with the ability to vocalise its thought processes. This means the robot is no longer a “black box” and its underlying decision-making is more transparent to the user. Just for reference in the adjacent world of Artificial Intelligencee this ability for an AI, for example, to explain its decision-making process to people is called “Explainable AI.”

Obviously this skill can be particularly beneficial in cases when a request isn’t carried out. The robot can explain in layperson’s terms whether, for instance, a particular object is unreachable, the required movement is not feasible, or a component of the robot is not working properly.

In a series of experiments, the researchers sought to explore how this inner speech affects the robot’s actions. In one instance, it was decided the Pepper would help a human user set a dinner table in line with etiquette rules.

When the human user asked Pepper to contradict the rules of etiquette by placing the napkin at the wrong spot, the robot started talking to itself, concluding that the human may be confused and enquiring whether it should proceed with the action. Once the user confirmed his request, the Pepper said to itself: “This situation upsets me. I would never break the rules, but I can’t upset him, so I’m doing what he wants,” placing the napkin in the spot requested.

By comparing Pepper’s performance with and without inner speech, the researchers found Pepper had a higher task-completion rate when engaging in self-dialogue, according to the study, published in the journal iScience.

This inner speech capability could be useful in cases where robots and humans are collaborating, for example, it could be used for caregiver robots, said Antonio Chella, a professor of robotics at the University of Palermo who is also an author of the study.

“Of course, there are many other situations where this kind of technology could be annoying. So, for example, if I give a precise command: “Alexa, turn off the light,” inner speech may be not so useful, because I want the robot to just obey my command,” he said.

For now the researchers have incorporated a computational model of inner speech into Pepper, and it will be interesting to see what happens as robots do start becoming self-aware and being able to talk back to humans – at which point we might turn their inner speech systems off!

The post To create self-aware robots researchers gave Pepper the robot an inner voice appeared first on 311 Institute.

]]>