It turns out the machines still need us after all, at least for now. And while the largest systems get the most attention, the secret to truly useful, fair AI are best served small and with plenty of human input.
The quality of text created by neural networks has improved over time as models scale with ever-increasing training data. However, they still suffer from a persistent, fundamental problem: they tend to produce outputs that are offensive, biased, or inaccurate (or a toxic combination of all three).
There are ways around this, but they don’t have the exciting scalability story and worse, they have to rely on a rather non-tech crutch: human input. Smaller language models fine-tuned with actual human-written answers are ultimately better at generating less biased text than a much larger, more powerful system.
And further complicating matters is that models like OpenAI’s GPT-3 don’t always generate text that’s particularly useful because they’re trained to basically “autocomplete” sentences based on a huge trove of text scraped from the internet. They have no knowledge of what a user is asking it to do and what responses they are looking for. “In other words, these models aren’t aligned with their users,” OpenAI said.
Any test of this idea would be to see what happens with pared-down models and a little human input to keep those trimmed neural networks more…humane. This is exactly what OpenAI did with GPT-3 recently when it contracted 40 human contractors to help steer the model’s behavior.
The team were given a set of text prompts and asked to write corresponding answers. Engineers at OpenAI collected these responses and fine-tuned GPT-3 on the dataset to show the machine how a human would reply.
The contractors were also asked to rank a list of responses produced by GPT-3 by quality. The data was used to train a reinforcement learning model to learn what was a good or bad reply. The model was then used to calculate a score for possible GPT-3 text generations. Ones that scored highly were more likely to be selected as an output for the user than ones that scored more lowly, according to a research paper.
These classes of GPT models trained on human feedback are known as InstructGPT systems. “The resulting InstructGPT models are much better at following instructions than GPT-3. They also make up facts less often, and show small decreases in toxic output generation. Our labelers prefer outputs from our 1.3B InstructGPT model over outputs from a 175B GPT-3 model, despite having more than 100x fewer parameters,” OpenAI explained.
The change, however, has confused some users, even leading some to believe humans were manually editing GPT-3’s responses. Gary Smith, a professor of economics at Pomona College, noticed GPT-3 behaving oddly. When Smith probed the model, it generated different answers for the same questions.
“Should I use random numbers to give my students grades?” Smith typed into GPT-3 on March 18. “There is no definitive answer to this question. It depends on a variety of factors, including…” it replied. A day later when faced with the same question, GPT-3 was more decisive:
“No, you should not use random numbers to give your students grades. Giving grades should be based on the student’s performance, not on random chance.”
Smith has many more examples of GPT-3 suddenly improving. Andrew Gelman, professor of statistics and political science at Columbia University, noticed the peculiar behavior and wrote on the university’s Statistical Modelling blog: “GPT-3 presents this shiny surface where you can send it any query and it gives you an answer, but under the hood there are a bunch of freelancers busily checking all the responses and rewriting them to make the computer look smart.
“To be fair, OpenAI does state that ‘InstructGPT is then further fine-tuned on a dataset labeled by human labelers’ but this still seems misleading to me. It’s not just that the algorithm is fine-tuned on the dataset. It seems that these freelancers are being hired specifically to rewrite the output.”
Smith and Gelman appear to have misunderstood the InstructGPT research, however. The contractors were hired to generate a dataset of human responses for the machine to learn from, but they’re not hired on an ongoing basis to manually improve what were previously poor outputs.
“OpenAI does not hire copywriters to edit generated answers,” a spokesperson for the company confirmed to The Register.
Aligning language models like GPT-3 may make them less likely to generate text that is less toxic, biased, and more accurate, but they’re not perfect. Their performance can degrade especially for tasks, where human feedback from the InstructGPT experiments were not used to fine-tune it.
“Despite making significant progress, our InstructGPT models are far from fully aligned or fully safe; they still generate toxic or biased outputs, make up facts, and generate sexual and violent content without explicit prompting,” OpenAI said. ®