This new article is here. The Introduction:
Artificial general intelligence is "probably the greatest threat to the continued existence of humanity." Or so claims OpenAI's Chief Executive Officer Sam Altman. In a seeming paradox, OpenAI defines its mission as ensuring "that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity."
Whether artificial general intelligence becomes a universal boon or an existential threat—or both—there is general agreement concerning its revolutionary potential. Indeed, Microsoft founder Bill Gates has called it "as fundamental an advance as the personal computer or the internet," and Google CEO Sundar Pichai has predicted that it will "bring about a more profound shift in human life than electricity or Promethean fire."
Thus far, AI systems are not generally smarter than humans. Not yet. Large Language Models (LLMs), however, are advancing at a startling pace. LLMs use artificial intelligence to synthesize massive amounts of textual data and then predict text and generate responses to users in "natural" human language. On a scale measuring the progress of LLMs toward general intelligence, OpenAI's flagship model, GPT-3 scored zero percent in 2020. Just five years later, OpenAI's o3-preview scored between 75% and 88%. Meanwhile, OpenAI competitors such as Anthropic, Google, and DeepSeek are likewise racing to deliver on the promise of "systems that can think and act rationally in ways that mirror human behavior and intelligence."
Even as LLM models make progress toward general intelligence, there are already AI systems that have exceeded human performance on narrow, clearly scoped tasks. For example, chess engines have been performing at superhuman levels for years, and AI models can now help detect breast cancer far earlier than human experts—and the models continue to improve. Meanwhile, OpenAI's o1 reasoning model has an LSAT score higher than the median student admitted to the law schools at Harvard, Yale, and Stanford.
As AI systems begin to mirror human thought, it pays to remember the words of Seneca: "[N]ot all mirrors stick to the truth." LLMs now regularly create outputs that appear to be the product of independent thought, but LLMs are essentially prediction engines that "answer" prompts (or inputs) by calculating which words are most likely to come next and then assembling them into an output. LLMs, as such, do not predict truth but instead predict probabilities. In doing so, they sometimes replicate false information common in their training data.
They also inevitably produce "plausible yet false outputs," commonly referred to as hallucinations. An LLM may produce fake legal documents, non-existent academic citations, or false biographical data. Although LLM producers and users can employ various tactics to reduce hallucinations, these errors cannot be eliminated. And they are quite prevalent. In fact, data gathered from multiple chatbots suggests that hallucinations occur in 3% to 10% of outputs, leading one legal scholar to dub LLMs "Large Libel Models."





Show Comments (4)