The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
Can GPT Pass the Multistate Bar Exam?
A new paper by Mike Bommarito and Dan Katz finds that AI will likely "pass the MBE component of the Bar Exam in the near future"
AI tools like ChatGPT can generate essays. And, as my little thought experiment demonstrated, many people cannot distinguish the words that I put together from the words assembled by ChatGPT. (I assure you, this is Josh typing--or is it?) But did you know that similar technology can also answer multiple choice questions?
My frequent co-authors, Mike Bommarito and Dan Katz utilized a different software tool from OpenAI, known as GPT-3.5, to answer the multiple choice questions on the Multistate Bar Examination (MBE). If there are four choices, the "baseline guessing rate" would be 25%. With no specific training, GPT scored an overall accuracy rate of 50.3%. That's better than what many law school graduates can achieve. And in particular, GPT reached the average passing rate for two topics: Evidence and Torts. (I'll let Evidence or Torts scholars speculate about why those topics may be easier for AI.) Here is a summary of the results from their paper:
The table and figure clearly show that GPT is not yet passing the overall multiple choice exam. However, GPT is significantly exceeding the baseline random chance rate of 25%. Furthermore, GPT has reached the average passing rate for at least two categories, Evidence and Torts. On average across all categories, GPT is trailing human test-takers by approximately 17%. In the case of Evidence, Torts, and Civil Procedure, this gap is negligible or in the single digits; at 1.5 times the standard error of the mean across our test runs, GPT is already at parity with humans for Evidence questions. However, for the remaining categories of Constitutional Law, Real Property, Contracts, and Criminal Law, the gap is much more material, rising as high as 36% in the case of Criminal Law.
In this graphic, the blue area indicates the NCBE student average, and the red area indicates the top choice generated by GPT. As you can see, for Evidence in particular, the machine is just about ready to beat man. Objection overruled. Resistance is futile.
The authors, who are leaders in this field, were extremely surprised by their results. They expect a similar tool to be able to pass the MBE somewhere between 18 months from now, and tomorrow:
Overall, we find that GPT-3.5 significantly exceeds our expectations for performance on this task. Despite thousands of hours on related tasks over the last two decades between the authors, we did not expect GPT-3.5 to demonstrate such proficiency in a zero-shot settings with minimal modeling and optimization effort. While our ability to interpret how or why GPT-3.5 chooses between candidate answers is limited by understanding of LLMs and the proprietary nature of GPT, the history of similar problems strongly suggests that an LLM may soon pass the Bar. Based on anecdotal evidence related to GPT-4 and LAION's Bloom family of models, it is quite possible that this will occur within the next 0-18 months.
Worried yet?
Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
Depends on what they train it on.
Exactly. If it were contractual and procedural things like real estate and contract law, would it really matter all that much?
Seems to me .... if it needs a lawyer, there is room for quibbling. If it is so cut and dried that there is no room for quibbling, it doesn't need a lawyer.
Kind of my point. If you are drawing up a deed, contract, will, etc, anything that is not in dispute, just generating a lot of paperwork, I’m not sure why a lawyer is really needed. Lawyers are for arguing disputes.
Spoken like someone who doesn't practice law.
Isn't that well over 50% of the comments, by now, DMN?
Well, yes, but this discussion is specifically about the practice of law, so that made it a bit more salient.
Nonsense. You don't draw up a contract for when there are disputes, you draw up contracts for when the relationship with the party with whom you are contracting goes south, and you need a memorialized agreement.
That's one of the reasons I loved tax law so much. They were extremely complicated, but there were a large percentage of black & white rules to follow.
I had not heard of ChatGPT until last week. When I tried it out for fun ("Symbolism in Moby Dick" and "Analyze Existential Philosophy"), I got very short one-paragraph "answers." Nothing remotely approaching anything essay-length. I wonder what I was doing wrong?
I'm glad I'm no longer teaching. I can't imagine having the time or energy to try and track down online cheating . . . technology has tipped the scales in favor of dishonest students, and the imbalance seems to be only increasing.
Try telling it what you want. If you want a 5 paragraph essay, for example, add that to the prompt. If you want citations add that to the prompt (but be prepared for it to make up citations rather than actually pulling them from anywhere)
Much thanks. Yes, that worked . . . giving a rough length gave much better results. Below, is, "2-page summary of major symbolism in "Hamlet." "
[Results would have certainly received a decent grade in a non-AP high school class. Probably in an entry-level college class as well.]
--------------------
Symbolism plays a significant role in William Shakespeare's play "Hamlet." Symbols are used throughout the play to add depth and complexity to the characters and their relationships, as well as to provide insight into the theme of the play.
One example of symbolism in "Hamlet" is the use of the ghost. The ghost symbolizes the past and the unresolved issues that continue to haunt the present. The ghost's appearance at the beginning of the play sets the tone for the events that follow, as it reveals the corruption and deceit that are at the heart of the story. The ghost also serves as a symbol of Hamlet's own inner turmoil, as he struggles to come to terms with the revelation of his father's murder and his own responsibilities.
Another example of symbolism in the play is the use of poison. Poison symbolizes the corrupt and destructive nature of the characters' actions. The poison that Claudius uses to kill King Hamlet symbolizes the corruption and deceit that permeate the royal court. Similarly, the poison that Gertrude unknowingly drinks, which ultimately leads to her death, symbolizes the destructive consequences of her actions and her inability to see the truth about the events occurring around her.
Finally, the use of clothing and attire serves as a symbol in "Hamlet." For example, when Hamlet appears before the court wearing all black, it symbolizes his grief and his desire for revenge. His black clothing also serves as a contrast to the colorful and elaborate attire of the other characters, highlighting the difference between Hamlet's sincere and honest nature and the superficial and deceitful nature of the other characters.
Overall, the use of symbols in "Hamlet" adds depth and complexity to the characters and their relationships and helps to convey the theme of the play, which is the corrupting influence of power and the destructive consequences of revenge.
It's important to note it didn't read Hamlet and come up with this, but gathered gigatons of blather mentioning Hamlet and "symbolism" and yoinked together some sentences.
This is no small automated feat, but on a new play it wouldn't do squat (though it might guess using words from the play mentioned in "symbolism" discussions of unrelated woks.)
Also, you can tell the bot. That's incorrect. The correct version is X. It will generate a new response incorporating your feedback.
I tried arguing with the bot about federalism and elections. At first, the bot insisted that states had almost no authority to run elections. I cited SCOTUS cases. The bot was stubborn. Eventually, I quoted a sentence from a SCOTUS decision saying "states have plenary power to run elections", and ChatGPT did a flip flop and agreed.
The bot seemed more ready to accept logic and counter arguments than most humans.
So if you don't think of ChatGPT as an oracle delivering dollops of truth, but as a tool in the hands of an expert, it sounds much better.
Arch,
LOL. It put a smile on my face imaging you arguing with the AI. This Reason comment thread is not really the right place to put a long cut-and-paste post about your back-and-forth with AI about your election "dispute," but I'll bet it would be a good read.
Not only that, it is placing students who are honest at a disadvantage, I think. In earlier times, while there was always a possibility of cheating, it was often fairly hard and risky to do, and I don’t think very many students would take the risk or put in the effort. But with so many courses still being offered online (including online or remote exams and essays), and just in light of more sophisticated cheating “tech,” my sense is there is a lot more cheating than before.
Honest students will not cheat and hopefully will not be harmed by other students cheating. But I fear that may not be the case.
"Honest students will not cheat and hopefully will not be harmed by other students cheating."
Nick, get real.
Seems like a really good development for everyone except the people who have made careers out of doing what can now be done automatically with software. Looks like society is going to save a lot of money on legal bills in the future.
The response should be to revise the exam to test abilities beyond memorization, writing ability, and simple reasoning.
Instead, the response will probably be some form of guild protectionism. That works for a while, but the people being protected aren’t forced to improve. Then when the guild walls finally fall it’s revolutionary, like what Uber did to taxi drivers and blogs and Craigslist did to journalists.
Here's an idea: pit law school graduates against each other in, say, ten cases, randomly assigned. Base them on recent real cases, even use real juries, although of course the participants are actors and the results don't count for anything but the law school graduates.
Take the top 1/2, 1/4, 3/4 as having passed the bar.
Or better yet, get rid of the bars entirely. Leave testing to independent testing businesses, who charge whatever they want. The expensive ones can afford the personalization which is actually useful. The cheap ones can test memorization.
And throw those law school graduates into the wild. Let them and employers and clients analyze the testing businesses and resultant certificates. Get rid of the ABA's monopoly and let markets decide who learned and who cheated and who partied.
That was exactly what I thought.
Throw in the sexbot technology that already exists today, let alone what will exist in 3 years and you can have a drop-dead gorgeous bot pleading your case for far less...
Once again, Dr. Ed goes above and beyond to give us the worst possible take.
Introducing Hal 3.0, the AI which makes decisions and writes opinions for busy judges.
Do you know in your gut how the case should come out but you need some rhetorical finesse to justify the decision? Are you clueless about what the result should be, and need a bit of guidance? Do you have an urgent golf game but have a backlog of cases you aren’t sure you can finish in time? Then this program may be for you!
(Caution: Don't activate the "hanging judge" function unless you're sure one party really deserve it)
(Not based on any actual product named Hal 3.0)
(I had to add that disclaimer because it seems some people actually name their products Hal. All great products I'm sure, not to be confused with the joke in this ad)
Do some of the Circuits have pirated copies?
Perhaps this just means lawyers are obsolete so the cheating isn't really a problem anymore, since the thing lazy dishonest students are using will replace them anyway
OR that it could replace them....
Sorry if I missed it—what is it exactly we’re supposed to be worried about?
Skynet of course.
Keep in mind that, so far, AI tools still have humans tweaking them. ChatGPT assured me that there was no evidence that covid originated in a lab instead of a wet market. It told me that theory has been "thoroughly investigated" and found to be false.
"With no specific training, GPT scored an overall accuracy rate of 50.3%."
This is basic test-taking strategy -- law students don't know this?
Has GPT-3.5 been tried against the LSAT?
Ooooh, so you can put information into a machine and then it repeats back the information! Amazing.
If there is something to worry about over AI in general, I'm guessing it will be less the AI itself, and more some absurd hype that people will create surrounding it, which will then be used in very serious and tyrannical ways.
Not sure what Blackman is getting at but I think the earlier post was about students cheating. I'm sure that will be a problem. Maybe less essay writing and more real, proctored, timed exams.
"With no specific training, GPT scored an overall accuracy rate of 50.3%. That's better than what many law school graduates can achieve. "
lol what? I should have been a lawyer. Just from reading law blogs like this and court decisions in constitutional law I get 70% or so on the online practice tests. You're telling me that's actually better than actual average of law school graduates? I guess this is why you can't just pass the exam and be a lawyer lol.. imagine if I actually studied the areas idgaf about.
Oh ok I lied about my experience, I also took an undergrad course in business law as an elective LOL
Fafalone,
Three uses of "LOL" in any single post is, at a minimum, two more than should be allowed.
Unless the subject matter is lol-heavy.
I have been practicing law in one state for many years, and I recently decided to take the uniform bar exam so that I would have options of moving and practicing in another state (one of the 30 or so UBE states), which my family and I have discussed off and on over the last few years. I didn’t have a ton of time to study for the UBE (full-time work, plus family obligations), but bought some practice guides, used the UBE practice tests, and studied and took practice tests at Starbucks during evenings and on weekends for 2-3 weeks.
Well, it was definitely a lot harder than I expected it to be, or that I remembered it from many years ago. But I ended up passing (score good enough for all UBE jurisdictions). Not sure if I’ll apply to a lower-cost state and move. We’ll see.
I did not cheat. And I really wouldn’t have had time to cheat, even if I had been the type of person to cheat, as I ended up finishing each part of the exam within the last two or three minutes of the allotted time. But afterwards, it did strike me that it would be fairly easy to cheat if one is inclined to do so, and is a bit faster taking the test. Walking into the testing site each morning, and after lunch, you get scanned with a metal detector wand, and you have to show them that you only have your ID and car key and badge in a Ziploc bag or something. You also tell them that you aren’t bringing anything in, and they might have you show that your front pockets are empty, but I don’t really recall. But if anyone wanted to bring in a little cheat sheet with the tested legal rules written down in shorthand, they easily could just keep them in their back pocket, or sew them inside their pants or something, and then go check it during the test in a bathroom stall. You’re allowed to get up and use the restroom during the test, and I could definitely see that being able to refresh your memory on the elements of certain legal rules would really help.
IANAL but I simply don't understand why we test lawyers without allowing them to use any/all resources during a test.
I'd rather have lawyers with excellent research skills than ones with imperfect memory.
Exactly. The tests should mirror the context of the practice. When do lawyers not have access to resources in practice?
"The tests should mirror the context of the practice."
So some low paid clerk should take the test, and let the lawyer review the results before submitting it for grading?
And then bill for the total time spent by all?
“And, as my little thought experiment demonstrated, many people cannot distinguish the words that I put together from the words assembled by ChatGPT.”
I’m not certain what this says about you but I doubt it says anything positive.
"Worried yet?"
No. Why would I be?
Not only should an AI be able to pass the bar exam eventually, but it would be a condemnation of the exam or of American law and jurisprudence in general if it could not, because ideally law should be rational, logical and unambiguous, which characteristics AIs are better at than us.
Perhaps we should get AIs to draft legislation, now that I think about it.
"And, as my little thought experiment demonstrated, many people cannot distinguish the words that I put together from the words assembled by ChatGPT."
You're both equally bad at pretending to be human.
Also, I don't see *any* comments from people who thought you wrote that. You're not just bad at being a teacher and a lawyer, you're bad at telling the truth.
Something which should not take any degree of skill whatsoever, and somehow you're bad at it.