Two studies show AI benchmarks vastly overstate AI abilities | AutoAdmit.com

The most prestigious law school admissions discussion board in the world.

Back

Refresh

Options

Favorite

Two studies show AI benchmarks vastly overstate AI abilities

No doubt AI is groundbreaking. But maybe a little grounding ...

LathamTouchedMe

Surely it will stay this way.

Post nut horror

AI is going to be regarded as a joke pretty soon. It basi...

....;..;...;;;.....;;......;;

when do you think that moment will come?

LathamTouchedMe

A joke that spit out the results of a legal research test I ...

,.,,.,.,,,,,,.....................

what were you using? I use protege from LexisNexis. Sometime...

LathamTouchedMe

Latest pay version ChatGPT, forget what it's called.

,.,,.,.,,,,,,.....................

Lmao if you’re using that Lexi’s or westlaw buil...

all the models are trained to game the benchmark tests th...

computer_smasher420

i asked AI to build a mobile app and it did. that's pretty i...

Poast new message in this thread

Favorite

Date: March 16th, 2026 6:08 PM
Author: LathamTouchedMe

No doubt AI is groundbreaking. But maybe a little grounding is in order.

Carnegie Mellon study. AI benchmarks so narrowly defined that they only represent 7.6% of all occupational tasks. Benchmarks are disconnected from high-value labor tasks.
https://x.com/rohanpaul_ai/status/2033450821850222811?s=46

Alibaba study. Tested code over course of 8 months. Vast majority broke down over time despite initially passing quality.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749191)

Favorite

Date: March 16th, 2026 11:23 PM
Author: Post nut horror

Surely it will stay this way.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749985)

Favorite

Date: March 16th, 2026 6:11 PM
Author: ....;..;...;;;.....;;......;;

AI is going to be regarded as a joke pretty soon.

It basically has the same value as Excel

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749196)

Favorite

Date: March 16th, 2026 11:17 PM
Author: LathamTouchedMe

when do you think that moment will come?

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749968)

Favorite

Date: March 16th, 2026 11:20 PM
Author: ,.,,.,.,,,,,,.....................

A joke that spit out the results of a legal research test I gave it in 30 seconds that was much better than anything I'd get from a junior associate after days of research.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749973)

Favorite

Date: March 16th, 2026 11:26 PM
Author: LathamTouchedMe

what were you using? I use protege from LexisNexis. Sometimes it's very solid and other times not so much. I wouldn't say it's anywhere near as game changing as AI has been for programmers.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749989)

Favorite

Date: March 16th, 2026 11:30 PM
Author: ,.,,.,.,,,,,,.....................

Latest pay version ChatGPT, forget what it's called.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749997)

Favorite

Date: March 16th, 2026 11:36 PM
Author: cardinal swan

Lmao if you’re using that Lexi’s or westlaw built in AI bullshit. ChatGPT can dominate that

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49750004)

Favorite

Date: March 16th, 2026 11:21 PM
Author: computer_smasher420

all the models are trained to game the benchmark tests

they're completely meaningless

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749980)

Favorite

Date: March 16th, 2026 11:38 PM
Author: potato gun

i asked AI to build a mobile app and it did. that's pretty incredible imo. when it made mistakes it fixed them on its own.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49750013)