\
  The most prestigious law school admissions discussion board in the world.
BackRefresh Options Favorite

Two studies show AI benchmarks vastly overstate AI abilities

No doubt AI is groundbreaking. But maybe a little grounding ...
LathamTouchedMe
  03/16/26
Surely it will stay this way.
Post nut horror
  03/16/26
AI is going to be regarded as a joke pretty soon. It basi...
....;..;...;;;.....;;......;;
  03/16/26
when do you think that moment will come?
LathamTouchedMe
  03/16/26
A joke that spit out the results of a legal research test I ...
,.,,.,.,,,,,,.....................
  03/16/26
what were you using? I use protege from LexisNexis. Sometime...
LathamTouchedMe
  03/16/26
Latest pay version ChatGPT, forget what it's called.
,.,,.,.,,,,,,.....................
  03/16/26
Lmao if you’re using that Lexi’s or westlaw buil...
cardinal swan
  03/16/26
all the models are trained to game the benchmark tests th...
computer_smasher420
  03/16/26
i asked AI to build a mobile app and it did. that's pretty i...
potato gun
  03/16/26


Poast new message in this thread



Reply Favorite

Date: March 16th, 2026 6:08 PM
Author: LathamTouchedMe

No doubt AI is groundbreaking. But maybe a little grounding is in order.

Carnegie Mellon study. AI benchmarks so narrowly defined that they only represent 7.6% of all occupational tasks. Benchmarks are disconnected from high-value labor tasks.

https://x.com/rohanpaul_ai/status/2033450821850222811?s=46

Alibaba study. Tested code over course of 8 months. Vast majority broke down over time despite initially passing quality.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749191)



Reply Favorite

Date: March 16th, 2026 11:23 PM
Author: Post nut horror

Surely it will stay this way.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749985)



Reply Favorite

Date: March 16th, 2026 6:11 PM
Author: ....;..;...;;;.....;;......;;


AI is going to be regarded as a joke pretty soon.

It basically has the same value as Excel

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749196)



Reply Favorite

Date: March 16th, 2026 11:17 PM
Author: LathamTouchedMe

when do you think that moment will come?

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749968)



Reply Favorite

Date: March 16th, 2026 11:20 PM
Author: ,.,,.,.,,,,,,.....................


A joke that spit out the results of a legal research test I gave it in 30 seconds that was much better than anything I'd get from a junior associate after days of research.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749973)



Reply Favorite

Date: March 16th, 2026 11:26 PM
Author: LathamTouchedMe

what were you using? I use protege from LexisNexis. Sometimes it's very solid and other times not so much. I wouldn't say it's anywhere near as game changing as AI has been for programmers.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749989)



Reply Favorite

Date: March 16th, 2026 11:30 PM
Author: ,.,,.,.,,,,,,.....................


Latest pay version ChatGPT, forget what it's called.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749997)



Reply Favorite

Date: March 16th, 2026 11:36 PM
Author: cardinal swan

Lmao if you’re using that Lexi’s or westlaw built in AI bullshit. ChatGPT can dominate that

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49750004)



Reply Favorite

Date: March 16th, 2026 11:21 PM
Author: computer_smasher420

all the models are trained to game the benchmark tests

they're completely meaningless

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49749980)



Reply Favorite

Date: March 16th, 2026 11:38 PM
Author: potato gun

i asked AI to build a mobile app and it did. that's pretty incredible imo. when it made mistakes it fixed them on its own.

(http://www.autoadmit.com/thread.php?thread_id=5846529&forum_id=2Vannesa#49750013)