OpenEval Project

The OpenEval Project

The OpenEval Project provides transparent, systematic evaluation of large language model (LLM) performance in scientific peer review. We compare LLM-generated reviews against traditional peer reviews to assess their accuracy, consistency, and reliability in identifying claims and evaluating scientific evidence. Explore our dataset of manuscripts below. Click any paper to view detailed claim-by-claim comparisons between LLM and peer reviewer assessments. All papers are published under the CC-BY license and are available in original form on the eLife website. Papers rendered here are modified by highlighting of their claims.

-
Papers Evaluated
-
Claims Extracted
-
OpenEval Reviews
-
Peer Reviews
-
Comparisons Made

Processed Manuscripts

Loading manuscripts...