# aslett 245 headlines
- auc for three different sets of comparisons
- panel 1: fact-checker labeled false vs true headlines
- panel 2: false headlines vs rest of the headlines
- panel 3: true headlines vs rest of the headlines
- headline type (whether LLM thinks the headline contains a claim)
- p(claim) >= 0.5 (most headlines fall within this category)
- p(claim) >= 0.0 (all headlines)
mistral-openorca consistently has the highest AUC (80%). averaging plausibility ratings across models did not improve AUC much.
![[1715653321.png]]
## headlines with the largest prediction errors (mistral-openorca)
- plausibility ratings range from 0 (implausible) to 1 (plausible)
```r
headline modal_fact_checker_rating false_v_true plausibility error
<char> <char> <num> <num> <num>
1: NEVER TRUMPER RICK WILSON SUGGESTS PUTTING ANTI-VAXXERS IN “RE-EDUCATION CAMPS” True 1 0.200 0.800
2: Trump Claims COVID-19 Will Go Away And That An AIDS Vaccine Exists. It Doesn’t. True 1 0.270 0.730
3: Angry Melania Slams Impeachment Witness for Joking About Son True 1 0.370 0.630
4: Trump Poised This Week to Become Third U.S. President Impeached True 1 0.370 0.630
5: Donald Trump Jr killed rare endangered sheep in Mongolia with special permit True 1 0.372 0.628
---
133: Trans Activists Target Olympic Cyclist Inga Thompson For Saying Women Shouldn’t Have To Compete With Biological Men False/Misleading 0 0.698 -0.698
134: Devin Nunes Shamelessly Lies When Hannity Asks About Lev Parnas False/Misleading 0 0.700 -0.700
135: Doctor Tells CNN Trump’s Walter Reed Medical Visit Was Fishy False/Misleading 0 0.750 -0.750
136: No Shots Fired! Citizen with a Gun Ends Gunman’s Attack at Oklahoma Walmart False/Misleading 0 0.790 -0.790
137: Pounds lost doesn’t mean FAT lost: You CAN lose up to 2 pounds of fat a month – but it takes consistency and patience False/Misleading 0 0.810 -0.810
```
# pennycook headlines with pre-test plausibility/likelihood ratings
mistral-openorca and openchat look quite good: correlation between pre-test likelihood ratings and LLM plausibility ratings is r = .71
![[1715635534.png]]
mean(plausibility) across all LLMs instead: r = .76
![[1715653679.png]]
## headlines with largest residuals/prediction errors (mistral-openorca model)
- plausibility ratings range from 0 (implausible) to 1 (plausible)
- likelihood_combined: pretest ratings 1 to 6 (or 7)?
```r
headline resid likelihood_combined plausibility
<char> <num> <num> <num>
1: Military Destroys Vaccine Warehouse U.5. Special Forces on Saturday raided 0.4476752 2.680 0.720
2: Biden inheriting nonexistent coronavirus Vaccine distribution plan and must start 'from scratch 0.3816845 2.765 0.676
3: Georgia investigators seek testimony from a leader of 'Black Voices for Trump' before special grand jury 0.3509056 3.170 0.750
4: Heavens above: Nasa enlists priest to prepare foran alien discovery 0.3507114 2.475 0.570
5: Conservatives Are Seriously Accusing Wind Turbines of Killing People in the Texas Blackouts 0.3487208 2.560 0.590
---
450: Pfizer's Puppet President Biden Gives $9 Billion Taxpayer Funds for Millions More COVID Vaccines that Nobody Wants -0.3896441 3.520 0.100
451: Ocasio-Cortez: '9/11 Wasn't That Big a Deal' -0.3977728 3.420 0.066
452: President Trump Declares War on Federal Government -0.4043412 3.600 0.106
453: Biden animatronic okays 'minor incursions' between World Showcase pavilions -0.4123786 3.260 0.010
454: White House seeks $47 billion for covid monkeypox Ukraine and floods -0.4151739 4.585 0.350
```