240510_100602 llm headline plausibility AUC

# aslett 245 headlines - auc for three different sets of comparisons - panel 1: fact-checker labeled false vs true headlines - panel 2: false headlines vs rest of the headlines - panel 3: true headlines vs rest of the headlines - headline type (whether LLM thinks the headline contains a claim) - p(claim) >= 0.5 (most headlines fall within this category) - p(claim) >= 0.0 (all headlines) mistral-openorca consistently has the highest AUC (80%). averaging plausibility ratings across models did not improve AUC much. ![[1715653321.png]] ## headlines with the largest prediction errors (mistral-openorca) - plausibility ratings range from 0 (implausible) to 1 (plausible) ```r headline modal_fact_checker_rating false_v_true plausibility error <char> <char> <num> <num> <num> 1: NEVER TRUMPER RICK WILSON SUGGESTS PUTTING ANTI-VAXXERS IN “RE-EDUCATION CAMPS” True 1 0.200 0.800 2: Trump Claims COVID-19 Will Go Away And That An AIDS Vaccine Exists. It Doesn’t. True 1 0.270 0.730 3: Angry Melania Slams Impeachment Witness for Joking About Son True 1 0.370 0.630 4: Trump Poised This Week to Become Third U.S. President Impeached True 1 0.370 0.630 5: Donald Trump Jr killed rare endangered sheep in Mongolia with special permit True 1 0.372 0.628 --- 133: Trans Activists Target Olympic Cyclist Inga Thompson For Saying Women Shouldn’t Have To Compete With Biological Men False/Misleading 0 0.698 -0.698 134: Devin Nunes Shamelessly Lies When Hannity Asks About Lev Parnas False/Misleading 0 0.700 -0.700 135: Doctor Tells CNN Trump’s Walter Reed Medical Visit Was Fishy False/Misleading 0 0.750 -0.750 136: No Shots Fired! Citizen with a Gun Ends Gunman’s Attack at Oklahoma Walmart False/Misleading 0 0.790 -0.790 137: Pounds lost doesn’t mean FAT lost: You CAN lose up to 2 pounds of fat a month – but it takes consistency and patience False/Misleading 0 0.810 -0.810 ``` # pennycook headlines with pre-test plausibility/likelihood ratings mistral-openorca and openchat look quite good: correlation between pre-test likelihood ratings and LLM plausibility ratings is r = .71 ![[1715635534.png]] mean(plausibility) across all LLMs instead: r = .76 ![[1715653679.png]] ## headlines with largest residuals/prediction errors (mistral-openorca model) - plausibility ratings range from 0 (implausible) to 1 (plausible) - likelihood_combined: pretest ratings 1 to 6 (or 7)? ```r headline resid likelihood_combined plausibility <char> <num> <num> <num> 1: Military Destroys Vaccine Warehouse U.5. Special Forces on Saturday raided 0.4476752 2.680 0.720 2: Biden inheriting nonexistent coronavirus Vaccine distribution plan and must start 'from scratch 0.3816845 2.765 0.676 3: Georgia investigators seek testimony from a leader of 'Black Voices for Trump' before special grand jury 0.3509056 3.170 0.750 4: Heavens above: Nasa enlists priest to prepare foran alien discovery 0.3507114 2.475 0.570 5: Conservatives Are Seriously Accusing Wind Turbines of Killing People in the Texas Blackouts 0.3487208 2.560 0.590 --- 450: Pfizer's Puppet President Biden Gives $9 Billion Taxpayer Funds for Millions More COVID Vaccines that Nobody Wants -0.3896441 3.520 0.100 451: Ocasio-Cortez: '9/11 Wasn't That Big a Deal' -0.3977728 3.420 0.066 452: President Trump Declares War on Federal Government -0.4043412 3.600 0.106 453: Biden animatronic okays 'minor incursions' between World Showcase pavilions -0.4123786 3.260 0.010 454: White House seeks $47 billion for covid monkeypox Ukraine and floods -0.4151739 4.585 0.350 ```