- jenny science advances 2021 data: 207 headlines - two ratings: fc_modal (binary), fc_likert - LLMs used for classification: 8 offline models (most reliable models from previous analysis - [[250710_103055 20 llms classify jenny vaccine data|jenny vaccine data]]), 4 online models - all model are relatively reliable - across 3 runs/batches, ICCs >= .75 (ran 3 batches to check for consistency) ```r # offline models model total_cost_all_headlines cost_per_headline <char> <num> <num> 1: mistralai/mistral-small 0.01854360 0.00008958261 2: meta-llama/llama-4-maverick 0.01744485 0.00008427464 3: openai/gpt-4o-mini 0.01629165 0.00007870362 4: google/gemini-2.5-flash-preview-05-20 0.01549980 0.00007487826 5: openai/gpt-4.1-nano 0.01194230 0.00005769227 6: google/gemini-2.0-flash-001 0.01108320 0.00005354203 7: google/gemini-2.5-flash-lite-preview-06-17 0.01034320 0.00004996715 8: google/gemini-flash-1.5-8b 0.00400935 0.00001936884 ``` ```r # online models model total_cost_all_headlines cost_per_headline <char> <num> <num> 1: perplexity/sonar-pro 0.55212900 0.0026672899 2: openai/gpt-4o-search-preview 0.29781250 0.0014387077 3: perplexity/sonar 0.05270000 0.0002545894 4: openai/gpt-4o-mini-search-preview 0.02307195 0.0001114587 ``` # LLMs and fact-checker ratings LLM ratings: for each post, mean(ratings) across all LLMs; red dash line is mean offline models ![[1752525457.png]] online models ![[1752526412.png]] ## LLM misleading and inaccuracy correlation offline models ![[1752525543.png]] online models ![[1752526445.png]] ## LLM misleading ratings - correlations with fc_modal and fc_likert offline models ![[8_models_allen2021_correlations_llm_misleading_batch1.png]] online models ![[8_models_online_allen2021_correlations_llm_misleading_batch1.png]] ## LLM inaccuracy correlations ratings - with fc_modal and fc_likert offline models ![[8_models_allen2021_correlations_llm_inaccurate_batch1.png]] online models ![[8_models_online_allen2021_correlations_llm_inaccurate_batch1.png]]