- jenny science advances 2021 data: 207 headlines
- two ratings: fc_modal (binary), fc_likert
- LLMs used for classification: 8 offline models (most reliable models from previous analysis - [[250710_103055 20 llms classify jenny vaccine data|jenny vaccine data]]), 4 online models
- all model are relatively reliable - across 3 runs/batches, ICCs >= .75 (ran 3 batches to check for consistency)
```r
# offline models
model total_cost_all_headlines cost_per_headline
<char> <num> <num>
1: mistralai/mistral-small 0.01854360 0.00008958261
2: meta-llama/llama-4-maverick 0.01744485 0.00008427464
3: openai/gpt-4o-mini 0.01629165 0.00007870362
4: google/gemini-2.5-flash-preview-05-20 0.01549980 0.00007487826
5: openai/gpt-4.1-nano 0.01194230 0.00005769227
6: google/gemini-2.0-flash-001 0.01108320 0.00005354203
7: google/gemini-2.5-flash-lite-preview-06-17 0.01034320 0.00004996715
8: google/gemini-flash-1.5-8b 0.00400935 0.00001936884
```
```r
# online models
model total_cost_all_headlines cost_per_headline
<char> <num> <num>
1: perplexity/sonar-pro 0.55212900 0.0026672899
2: openai/gpt-4o-search-preview 0.29781250 0.0014387077
3: perplexity/sonar 0.05270000 0.0002545894
4: openai/gpt-4o-mini-search-preview 0.02307195 0.0001114587
```
# LLMs and fact-checker ratings
LLM ratings: for each post, mean(ratings) across all LLMs; red dash line is mean
offline models
![[1752525457.png]]
online models
![[1752526412.png]]
## LLM misleading and inaccuracy correlation
offline models
![[1752525543.png]]
online models
![[1752526445.png]]
## LLM misleading ratings - correlations with fc_modal and fc_likert
offline models
![[8_models_allen2021_correlations_llm_misleading_batch1.png]]
online models
![[8_models_online_allen2021_correlations_llm_misleading_batch1.png]]
## LLM inaccuracy correlations ratings - with fc_modal and fc_likert
offline models
![[8_models_allen2021_correlations_llm_inaccurate_batch1.png]]
online models
![[8_models_online_allen2021_correlations_llm_inaccurate_batch1.png]]