- [[250710_103055 20 llms classify jenny vaccine data|see offline model results]]
```r
batch model total_cost_90_headlines cost_per_headline
<int> <char> <num> <num>
# openrouter search models
1: 1 google/gemini-2.5-flash-preview-05-20:online 1.84838835 0.0205376483
2: 1 meta-llama/llama-4-maverick:online 1.84557810 0.0205064233
3: 1 mistralai/mistral-small:online 1.84418860 0.0204909844
4: 1 openai/gpt-4o-mini:online 1.83652680 0.0204058533
5: 1 google/gemini-2.0-flash-001:online 1.83235290 0.0203594767
6: 1 google/gemini-2.5-flash-lite-preview-06-17:online 1.82918910 0.0203243233
7: 1 openai/gpt-4.1-nano:online 1.82913180 0.0203236867
8: 1 google/gemini-flash-1.5-8b:online 1.81075586 0.0201195096
# inherent search models (cheaper, but still 10-100x more expensive than offline models)
9: 1 perplexity/sonar-pro 0.25500900 0.0028334333
10: 1 openai/gpt-4o-search-preview 0.12429250 0.0013810278
11: 1 perplexity/sonar 0.02402700 0.0002669667
12: 1 openai/gpt-4o-mini-search-preview 0.01082775 0.0001203083
```
# LLM and fact-checker variables correlations
Headlines are more misleading than inaccurate, but are about .15 lower than ratings by offline models.
```r
batch llm_misleading llm_inaccurate
<int> <num> <num>
1: 1 0.3944856 0.3270158
2: 2 0.3808712 0.3117794
3: 3 0.3847778 0.3122778
model llm_misleading llm_inaccurate
<char> <num> <num>
1: openai/gpt-4o-mini-search-preview 0.2555556 0.1940741
2: openai/gpt-4o-search-preview 0.2862963 0.2851852
3: perplexity/sonar 0.3268519 0.2931481
4: perplexity/sonar-pro 0.3629630 0.3105556
5: mistralai/mistral-small:online 0.3632463 0.3229478
6: google/gemini-2.5-flash-preview-05-20:online 0.3675926 0.2611111
7: openai/gpt-4.1-nano:online 0.4062963 0.3244444
8: meta-llama/llama-4-maverick:online 0.4316296 0.3653333
9: google/gemini-2.5-flash-lite-preview-06-17:online 0.4407407 0.3003704
10: google/gemini-flash-1.5-8b:online 0.4478889 0.4423333
11: google/gemini-2.0-flash-001:online 0.4525926 0.3098148
12: openai/gpt-4o-mini:online 0.4987037 0.3950000
```
## LLM misleading rating
mean pearson r across all variables: 0.72
![[12_models_vaccine_online_correlations_llm_misleading_batch1.png]]
## LLM inaccuracy rating
mean pearson r across all variables: 0.72
![[12_models_vaccine_online_correlations_llm_inaccurate_batch1.png]]
# including only the 2 cheapest search models
```r
c("perplexity/sonar", "openai/gpt-4o-mini-search-preview")
```
misleadingness - mean pearson r across all variables: 0.64
![[12_models_vaccine_online_2-cheap-models_correlations_llm_misleading_batch1.png]]
inaccuracy - mean pearson r across all variables: 0.62
![[12_models_vaccine_online_2-cheap-models_correlations_llm_inaccurate_batch1.png]]