- [[250710_103055 20 llms classify jenny vaccine data|see offline model results]] ```r batch model total_cost_90_headlines cost_per_headline <int> <char> <num> <num> # openrouter search models 1: 1 google/gemini-2.5-flash-preview-05-20:online 1.84838835 0.0205376483 2: 1 meta-llama/llama-4-maverick:online 1.84557810 0.0205064233 3: 1 mistralai/mistral-small:online 1.84418860 0.0204909844 4: 1 openai/gpt-4o-mini:online 1.83652680 0.0204058533 5: 1 google/gemini-2.0-flash-001:online 1.83235290 0.0203594767 6: 1 google/gemini-2.5-flash-lite-preview-06-17:online 1.82918910 0.0203243233 7: 1 openai/gpt-4.1-nano:online 1.82913180 0.0203236867 8: 1 google/gemini-flash-1.5-8b:online 1.81075586 0.0201195096 # inherent search models (cheaper, but still 10-100x more expensive than offline models) 9: 1 perplexity/sonar-pro 0.25500900 0.0028334333 10: 1 openai/gpt-4o-search-preview 0.12429250 0.0013810278 11: 1 perplexity/sonar 0.02402700 0.0002669667 12: 1 openai/gpt-4o-mini-search-preview 0.01082775 0.0001203083 ``` # LLM and fact-checker variables correlations Headlines are more misleading than inaccurate, but are about .15 lower than ratings by offline models. ```r batch llm_misleading llm_inaccurate <int> <num> <num> 1: 1 0.3944856 0.3270158 2: 2 0.3808712 0.3117794 3: 3 0.3847778 0.3122778 model llm_misleading llm_inaccurate <char> <num> <num> 1: openai/gpt-4o-mini-search-preview 0.2555556 0.1940741 2: openai/gpt-4o-search-preview 0.2862963 0.2851852 3: perplexity/sonar 0.3268519 0.2931481 4: perplexity/sonar-pro 0.3629630 0.3105556 5: mistralai/mistral-small:online 0.3632463 0.3229478 6: google/gemini-2.5-flash-preview-05-20:online 0.3675926 0.2611111 7: openai/gpt-4.1-nano:online 0.4062963 0.3244444 8: meta-llama/llama-4-maverick:online 0.4316296 0.3653333 9: google/gemini-2.5-flash-lite-preview-06-17:online 0.4407407 0.3003704 10: google/gemini-flash-1.5-8b:online 0.4478889 0.4423333 11: google/gemini-2.0-flash-001:online 0.4525926 0.3098148 12: openai/gpt-4o-mini:online 0.4987037 0.3950000 ``` ## LLM misleading rating mean pearson r across all variables: 0.72 ![[12_models_vaccine_online_correlations_llm_misleading_batch1.png]] ## LLM inaccuracy rating mean pearson r across all variables: 0.72 ![[12_models_vaccine_online_correlations_llm_inaccurate_batch1.png]] # including only the 2 cheapest search models ```r c("perplexity/sonar", "openai/gpt-4o-mini-search-preview") ``` misleadingness - mean pearson r across all variables: 0.64 ![[12_models_vaccine_online_2-cheap-models_correlations_llm_misleading_batch1.png]] inaccuracy - mean pearson r across all variables: 0.62 ![[12_models_vaccine_online_2-cheap-models_correlations_llm_inaccurate_batch1.png]]