- see [[250513_103539 canada - perplexity factcheck accuracy - sonarpro|canada accuracy results]] Mean/median accuracy - `n`: no. of AI messages with factual claims ```r condition median_accuracy mean_accuracy n <char> <num> <num> <int> 1: proN 75 73.65810 3463 # right leaning 2: proT 85 81.65152 3498 # left/center leaning strategy condition modelF median_accuracy mean_accuracy n <char> <char> <fctr> <num> <num> <int> 1: baseline proN GPT-4.1 80 75.30108 465 2: baseline proN DeepSeek-V3 80 75.43564 505 3: baseline proT GPT-4.1 85 82.94545 550 4: baseline proT DeepSeek-V3 85 80.41393 517 5: generic proN GPT-4.1 70 72.15737 502 6: generic proN DeepSeek-V3 70 70.16032 499 7: generic proT GPT-4.1 85 83.12392 581 8: generic proT DeepSeek-V3 85 82.46711 456 9: nofacts proN GPT-4.1 80 78.83178 214 10: nofacts proN DeepSeek-V3 70 69.37500 144 11: nofacts proT GPT-4.1 85 83.09160 262 12: nofacts proT DeepSeek-V3 80 75.45342 161 13: noinstr proN GPT-4.1 75 74.09836 549 14: noinstr proN DeepSeek-V3 80 73.83761 585 15: noinstr proT GPT-4.1 85 81.20000 500 16: noinstr proT DeepSeek-V3 85 80.69002 471 ``` ![[1747750543.png]] DV: AI message accuracy - each person has multiple AI messages, so SEs are clustered on user - `conditionC`: proT (-0.5, left/center), proN (0.5, right/nationalist) - `modelF`: gpt (reference), deepseek - `strategy`: baseline (reference), nofacts, generic, noinstr ```r summ(feols(pfc ~ conditionC * modelF * strategy, d1, cluster = ~responseid)) term result sig <char> <char> <char> 1: (Intercept) b = 79.12 [78.19, 80.06], p < .001 *** 2: conditionC b = -7.64 [-9.52, -5.77], p < .001 *** # proN messages are less accurate 3: modelFDeepSeek-V3 b = -1.20 [-2.51, 0.11], p = .073 . 4: strategygeneric b = -1.48 [-2.77, -0.19], p = .024 * 5: strategynofacts b = 1.84 [0.41, 3.26], p = .011 * 6: strategynoinstr b = -1.47 [-2.81, -0.13], p = .031 * 7: conditionC × modelFDeepSeek-V3 b = 2.67 [0.04, 5.29], p = .046 * 8: conditionC × strategygeneric b = -3.32 [-5.90, -0.75], p = .012 * 9: conditionC × strategynofacts b = 3.38 [0.53, 6.23], p = .020 * 10: conditionC × strategynoinstr b = 0.54 [-2.14, 3.22], p = .691 11: modelFDeepSeek-V3 × strategygeneric b = -0.13 [-1.95, 1.69], p = .890 12: modelFDeepSeek-V3 × strategynofacts b = -7.35 [-9.88, -4.82], p < .001 *** 13: modelFDeepSeek-V3 × strategynoinstr b = 0.81 [-1.10, 2.72], p = .404 14: conditionC × modelFDeepSeek-V3 × strategygeneric b = -4.01 [-7.64, -0.37], p = .031 * 15: conditionC × modelFDeepSeek-V3 × strategynofacts b = -4.48 [-9.54, 0.57], p = .082 . 16: conditionC × modelFDeepSeek-V3 × strategynoinstr b = -2.42 [-6.24, 1.40], p = .215 ``` # interact with no. of facts ```r rgt; summ(feols(pfc ~ conditionC * modelF * scale(n_factual_claims), d1, cluster = ~responseid)) term result sig <char> <char> <char> 1: (Intercept) b = 78.40 [77.91, 78.89], p < .001 *** 2: conditionC b = -7.54 [-8.52, -6.56], p < .001 *** 3: modelFDeepSeek-V3 b = -1.39 [-2.12, -0.67], p < .001 *** # deepseek less accurate 4: scale(n_factual_claims) b = 2.19 [1.76, 2.61], p < .001 *** # more facts, more accurate 5: conditionC × modelFDeepSeek-V3 b = 0.48 [-0.97, 1.93], p = .519 6: conditionC × scale(n_factual_claims) b = 1.13 [0.28, 1.98], p = .009 ** # when proN bot made more factual claims, it's more accurate 7: modelFDeepSeek-V3 × scale(n_factual_claims) b = -0.48 [-1.20, 0.24], p = .194 8: conditionC × modelFDeepSeek-V3 × scale(n_factual_claims) b = 0.34 [-1.10, 1.78], p = .643 ``` ![[20250520101245.png]]