- see [[250513_103539 canada - perplexity factcheck accuracy - sonarpro|canada accuracy results]]
Mean/median accuracy
- `n`: no. of AI messages with factual claims
```r
condition median_accuracy mean_accuracy n
<char> <num> <num> <int>
1: proN 75 73.65810 3463 # right leaning
2: proT 85 81.65152 3498 # left/center leaning
strategy condition modelF median_accuracy mean_accuracy n
<char> <char> <fctr> <num> <num> <int>
1: baseline proN GPT-4.1 80 75.30108 465
2: baseline proN DeepSeek-V3 80 75.43564 505
3: baseline proT GPT-4.1 85 82.94545 550
4: baseline proT DeepSeek-V3 85 80.41393 517
5: generic proN GPT-4.1 70 72.15737 502
6: generic proN DeepSeek-V3 70 70.16032 499
7: generic proT GPT-4.1 85 83.12392 581
8: generic proT DeepSeek-V3 85 82.46711 456
9: nofacts proN GPT-4.1 80 78.83178 214
10: nofacts proN DeepSeek-V3 70 69.37500 144
11: nofacts proT GPT-4.1 85 83.09160 262
12: nofacts proT DeepSeek-V3 80 75.45342 161
13: noinstr proN GPT-4.1 75 74.09836 549
14: noinstr proN DeepSeek-V3 80 73.83761 585
15: noinstr proT GPT-4.1 85 81.20000 500
16: noinstr proT DeepSeek-V3 85 80.69002 471
```
![[1747750543.png]]
DV: AI message accuracy
- each person has multiple AI messages, so SEs are clustered on user
- `conditionC`: proT (-0.5, left/center), proN (0.5, right/nationalist)
- `modelF`: gpt (reference), deepseek
- `strategy`: baseline (reference), nofacts, generic, noinstr
```r
summ(feols(pfc ~ conditionC * modelF * strategy, d1, cluster = ~responseid))
term result sig
<char> <char> <char>
1: (Intercept) b = 79.12 [78.19, 80.06], p < .001 ***
2: conditionC b = -7.64 [-9.52, -5.77], p < .001 *** # proN messages are less accurate
3: modelFDeepSeek-V3 b = -1.20 [-2.51, 0.11], p = .073 .
4: strategygeneric b = -1.48 [-2.77, -0.19], p = .024 *
5: strategynofacts b = 1.84 [0.41, 3.26], p = .011 *
6: strategynoinstr b = -1.47 [-2.81, -0.13], p = .031 *
7: conditionC × modelFDeepSeek-V3 b = 2.67 [0.04, 5.29], p = .046 *
8: conditionC × strategygeneric b = -3.32 [-5.90, -0.75], p = .012 *
9: conditionC × strategynofacts b = 3.38 [0.53, 6.23], p = .020 *
10: conditionC × strategynoinstr b = 0.54 [-2.14, 3.22], p = .691
11: modelFDeepSeek-V3 × strategygeneric b = -0.13 [-1.95, 1.69], p = .890
12: modelFDeepSeek-V3 × strategynofacts b = -7.35 [-9.88, -4.82], p < .001 ***
13: modelFDeepSeek-V3 × strategynoinstr b = 0.81 [-1.10, 2.72], p = .404
14: conditionC × modelFDeepSeek-V3 × strategygeneric b = -4.01 [-7.64, -0.37], p = .031 *
15: conditionC × modelFDeepSeek-V3 × strategynofacts b = -4.48 [-9.54, 0.57], p = .082 .
16: conditionC × modelFDeepSeek-V3 × strategynoinstr b = -2.42 [-6.24, 1.40], p = .215
```
# interact with no. of facts
```r
r
gt; summ(feols(pfc ~ conditionC * modelF * scale(n_factual_claims), d1, cluster = ~responseid))
term result sig
<char> <char> <char>
1: (Intercept) b = 78.40 [77.91, 78.89], p < .001 ***
2: conditionC b = -7.54 [-8.52, -6.56], p < .001 ***
3: modelFDeepSeek-V3 b = -1.39 [-2.12, -0.67], p < .001 *** # deepseek less accurate
4: scale(n_factual_claims) b = 2.19 [1.76, 2.61], p < .001 *** # more facts, more accurate
5: conditionC × modelFDeepSeek-V3 b = 0.48 [-0.97, 1.93], p = .519
6: conditionC × scale(n_factual_claims) b = 1.13 [0.28, 1.98], p = .009 ** # when proN bot made more factual claims, it's more accurate
7: modelFDeepSeek-V3 × scale(n_factual_claims) b = -0.48 [-1.20, 0.24], p = .194
8: conditionC × modelFDeepSeek-V3 × scale(n_factual_claims) b = 0.34 [-1.10, 1.78], p = .643
```
![[20250520101245.png]]