- used generalized random forests (GRFs) to find heterogeneous treatment effects
- days/dvs chosen based on [[220518_170142 campaign 3 daily analysis and results|earlier results]]
- fixed parameters
- winsorize: 0.95
- cluster on block
# summary of results
Performed GRFs on select days (with strongest treatment effects - treatment group shared less bad stuff). Tried it on four different days/DVs (see 4 sections below)
Overall, no significant heterogeneous treatment effects (HTE) in all the analyses
But if we look at which individual covariates (of 13) are most important and likely to moderate treatment effects, it's usually the outcome itself that's measured pre-campaign (`t0`), total activity during the campaign, or activity prior to the campaign.
**But surprisingly, the effects are generally negative (see figures below): users with greater activity or shared more bad stuff (x-axes) tend to have greater (more negative) treatment effect (y-axes). Most of the figures below have the same trend**.
How to perform these analyses across days? Probably analyses done on single days don't have enough statistical power to detect HTEs, so would be great if we can somehow aggregate across days...
# fc summed badness, threshold 50, 2021-10-23
- file: `../data-v09-badness-daily/dv_fc_badness_threshold50_day2021-10-23.csv`
- somewhat significant condition effect (see quasipoisson model below)
```r
> m <- feglm(t1 ~ conditionC * t0LC | block, dt1, family = "quasipoisson")
NOTE: 1,530 fixed-effects (6,782 observations) removed because of only 0 outcomes.
> m
GLM estimation, family = quasipoisson, Dep. Var.: t1
Observations: 20,968
Fixed-effects: block: 3,889
Standard-errors: Clustered (block)
Estimate Std. Error t value Pr(>|t|)
conditionC -0.092554 0.035737 -2.58983 0.0096381 **
t0LC 0.140713 0.010101 13.93079 < 2.2e-16 ***
conditionC:t0LC 0.029397 0.012812 2.29443 0.0218191 *
```
**Overall**, no heterogeneous treatment effects (HTE).
```r
Best linear fit using forest predictions (on held-out data)
as well as the mean forest prediction as regressors, along
with one-sided heteroskedasticity-robust (HC3) SEs:
Estimate Std. Error t value Pr(>t)
mean.forest.prediction 1.019108 0.432859 2.3544 0.009281 **
differential.forest.prediction 0.086326 0.380972 0.2266 0.410371 # not significant - no HTE
```
covariate importance
```r
# cov variable importance (sorted most to least important)
covariate imp
1: t0 0.24765623
2: total_activity 0.23195134
3: total_rt_t0 0.12609527
4: statuses_count 0.08421867
5: favourites_count 0.06184099
6: friends_count 0.04580157
7: description_len 0.04376029
8: followers_count 0.04265408
9: friend_follow_ratio 0.03835385
10: days_since_create 0.02967993
11: description_alpha_pct 0.01878629
12: name_alpha_pct 0.01462584
13: name_len 0.01457565
```
Stronger treatment effect for users with higher `t0` values (the outcome, but measured before campaign).
![[t0.png]]
![[1657922619.png]]
![[1657922750.png]]
### but when `t0` is the ONLY covariate used in the analysis (instead of 13 covariates)
We see potentially opposite effects (more bad stuff shared pre-campaign is associated with weaker treatment effect) relative to when we used all 13 covariates - consistent with the positive `conditionC:t0LC` linear interaction effect (p = .02), but the figure below looks more like a null interaction effect?
![[1658170379.png]]
# fc count badness, threshold 65, 2021-10-23
- file: `../data-v09-badness-daily/dv_fc_badness_threshold65_day2021-10-23.csv`
```r
GLM estimation, family = quasipoisson, Dep. Var.: t1
Observations: 18,496
Fixed-effects: block: 3,399
Standard-errors: Clustered (block)
Estimate Std. Error t value Pr(>|t|)
conditionC -0.093240 0.041005 -2.27386 0.023037 *
t0LC 0.481618 0.024707 19.49328 < 2.2e-16 ***
conditionC:t0LC 0.053403 0.025872 2.06410 0.039084 *
```
No HTE.
```r
Best linear fit using forest predictions (on held-out data)
as well as the mean forest prediction as regressors, along
with one-sided heteroskedasticity-robust (HC3) SEs:
Estimate Std. Error t value Pr(>t)
mean.forest.prediction 1.00553 0.43871 2.2921 0.01096 *
differential.forest.prediction 0.12120 0.40697 0.2978 0.38293
```
covariate importance
```r
covariate imp
1: t0 0.26697278
2: total_activity 0.21717985
3: total_rt_t0 0.12697095
4: statuses_count 0.08707942
5: favourites_count 0.05294394
6: friends_count 0.05156918
7: friend_follow_ratio 0.04555643
8: followers_count 0.03920187
9: description_len 0.03177402
10: days_since_create 0.03137276
11: description_alpha_pct 0.01883286
12: name_len 0.01658380
13: name_alpha_pct 0.01396215
```
![[1657923587.png]]
![[1657923611.png]]
# mbfc_min sum badness, threshold 80, 2021-10-22
- file: `../data-v09-badness-daily/dv_mbfc_min_badness_threshold80_day2021-10-22.csv`
```r
> m <- feglm(t1 ~ conditionC * t0LC | block, dt1, family = "quasipoisson")
NOTE: 202 fixed-effects (702 observations) removed because of only 0 outcomes.
> m
GLM estimation, family = quasipoisson, Dep. Var.: t1
Observations: 27,718
Fixed-effects: block: 5,219
Standard-errors: Clustered (block)
Estimate Std. Error t value Pr(>|t|)
conditionC -0.036149 0.015626 -2.31344 2.0737e-02 *
t0LC 0.019798 0.004617 4.28795 1.8359e-05 ***
conditionC:t0LC 0.007465 0.007078 1.05469 2.9162e-01
```
No HTE.
```r
Best linear fit using forest predictions (on held-out data)
as well as the mean forest prediction as regressors, along
with one-sided heteroskedasticity-robust (HC3) SEs:
Estimate Std. Error t value Pr(>t)
mean.forest.prediction 0.97994 0.44240 2.2151 0.01338 *
differential.forest.prediction -0.10503 0.47918 -0.2192 0.58675
```
covariate importance
```
covariate imp
1: total_activity 0.21904111
2: total_rt_t0 0.20239302
3: t0 0.11599926
4: friends_count 0.10135643
5: statuses_count 0.09968321
6: favourites_count 0.07836697
7: followers_count 0.03859189
8: days_since_create 0.03823481
9: description_len 0.02555517
10: friend_follow_ratio 0.02341081
11: name_len 0.02074606
12: description_alpha_pct 0.01976041
13: name_alpha_pct 0.01686086
```
![[1657924283.png]]
# afm_min fraction badness, threshold 80, 2021-10-22
- file: `../data-v09-badness-daily/dv_afm_min_badness_threshold80_day2021-10-21.csv`
```r
OLS estimation, Dep. Var.: t1
Observations: 28,674
Fixed-effects: block: 5,421
Standard-errors: Clustered (block)
Estimate Std. Error t value Pr(>|t|)
conditionC -0.002031 0.000831 -2.44366 0.014571 *
t0LC 0.687980 0.012675 54.27834 < 2.2e-16 ***
conditionC:t0LC -0.037243 0.019566 -1.90348 0.057031 .
```
No HTE
```r
Best linear fit using forest predictions (on held-out data)
as well as the mean forest prediction as regressors, along
with one-sided heteroskedasticity-robust (HC3) SEs:
Estimate Std. Error t value Pr(>t)
mean.forest.prediction 1.01617 0.76373 1.3305 0.09168 .
differential.forest.prediction -0.27473 0.46446 -0.5915 0.72291
```
covariate importance
```r
covariate imp
1: t0 0.20769795
2: followers_count 0.13051875
3: friend_follow_ratio 0.12117910
4: total_activity 0.08039265
5: friends_count 0.07972335
6: days_since_create 0.07204435
7: favourites_count 0.07175145
8: total_rt_t0 0.07051493
9: statuses_count 0.04879788
10: description_len 0.04353977
11: description_alpha_pct 0.03048560
12: name_len 0.02864845
13: name_alpha_pct 0.01470578
```
![[1657924882.png]]
![[1657924911.png]]