Exclude screen names
- too many followers (e.g., news sites) and too small friend-follow-ratio
- i.e., too many followers and following too few people
- see `assign.Rmd`
```r
> exclude <- dt1[followers_count > 100000 & friend_follow_ratio < 0.001, .(screen_name, followers_count, friend_follow_ratio)][order(-followers_count)]
> exclude
screen_name followers_count friend_follow_ratio
1: cnnbrk 62076267 1.949215e-06
2: ft 4898037 1.588391e-04
3: economictimes 4131225 1.113471e-05
4: benshapiro 3803462 8.439677e-05
5: france24 3796408 1.193233e-04
6: twittermoments 804819 1.615268e-05
7: enesfreedom 607914 1.644967e-06
8: leshchenkos 314355 4.103628e-04
9: dominic2306 271720 1.361691e-04
10: presstv 265906 2.519678e-04
11: aawsat_eng 142046 3.519962e-05
12: global_mil_info 110253 5.623379e-04
13: aymanrashdanw 103465 8.698510e-04
```
Covariates/features used for blocking
```r
features <- c(
"total_tweets",
"en_count",
"uk_count",
"ru_count",
"topic_count_all",
"topic_1_count",
"topic_2_count",
"topic_3_count",
"topic_4_count",
"followers_count",
"friends_count",
"favourites_count",
"statuses_count",
"friend_follow_ratio",
"days_since_create"
)
```
After winsorizing
![[Pasted image 20220224145814.png|800]]
Blocking
```r
condition
c t
6285 6314
> dt2[, table(table(block))]
4 5 6 7 8 9 10 11 12 13 14 15 16 17 19
699 507 371 212 152 85 55 33 21 12 7 6 2 1 1
# conditon differences in covariate (accounting for blocking)
> pvals_adjust
covariate adjusted_pval
1: total_tweets 0.6214045
2: en_count 0.9535791
3: uk_count 0.7457448
4: ru_count 0.6734039
5: topic_count_all 0.2794515
6: topic_1_count 0.2335795
7: topic_2_count 0.4256212
8: topic_3_count 0.7392818
9: topic_4_count 0.1125286
10: followers_count 0.3963671
11: friends_count 0.2679797
12: favourites_count 0.1904098
13: statuses_count 0.6869106
14: friend_follow_ratio 0.9125938
15: days_since_create 0.3415026
```
Sub-divided into two groups
```r
features_grp <- c("total_tweets", "topic_count_all")
group condition n
1: 0 c 3063
2: 0 t 3073
3: 1 c 3222
4: 1 t 3241
```
![[assignment_condition_group.png|1000]]
audience status
![[Pasted image 20220225012053.png]]
```r
upload audience_size match_rate
3063 id_220224224745 3028 true 0.9886 # control group0
3222 id_220224224832 3180 true 0.987 # control group1
3073 id_220224224906 3037 true 0.9883 # treatment group0
3241 id_220224224942 3201 true 0.9877 # treatment group1
```