# filter users based on recent tweet activity
- 1,420,274 users tweeted at least once in last 2 to 3 weeks (since Mar 24)
- selected 514,433 users
- tweeted at least 20% of the days (bin them into 10 bins for 10 ad groups in the campaign - see next section)
- also include **all narrative accounts** regardless of recent activity
![[1681330959.png]]
```r
x cdf n_users
1: 0.00 0.00 0
2: 0.05 1.42 20204
3: 0.10 24.10 342285
4: 0.15 36.38 516664
5: 0.20 45.05 639803
6: 0.25 51.93 737516
7: 0.30 57.67 819016
8: 0.35 60.80 863460
9: 0.40 65.40 928816
10: 0.45 69.58 988157
11: 0.50 73.42 1042721
12: 0.55 75.93 1078423
13: 0.60 79.39 1127568
14: 0.65 82.65 1173813
15: 0.70 84.56 1200937
16: 0.75 87.62 1244480
17: 0.80 90.67 1287798
18: 0.85 93.72 1331065
19: 0.90 96.86 1375693
20: 0.95 99.88 1418564
21: 1.00 100.00 1420274
x cdf n_users
```
# divided users into 10 ad groups based on "prop days with tweet activity" covariate
each facet/panel is 1 ad group
![[1681321101.png]]
# blocking and condition assignment
## covariates blocked on
```r
features <- c(
"n_recent_tweets",
"prop_active",
"en",
"hi",
"en_follower",
"hi_follower",
"days_since_create",
"friend_follow_ratio",
"friends_count",
"statuses_count",
"favourites_count",
"cluster00",
"cluster04",
"cluster06",
"cluster07",
"cluster09",
"cluster17",
"cluster12",
"cluster13",
"cluster16",
"lang_en",
"lang_hi"
)
```
## no. of users per group and condition
```r
group condition N
1: 1 c 2765 # lowest activity group
2: 1 t 2753
3: 2 c 31177
4: 2 t 31178
5: 3 c 40454
6: 3 t 40402
7: 4 c 34266
8: 4 t 34436
9: 5 c 28095
10: 5 t 28194
11: 6 c 27858
12: 6 t 27905
13: 7 c 24991
14: 7 t 24911
15: 8 c 23006
16: 8 t 23039
17: 9 c 29296
18: 9 t 29303
19: 10 c 15225
20: 10 t 15183 # highest activity group
```
## example covariate distributions after blocking
- x-axis: covariate value (normalized and winsorized - 95%)
- y-axis: density
- rows: control, treatment
- columns: 10 adgroups
![[1681330782.png]]
![[1681330823.png]]
![[1681330865.png]]
![[1681330897.png]]