# filter users based on recent tweet activity - 1,420,274 users tweeted at least once in last 2 to 3 weeks (since Mar 24) - selected 514,433 users - tweeted at least 20% of the days (bin them into 10 bins for 10 ad groups in the campaign - see next section) - also include **all narrative accounts** regardless of recent activity ![[1681330959.png]] ```r x cdf n_users 1: 0.00 0.00 0 2: 0.05 1.42 20204 3: 0.10 24.10 342285 4: 0.15 36.38 516664 5: 0.20 45.05 639803 6: 0.25 51.93 737516 7: 0.30 57.67 819016 8: 0.35 60.80 863460 9: 0.40 65.40 928816 10: 0.45 69.58 988157 11: 0.50 73.42 1042721 12: 0.55 75.93 1078423 13: 0.60 79.39 1127568 14: 0.65 82.65 1173813 15: 0.70 84.56 1200937 16: 0.75 87.62 1244480 17: 0.80 90.67 1287798 18: 0.85 93.72 1331065 19: 0.90 96.86 1375693 20: 0.95 99.88 1418564 21: 1.00 100.00 1420274 x cdf n_users ``` # divided users into 10 ad groups based on "prop days with tweet activity" covariate each facet/panel is 1 ad group ![[1681321101.png]] # blocking and condition assignment ## covariates blocked on ```r features <- c( "n_recent_tweets", "prop_active", "en", "hi", "en_follower", "hi_follower", "days_since_create", "friend_follow_ratio", "friends_count", "statuses_count", "favourites_count", "cluster00", "cluster04", "cluster06", "cluster07", "cluster09", "cluster17", "cluster12", "cluster13", "cluster16", "lang_en", "lang_hi" ) ``` ## no. of users per group and condition ```r group condition N 1: 1 c 2765 # lowest activity group 2: 1 t 2753 3: 2 c 31177 4: 2 t 31178 5: 3 c 40454 6: 3 t 40402 7: 4 c 34266 8: 4 t 34436 9: 5 c 28095 10: 5 t 28194 11: 6 c 27858 12: 6 t 27905 13: 7 c 24991 14: 7 t 24911 15: 8 c 23006 16: 8 t 23039 17: 9 c 29296 18: 9 t 29303 19: 10 c 15225 20: 10 t 15183 # highest activity group ``` ## example covariate distributions after blocking - x-axis: covariate value (normalized and winsorized - 95%) - y-axis: density - rows: control, treatment - columns: 10 adgroups ![[1681330782.png]] ![[1681330823.png]] ![[1681330865.png]] ![[1681330897.png]]