230123_120758 check data

# questions - there are rows where `event_type == "PROMPT_SHOWN"` but `prompt_probability == NULL` - drop these prompts? affects how we determine when the experiment began for those in the treatment group - 298 prompts/rows/events with `NULL` prompt probability, Nov 7 to Jan 13 - why control group have prompts with accuracy judgments? (relatively few though) - many users saw only 1 prompt (or very few prompts) - prompts were closed before they were shown (based on `event_time` and `event_type` columns) - many `domain_name` is `NULL`: exclude them? - `prompt_probability` vs `latest_probabilty`? differ for about 50 domains - prompts can be closed without it being opened first, which means users could have provided ratings even though there wasn’t a prompt_shown event - have to find the first prompt/event using a combination of methods/columns - event_time, survey_sharing, survey_accuracy - we have ratings (sharing or accuracy) without `prompt_shown` or `prompt_closed` event types - probably fine to consider these cases as prompt shown already? - `event_time`: UTC time? - if multiple ratings exist per prompt, use first rating? - if `prompt_shown` and `prompt_closed` have different domains, which one to use? # summary/descriptives ```r # 472 users in total arm n 1: CONTROL 226 2: EXPERIMENT 246 # how many of each event event_type N 1: TAB_NAVIGATION 5990941 2: PROMPT_SHOWN 3616 3: PROMPT_CLOSED 3483 # sharing intentions is in the `survey_sharing` column, and saved when `PROMPT_CLOSED` event_type survey_sharing N 1: TAB_NAVIGATION NULL 5990941 # no. of sites visited 2: PROMPT_SHOWN NULL 3616 # no. of prompts shown 3: PROMPT_CLOSED NULL 2346 # of prompts shown, these were non-responses 4: PROMPT_CLOSED NO 331 # no share 5: PROMPT_CLOSED NOT_SURE 193 # not sure about sharing 6: PROMPT_CLOSED YES 613 # yes share arm event_type survey_sharing N 1: CONTROL TAB_NAVIGATION NULL 2837414 2: CONTROL PROMPT_SHOWN NULL 1731 3: CONTROL PROMPT_CLOSED NULL 1224 4: CONTROL PROMPT_CLOSED NOT_SURE 67 5: CONTROL PROMPT_CLOSED NO 151 6: CONTROL PROMPT_CLOSED YES 236 7: EXPERIMENT TAB_NAVIGATION NULL 3153527 8: EXPERIMENT PROMPT_SHOWN NULL 1885 9: EXPERIMENT PROMPT_CLOSED NULL 1122 10: EXPERIMENT PROMPT_CLOSED NO 180 11: EXPERIMENT PROMPT_CLOSED NOT_SURE 126 12: EXPERIMENT PROMPT_CLOSED YES 377 # accuracy judgments saved in `survey_accuracy` column, and saved when `PROMPT_CLOSED` event_type survey_accuracy N 1: TAB_NAVIGATION NULL 5990941 2: PROMPT_SHOWN NULL 3616 # no. of prompts shown 3: PROMPT_CLOSED NULL 2776 # difference between NULL vs NO? 4: PROMPT_CLOSED NO 85 5: PROMPT_CLOSED YES 478 6: PROMPT_CLOSED NOT_SURE 144 arm event_type survey_accuracy N 1: CONTROL TAB_NAVIGATION NULL 2837414 2: CONTROL PROMPT_SHOWN NULL 1731 3: CONTROL PROMPT_CLOSED NULL 1654 4: CONTROL PROMPT_CLOSED NOT_SURE 8 # ? shouldn't be here? 5: CONTROL PROMPT_CLOSED YES 14 # ? shouldn't be here? 6: CONTROL PROMPT_CLOSED NO 2 # ? shouldn't be here? 7: EXPERIMENT TAB_NAVIGATION NULL 3153527 8: EXPERIMENT PROMPT_SHOWN NULL 1885 9: EXPERIMENT PROMPT_CLOSED NULL 1122 10: EXPERIMENT PROMPT_CLOSED NO 83 11: EXPERIMENT PROMPT_CLOSED YES 464 12: EXPERIMENT PROMPT_CLOSED NOT_SURE 136 ``` ## 7 control users provided accuracy judgments - drop them? ```r # user ids and no. of times they provided accuracy judgments rgt; weird_control[, .N, anon_user] anon_user N 1: 73 4 2: 80 2 3: 132 1 4: 259 9 5: 323 1 6: 324 1 7: 455 6 ``` ## events/rows per user ```r # 5-num summary, min, 25, median, 75, max [1] 74.0 3576.5 7732.0 15391.5 124406.0 ``` ![[1674501132.png|600]] natural log ![[1674501167.png|600]] ## prompts per user ```r n_prompts_shown arm n_users 1: 1 EXPERIMENT 53 # mode is 1 prompt shown 2: 1 CONTROL 54 3: 2 EXPERIMENT 32 4: 2 CONTROL 26 5: 3 CONTROL 17 6: 3 EXPERIMENT 18 7: 4 EXPERIMENT 22 8: 4 CONTROL 19 9: 5 CONTROL 12 10: 5 EXPERIMENT 8 11: 6 EXPERIMENT 18 12: 6 CONTROL 13 13: 7 EXPERIMENT 15 14: 7 CONTROL 8 15: 8 EXPERIMENT 11 16: 8 CONTROL 8 17: 9 CONTROL 7 18: 9 EXPERIMENT 5 19: 10 EXPERIMENT 4 20: 10 CONTROL 7 21: 11 CONTROL 5 22: 11 EXPERIMENT 6 23: 12 CONTROL 7 24: 12 EXPERIMENT 5 25: 13 CONTROL 3 26: 13 EXPERIMENT 1 27: 14 EXPERIMENT 5 28: 14 CONTROL 2 29: 15 EXPERIMENT 3 30: 15 CONTROL 2 31: 16 EXPERIMENT 4 32: 16 CONTROL 3 33: 17 CONTROL 1 34: 17 EXPERIMENT 6 35: 18 CONTROL 1 36: 18 EXPERIMENT 5 37: 19 CONTROL 6 38: 19 EXPERIMENT 1 39: 20 CONTROL 4 40: 20 EXPERIMENT 2 # ... truncated ``` ![[1674501046.png|600]] natural log ![[1674501167.png|600]] ## no. prompts with sharing intentions (no, not sure, yes) - user averaged ![[1674503455.png|600]] ![[1674503502.png|600]] ## no. of prompts with accuracy judgments (no, not sure, yes) why control group have accuracy judgments? should be all `NULL` for control group? ```r rgt; promptsclosed_per_user[arm == "CONTROL", table(survey_accuracy)] survey_accuracy NO NOT_SURE NULL YES 2 8 1654 14 ``` ![[1674503694.png|600]] ![[1674503709.png|600]] ## there were prompts closed before prompts were shown? ```r rgt; d0[, .N, keyby = .(period, event_type, arm)] # period -0.5: events that occured before first prompt was shown period event_type arm n_events 1: -0.5 PROMPT_CLOSED CONTROL 70 # ? prompts closed before they were first shown? 2: -0.5 PROMPT_CLOSED EXPERIMENT 20 # ? prompts closed before they were first shown? 3: -0.5 TAB_NAVIGATION CONTROL 905063 4: -0.5 TAB_NAVIGATION EXPERIMENT 898711 # period 0: events for when prompt shown 5: 0.0 PROMPT_SHOWN CONTROL 226 6: 0.0 PROMPT_SHOWN EXPERIMENT 249 7: 0.0 TAB_NAVIGATION CONTROL 225 8: 0.0 TAB_NAVIGATION EXPERIMENT 256 9: 0.5 PROMPT_CLOSED CONTROL 1608 # period 1: events for after first prompt was shown 10: 0.5 PROMPT_CLOSED EXPERIMENT 1785 11: 0.5 PROMPT_SHOWN CONTROL 1505 12: 0.5 PROMPT_SHOWN EXPERIMENT 1636 13: 0.5 TAB_NAVIGATION CONTROL 1932126 14: 0.5 TAB_NAVIGATION EXPERIMENT 2254560 ``` ### sites visited per user, before (t0) and after (t1) first prompt ```r # fivenum summary of no. of sites visited per user pre/post first prompt # pre arm period n_events 1: CONTROL -0.5 9.0 # min 2: CONTROL -0.5 928.0 # 25% 3: CONTROL -0.5 2024.0 # median 4: CONTROL -0.5 3783.0 # 75% 5: CONTROL -0.5 124375.0 # max # post 6: CONTROL 0.5 9.0 7: CONTROL 0.5 1291.0 8: CONTROL 0.5 4584.5 9: CONTROL 0.5 11168.0 10: CONTROL 0.5 83920.0 # pre 11: EXPERIMENT -0.5 4.0 12: EXPERIMENT -0.5 921.0 13: EXPERIMENT -0.5 2155.5 14: EXPERIMENT -0.5 4240.0 15: EXPERIMENT -0.5 70734.0 # post 16: EXPERIMENT 0.5 21.0 17: EXPERIMENT 0.5 1703.0 18: EXPERIMENT 0.5 4735.0 19: EXPERIMENT 0.5 11117.0 20: EXPERIMENT 0.5 77598.0 ``` ![[1674506523.png|600]] ![[1674506536.png|600]]