# questions
- there are rows where `event_type == "PROMPT_SHOWN"` but `prompt_probability == NULL`
- drop these prompts? affects how we determine when the experiment began for those in the treatment group
- 298 prompts/rows/events with `NULL` prompt probability, Nov 7 to Jan 13
- why control group have prompts with accuracy judgments? (relatively few though)
- many users saw only 1 prompt (or very few prompts)
- prompts were closed before they were shown (based on `event_time` and `event_type` columns)
- many `domain_name` is `NULL`: exclude them?
- `prompt_probability` vs `latest_probabilty`? differ for about 50 domains
- prompts can be closed without it being opened first, which means users could have provided ratings even though there wasn’t a prompt_shown event
- have to find the first prompt/event using a combination of methods/columns - event_time, survey_sharing, survey_accuracy
- we have ratings (sharing or accuracy) without `prompt_shown` or `prompt_closed` event types - probably fine to consider these cases as prompt shown already?
- `event_time`: UTC time?
- if multiple ratings exist per prompt, use first rating?
- if `prompt_shown` and `prompt_closed` have different domains, which one to use?
# summary/descriptives
```r
# 472 users in total
arm n
1: CONTROL 226
2: EXPERIMENT 246
# how many of each event
event_type N
1: TAB_NAVIGATION 5990941
2: PROMPT_SHOWN 3616
3: PROMPT_CLOSED 3483
# sharing intentions is in the `survey_sharing` column, and saved when `PROMPT_CLOSED`
event_type survey_sharing N
1: TAB_NAVIGATION NULL 5990941 # no. of sites visited
2: PROMPT_SHOWN NULL 3616 # no. of prompts shown
3: PROMPT_CLOSED NULL 2346 # of prompts shown, these were non-responses
4: PROMPT_CLOSED NO 331 # no share
5: PROMPT_CLOSED NOT_SURE 193 # not sure about sharing
6: PROMPT_CLOSED YES 613 # yes share
arm event_type survey_sharing N
1: CONTROL TAB_NAVIGATION NULL 2837414
2: CONTROL PROMPT_SHOWN NULL 1731
3: CONTROL PROMPT_CLOSED NULL 1224
4: CONTROL PROMPT_CLOSED NOT_SURE 67
5: CONTROL PROMPT_CLOSED NO 151
6: CONTROL PROMPT_CLOSED YES 236
7: EXPERIMENT TAB_NAVIGATION NULL 3153527
8: EXPERIMENT PROMPT_SHOWN NULL 1885
9: EXPERIMENT PROMPT_CLOSED NULL 1122
10: EXPERIMENT PROMPT_CLOSED NO 180
11: EXPERIMENT PROMPT_CLOSED NOT_SURE 126
12: EXPERIMENT PROMPT_CLOSED YES 377
# accuracy judgments saved in `survey_accuracy` column, and saved when `PROMPT_CLOSED`
event_type survey_accuracy N
1: TAB_NAVIGATION NULL 5990941
2: PROMPT_SHOWN NULL 3616 # no. of prompts shown
3: PROMPT_CLOSED NULL 2776 # difference between NULL vs NO?
4: PROMPT_CLOSED NO 85
5: PROMPT_CLOSED YES 478
6: PROMPT_CLOSED NOT_SURE 144
arm event_type survey_accuracy N
1: CONTROL TAB_NAVIGATION NULL 2837414
2: CONTROL PROMPT_SHOWN NULL 1731
3: CONTROL PROMPT_CLOSED NULL 1654
4: CONTROL PROMPT_CLOSED NOT_SURE 8 # ? shouldn't be here?
5: CONTROL PROMPT_CLOSED YES 14 # ? shouldn't be here?
6: CONTROL PROMPT_CLOSED NO 2 # ? shouldn't be here?
7: EXPERIMENT TAB_NAVIGATION NULL 3153527
8: EXPERIMENT PROMPT_SHOWN NULL 1885
9: EXPERIMENT PROMPT_CLOSED NULL 1122
10: EXPERIMENT PROMPT_CLOSED NO 83
11: EXPERIMENT PROMPT_CLOSED YES 464
12: EXPERIMENT PROMPT_CLOSED NOT_SURE 136
```
## 7 control users provided accuracy judgments - drop them?
```r
# user ids and no. of times they provided accuracy judgments
r
gt; weird_control[, .N, anon_user]
anon_user N
1: 73 4
2: 80 2
3: 132 1
4: 259 9
5: 323 1
6: 324 1
7: 455 6
```
## events/rows per user
```r
# 5-num summary, min, 25, median, 75, max
[1] 74.0 3576.5 7732.0 15391.5 124406.0
```
![[1674501132.png|600]]
natural log
![[1674501167.png|600]]
## prompts per user
```r
n_prompts_shown arm n_users
1: 1 EXPERIMENT 53 # mode is 1 prompt shown
2: 1 CONTROL 54
3: 2 EXPERIMENT 32
4: 2 CONTROL 26
5: 3 CONTROL 17
6: 3 EXPERIMENT 18
7: 4 EXPERIMENT 22
8: 4 CONTROL 19
9: 5 CONTROL 12
10: 5 EXPERIMENT 8
11: 6 EXPERIMENT 18
12: 6 CONTROL 13
13: 7 EXPERIMENT 15
14: 7 CONTROL 8
15: 8 EXPERIMENT 11
16: 8 CONTROL 8
17: 9 CONTROL 7
18: 9 EXPERIMENT 5
19: 10 EXPERIMENT 4
20: 10 CONTROL 7
21: 11 CONTROL 5
22: 11 EXPERIMENT 6
23: 12 CONTROL 7
24: 12 EXPERIMENT 5
25: 13 CONTROL 3
26: 13 EXPERIMENT 1
27: 14 EXPERIMENT 5
28: 14 CONTROL 2
29: 15 EXPERIMENT 3
30: 15 CONTROL 2
31: 16 EXPERIMENT 4
32: 16 CONTROL 3
33: 17 CONTROL 1
34: 17 EXPERIMENT 6
35: 18 CONTROL 1
36: 18 EXPERIMENT 5
37: 19 CONTROL 6
38: 19 EXPERIMENT 1
39: 20 CONTROL 4
40: 20 EXPERIMENT 2
# ... truncated
```
![[1674501046.png|600]]
natural log
![[1674501167.png|600]]
## no. prompts with sharing intentions (no, not sure, yes) - user averaged
![[1674503455.png|600]]
![[1674503502.png|600]]
## no. of prompts with accuracy judgments (no, not sure, yes)
why control group have accuracy judgments? should be all `NULL` for control group?
```r
rgt; promptsclosed_per_user[arm == "CONTROL", table(survey_accuracy)]
survey_accuracy
NO NOT_SURE NULL YES
2 8 1654 14
```
![[1674503694.png|600]]
![[1674503709.png|600]]
## there were prompts closed before prompts were shown?
```r
rgt; d0[, .N, keyby = .(period, event_type, arm)]
# period -0.5: events that occured before first prompt was shown
period event_type arm n_events
1: -0.5 PROMPT_CLOSED CONTROL 70 # ? prompts closed before they were first shown?
2: -0.5 PROMPT_CLOSED EXPERIMENT 20 # ? prompts closed before they were first shown?
3: -0.5 TAB_NAVIGATION CONTROL 905063
4: -0.5 TAB_NAVIGATION EXPERIMENT 898711
# period 0: events for when prompt shown
5: 0.0 PROMPT_SHOWN CONTROL 226
6: 0.0 PROMPT_SHOWN EXPERIMENT 249
7: 0.0 TAB_NAVIGATION CONTROL 225
8: 0.0 TAB_NAVIGATION EXPERIMENT 256
9: 0.5 PROMPT_CLOSED CONTROL 1608
# period 1: events for after first prompt was shown
10: 0.5 PROMPT_CLOSED EXPERIMENT 1785
11: 0.5 PROMPT_SHOWN CONTROL 1505
12: 0.5 PROMPT_SHOWN EXPERIMENT 1636
13: 0.5 TAB_NAVIGATION CONTROL 1932126
14: 0.5 TAB_NAVIGATION EXPERIMENT 2254560
```
### sites visited per user, before (t0) and after (t1) first prompt
```r
# fivenum summary of no. of sites visited per user pre/post first prompt
# pre
arm period n_events
1: CONTROL -0.5 9.0 # min
2: CONTROL -0.5 928.0 # 25%
3: CONTROL -0.5 2024.0 # median
4: CONTROL -0.5 3783.0 # 75%
5: CONTROL -0.5 124375.0 # max
# post
6: CONTROL 0.5 9.0
7: CONTROL 0.5 1291.0
8: CONTROL 0.5 4584.5
9: CONTROL 0.5 11168.0
10: CONTROL 0.5 83920.0
# pre
11: EXPERIMENT -0.5 4.0
12: EXPERIMENT -0.5 921.0
13: EXPERIMENT -0.5 2155.5
14: EXPERIMENT -0.5 4240.0
15: EXPERIMENT -0.5 70734.0
# post
16: EXPERIMENT 0.5 21.0
17: EXPERIMENT 0.5 1703.0
18: EXPERIMENT 0.5 4735.0
19: EXPERIMENT 0.5 11117.0
20: EXPERIMENT 0.5 77598.0
```
![[1674506523.png|600]]
![[1674506536.png|600]]