- [[240510_100602 llm headline plausibility AUC]]
# aslett 245 headlines
logistic regression with l1 lasso/feature selection. all headlines.
```python
# true vs false
auc_cv: 0.8436
auc_out_of_sample: 0.8678
("[('plausibility_mistral_openorca_latest', 4.56149950109138), "
"('plausibility_openchat_latest', 4.171729512179912), "
"('plausibility_gemma_latest', 1.7067363617292777), "
"('plausibility_llama3_latest', 0.09377659485265712), "
"('plausibility_phi3_latest', -0.8678015082978409)]")
# top features from recursive feature selection
['plausibility_gemma_latest',
'plausibility_mistral_openorca_latest',
'plausibility_openchat_latest']
# false vs rest
LogisticRegressionCV
auc_cv: 0.8342
auc_out_of_sample: 0.7536
("[('plausibility_mistral_openorca_latest', 5.050336511941368), "
"('plausibility_gemma_latest', 1.675263074666969), "
"('plausibility_openchat_latest', 1.3833372785734577), "
"('plausibility_llama3_latest', 0.9748011393232207), "
"('plausibility_phi3_latest', -1.4413646391336885)]")
# top features from recursive feature elimination with cross-validation
['plausibility_gemma_latest', 'plausibility_mistral_openorca_latest']
# true vs rest
auc_cv: 0.8542
auc_out_of_sample: 0.6379
("[('plausibility_mistral_openorca_latest', 5.350387745960659), "
"('plausibility_openchat_latest', 3.587561320417132), "
"('plausibility_llama3_latest', 0.992733554925274), "
"('plausibility_gemma_latest', 0.9301229253238354), "
"('plausibility_phi3_latest', -1.4892755001959155)]")
# top features from recursive feature selection
['plausibility_gemma_latest',
'plausibility_llama3_latest',
'plausibility_mistral_openorca_latest',
'plausibility_openchat_latest',
'plausibility_phi3_latest']
```
`all_models_mean` is the mean of the top 3 models:
- `('gemma:latest', 'mistral-openorca:latest', 'openchat:latest')`
![[1716348604.png]]
# gord headlines
elastic net regression with l1_ratio = 1 (lasso/feature selection)
```python
# all headlines (454)
r_cv: 0.7623
r_out_of_sample: 0.7890
("[('plausibility_mistral_openorca_latest', 0.7479018916961439), "
"('plausibility_openchat_latest', 0.7243676898572867), "
"('plausibility_gemma_latest', 0.3522856231053299), "
"('plausibility_llama3_latest', 0.18854629149507898), "
"('plausibility_phi3_latest', 0.1320561837581022)]")
# top features from recursive feature elimination with cross-validation
['plausibility_gemma_latest',
'plausibility_llama3_latest',
'plausibility_mistral_openorca_latest',
'plausibility_openchat_latest',
'plausibility_phi3_latest']
# only p(claim) > 0.5 (385 headlines)
r_cv: 0.7587
r_out_of_sample: 0.7262
("[('plausibility_mistral_openorca_latest', 0.8142028068346667), "
"('plausibility_openchat_latest', 0.7619330812525645), "
"('plausibility_gemma_latest', 0.300639133917043), "
"('plausibility_llama3_latest', 0.19917785593908574), "
"('plausibility_phi3_latest', 0.15718520641339623)]")
# top features from recursive feature elimination with cross-validation
['plausibility_gemma_latest',
'plausibility_mistral_openorca_latest',
'plausibility_openchat_latest']
```
`all_models_mean` is the mean of the top 3 models:
- `('gemma:latest', 'mistral-openorca:latest', 'openchat:latest')`
![[1716348798.png]]