# allen2021 dataset
training set (144 headlines)
- avg kappa (3 fc + 12 llm): 0.323
- avg kappa (3 fc): 0.419
- avg kappa (12 llm): 0.339
- avg kappa (3 best llm: grok, gemini3pro, gpt4search): 0.517
- avg kappa (3 fc + 3 best llm): 0.438
pairwise kappas (training set)
![[_tmp.png]]
3 fcs + 3 best models (training set)
![[_tmp2 1.png]]
3 fcs + 3 best models (testing set)
63 headlines in holdout set, avg kappas
- kappa (3 fc + 3 llm): 0.397
- kappa (3 fc): 0.39
- kappa (3 llm): 0.4
![[_tmp 1.png]]