# allen2021 dataset training set (144 headlines) - avg kappa (3 fc + 12 llm): 0.323 - avg kappa (3 fc): 0.419 - avg kappa (12 llm): 0.339 - avg kappa (3 best llm: grok, gemini3pro, gpt4search): 0.517 - avg kappa (3 fc + 3 best llm): 0.438 pairwise kappas (training set) ![[_tmp.png]] 3 fcs + 3 best models (training set) ![[_tmp2 1.png]] 3 fcs + 3 best models (testing set) 63 headlines in holdout set, avg kappas - kappa (3 fc + 3 llm): 0.397 - kappa (3 fc): 0.39 - kappa (3 llm): 0.4 ![[_tmp 1.png]]