Spaces:

LLM-GAT
/

README

Running

stecas commited on Feb 3

Commit

5083808

verified ·

1 Parent(s): f2acc1a

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -11,7 +11,9 @@ thumbnail: >-
 # Model Tampering Attacks Enable More Rigorous Evlauations of LLM Capabilities
-Zora Che*, Stephen Casper*, Robert Kirk, Anirudh Satheesh, Stewart Slocum, Lev E McKinney, Rohit Gandikota, Aidan Ewart, Domenic Rosati, Zichu Wu, Zikui Cai, Bilal Chughtai, Yarin Gal, Furong Huang, Dylan Hadfield-Menell
 Paper: COMING SOON
@@ -57,7 +59,7 @@ So we evaluated models using multiple benchmarks.
 * **WMDP-Bio** (Bio capabilities)
 * **MMLU** (General capabilities)
 * **AGIEval** (General capabilities)
-* **T-Bench** (General capabilities)
 We then calculated the unlearning score which gives a normalized measure of how much WMDP-bio capabilities go down disproportionately compared to general capabilities.

 # Model Tampering Attacks Enable More Rigorous Evlauations of LLM Capabilities
+Zora Che*, Stephen Casper*,
+Robert Kirk, Anirudh Satheesh, Stewart Slocum, Lev E McKinney, Rohit Gandikota, Aidan Ewart, Domenic Rosati, Zichu Wu, Zikui Cai, Bilal Chughtai,
+Yarin Gal, Furong Huang, Dylan Hadfield-Menell
 Paper: COMING SOON
 * **WMDP-Bio** (Bio capabilities)
 * **MMLU** (General capabilities)
 * **AGIEval** (General capabilities)
+* **MT-Bench** (General capabilities)
 We then calculated the unlearning score which gives a normalized measure of how much WMDP-bio capabilities go down disproportionately compared to general capabilities.