Update README.md
Browse files
README.md
CHANGED
|
@@ -11,7 +11,9 @@ thumbnail: >-
|
|
| 11 |
|
| 12 |
# Model Tampering Attacks Enable More Rigorous Evlauations of LLM Capabilities
|
| 13 |
|
| 14 |
-
Zora Che*, Stephen Casper*,
|
|
|
|
|
|
|
| 15 |
|
| 16 |
Paper: COMING SOON
|
| 17 |
|
|
@@ -57,7 +59,7 @@ So we evaluated models using multiple benchmarks.
|
|
| 57 |
* **WMDP-Bio** (Bio capabilities)
|
| 58 |
* **MMLU** (General capabilities)
|
| 59 |
* **AGIEval** (General capabilities)
|
| 60 |
-
* **
|
| 61 |
|
| 62 |
We then calculated the unlearning score which gives a normalized measure of how much WMDP-bio capabilities go down disproportionately compared to general capabilities.
|
| 63 |
|
|
|
|
| 11 |
|
| 12 |
# Model Tampering Attacks Enable More Rigorous Evlauations of LLM Capabilities
|
| 13 |
|
| 14 |
+
Zora Che*, Stephen Casper*,
|
| 15 |
+
Robert Kirk, Anirudh Satheesh, Stewart Slocum, Lev E McKinney, Rohit Gandikota, Aidan Ewart, Domenic Rosati, Zichu Wu, Zikui Cai, Bilal Chughtai,
|
| 16 |
+
Yarin Gal, Furong Huang, Dylan Hadfield-Menell
|
| 17 |
|
| 18 |
Paper: COMING SOON
|
| 19 |
|
|
|
|
| 59 |
* **WMDP-Bio** (Bio capabilities)
|
| 60 |
* **MMLU** (General capabilities)
|
| 61 |
* **AGIEval** (General capabilities)
|
| 62 |
+
* **MT-Bench** (General capabilities)
|
| 63 |
|
| 64 |
We then calculated the unlearning score which gives a normalized measure of how much WMDP-bio capabilities go down disproportionately compared to general capabilities.
|
| 65 |
|