Update README.md
Browse files
README.md
CHANGED
|
@@ -26,7 +26,7 @@ This model is a fine-tuned version of [DebateLabKIT/Phi-4-Argunaut-1-SPIN-dev1](
|
|
| 26 |
It has been trained using [TRL](https://github.com/huggingface/trl).
|
| 27 |
|
| 28 |
|
| 29 |
-
📘 [HF Blog Article](https://huggingface.co/blog/ggbetz/argunauts-
|
| 30 |
|
| 31 |
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://api.wandb.ai/links/ggbetz/tqp681ch)
|
| 32 |
|
|
@@ -62,7 +62,7 @@ We have released the preference pairs generated online as a separate dataset: [D
|
|
| 62 |
|
| 63 |
## Evaluation
|
| 64 |
|
| 65 |
-
|
| 66 |
|
| 67 |
|
| 68 |
## Citations
|
|
|
|
| 26 |
It has been trained using [TRL](https://github.com/huggingface/trl).
|
| 27 |
|
| 28 |
|
| 29 |
+
📘 [HF Blog Article](https://huggingface.co/blog/ggbetz/argunauts-update-202512)
|
| 30 |
|
| 31 |
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://api.wandb.ai/links/ggbetz/tqp681ch)
|
| 32 |
|
|
|
|
| 62 |
|
| 63 |
## Evaluation
|
| 64 |
|
| 65 |
+
As described in [this article](https://huggingface.co/blog/ggbetz/argunauts-update-202512), `Phi-4-Argunaut-1-HIRPO` technically masters formal argument analysis but has lost general conversational abilities during one-sided training.
|
| 66 |
|
| 67 |
|
| 68 |
## Citations
|