Update README.md
Browse files
README.md
CHANGED
|
@@ -169,21 +169,23 @@ Diverse thematic data were included to enhance the model's capabilities in subta
|
|
| 169 |
|
| 170 |
As there is a lack of multimodal multilingual evaluation data, we haven't performed a thorough multilingual evaluation yet (coming soon). The English evaluations are shown in the table below:
|
| 171 |
|
| 172 |
-
| Task
|
| 173 |
-
|
| 174 |
-
| ai2d
|
| 175 |
-
| mme
|
| 176 |
-
|
|
| 177 |
-
| mmmu_val
|
| 178 |
-
| mmstar
|
| 179 |
-
|
|
| 180 |
-
|
|
| 181 |
-
|
|
| 182 |
-
|
|
| 183 |
-
|
|
| 184 |
-
|
|
| 185 |
-
| realworldqa
|
| 186 |
-
|mmbench_en_dev| | exact_match | 0.7113 |
|
|
|
|
|
|
|
| 187 |
|
| 188 |
---
|
| 189 |
|
|
|
|
| 169 |
|
| 170 |
As there is a lack of multimodal multilingual evaluation data, we haven't performed a thorough multilingual evaluation yet (coming soon). The English evaluations are shown in the table below:
|
| 171 |
|
| 172 |
+
| Task | Subtask | Metric | Value |
|
| 173 |
+
|----------------|-------------------------|-------------------------|-----------|
|
| 174 |
+
| ai2d | | exact_match | 0.7451 |
|
| 175 |
+
| mme | cognition_score | mme_cognition_score | 246.4286 |
|
| 176 |
+
| | perception_score | mme_perception_score | 1371.8164 |
|
| 177 |
+
| mmmu_val | | accuracy | 0.3689 |
|
| 178 |
+
| mmstar | average | accuracy | 0.4865 |
|
| 179 |
+
| | coarse perception | accuracy | 0.7127 |
|
| 180 |
+
| | fine-grained perception | accuracy | 0.3799 |
|
| 181 |
+
| | instance reasoning | accuracy | 0.5674 |
|
| 182 |
+
| | logical reasoning | accuracy | 0.4478 |
|
| 183 |
+
| | math | accuracy | 0.4279 |
|
| 184 |
+
| | science & technology | accuracy | 0.3832 |
|
| 185 |
+
| realworldqa | | exact_match | 0.5699 |
|
| 186 |
+
| mmbench_en_dev | | exact_match | 0.7113 |
|
| 187 |
+
| docvqa_val | | anls | 0.6805 |
|
| 188 |
+
| infovqa_val | | anls | 0.4859 |
|
| 189 |
|
| 190 |
---
|
| 191 |
|