yinsong1986 commited on
Commit
3c10214
·
verified ·
1 Parent(s): 6368466

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -30,15 +30,15 @@ or "counter-commonsense", ensuring the model cannot answer based on language kno
30
 
31
  The horizontal axis depicts the cumulative frames constituting the video haystack. The vertical axis indicates the positioning of the needle image within that sequence. For example, a frame depth of 0% would situate the needle image at the outset of the video. The black dotted line signifies the training duration of the backbone language model, with each frame comprising 144 tokens.
32
 
33
- `OmniLong-Qwen2_5-VL-7B` scored averagely `97.55%` on this NIAH benchmark across different numbers of frame depths and frames shown in this plot.
34
 
35
  **[2. MME: A Comprehensive Evaluation Benchmark for Image Understanding](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation)**
36
 
37
- MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning. `OmniLong-Qwen2_5-VL-7B` retains SOTAs on both perception and cognition evaluation.
38
 
39
  | Models | mme_cognition_score | mme_percetion_score |
40
  |--------------------|----------------------|---------------------|
41
- |**OmniLong-Qwen2_5-VL-7B** | **642.85** | 1599.28|
42
  |[Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 629.64 | **1691.36** |
43
 
44
 
@@ -46,7 +46,7 @@ MME is a comprehensive evaluation benchmark for multimodal large language models
46
 
47
  Video-MME, the first-ever full-spectrum, Multi-Modal Evaluation benchmark of MLLMs in Video analysis.It covers a wide range of short videos (< 2min), Medium Video (4min ~ 15min), long video (30min ~ 60min). 900 videos with a total of 254 hours are manually selected and annotated by repeatedly viewing all the video content, resulting in 2,700 question-answer pairs. Also, subtitles are also provided with the video for evaluation.
48
 
49
- `OmniLong-Qwen2_5-VL-7B` scored a overall `67.9%` with subtitles and `73.4%` with as shown in this table (*adapted from the [VideoMME Leaderboard](https://video-mme.github.io/home_page.html)*), which makes it the SOTA for `7B` models.
50
 
51
  | Models | LLM Params | Overall (%) - w/o subs | Overall (%) - w subs |
52
  |--------------------|------------|-------------------------|------------------------|
@@ -98,7 +98,7 @@ pip install vllm
98
 
99
  ### Start the server
100
  ```shell
101
- vllm serve aws-prototyping/OmniLong-Qwen2_5-VL-7B --tensor-parallel-size 4
102
  ```
103
 
104
  ## Deploy the model on a SageMaker LMI Endpoint
@@ -107,18 +107,18 @@ Please refer to this [large model inference (LMI) container](https://docs.aws.am
107
 
108
 
109
  ## Limitations
110
- Before using the `OmniLong-Qwen2_5-VL-7B` model, it is important to perform your own independent assessment, and take measures to ensure that your use would comply with your own specific quality control practices and standards, and that your use would comply with the local rules, laws, regulations, licenses and terms that apply to you, and your content.
111
 
112
  ## Citation
113
 
114
  If you find our work helpful, feel free to give us a cite.
115
 
116
  ```
117
- @misc{OmniLong-Qwen2_5-VL-7B-2025,
118
  author = { {Yin Song and Chen Wu} },
119
- title = { {aws-prototyping/OmniLong-Qwen2_5-VL-7B} },
120
  year = 2025,
121
- url = { https://huggingface.co/aws-prototyping/OmniLong-Qwen2_5-VL-7B },
122
  publisher = { Hugging Face }
123
  }
124
  ```
 
30
 
31
  The horizontal axis depicts the cumulative frames constituting the video haystack. The vertical axis indicates the positioning of the needle image within that sequence. For example, a frame depth of 0% would situate the needle image at the outset of the video. The black dotted line signifies the training duration of the backbone language model, with each frame comprising 144 tokens.
32
 
33
+ `OmniLong-Qwen2.5-VL-7B` scored averagely `97.55%` on this NIAH benchmark across different numbers of frame depths and frames shown in this plot.
34
 
35
  **[2. MME: A Comprehensive Evaluation Benchmark for Image Understanding](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation)**
36
 
37
+ MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning. `OmniLong-Qwen2.5-VL-7B` retains SOTAs on both perception and cognition evaluation.
38
 
39
  | Models | mme_cognition_score | mme_percetion_score |
40
  |--------------------|----------------------|---------------------|
41
+ |**OmniLong-Qwen2.5-VL-7B** | **642.85** | 1599.28|
42
  |[Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 629.64 | **1691.36** |
43
 
44
 
 
46
 
47
  Video-MME, the first-ever full-spectrum, Multi-Modal Evaluation benchmark of MLLMs in Video analysis.It covers a wide range of short videos (< 2min), Medium Video (4min ~ 15min), long video (30min ~ 60min). 900 videos with a total of 254 hours are manually selected and annotated by repeatedly viewing all the video content, resulting in 2,700 question-answer pairs. Also, subtitles are also provided with the video for evaluation.
48
 
49
+ `OmniLong-Qwen2.5-VL-7B` scored a overall `67.9%` with subtitles and `73.4%` with as shown in this table (*adapted from the [VideoMME Leaderboard](https://video-mme.github.io/home_page.html)*), which makes it the SOTA for `7B` models.
50
 
51
  | Models | LLM Params | Overall (%) - w/o subs | Overall (%) - w subs |
52
  |--------------------|------------|-------------------------|------------------------|
 
98
 
99
  ### Start the server
100
  ```shell
101
+ vllm serve aws-prototyping/OmniLong-Qwen2.5-VL-7B --tensor-parallel-size 4
102
  ```
103
 
104
  ## Deploy the model on a SageMaker LMI Endpoint
 
107
 
108
 
109
  ## Limitations
110
+ Before using the `OmniLong-Qwen2.5-VL-7B` model, it is important to perform your own independent assessment, and take measures to ensure that your use would comply with your own specific quality control practices and standards, and that your use would comply with the local rules, laws, regulations, licenses and terms that apply to you, and your content.
111
 
112
  ## Citation
113
 
114
  If you find our work helpful, feel free to give us a cite.
115
 
116
  ```
117
+ @misc{OmniLong-Qwen2.5-VL-7B-2025,
118
  author = { {Yin Song and Chen Wu} },
119
+ title = { {aws-prototyping/OmniLong-Qwen2.5-VL-7B} },
120
  year = 2025,
121
+ url = { https://huggingface.co/aws-prototyping/OmniLong-Qwen2.5-VL-7B },
122
  publisher = { Hugging Face }
123
  }
124
  ```