Fix table for metrics

#8
by tidealwari - opened
Files changed (1) hide show
  1. README.md +8 -108
README.md CHANGED
@@ -76,115 +76,15 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
76
 
77
  ### Comparison of Qiskit models across benchmarks
78
 
79
- <table
80
- style="
81
- display: inline-table;
82
- border-collapse: separate;
83
- border-spacing: 0;
84
- font-family: Inter, -apple-system, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
85
- box-shadow: 0 6px 18px rgba(12, 20, 29, 0.06);
86
- border-radius: 12px;
87
- overflow: hidden;
88
- table-layout: auto;
89
- box-sizing: border-box;
90
- margin: 16px 0;
91
- "
92
- >
93
- <thead>
94
- <tr>
95
- <th style="text-align:left; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
96
- Model
97
- </th>
98
- <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
99
- QiskitHumanEval-Hard
100
- </th>
101
- <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
102
- QiskitHumanEval
103
- </th>
104
- <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
105
- HumanEval
106
- </th>
107
- <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
108
- ASDiv
109
- </th>
110
- <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
111
- MathQA
112
- </th>
113
- <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
114
- SciQ
115
- </th>
116
- <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
117
- MBPP
118
- </th>
119
- <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
120
- IFEval
121
- </th>
122
- <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
123
- CrowsPairs (English)
124
- </th>
125
- <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
126
- TruthfulQA (MC1 acc)
127
- </th>
128
- </tr>
129
- </thead>
130
- <tbody>
131
- <tr style="background:#f7fafc;">
132
- <td style="padding:12px 16px; font-weight:700; color:#07102a;">Qwen2.5-Coder-14B-Qiskit</td>
133
- <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">25.17</td>
134
- <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">49.01</td>
135
- <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">91.46</td>
136
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">4.21</td>
137
- <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">53.90</td>
138
- <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">97.00</td>
139
- <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">77.60</td>
140
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">49.64</td>
141
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">65.18</td>
142
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">37.82</td>
143
- </tr>
144
- <tr style="background:#ffffff;">
145
- <td style="padding:12px 16px; color:#0f172a;">mistral-small-3.2-24b-qiskit</td>
146
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">20.53</td>
147
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">40.39</td>
148
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">77.49</td>
149
- <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">20.69</td>
150
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">53.40</td>
151
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">96.40</td>
152
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">63.40</td>
153
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">31.66</td>
154
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">67.56</td>
155
- <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">42.84</td>
156
- </tr>
157
- <tr style="background:#ffffff;">
158
- <td style="padding:12px 16px; color:#0f172a;">granite-3.3-8b-qiskit</td>
159
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">14.57</td>
160
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">27.15</td>
161
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">62.80</td>
162
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">0.48</td>
163
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">38.66</td>
164
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">93.30</td>
165
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">52.40</td>
166
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">59.71</td>
167
- <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">59.75</td>
168
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">39.05</td>
169
- </tr>
170
- <tr style="background:#fbfdff;">
171
- <td style="padding:12px 16px; color:#0f172a;">granite-3.2-8b-qiskit</td>
172
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">9.93</td>
173
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">24.50</td>
174
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">57.32</td>
175
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">0.09</td>
176
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">41.41</td>
177
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">96.30</td>
178
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">51.80</td>
179
- <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">60.79</td>
180
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">66.79</td>
181
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">40.51</td>
182
- </tr>
183
- </tbody>
184
- </table>
185
-
186
- *Note: All models listed in the benchmark table were evaluated using their respective system prompt, defined in their Hugging Face model.*
187
 
 
 
188
  ## Training Data
189
 
190
  - **Data Collection and Filtering:** Our code data is sourced from a combination of publicly available datasets (e.g., Code available on <https://github.com>), and additional synthetic data generated at IBM Quantum. We exclude code that is older than 2023.
 
76
 
77
  ### Comparison of Qiskit models across benchmarks
78
 
79
+ | **Model** | **QiskitHumanEval-Hard** | **QiskitHumanEval** | **HumanEval** | **ASDiv** | **MathQA** | **SciQ** | **MBPP** | **IFEval** | **CrowsPairs (English)** | **TruthfulQA (MC1 acc)** |
80
+ |-----------|---------------------------|----------------------|---------------|-----------|------------|----------|----------|------------|---------------------------|---------------------------|
81
+ | **qwen2.5-coder-14b-qiskit** | **25.17** | **49.01** | **91.46** | 4.21 | **53.90** | **97.00** | **77.60** | 49.64 | 65.18 | 37.82 |
82
+ | mistral-small-3.2-24b-qiskit | 20.53 | 40.39 | 77.49 | **20.69** | 53.40 | 96.40 | 63.40 | 31.66 | 67.56 | **42.84** |
83
+ | granite-3.3-8b-qiskit | 14.57 | 27.15 | 62.80 | 0.48 | 38.66 | 93.30 | 52.40 | 59.71 | **59.75** | 39.05 |
84
+ | granite-3.2-8b-qiskit | 9.93 | 24.50 | 57.32 | 0.09 | 41.41 | 96.30 | 51.80 | **60.79** | 66.79 | 40.51 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
+ *Note: All models listed in the benchmark table were evaluated using their respective system prompt, defined in their Hugging Face model.*
87
+
88
  ## Training Data
89
 
90
  - **Data Collection and Filtering:** Our code data is sourced from a combination of publicly available datasets (e.g., Code available on <https://github.com>), and additional synthetic data generated at IBM Quantum. We exclude code that is older than 2023.