Hi, could you try quantizing Devstral-2-123B-Instruct-2512?

by win10 - opened 28 days ago

Discussion

win10

28 days ago

I think this model should be able to fit into an RTX PRO 6000.
mistralai/Devstral-2-123B-Instruct-2512

Firworks

Owner 28 days ago

I'll give it my best shot. I've been battling with Devstral-Small-2-24B-Instruct-2512 for like 4 hours. If I manage to get that to run then the larger version will probably run too. Also yeah it should come out to something like 70GB so as long as you don't need a huge context it'll run on an RTX Pro 6000.

win10

28 days ago

I'll give it my best shot. I've been battling with Devstral-Small-2-24B-Instruct-2512 for like 4 hours. If I manage to get that to run then the larger version will probably run too. Also yeah it should come out to something like 70GB so as long as you don't need a huge context it'll run on an RTX Pro 6000.

I think we can change the configuration and try aligning it to Mistral3ForConditionalGeneration. Here are the model differences:

Ministral3:

Mistral3ForConditionalGeneration

Devstral-2:

Ministral3ForCausalLM

win10

28 days ago

I'll give it my best shot. I've been battling with Devstral-Small-2-24B-Instruct-2512 for like 4 hours. If I manage to get that to run then the larger version will probably run too. Also yeah it should come out to something like 70GB so as long as you don't need a huge context it'll run on an RTX Pro 6000.

Since native FP8 training was used, it might be necessary to convert to BF16 using the methods discussed here (but I think waiting for Unsloth would be better).

https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512/discussions/1#69385cc61a179b6b8e9b4d4a

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment