Hi, could you try quantizing Devstral-2-123B-Instruct-2512?
I think this model should be able to fit into an RTX PRO 6000.
mistralai/Devstral-2-123B-Instruct-2512
I'll give it my best shot. I've been battling with Devstral-Small-2-24B-Instruct-2512 for like 4 hours. If I manage to get that to run then the larger version will probably run too. Also yeah it should come out to something like 70GB so as long as you don't need a huge context it'll run on an RTX Pro 6000.
I'll give it my best shot. I've been battling with Devstral-Small-2-24B-Instruct-2512 for like 4 hours. If I manage to get that to run then the larger version will probably run too. Also yeah it should come out to something like 70GB so as long as you don't need a huge context it'll run on an RTX Pro 6000.
I think we can change the configuration and try aligning it to Mistral3ForConditionalGeneration. Here are the model differences:
Ministral3:
Mistral3ForConditionalGeneration
Devstral-2:
Ministral3ForCausalLM
I'll give it my best shot. I've been battling with Devstral-Small-2-24B-Instruct-2512 for like 4 hours. If I manage to get that to run then the larger version will probably run too. Also yeah it should come out to something like 70GB so as long as you don't need a huge context it'll run on an RTX Pro 6000.
Since native FP8 training was used, it might be necessary to convert to BF16 using the methods discussed here (but I think waiting for Unsloth would be better).