I am trying to run SFT with QLoRA on GLM4.5V. I would like to train in either bfloat16 or float16, but there is some mismatch between dtypes when actually running the model:
MODEL_ID = “zai-org/GLM-4.5V” bnb = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type=“nf4”, bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.float16, ) model = Glm4vMoeForConditionalGeneration.from_pretrained( MODEL_ID, trust_remote_code=True, quantization_config=bnb, torch_dtype=torch.float16, low_cpu_mem_usage=True, device_map={“”: 0}, ) File /usr/local/lib/python3.11/site-packages/transformers/models/glm4v_moe/modeling_glm4v_moe.py:342, in Glm4vMoeTextMoE.forward(self, hidden_states) 340 topk_indices, topk_weights = self.gate(hidden_states) 341 hidden_states = hidden_states.view(-1, hidden_states.shape[-1]) → 342 hidden_states = self.moe(hidden_states, topk_indices, topk_weights).view(orig_shape) 343 hidden_states = hidden_states + self.shared_experts(residuals) 344 return hidden_states File /usr/local/lib/python3.11/site-packages/transformers/models/glm4v_moe/modeling_glm4v_moe.py:330, in Glm4vMoeTextMoE.moe(self, hidden_states, topk_indices, topk_weights) 328 expert_output = expert(expert_input) 329 weighted_output = expert_output expert_weights.unsqueeze(-1) → 330 final_hidden_states.indexadd(0, token_indices, weighted_output) 332 # in original deepseek, the output of the experts are gathered once we leave this module 333 # thus the moe module is itelsf an IsolatedParallel module 334 # and all expert are “local” meaning we shard but we don’t gather 335 return final_hidden_states.type(hidden_states.dtype) RuntimeError: indexadd(): self (Half) and source (Float) must have the same scalar type
Can someone help working with GLM4.5V? I am using the preview build (transformers-v4.55.0-GLM-4.5V-preview) shown here zai-org/GLM-4.5V · Hugging Face