Trying out later with 8 x RTX 5090

#1
by crystech - opened

Does it work with 5090? Will be trying out later and update results.
any vllm serve argument recommendation ?

KeyError: 'layers.52.self_attn.qkv_proj.k_scale'
unfortunately error running with vllm 0.13.0

It should, but I haven't tested with them. I did however run into that error with my vLLM inference setup. My fix is in this discussion:

https://huggingface.co/Salyut1/GLM-4.7-NVFP4/discussions/3#694ab9b6e2efa04b7ecb0c4b

Sign up or log in to comment