Trying out later with 8 x RTX 5090
#1
by
crystech
- opened
Does it work with 5090? Will be trying out later and update results.
any vllm serve argument recommendation ?
KeyError: 'layers.52.self_attn.qkv_proj.k_scale'
unfortunately error running with vllm 0.13.0
It should, but I haven't tested with them. I did however run into that error with my vLLM inference setup. My fix is in this discussion:
https://huggingface.co/Salyut1/GLM-4.7-NVFP4/discussions/3#694ab9b6e2efa04b7ecb0c4b