arxiv:2601.15165

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Published on Jan 21

· Submitted by

Zanlin Ni on Jan 23

Tsinghua-LeapLab

Upvote

Authors:

Shenzhi Wang ,

Abstract

Arbitrary order generation in diffusion large language models limits reasoning capability by causing premature solution space collapse, making standard policy optimization more effective.

AI-generated summary

Diffusion Large Language Models (dLLMs) break the rigid left-to-right constraint of traditional LLMs, enabling token generation in arbitrary orders. Intuitively, this flexibility implies a solution space that strictly supersets the fixed autoregressive trajectory, theoretically unlocking superior reasoning potential for general tasks like mathematics and coding. Consequently, numerous works have leveraged reinforcement learning (RL) to elicit the reasoning capability of dLLMs. In this paper, we reveal a counter-intuitive reality: arbitrary order generation, in its current form, narrows rather than expands the reasoning boundary of dLLMs. We find that dLLMs tend to exploit this order flexibility to bypass high-uncertainty tokens that are crucial for exploration, leading to a premature collapse of the solution space. This observation challenges the premise of existing RL approaches for dLLMs, where considerable complexities, such as handling combinatorial trajectories and intractable likelihoods, are often devoted to preserving this flexibility. We demonstrate that effective reasoning is better elicited by intentionally forgoing arbitrary order and applying standard Group Relative Policy Optimization (GRPO) instead. Our approach, JustGRPO, is minimalist yet surprisingly effective (e.g., 89.1% accuracy on GSM8K) while fully retaining the parallel decoding ability of dLLMs. Project page: https://nzl-thu.github.io/the-flexibility-trap

View arXiv page View PDF Project page GitHub 68 Add to collection

Community

nzl-thu

Paper submitter 2 days ago

Links

📄 paper: https://arxiv.org/abs/2601.15165

🏠 project page: https://nzl-thu.github.io/the-flexibility-trap

💻 code: https://github.com/LeapLabTHU/JustGRPO

🤗 model: https://huggingface.co/nzl-thu/LLaDA-Instruct-JustGRPO

librarian-bot

about 20 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

search-facility

about 11 hours ago

Nice find!

jsm0424

about 10 hours ago

It appears that the figures for gsm8k and math500 in Figure 3 might have been swapped. Could you please verify this?

nzl-thu

about 7 hours ago

Thanks for pointing this out. This is indeed a labeling mismatch in our figure. We will correct this in our upcoming revision (v2) on arXiv promptly. We really appreciate your attention to detail!