oguzhanercan
's Collections
Architectural Proposals
updated
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
108
Causal Diffusion Transformers for Generative Modeling
Paper
•
2412.12095
•
Published
•
23
Tensor Product Attention Is All You Need
Paper
•
2501.06425
•
Published
•
90
TransMLA: Multi-head Latent Attention Is All You Need
Paper
•
2502.07864
•
Published
•
57
Transformers without Normalization
Paper
•
2503.10622
•
Published
•
170
LSNet: See Large, Focus Small
Paper
•
2503.23135
•
Published
•
11
DDT: Decoupled Diffusion Transformer
Paper
•
2504.05741
•
Published
•
77
Latent Diffusion Autoencoders: Toward Efficient and Meaningful
Unsupervised Representation Learning in Medical Imaging
Paper
•
2504.08635
•
Published
•
4
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Paper
•
2504.09454
•
Published
•
11
Efficient Generative Model Training via Embedded Representation Warmup
Paper
•
2504.10188
•
Published
•
12
Softpick: No Attention Sink, No Massive Activations with Rectified
Softmax
Paper
•
2504.20966
•
Published
•
31
Group Downsampling with Equivariant Anti-aliasing
Paper
•
2504.17258
•
Published
•
9
Paper
•
2505.14513
•
Published
•
29
LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer
Paper
•
2506.06952
•
Published
•
9
Marrying Autoregressive Transformer and Diffusion with Multi-Reference
Autoregression
Paper
•
2506.09482
•
Published
•
45
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
Paper
•
2506.14761
•
Published
•
17
Energy-Based Transformers are Scalable Learners and Thinkers
Paper
•
2507.02092
•
Published
•
69
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Paper
•
2507.07955
•
Published
•
26
Region-based Cluster Discrimination for Visual Representation Learning
Paper
•
2507.20025
•
Published
•
19
PixNerd: Pixel Neural Field Diffusion
Paper
•
2507.23268
•
Published
•
51
Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer
Paper
•
2508.14187
•
Published
•
4
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Paper
•
2510.07318
•
Published
•
30
Paper
•
2511.11238
•
Published
•
37