Abstract:Single-image super-resolution is a challenging ill-posed problem. Current methods based on convolutional neural networks face performance bottlenecks, while Transformer models, though capable of improving performance through global modeling, struggle to achieve computational efficiency due to their high computational complexity. Therefore, we propose a progressive Cross-Learning Network (CLNet) that integrates ultra-dense dilated residual blocks (UD2B) with enhanced Transformer blocks (ETB) to construct a synergistic progressive architecture. UD2B aggregates high- and low-frequency features through multi-scale dilated convolutions to enhance local representations, while ETB establishes long-range dependencies via cross-channel self-attention to capture global context. We also introduce a Cross-Feature and Cross-Level Attention Fusion Block (C2AFB) that achieves effective fusion of multi-level features through adaptive learning. Experiments on multiple benchmark datasets demonstrate that CLNet outperforms existing methods in both objective metrics and visual perceptual quality, achieving a favorable balance between performance and efficiency.