Underwater image enhancement (UIE) is crucial for high-level vision in underwater robotics. While convolutional neural networks (CNNs) have made significant achievements in UIE, the locality of convolution poses a challenge in capturing the global context. In contrast, transformer-based networks, adept at handling long-range dependencies, have shown promise in various vision tasks. Nonetheless, directly applying a transformer to UIE faces critical challenges: 1) it tends to produce results with coarse details due to the negligence of local texture and 2) the varicolored degraded images require the network to be adaptable to different underwater environments. In this article, we propose a novel transformer-based network that can effectively leverage both the global contextual and local detailed information with some key designs (a global–local transformer [GL-Trans] block and a detail-enhanced skip connector [DESC]) while being computationally efficient. Moreover, by introducing a simple but effective learnable environment adaptor, the proposed network is flexible to deal with different underwater environments. Extensive experiments have been conducted and have demonstrated the superiority of our proposed network compared with other state-of-the-art (SOTA) methods both qualitatively and quantitatively.