Turkish Journal of Electrical Engineering and Computer Sciences




Segmentation of lung regions is of key importance for the automatic analysis of Chest X-Ray (CXR) images, which have a vital role in the detection of various pulmonary diseases. Precise identification of lung regions is the basic prerequisite for disease diagnosis and treatment planning. However, achieving precise lung segmentation poses significant challenges due to factors such as variations in anatomical shape and size, the presence of strong edges at the rib cage and clavicle, and overlapping anatomical structures resulting from diverse diseases. Although commonly considered as the de-facto standard in medical image segmentation, the convolutional UNet architecture and its variants fall short in addressing these challenges, primarily due to the limited ability to model long-range dependencies between image features. While vision transformers equipped with self-attention mechanisms excel at capturing long-range relationships, either a coarse-grained global self-attention or a fine-grained local self-attention is typically adopted for segmentation tasks on high-resolution images to alleviate quadratic computational cost at the expense of performance loss. This paper introduces a focal modulation UNet model (FMN-UNet) to enhance segmentation performance by effectively aggregating fine-grained local and coarse-grained global relations at a reasonable computational cost. FMN-UNet first encodes CXR images via a convolutional encoder to suppress background regions and extract latent feature maps at a relatively modest resolution. FMN-UNet then leverages global and local attention mechanisms to model contextual relationships across the images. These contextual feature maps are convolutionally decoded to produce segmentation masks. The segmentation performance of FMN-UNet is compared against state-of-the-art methods on three public CXR datasets (JSRT, Montgomery, and Shenzhen). Experiments in each dataset demonstrate the superior performance of FMN-UNet against baselines.


Focal modulation, lung segmentation, chest x-ray, transformer, attention

First Page


Last Page