Abstract:When highprecision cloud detection is implemented on visible spectral remote sensing images, the variability of the cloud form and the similarity between the cloud area and the earth object will reduce detection accuracy. To address this problem, this paper proposes a weighted multilevel scale fused network (WMSFNet), which can be trained endtoend without manual intervention. Firstly, the sensitivity to the cloud form is reduced by learning cloud area and earth object in turn. Meanwhile, WMSFNet can automatically extract highlevel spatial features through the fully convolutional network. In this way, cloud and earth object can be distinguished at the pixel level. A multilevel feature fused structure is designed to combine semantic information with spatial information from different levels. The detection of segmentation boundaries can be enhanced. Experiments performed on several real remote sensing images demonstrate that the proposed method can reach pixel accuracy of 9539%, which is better than other stateoftheart semantic segmentation methods. The error rate of cloud fraction is less than 1%, which provides a new solution for cloudcontaminated remote sensing images.