Abstract:Active learning proves instrumental in training superior machine learning models while minimizing labeling costs. The combination of RD and QBC algorithms effectively addresses issues associated with considering only a single criterion. However, the K-means clustering upon which RD is based may include outliers, leading to a decrease in model performance, and QBC requires maintaining multiple models and indirectly provides sample information. To address these issues, we propose an adaptive density clustering-based Gaussian process regression ( ADC-GPR) algorithm, which efficiently selects samples by first clustering and then utilizing uncertainty directly. The ADC clustering in this algorithm is not only robust against outliers but also adapts to the distribution characteristics of the dataset, providing representative sample points and their corresponding clusters for subsequent AL. This method ensures both representativeness and diversity in unsupervised selection and considers informativeness, representativeness, and diversity in supervised selection. The experimental results demonstrate that compared to the RS, KS, and RD-GPR algorithms, the ADC-GPR algorithm exhibits an average performance improvement of 37. 3% , 8% , and 2. 8% respectively, with the same number of sampling iterations. Furthermore, the ADC-GPR algorithm demonstrates higher selection efficiency.