Fault diagnosis is an important part of industrial system health monitoring. Existing data-driven diagnosis methods often use balanced datasets for fault modelling. However, in practical applications, industrial systems often produce many samples with imbalanced distribution, which pose challenges to data-driven fault diagnostics. This issue receives extensive attention from the academic and industrial communities. Many results have been achieved in this area. However, there have been a few reviews on the imbalanced data-driven fault diagnosis. It is difficult to clarify the real challenges and future research directions. In response to this problem, a comprehensive review on the research progress in data-driven diagnostic methods and diagnostic application scenarios is provided. It proposes the challenges and future prospects facing the field, which could provide a reference for the research and application of the fault diagnostics.