A review of research on intelligent fault detection of power equipment based on infrared and voiceprint: methods, applications and challenges☆

doi:10.1016/j.gloei.2025.08.001

Figure（0）

Tables（0）

Author Information

Publication Information

A review of research on intelligent fault detection of power equipment based on infrared and voiceprint: methods, applications and challenges☆

Xizhou Du^a ,Xing Lei^a ,Ting Ye^a ,Yingzhou Sun^b,c,d ,Zewen Shang^b,c,d ,Zhiqiang Liu^b,c,d ,Tianyi Xu^b,c,d,*

（ a State Grid Shanghai Municipal Electric Power Company, Shanghai 200437, P.R.China , b College of Intelligence and Computing, Tianjin University, Tianjin 300350, P.R.China , c Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin 300350, P.R.China , d Tianjin Key Laboratory of Advanced Networking, Tianjin 300350, P.R.China ）

DOI:10.1016/j.gloei.2025.08.001

Keywords

Power equipment fault detection; Infrared image; Voiceprint data; Deep learning; Traditional image processing; Voiceprint detection

Abstract

Abstract As modern power systems grow in complexity, accurate and efficient fault detection has become increasingly important.While many existing reviews focus on a single modality, this paper presents a comprehensive survey from a dual-modality perspective-infrared imaging and voiceprint analysis-two complementary, non-contact techniques that capture different fault characteristics.Infrared imaging excels at detecting thermal anomalies, while voiceprint signals provide insight into mechanical vibrations and internal discharge phenomena.We review both traditional signal processing and deep learning-based approaches for each modality, categorized by key processing stages such as feature extraction and classification.The paper highlights how these modalities address distinct fault types and how they may be fused to improve robustness and accuracy.Representative datasets are summarized,and practical challenges such as noise interference,limited fault samples,and deployment constraints are discussed.By offering a cross-modal,comparative analysis,this work aims to bridge fragmented research and guide future development in intelligent fault detection systems.The review concludes with research trends including multimodal fusion, lightweight models, and self-supervised learning.

0 Introduction

1) Background and Signific ance

With the rapid development of the social economy, the power syst em has become an essential infrastructure for modern society.The operational status of power equipment directly affects the safety, reliab ility, and economy of the power supply [1].As the scale and complexity of power systems continue to expand, problems such as power outages and equipment damage caused by equipment failures have become increasingly prominent [2].Equipment failures can lead to large-scale power outages,economic losses, and even personal safety issues.Particularly in modern power grids, where there is a wide variety of power equipment, complex technol ogies, and diverse,hidden fault types, efficient and accurate fault detection and diagnosis of power equipment has become a hot research topic [3,4].

Traditional fault detection methods for power equipment mainly rely on manual inspections and monitoring based on electrical parameters [5].For instance, changes in electrical parameters such as voltage, current, and power can serve as early warning signals for faults [6].However, these metho ds often have certain limitations.They rely on periodic manual inspections and the assumption of normal equipment operation, which may overlook potential faults in equipment [5].Furthermore, although electrical parameter monitoring can detect certain faults,it is difficult to obtain details abou t the external characteristics and internal faults of the equipment [7].Therefore,traditional methods face significant challenges in terms of detection efficiency, accuracy, and real-time performance[8].

With the rapid development of machine learning and deep learning technologies, fault detection techniques that combine sensors, imag e processing, voiceprint analysis,and other methods have gradually emerged [9].Infrared imaging technology and voiceprint analysis technology,as two typical non-contact monitoring methods, have become among the most commonly used detection techniques.Infrared imaging technology captures temperature changes on the surface of equipment, enabling real-time monitoring of the thermal state of power equipment,thereby identifying potential issues such as overheating and localized faults [10].Voiceprint technology, on the other hand, analyzes the acoustic signals generated by the equipment during operation to assess its operational status [11,12].Both methods offer high detection accuracy and sensitivity, providing rich diagnostic information without interfering with the normal operation of the equipment.By applying deep learning methods to process the collected infrared image data and voiceprint data of the power system, automated fault monitoring, localization, and diagnosis of power equipment can be achieved[13].While recent advances such as knowledge-graphassisted models and GNN-based methods have shown great potential in power system fault diagnosis [14], this review focuses specifically on two representative noncontact modalities-infrared imaging and voiceprint analysis-due to their complementary characteristics, practical deployment feasibility, and the relative maturity of the research landscape.

2) Research Status and Challenges

Early fault detection of power equipment mainly relied on changes in electrical parameters, monitoring physical quantities such as current and voltage to determine the fault status of the equipment.However, this method required manual detection, which was inefficient and lacked accuracy in fault localization [15].With the development of image processing and voiceprint analysis technologies, methods based on infrared image processing and voiceprint analysis of power equipment have been widely applied [16-19].

Traditional image processing methods mainly rely on rule-based algorithms.These methods often depend on manually set rules and features, and can detect abnormal conditions on the equipment surface to a certain extent[20].However, traditional image processing methods are less robust to external environmental factors such as image quality, noise, and lighting changes.Moreover, in complex equipment backgrounds, the effectiveness of image segmentation and feature extraction is limited,leading to bottlenecks in fault detection accuracy and a reduction in detection speed due to the complexity of the algorithms[21].

Traditional voiceprint analysis techniques have also been applied to power equipment fault diagnosis.These methods primarily rely on time-domain and frequencydomain feature extraction.By performing time-frequency transformati ons on the acoustic signals generated during equipment operation,these methods extract voiceprint features to identify fault types [22].However, traditional voiceprint analysis methods generally struggle with processing high-dimensional, diverse acoustic signal data, rely on manually designed features, and have poor noise suppression capabilities.This limit s their ability to adapt to complex fault patterns and significantly reduces fault detection accuracy in challenging environments [23].

With the widespread application of deep learning techniques, fault detection methods for power equipment based on deep neural networks have made significant progress in several areas [24].Compared to traditional image processing and voiceprint analysis methods, deep learning methods can automatically learn the features of equipment from a large volume of collected data, enabling more effective fault feature extraction.As a result, deep learningbased fault detection methods offer greater robustness and adaptability [25].

However, despite the impressive advantages of deep learning methods in power equipment fault detection,several challenges remain.First, deep learning methods typically require large amounts of labeled data for training.In practice, data collection for specific fault types is often difficult, and manual labeling is costly, leading to a scarcity of fault samples for power equipment.Second,environmental interference, such as lighting changes and background noise, can distort signals, affecting the model’s generalization ability [26,27].Moreover, most existing methods are based on either infrared images or voiceprint data for fault analysis, and the mechanisms for aligning and fusing multi-source data are still underdeveloped.Multi-modal fault detection models for power equipment are still in the exploratory stage [28].To meet the demands of real-time detection and the flexibility of the detection equipment, achieving real-time processing and model lightweighting for deep learningbased fault detection methods is also an urgent problem to solve [29].

3) Contributions and Structure

This paper provides a systematic review and summary of fault detection methods for power equipment based on two representative non-contact sensing modalities:infrared imaging and voiceprint analysis.These two approaches are inherently complementary-infrared imaging excels at capturing surface temperature anomalies indicative of thermal faults, while voiceprint analysis offers insight into internal conditions such as mechanical vibration and partial discharge.By jointly analyzing both modalities, this review provides a more comprehensive and multidimensional perspective on intelligent fault detection.The paper covers the evolution from traditional image and signal processing techniques to deep learningbased methods, highlighting differences in methodology,detection performance, and practical applicability.The review also compares their application scenarios, summarizes publicly available datasets, and discusses key technical challenges including noise interference,sample scarcity,and generalization limitations.The structure of the paper is arranged as follows: Section 1 provides an overview of power equipment fault types and corresponding detection principles; Section 2 introduces infrared-based detection technologies, focusing on traditional image process ing and deep learning methods; Section 3 reviews voiceprintbased detection methods, including feature extraction and acoustic signal analysis;Section 4 summarizes relevant infrared and voiceprint datasets; and Section 6 outlines current limitations and suggests future research directions.

1 Overview of fault detection in power equipment

1.1 Representative fault types in power equipment

Among the various faults in power equipment, short circuits, open circuits, and partial discharge are the most common and hazardous.This section presents these fault types to deepen understanding of their characteristics and to illustrate their relevance for infrared and voiceprint-based detection.

1.1.1 Short circuit faults

Short circuits occur when conductors with different potentials make unin tended contact, disrupting circuit operation [30].A key feature is a sudden surge in current far beyond rated capacity, which can cause equipment overheating, insulation damage, or even explosions [31].

Common causes include improper wiring and progressive insulation degradation due to aging, mechanical damage, or moisture ingress, whi ch can compromise dielectric integrity and result in short circuit faults [32-34].Single line-to-ground (SLG) faults are particular ly prevalent[33].In addition, aging of components like transformer windings or circuit breaker contacts can degrade insulation, increasing fault risk [35].Manufacturing defects in materials or dimensions may also contribute [36].The intense heat from fault currents accelerates insulation aging and can ignite internal components,threatening system stability [37,38].

1.1.2 Open circuit faults

An open circuit occurs when a break in the current path causes current to drop or vanish, leading to equipment failure and system disruption [30,39].

Such faults often result from mechanical stress, corrosion, or wear in cables and wi res, which can rupture insulation or break conductors [40].Loose connections in joints—e.g., poorly tightened screws or insufficient contact area—can raise resistance and generate heat, degrading insulation and causing failure [41].Cold solder joints or missed soldering can also produce weak connections prone to faults [42].

1.1.3 Partial discharge faults

Partial discharge refers to a phenomenon in the insulation system of power equipment, where the electric field strength in a localized area exceeds the breakdown strength of the insulating material, causing dischar ge.However, the discharge energy is relatively small and typically does not immediately result in the breakdown of the insulation [30].Partial discharge faults have the following characteristics: the discharge occurs in a relatively concentrated area, usually in weak points of the equipment insulation; the discharge signals are random and intermittent,making them difficult to capture and monitor; an d during the discharge process,various physical phenomena such as electromagnetic waves and ultrasound are generated,which can serve as diagnostic indicators for partial discharge faults [43].

One of the main causes of partial discharge faults is the presence of defects in the insulation materials of power equipment.These defects may include bubbles, impurities,cracks, and other irregularities that disrupt the uniformity and continuity of the insulation, leading to an increase in local electric field strength[44].For example, in the insulation layer of a cable, the presence of bubbles can creat e partial discharge points under the influence of the electric field [45].

Uneven voltage distribution within equipment can also lead to partial discharge faults.In transformers, an unreasonable winding structure or insufficient insulation distance can cause localized electric field intensification,triggering partial discharge [46].Environmental factors such as elevated temperature, humidity, or transient overvoltages further degrade insulation and increas e discharge risk [47,48].

Partial discharge can damage insulation materials, leading to a decline in their insulating properties.Long-term partial discharge can alter the molecular structure of the insul ation material, gradually reducing its insulating performance and ultimately leading to insulation breakdown[49].

1.2 Classification of power equipment fault detection technologies

The detection technologies for power equipment can be classified into non-invasive and invasive techniques.Noninvasive detection technologies obtain the equipment’s status information through the collection and analys is of external signals without damaging the equipment’s structure, such as infrared detection and voiceprint detection[50].Invasive detection techniques, on the other hand,require partial disassembly of the equipment or direct contact with its internal components to gather more detailed information, such as electrical parameter measurements and partial discharge detection [51].

Based on the type of sensor data used, the detection techniques can be further divided into infrared detection and voiceprint detection.Infrared detection technology is based on the principle of thermal radiation.Power equipment generates heat during operation, and its temperature distribution and variations reflec t the operating status and fault characteristics of the equipment.By applying image processing and analysis techniques, the temperature field of the equipment can be visualized and quantitatively analyzed [52].Infrared detection allows for remote detection of equipment without interfering with its normal operation, thereby enhancing safety and efficiency.It is widely used for temperature monitoring of power equipment,such as transformers, circuit breakers, and cables.By regularly performing infrared detection and analyzing the temperature distribution and trends on the equipment’s surface, it is possible to assess whether the equipment has internal faults [53].

This paper introduces infrared-based fault detection from two perspectives-Traditional Image Processing Method s and Deep Learning Methods, as shown in Fig.1.Traditional Image Processing Methods improve image quality through denoising, data augmentation and similar techniques, and employ expert knowledge to monitor fault regions.These approaches offer high interpretability and impose relatively low demands on sample size and computational resources; however, they suffer from limited accuracy and poor robustness.Conversely,deep learning models exhibit powerful representation capabilities that enable them to discover latent features and patterns within large volumes of infrared image data.A variety of deep learning architectures have been applied to the infrared inspection of electrical equipment,substantially improving detection accuracy and operational effi-ciency.Additionally, denoising and data augmentation techniques developed in the traditional framework can be integrated into the data preprocessing pipelines of deep learning models.Nonetheless, deep learning approaches require extensive training samples and significant computational power, making them impractical for resourceconstrained environments.

Voiceprint detection technology collects sound signals generated by the operation of power equipment through sound sensors installed on the equipment [54].These sound signals contain information about the equipment’s operating status and fault characteristics.Different types of fault s generate sound signals with varying frequencies and intensities [55].By extracting and analyzing the features of these sound signals, fault diagnosis for the equipment can be performed.

The present study introduces voiceprint-based fault detection from two perspectives-Voiceprint Feature Extraction a nd Machine Learning and Deep Learning Methods as shown in Fig.1.Owing to the high redundancy and complexity of raw voiceprint signals, suitable feature extraction techniques can distill critical characteristics and reduce computational complexity.The extracted features are then input into machine learning or deep learning models to achieve precise identification,classification, and diagnosis of electrical equipment faults.

2 Infrared-based fault detection for power equipment

Infrared images contain rich temperature and texture information, which is of great value for assessing the status of power equipment.Infrared detection technology, with its advantages of non-contact, long-distance measurement,and the ability to reflect the thermal state of equipment,plays an important role in power equipment inspection.Early infrared detection primarily relied on traditional image processing methods, where fault information was extracted through manually designed features.In recent years, with the continuous development and breakthroughs in infrared detector technology, image processing techniques, and deep learning algorithms, infrared-based fault detection for power equipment has made significant progress.It is now capable of more accurately detecting fault types and locations in power equipment, providing strong support for the safe and stable operation of power systems.This section will review the evolution of infrared detection technology from traditional methods to deep learning, and analyze its core methods and practical applications.

2.1 Infrared detection based on traditional image processing methods

pagenumber_ebook=109,pagenumber_book=825

Fig.1.A comprehensive taxonomy of intelligent fault detection, categorized according to methodologies.

Traditional image processing methods, due to their strong interpretability and low computational resource requirements, played an important role in early fault detection method for power equipment and still have practical value in specific scenarios.Traditional methods generally follow the process of image preprocessing, feature extraction, and target recognition, as shown in Fig.2.First, infrared image quality is optimized through techniques such as denoising and enhancement.Subsequently,fault region saliency information is extracted based on manually designed features (e.g., texture, gradient, temperature distribution).Finally, fault localization and classification are completed through thresholding, edge detection, or pattern matching algorithms.However,these methods heavily rely on expert-designed feature rules,lack robustness in complex environments(such as lighting variations and background interference), and are difficult to adapt to the diverse fault patterns of power equipment.This section will systematically review the core technological components of traditional image processing methods and discuss their limitations in practical applications.

2.1.1 Image denoising

Infrared images of power equipment are typically captured in complex environments, which may be affected by environmental noise, sensor noise, transmission interference, and external environmental factors.These influences can cause the acquired infrared images to be blurry or lose details, resulting in inaccurate display of the temperature and infrared features of the fault region.This directly impacts the accuracy of subsequent feature extraction, fault recognition, and diagnostic localization.Therefore, image denoising is a crucial step in infrared image processing and detection.Denoising can remove unnecessary interference while preserving key tempe rature anomaly information, making the detection and localization of power equipment faults more accurate.

pagenumber_ebook=110,pagenumber_book=826

Fig.2.Schematic diagram of the infrared detection process based on traditional image processing methods.

Zhang et al.[56] applied bilateral filtering to the framedifference images of drone-captured infrared video, suppressing noise while preserving structural details.Liu et al.[58] converted infrared frames into the frequency domain via orthogonal wavelet transform, processed low- and high-frequency subbands with homomorphic and low-pass filters respectively, and then reconstructed the denoised image by inverse transform.Yuan et al.[59]employed an adaptive median filter on insulator-string imagery to maintain contrast and edge sharpness during noise removal.Li et al.[61] cropped redundant regions and then used adaptive Wiener filtering-estimating local means and variances-t o eliminate Gaussian noise without sacrificing high-frequency fault features.

Liang et al.[62] modified system state and noisetransition matrices to jointly mitigate Gaussian, gamma,and salt-and-pepper noise in circuit-fault infrared images,enhancing overall clarity.Wu et al.[63] optimized threshold selection for multiple regions of interest using a bat inspired algorithm with a two dimens ional entropy fitness function, iteratively improving noise removal.Liu et al.[65] applied bilateral filtering to insulator infrared images,effectively removing environmental and sensor noise while preserving edges and local details.Li et al.[66] implemented an optimized Savitzky-Golay filter to smooth in situ infrared spectra of ablation gases, boosti ng signal to noise ratios and facilitating accurate quantitative analysis.

In power equipment fault detection algorithms based on traditional image processing methods, bilateral filtering, wavelet transform, adaptive filtering, and Wiener filtering are primarily used to remove infrared image noise caused by environmental, sensor, or transmission factors.These methods protect image details and edge information while denoising, effectively improving image quality.However, most existing methods rely on preset parameters,prior knowledge, and specific noise models, making it difficult to handle mixed noise in real-world scenarios.They also lack adaptability to complex noise environments.Additionally, the filtering process may lead to the loss of weak fault features, potentially removing certain detailed information from the image [62].Therefore, although denoising methods can effectively improve image quality,they still face issues in practical applications, such as significant changes in noise environments and insufficient denoising accuracy.

2.1.2 Image enhancement

In infrared monitoring of power equipment, fault areas are often acco mpanied by temperature anomalies.However, due to low contrast and minimal temperature distribution differences, fault regions can be confused with complex backgrounds.To effectively identify these fault areas, image enhancement becomes a crucial processing step.Image enhancement techniques, such as improving image contrast,brightness,or applying pseudo-color mapping, make temperature anomaly areas more prominent,helping detection systems more easily identify potential faults.

Zhang et al.[56] enhanced drone captured infrared video by first applying platform histogram equalization to expand the gray level dynamic range and then using Otsu’s method to segment foreground and background,thereby accentuating fault regions.Jin et al.[57] similarly leveraged Otsu’s thresholding on the I component imagewithout prior knowledge-to isolate and extract insul ator discs, enhancing their clarity and prominence.Liu et al[58] decomposed the image via wavelet transform, processed high and low frequency subbands with homomorphic and low pass filters respectively, reconstructed via invers e transform, and applied fuzzy contrast sharpening for final enhancement.Yuan et al.[59] proposed a double threshold adaptive algorithm: pixels below or above two data driven thresholds undergo linear compression,while intermediate values receive linear stretching, resulting in balanced contrast and clearer feature delineation.

Li et a l.[61] applied histogram equalization to infrared video of porcelain column circuit breakers, expanding gray value dynamics and boosting contrast.Zhang et al.[64] used Top Hat transforms with size tuned structural elements to correct uneven brightness in substation equipment images, extra cting and emphasizing regional features for improved segmentation.Gao et al.[67] introduced maximum probability extremes into an enhanced histogram equalization scheme, greatly widening gray level range and contrast, then combined inter frame differenci ng for background modeling to suppress ghosting and sharpen distinction between background and targets.

In cases where the fault region is hard to distinguish from the complex background, image enhancement techniques such as contrast and brightness improvement and pseudo-color mapping help make temperature anomaly areas more prominent, enabling the detection system to more effectively identify potential faults.The most common methods for image enhancement include the Otsu method, fuzzy contrast algorithm, double-threshold adaptive enhancement algorithm, and histogram equalization,with structural element methods and improved histogram equalization applied in specific scenarios.Image enhancement optimizes the quality of infrared images, improving the recognizability of target regions and enhancing detection algorithms’ ability to diagnose power equipment faults.However, many existing methods suffer from over-enhancement.The Otsu + histogram equalization method may lead to the appearance of artifacts in enhanced samples [56], affecting detection accuracy.Some methods may also result in a loss of temperature resolution during image enhancement, making micro temperature anomalies with differences as small as pagenumber_ebook=111,pagenumber_book=827 difficult to detect, thus reducing the accuracy and reliability of fault detection [67].Additionally, the HSI space conversion method may cause color distortion during image enhancement, especially in high-humidity scenarios (above 30%),leading to chroma shifts.For specific scenarios, image enhancement methods may fail to adapt to dynamic environmental changes, thus posing challenges for their application in power equipment with complex backgrounds.

2.1.3 Statistical feature construction

In infrared images, potential fault areas often exhibit specific characteristics, such as temperature anomalies,texture changes, and equipment cracks.Feature extraction algorithms aim to effectively capture these key pieces of information, identifying the essential attributes of fault regions from preprocessed infrared images.Traditional feature extraction methods typically rely on manually designed features, such as texture and edge features, to identify the fault regions.

Zhang et al.[56] combined image stitching and frame differencing on drone acquired infrared video by matching SURF features across adjacent frames, aligning overlapping areas via a neural network matcher, and subtracting backgrounds to isolate transmission line details.Jin et al.[57] applied the Fisher criterion for feature selection to retain only classification relevant attributes, then used KPCA’s nonlinear mapping to extract discr iminative principal components and reduce vector dimensionality for faster classification.Liang et al.[62] exploited RGB channel transformations and piecewise linear gray scale functions alongside adaptive thresholds to delineate foreground conductors from background sky, thereby highlighting fault regions.Wu et al.[63] enhanced the Bat Algorithm-modulating search velocity, pulse frequency, and position updates by linking loudness to frequency-to balance exploration and exploitation during key feature extraction.Liu et al.[65] employed the random frog leaping algorithm to select thirty informative hyperspectral wavelengths, then derived four thermal descriptors (max, min, mean, variance) from insulator images,which correlated reliably with contamination levels.

Feature extraction from infrared images of power equipment allows for the identification of key information representing fault areas.Traditional infrared detection methods for power equipment have successfully identified and located fault regions in infrared images through manually designed features such as texture and edge features.Furtherm ore,techniques like image stitching,frame difference, Kernel Principal Component Analysis (KPCA), and the Bat Algorithm have enabled the extraction of more refined image features.Methods such as RGB color channel transformation and grayscale linear transformation have also provided new solutions for fault region segmentation and feature extraction.However, these methods often rely on manually selected and designed features,and when dealing with complex fault types and environmental changes, they show lower robustness.This results in a lack of feature completeness, with the extracted fault features being limited, which in turn creates a bottleneck in detection accuracy[62].Most methods suffer from insufficient dynamic adaptability, and changes in lighting conditions can cause a decrease in feature stability [61].Moreover, traditional feature extraction methods struggle with cross-device generalization, as the accuracy of fault detection signifi cantly drops when applying the same feature set across different device types.

2.1.4 Fault detection, classification, and segmentation

Traditional image recognition and segmentation aim to locate and perform semantic classification from enhanced infrared images.Image recognition and segmentation are also the final steps in power equipment fault detection tasks, as well as the most critical step.After feature extraction from the preprocessed infrared image data, object localization and segmentation are performed.Infrared images of power equipment often suffer from complex backgrounds and irregular fault target shapes.After preprocessing and feature extraction, the fault points become easier to recognize and locate compared to the original images.However, infrared detection algorithms still need to distinguish between power equipment and complex backgrounds.

Zhang et al.[56] applied two-dimensional Fourier transform to the binarized image after frame difference.By calculating the power spectrum and transforming it into polar coordinates, they determined the main direction of the transmission line.Based on the characteristics of the transmission line, a parallelogram window was set and translated.Using the foreground information, they located the transmission line area, and then employed the Otsu method for binarization and segmentation of the framedifferenced image to co mplete the localization and segmentation of the transmission line area.Jin et al.[57] analyzed the H, S, and I components of the HSI color space and found that the I component provides the clearest outline of the insulator disk.Therefore, they applied the Otsu method to segment the infrared image and e xtract the insulator disk area.Zhang et al.[60] also used the Otsu method to maximize the inter-class distance between the image object and background, adaptively selecting the optimal threshold based on the global grayscale histogram of the image, achieving segmentation of hot spots in the infrared image, and subse quently extracting defective areas.Liu et al.[65] employed the Otsu-based threshold segmentation method, dynamically selecting the statistically optimal threshold to separate the insulator region from the background, thereby completing the segmentation of the infrared image.

Liu et al.[58] segmented infrared images by defining color intervals in RGB space-pixels where both red and green exceeded blue within specified bounds were retained,while others were set to black-enabling the localization of target regions.Yuan et al.[59] applied Sobel edge detection to the binary mask, then used least squares ellipse fitting on six randomly sampled edge points to extract insulator disks by validating major/minor axes and orientation.Li et al.[61] detected leakage areas via an enhanced Canny based Surendra algorithm, achieving rapid contour extraction and precise localization.Liang et al.[62] modeled edge trends with conjugate gradient descriptors and enhanced line background contrast with histogram techniques; fault point coordinates were identified by analyzing gradient changes.Wu et al.[63] built a two dimensional histogram and applied maximum entropy and entropy discrimination functions to determine optimal segmentation thresholds.Finally,Zhang et al.[64] mapped local variance-computed from neighborhood mean and variance-into a two dimensional histogram, then used maximum entropy in combination with a genetic algorithm to partition target and background regions.Collectively, these approaches leverage color rules, edge fitting,gradient analysis,entropy maximization,and evolutionary search to automate region segmentation and fault localization in infrared imagery.

Infrared image recognition and segmentation of power equipment enable the identification and localization of fault areas in infrared images of power equipment after denoising, enhancement,and infrared fault feature extraction.As in Table 1, a summary of traditional methods in power equipment infrared detection is provided.Traditional image segmentation methods rely on algorithms such as the Otsu method, Sobel operator, and Canny edge detection.Most infrared segmentation and localization methods for power equipment improve fault point localization and area segmentation accuracy by optimizing these algorithms and combining different feature extraction and localization techniques.By adopting advanced methods such as dynamic threshold segmentation, maximum entropy method, and conjugat e gradient method,the fault areas in infrared images of power equipment can be accurately identified and localized, providing crucial support for subsequent fault diagnosis.However, traditional image recognition and segmentation methods suffer from increased false detection rates when there are complex background interferences, leading to significant performance degradation [59].When the object shapes are irregular, the distance errors in traditional detection methods are large, causing detection deviations.Moreover, the real-time constraints of traditi onal recognition and segmentation methods are stringent,making them difficult to meet the demands of online monitoring.Therefore, more flexible and efficient technologies are needed to address these issues.

In summary, image denoising methods primarily enhance the clarity of infrared images by suppressing environmental and sensor noise.These methods are advantageous in their ability to remove noise while preserving image details and edge information.However, they rely on predefined parameters and specific noise models, making them less adaptable to complex mixed noise conditions.Addit ionally, they may fail to retain weak fault features.Such methods are more suitable for scenarios with relatively simple noise types and high demands on detail preservation, such as the preprocessing of infrared images of insulators and transmission lines.

Image enhancement methods aim to improve image contrast, brightness, or highlight regions with abnormal temperatures, thereby making fault areas more discernible.These methods effectively increase the contrast between target areas and the background, enhancing the saliency of fault features.Nonetheless, some techniques may introduce artifacts or degrade temperature resolution due to over-enhancement.They are better suited for scenarios where fault regions exhibit low contrast with the background and subtle temperature differences, such as in the infrared image processing of substation equipment and circuit breakers.

Statistical feature construction methods focus on extracting key features that characterize faults from preprocessed images.Manually designed features can capture specific attributes of faults and reduce the complexity of subsequent classification tasks to some extent.However,these methods heavily rely on manual design and exhibit low robustness to complex faults and environmental variations, often resulting in incomplete feature representation.They are applicable in relatively simple scenarios with clearly defined fault characteristics, such as feature extraction for thermal defects in transmission lines and contamination on insulators.

Traditional methods for fault detection, classification,and segmentation can leverage the results of prior processing steps to produce final fault judgments.Some approaches incorporate optimization algorithms to improve localization accuracy.However, these methods tend to have high false detection rates under complex background interference and exhibit large deviations when detecting irregularly shaped targe ts.Their real-time performance is also limited.As such, they are more appropriate for scenarios with relatively simple backgrounds and regularly shaped fault regions, such as hotspot detection in transformers and zero-voltage faults in insulators.

Traditional image processing methods have achieved certain results in infrared fault detection for power equipment, particularly in image denoising, enhancement, feature extraction, and recognition segmentation.However,as power equipment fault types become more complexand environmental conditions change, the limitations of traditional methods are becoming more evident, especially in handling complex backgrounds, noise interference, and diverse fault shapes.To address these issue s,the introduction of deep learning technology in recent years has provided new opportunities for infrared image detection.In Section 2.2, we will delve into deep learning-based infrared detection methods such as YOLO, CenterNet, SSD, and CNN, which, through more flexible and powerful feature learning capabilities, solve some of the bottlenecks in traditional methods and provide more accurate technical support for power equipment fault diagnosis.

Table 1 Typical applications of traditional methods in infrared detection of power equipment.

pagenumber_ebook=113,pagenumber_book=829

Detection Target ReferenceImage DenoisingImage EnhancementFeature ExtractionRecognition/Segmentation[56]Bilateral FilterImage Stitching, Frame Difference + SURF[57]Insulator/Otsu’s MethodFisher Criterion + Kernel Principal Component Analysis Transmission Line Otsu’s Method +Platform Histogram Equalization Fourier Transform + Otsu’s Method HSI Color Space Analysis +Otsu’s Method Contact Network[58]Wavelet Transform/Adaptive Median Filter Fuzzy Contrast Algorithm RGB Component Judgement Method[59]Insulator/Sobel Edge Detection + Least Dual-threshold Adaptive Enhanceme nt Squares Method[60]/Hough TransformScale Invariant Feature Transform (SIFT)Transmission Line +Insulator Otsu’s Method[61]/Surendra Algorithm + Canny Porcelain SF6 Breaker Leak Area Adaptive Wiener Filter Histogram Equalizat ion Edge Detection Transmission Line[62]State Transition Matrix + Noise Transition Matrix Histogram Enhanceme nt RGB Color Channel Conversion+ Piecewise Linear Grayscale Transformation Conjugate Gradient Method[63]/Optimized Bat Algorithm2D Maximum Entropy Method Distribution Network Equipmen t Bat-Inspired Algorith m[64]/Top-Hat Transform/Maximum Entropy Method +Genetic Algorithm[65]InsulatorBilateral Filter/Random Frog Jump AlgorithmOtsu’s Method[66]XLPE CableSavitzky-Golay Filter//Adaptive Extended Gaussian Peak Derivative Re-weigh ted Least Squares Method GIS Equipmen t Substation Equipmen t[67]//ViBe Algorithm Histogram Equalizat ion

2.2 Infrared detection methods based on deep learning

In recent years, the rapid development of deep learning technology has brought new opportunities to infrared detection.Deep learning models possess powerful learning capabilities, enabling them to uncover potential features and patterns of equipment from large amounts of infrared image data.Various deep learning models have been applied to infrared detection of electrical equipment, significantly improving the accuracy and efficiency of detection.This section will provide a detailed introduction to the application of deep learning-base d infrared detection methods in electrical equipment inspection.

2.2.1 YOLO-based infrared detection method

The YOLO (You Only Look Once) seri es of algorithms[91] is a single-stage object detection algorithm based on convolutional neural networks.It transforms the object detection problem into a regression problem, where the neural network directly regresses the object’s class and location information at the output layer.The YOLO algorithm divides the image into multiple grid cells, predicting the location and class information of the object within each cell.By learning from a large amount of labeled data,the model can automatically learn the features of different objects at various positions and scales, enabling accurate object detection.The YOLO algorithm offers advantages such as fast detection speed, strong generalization ability,and good performance in detecting small objects, making it suitable for real-time scenarios.Fig.3 shows a schematic diagram of the YOLOv8 network structure.

pagenumber_ebook=114,pagenumber_book=830

Fig.3.Schematic diagram of the YOLOv8 network structure [92].

Assuming the input image isI,the YOLO algorithm divides the image intoS S grids.Each grid predicts B bounding boxes, and each bounding box contains5 C parameters, where the 5 parameters include the center coordinates of the target box (x y ), the width (w ), the height(h), and the confidence score (p ).Cis the number of categories.For each grid cell, YOLO outputs as shown in Eq.(1):

where p is the confidence score,c1 c2cc representing the probability that the predicted bounding box contains an object;x y w hare the location parameters of the target bounding box; andp1 p2pcare the probability distributions for each category.Ultimately, YOLO determines the target objects in the image by maximizing the confidence score of the predicted bounding box and the probabilities of each category.

However, the YOLO algorithm is primarily designed based on visible light natural images.Compared to visible light natural images, infrared images of electrical equipment have several differences.The backgrounds of infrared images of electrical equipment are often complex, with frequent occlusions and interference of targets.Additionally,infrared thermal images may have regions where temperature is highly concentrated, resulting in low image contrast.The external contours of the equipment are also relatively similar, making it harder to distinguish between them compared to objects in natural images.Moreover,the equipment tends to be of medium or large size, which may lead to issues such as missed detections, false positives, or repeated detections in the YOLO model.

To improve the applicability and accuracy of YOLO for infrared detection of electrical equipment,Zheng et al.[68]modified the structure of YOLOv3 by introducing the cross-stage local CSP module into the backbone network DarkNet53 and integrating the path aggregation network into the feature pyramid structure.Additionally, the CIoU(complete intersection over union) loss function was introduced, enhancing the model’s ability to detect infrared images of electrical equipment.Although the number of model parameters increased, the model achieved a detection accuracy of over 92% on a dataset of four types of electrical equipment with similar ripple structures.Zhu et al.[69] addressed issues such as complex backgrounds and low contrast in substation equipment infr ared image recognition by improving the YOLOv3 model.They employed a multi-scale Retinex algorithm with chromaticity preservation to enhance image details and suppress background interference, and optimized the prior box size using the K-means++ clustering algorithm to adapt to equipment shape features.A transfer learning strategy was also used to alleviate the challenge of small sample data training.The improved model still achieved a detection accuracy of over 95%with a small number of samples.

To enhance multi-target recognition and efficiency in infrared fault detection, several studies have refined YOLO-based architectur es with tailored preprocessing and network modifications.Chen et al.[70] standardized input sizes to 416416 pixels and employed the Yolomark tool for precise labeling of four equipment classes and heating regions; by reducing downsampling from 8 to 2 , integrating additional residual modules, slimming DBL blocks, and reweighting loss term s, they accelerated convergence and achieved a 0.83%accuracy gain alongside improved overlap estimation.Addressing diverse pseudocoloring and complex backgrounds, Wu et al.[71] augmented insulator images via random rotations and colorspace perturbations, used k means to define anchor boxes,and retrained a YOLOv3 fram ework on the enriched dataset.For photovoltaic module hotspots, Wang et al.[72]applied gamma correction for data augmentation, embedded an attention module to suppress irrelevant features,and upgraded the YOLOv4 tiny feature pyramid with a path aggrega tion network and lightweight 11 convolutions-yielding an AP50 of 98.42% at high speed.Li et al.[73] enhanced YOLOv4 by incorporating SE channel attention and substituting EIoU and Focal Loss to address class imbalance, boosting average precision by 5.61% and improving challenging sample accuracy by 8.57%.Finally, Zhang et al.[74] tailored YOLOv7 with SimAM attention, an efficient decoupling head, and skew correction, extracting central-region temperature features to diagnose transformer bushing defects with 95.50%accuracy, 97.14% recall, and 98.30% AP.

Yang et al.[75] leveraged terahertz time domain spectroscopy imaging and an improved YOLOv8 augmented with a triple attention mechanism to suppress background noise and achieved 99.8% accuracy and 99.5% AP50 in detecting internal defects of cross linked polyethylene cable joints.Zeng et al.[76] adapted YOLOX for transformer bushing inspection by integrating Mish activation,an SPPF module, and ECA channel attention, enabling voltage level localization, weighted mean grayscale processing, an d region segmentation; this yielded 98.41%recognition accuracy and faster defect diagnosis.Zhou et al.[77] constructed a lightweight D Mobilenet backbone for YOLOv7 incorporating dilated depthwise separable convolutions to expand the receptive field and applied SJS and Mosaic augmentations, resulting in a 94.1%mAP, superior robustness, generalization, and efficiency compared with YOLOv4-v7.

In summary, YOLO series algorithms, with their singlestage architecture, demonstrate advantages in infrared detection of electrical equipment, including fast detection speed, strong generalization, and good performance in small target detection.To address challenges like complex backgrounds and low co ntrast in infrared images,improvements in model structure,loss functions,and data preprocessing have effectively enhanced detection accuracy, providing technical support for equipment safety.

However, YOLO has limitations in adapting to lowcontrast regions from uneven temperature distribution, is susceptible to background thermal noise, and may be confused by similar equipment contours.Its relatively complex network structure also leads to higher computational complexity.

2.2.2 Infrared detection method based on CenterNet

CenterNet [93] is a center-point-based object detection algorithm, which primarily focuses on regressing the central point of a target to complete the object detection task.Unlike traditional boundary box-based detection methods,CenterNet locates the central point of the target through keypoint regression and uses this information to predict the size and other attributes of the target.CenterNet demonstrates strong robustness, particularly excelling in dense object detection and small object recognition.As shown in Fig.4, it is a schematic diagram of the CenterNet architecture.

Given an input image I,CenterNet first performs convolution operations on the imag e to obtain a feature map F with dimensionsH W C.Then, the algorithm predicts, through regression, whether each pixel is the center of a target and computes the size of that target.For each positioni jin the feature map, CenterNet predicts three key pieces of information: the confidence of the target center point p ij, the target’s width and height wij, hij,and the class probability cij.

pagenumber_ebook=115,pagenumber_book=831

Fig.4.Schematic diagram of the centernet network architecture [93].

The loss function of CenterNet consists of multiple components, including center point regression loss, size regression loss, and classification loss, as shown in (2):

Here, MSErepresents the Mean Squared Error loss,andCErepresents the Cross-Entrop y loss.

Due to challenges such as similar shapes of power equipment, complex backgrounds, and occlusion from abnormally hot areas in infrared images, CenterNet’s center-point detection method provides a novel approach.Through an impr oved feature extraction network and training strategy, CenterNet has achieved promising results in some infrared fault detection tasks for power equipment.

Huang et al.[78] combined CenterNet with a structured localization knowledge base to build a defect monitoring model: after augmenting infrared samples via random cropping and flipping, they trained CenterNet, then used the knowledge base to pinpoint small comp onents and fused temperature data with standards-effectively suppressing interference and boosting monitoring intelligence and accuracy.Liu et al.[79] tackled visible light inspection distortions by developing Rot CenterNet, which adds a rotation angle detection head and IoU L1 loss to reduce misses; by supporting HRNet, EfficientNet, or ResNet backbones an d incorporating DCN ASPP and D SKN modules for adaptive receptive fields, they achieved a 5.95% average accuracy improvement over vanilla CenterNet.

CenterNet localizes via center-point regression, ideal for small/dense power equipment in infrared images.Improvements boost accuracy but it’s sensitive to center shifts; manual rules limit generalization, hurting realtime performance.

2.2.3 Infrared detection method based on SSD

SSD [90] (Single Shot MultiBox Detector) is an object detection method based on convolutional neural networks(CNN).Although it belongs to the CNN family, it is discussed separately here to highlight its unique detection structure and facilitate comparison with other one-stage detectors such as YOLO and CenterNet.It extracts multi-scale feature maps through multiple convolutional kernels and detects targets of different sizes on each layer.Smaller objects are detected using lower-layer feature maps, while larger objects are detected using higher-layer feature maps.SSD generates a set of candidate boxes in each convolutional layer and predicts the object’s category and location based on the IoU (Intersection over Union)value between each candidate box and the ground truth.Fig.5 shows a schematic diagram of the SSD network structure.

For an input image I,the SSD algorithm extracts multiscale feature maps through a convolutional neural network.On each feature map, SSD predicts a set of bounding box coordinat es and category probabilities at each position.For each bounding box, SSD outputs four coordinates(x y h w)and the probability distribution p 1 p2p Cfor each categor y.

The loss function consists of classification loss and localization loss, defined as shown in (3):

Where p i is the predicted class for the box i, pagenumber_ebook=116,pagenumber_book=832 is the ground truth class, liis the predicted bounding box coo rdinates, andis the ground truth boundin g box.

Bie et al.[80] addressed the phenomenon of transformer bushing oil level dependence on temperature information by modifying the SSD loss function to include a center loss and utilizing the SLIC algorithm for oil level detection without temperature dependence.The improved SSD a lgorithm achieved an accuracy 13% higher than YOLO-v3 and a 9.5%improvement over the original SSD algorithm.The relative error in oil level recognition based on infrared images was 0.08%.

Wang et al.[81] replaced the SSD backbone network VGG with the lighter MobileNets for deployment on intelligent inspection devices.By modifying the prediction localization and classification, defaul t box generation and matching, and the model training loss function, they achieved an accuracy of 71.54%.

SSD detects multi-scale power equipment via multilayer feature maps, with improved accuracy/flexibility and suitability for smart devices.But mis matched anchor boxes, low-res feature maps, and speed-semantic tradeoffs pose issues.

2.2.4 Infrared detection method based on CNN

pagenumber_ebook=116,pagenumber_book=832

Fig.5.Schematic diagram of the ssd network structure framework [94].

Convolutional Neural Networks (CNN) are one of the most widely used models in deep learning, especially excelling in image processing and feature extraction tasks.Most modern deep learning methods are constructed with the convo lutional kernel as one of the basic building blocks,and most target detection and infrared detection models are also based on CNN.In addition to target detectionspecific networks like YOLO, CenterNet, and SSD,CNN-based models have also led to the development of target detection networks such as RCNN [95], Fast-RCNN [96], and Mask-RCNN [97], which are applied in infrared detection tasks for power equipment.Furthermore, some methods that combine CNN with other network structures and deep learning algorithms have also been used in infrared detection tasks for power equipment.Fig.6 and Fig.7 show the structural diagrams of Fast-RCNN a nd Mask-RCNN, respectively.

For the input image I,the CNN first extracts the local features of the image through convolution operations, as shown in (4):

Among them, Wis the convolution kernel, represents the convolution operation,and bis the bias.Through multiple layers of convolution and pooling layers, the CNN gradually extracts high-level features.Finally, the category and location information are output through the fully connected layer.

Faster R-CNN is a very classic and widely used object detection algorithm.It generates candidate bounding boxes through the Region Proposal Network (RPN) and uses Fast R-CNN for object classification and location regression.The loss function of Faster R-CNN consists of two parts: classification loss and regression loss, as shown in (5):

Fig.6.Schematic diagram of the network structure framework of Fast-RCNN [96].

Fig.7.Schematic diagram of the network structure framework of Mask-RCNN [97].

Among them, Lclsis the classification loss, representing the prediction loss of the object category;Lregis the bounding box regression loss, representing the error between the predicted box and the ground-truth box;p i and tiare the predicted category probability and bounding box parameters respectively; pagenumber_ebook=117,pagenumber_book=833 and are the ground-truth category label and the ground-truth bounding box.

Mask R-CNN is an extension of Faster R-CNN.In addition to object detection, it can also generate pixellevel segmentation masks for each object.It is a typical instan ce segmentation algorithm.The loss function of Mask R-CNN includes classification loss, bounding box regression loss, and encoding loss,Lmaskis the mask loss,and binary cross-entropy is usually used, as shown in (6):

Here, pagenumber_ebook=117,pagenumber_book=833 represents the predicted mask value of the j th pixel of the i th object, andis the ground-truth mask.

Liu et al.[82] created a manually annotated infrared dataset of various heating faults and applied Faster RCNN-alternately training its RPN and Fast R CNN modules-to shared convolutional layers.At a confidence threshold of 0.4, the model eff ectively suppressed interference and accurately located abnormal heating points,advancing intelligent fault localization in power equipment.Li et al.[83] enhanced the R3Det model for rotated object detection of porcelain insulators and used Faster R CNN to detect regions such as bushings and current transformers.By grouping similar equipment and compu ting temperature differences against a threshold,they achieved a 90.65%mAP,81.39%defect type accuracy,and a 9.62% false alarm rate.

Chen et al.[84] developed a MobileNet based classifier for power equipment infrared images, leveraging ImageNet pretraining and extensive data augmentation to mitigate overfitting.They calibrated pixel gray levels to actual temperatures via color bar fitting, enabling automatic hotspot extraction and fault diagnosis, and achieved 95.7%classification accuracy with a 0.29% temperatureestimation error.Xu et al.[85] introduced an improved R FCN for high voltage lead joint faults, enhancing residual blocks for multi scale feature fusion and employing OpenCV for result refinement; this raised average precision to 80.76%, an 8.43% gain over the baseline.Wang et al.[86] replaced CNN fully connected layers with global average pooling and swapped Softmax for a nonlinear SVM, reducing parameters and boos ting photovoltaic module fault recognition accuracy to 82.85%—a 4.3%improvement.Guo et al.[89] combined a multi scale convolutional network with a channel wise attention mechanism, yielding 95.9% prediction accuracy, 15% above traditional methods and 5% over other deep learning approaches.

Wang et al.[90] accelerated Mask R CNN by integrating MobileNetv3 as the backbone and addressed class imbalance through data augmentation and GHM Loss,achieving an 89.72% fault diagnosis rate, a 6.78% false alarm rate, and 216 ms per frame.Yu et al.[87] combined infrared image feature extraction (temperature, texture,shape) with SMOTE augmented sampling and a graphbased semi supervised netw ork to reach 84.5% accuracy while enhancing both classification and generalization.Li et al.[88] designed an R UNet-GANR UNet hybrid for precise insulator string segmentation, refined positioning via Hough transforms, extracted HSV values, and employed a sequ ence similarity distance measure to reliably diagnose low and zero value insulators with sensitivity over 1.2.

CNN-based infrared detection extracts fault features from complex images, with optimized networks and techniques like attention mechanisms achieving good results and easing class imbalance.But two-stage method s are less efficient, and instance segmentation models need precise annotations, while power equipment fault samples are scarce.

In summary, YOLO based infrared detectors use a single stage grid regression framework to achieve high speed and strong generalization-particularly for small objects and real time monitoring-but struggle with complex backgrounds, low contrast, and similar contours, making them ideal for rapid localization in substations, photovoltaic arrays, and cable joints.CenterNet based methods regress keypoints at object centers to deliver robust performance in dense and small object settings; however, they are sensitive to heat diffusion or occlusion errors and depend on manually designed knowledge bases, suiting applications such as fine scale defect inspection in densely packed electrical components.SSD based approaches exploit multi scale feature maps to address variable object sizes and,when enhanced and paired with lightweight backbones,offer flexible, accurate detection for coexisting equipment scales;nonetheless,their predefined anchors and low resolution base maps can limit detail, making them most appropriate for mixed scale scenarios like transformer bushings and general power system elements.

As in Table 2, Deep learning has revolutionized infrared power-equipment inspection: end-to-end models like YOLO and CenterNet auto-extract thermal features, while SSD with feature pyramids handles scale variation, and lightweight backbones such as MobileNet enable embedded deployment.Yet challenges remain-limited labeled data, class imbalance, domain shifts, and background noise still cause misses and false alarms, and lightweight designs often trade accuracy for speed.Future work should explore self supervised and few shot methods,multi modal fusion of acoustic, visible, infrared, and partial discharge data, dynamic online adaptation, and edge intelligence optimization to balance real time performance with robust detection.

Table 2 Typical applications of deep learning algorithms in infrared detection of power equipment.

pagenumber_ebook=118,pagenumber_book=834

Detection Target ReferenceDetection Method Fault TypeDataset/Data CollectionResolution[68]YOLOv3Overheating Fault of Power Equipment Power Equipment Data collected by a power grid company from energized equipment in substations, augmented with 4323 images,including 635 arresters, 2470 current transformers, 6084 insulators, and 1291 circuit breakers 608 608[69]Overheating Fault of Substat ion Equipment Substation Equipment Retinex +YOLOv3 Infrared images of arrester, breaker, and CT devices from multiple substations in Guangzhou, Guangdong, from 2012 to 2017, totaling 509 images 640 640[70]YOLOv3Overheating Fault of Power Equipment Power Equipment 2936 infrared images of typical electrical equipment (including suspended insulators, isolator contacts, bushings, and fittings)collected by maintenance departments 416 416[71]InsulatorYOLOv3Insulator Fault2073 infrared images collected on-site256 256 Photovoltaic Modules[72]YOLOv4-tiny416 416 Hotspot in Photovoltaic Modules Infrared images of real and simulated hotspots from a large photovoltaic power plant in western China, with a total of 750 images[73]Overheating Fault of High Voltage Bushing High Voltage Bushing YOLOv4 +Channel Attention 1200 infrared images collected by infrared thermography in substations 608 608[74]Overheating Defect in Transform er Bushing Transformer Bushing YOLOv7 +SimAM Attention Mechanism Public dataset with 720 infrared inspection images of transformer bushings 640 640[75]YOLOv8Internal Defects of Cable Composite Insulation Structure Cable Composite Insulation Structure 3-layer composite insulation samples with defects premanufactured in a laboratory, imaged with terahertz imaging/[76]Overheating Defect in Transform er Bushing Transformer Bushing YOLOX +Efficient Channel Attention 600 infrared images of transformer bushings collected on-site at a substation and from the internet 640 640[77]Ceramic Insulator Pillar YOLOv7 +Transfer Learning Overheating Fault of Ceramic Insulator Pillar[78]Overheating Fault of Power Equipment Power Equipment CenterNet +Structured Localization 450 infrared images of ceramic insulators from a substation640 640 1525 infrared images of busbars, circuit breakers, and other power equipment collected by an inspection robot paired with an infrared thermography camera 640 480[79]512 512 Transmission Line Equipment Rot-CenterNet +DCN-ASSP + DSKN Transmission Line Faults TransLine-2020 transmission line dataset, with 12,000 augmented images; DOTA public dataset cropped into 10,530 images[80]Improved SSDOil Leakage in Transform er Bushing 200 infrared images of oil-paper insulated bushings collected from a substation, augmented to 1000 images Transformer Bushing[81]Improved SSDOverheating Fault of Power Equipment 640 480 Power Equipment 867 infrared images of typical faults, including transformers,current transformers, arresters, etc.from State Grid operation and inspection departments 480 480[82]Faster RCNN1270 infrared images from the power grid company’s infrared fault image library Power Transmission Equipment Overheating Fault of Power Transmiss ion Equipment/[83]Voltage-induced Heating Defect in Substation Equipment Substation Equipment R3Det + Faster RCNN 5260 infrared images of three-phase devices collected by a provincial network company 360 480[84]Overheating Fault of Substat ion Equipment Substation Equipment MobileNet +Transfer Learning 500 infrared images of main transformers, bushings, arresters,high-voltage switchgear, and GIS from Shenzhen Electric Power Supply Bureau 320 240[85]10,000 infrared images from a substation’s high-voltage lead connector, self-built dataset High Voltage Lead Connector ResNet +Improved RFCN Overheating Defect in High Voltage Lead Connector/[86]CNN-SVMMulti-class Faults in Photovolt aic Modules Photovoltaic Modules InfraredSolarModules dataset with 20,000 infrared images of 12 fault types, collected by Raptor Maps using drones with infrared cameras 40 24

Table 2 (continue d)

pagenumber_ebook=119,pagenumber_book=835

Detection Target ReferenceDetection Method Fault TypeDataset/Data CollectionResolution[87]Overheating Fault of Substat ion Equipment Substation Equipment SMOTE + Semisupervised Learning 154 labeled and 217 unlabeled infrared images of substation insulators collected by a power company/[88]GANR-UNetLow/Zero Fault in Ceramic Insulator/EPR Cable Termination Ceramic Insulator Self-built dataset, 335 infrared images obtained by data augmentation of insulator images[89]Self-built dataset, 8000 PRPD spectrogram images for four defect types (circumferential scratches, metal particles, semiconductive layer spikes, and air gaps)CNN +Attention Mechanism Partial Discharge Defect in EPR Cable Termination 256 256[90]Improved Mask R-CNN Transformer Insulation Bushing Overheating Fault in Transform er Insulation Bushing 1256 normal and 86 faulty infrared images of transformer insulation bushings provided by a provincial electric power company/

Table 3 Typical applications of feature extraction methods and deep learning algorithms in voiceprint detection of power equipment.

Voiceprint Feature Extraction ReferenceCore Model and AlgorithmFault TypeData Collection Research Object[99]MFCCIntroduced attention mechanism to improve algorithm recognition performance; feature recognition based on deep neural networks.[101]Using 3D convolutional neural networks to recognize faults.MFCC + Local Linear Embedding Algorithm Gas Insulated Switchg ear(GIS)10 kV Transform er Energy storage motor voltage anomaly; drive motor voltage anomaly;transmission mechanism jam.Loose core 40%, 80%,100%48 kHz sampling, each fault state repeated 3 times.50 kHz sampling.[102]MFCC Deep Belief Network for feature extraction; SVDD for voiceprint feature and defect type association analysis.Power Transform er Core looseness; winding looseness; winding deformation; core grounding; partial discharge.Training set: 6400 groups;test set:1600.[107]Sound signals collected under 0.8PN-1.3PN load; each signal segment is 15 s long; training set:660 groups, test set: 60 groups.GFCC +Information Entropy Construct random forest for classification; whale optimization algorithm for optimizing the size and feature subset of decision tree base classifiers.10 kV Drytype Transformer Core looseness; winding looseness.[112]Fault type recognition based on BP neural network; improved wavelet packet decompo sition algorithm to enhance low-frequency signal recognition accuracy.Independent Component Analysis +Wavelet Packet Energy Distribution[122]MFCCIntegrated Gaussian Mixture Model into the recognition algorithm; GE2E loss function to improve recognition stability and accuracy.[124]Optimized VMD and ITCN parameters using improved Hunter-Prey Optimizatio n Algorithm;introduced Mish activation function.VMD Denoising +MFCC[127]Improved Gram Angle Field Integrated channel attention mechanism and Center-Softmax loss function.[130]High/Low Frequency Domain Split Optimized penalty factor and kernel width of SVM using hybrid frogleaping algorithm.Power Transform er Power Transform er 800 kV Converter Station EPR Cable Termina tion Three-phase Oilimmersed Transform er Metal parts rubbing with coil; flat electrode,corona, surface discharge.Different sound source positions Core looseness, winding looseness, DC magnetization.Semi-conductive residue,circumferential scratch,air gap defect, metal debris.Core looseness, winding looseness.24 kHz sampling; 7 transformers at different voltage levels, 450 samples for each fault type.Public dataset TIMIT.180 samples per class in training set; 20 samples per class in test set.1 kHz sampling, 4 defect types, 4 cable samples for each defect, 3 signal collections per cable, 15 segments of 800 points per signal.120 samples per class in training set; 60 samples per class in test set.

3 Fault detection for power equipment based on voiceprint

The voiceprint signals generated during the operation of power equipment contain rich state information, and their non-contact collection characteristics provide an important technical path for equipment health monitoring.This chapter systematically combs the intelligent detection methods of power equipment based on voiceprints,focuses on the research of voiceprint feature extraction and intelligent diagnosis models,and the representative methods are shown in Table 3.Firstly, this chapter conducts an indepth analysis of core feature extraction technologies such as the Mel-frequency cepstral coefficient and the γ-root frequency cepstral coefficient, revealing their feature representation capabilities under fault modes such as mechanical vibration and partial discharge.Secondly, it systematically reviews the innovative applications of deep learning algorithms such as the Long Short-Term Memory network and the Convolutional Neural Network in timeseries feature modeling and pattern recognition, and verifies the adaptability of different models under complex working conditions in combination with typical cases.To support technical comparison and engineering practice,this chapter has sorted out the key parameters and experimental data of the current mainstream voiceprint detection methods, and carried out a systematic comparison from the dimensions of feature extraction methods, core model architectures, research objects, and data collection specifications, aiming to provide a theoretical basis and technical reference for the optimization and engineering application of the voiceprint detection technology of power equipment.

3.1 Voiceprint feature extraction method

3.1.1 Mel-frequency cepstral coefficient

The Mel Frequency Cepstral Coefficient(MFCC)[98] is a feature parameter widely used in the fields of frequency analysis and speech processing, and it is of great significance for analyzing and understanding the characteristics of audio signals.The calculation of MFCC is based on the nonlinear characteristics of the human auditory system’s perception of sound frequency the Mel frequency scale.In human auditory perception,the perception of frequency is not linear.Instead, the resolution is high in the low-frequency band and low in the high-frequency band.MFCC takes advantage of this characteristic to convert the audio signal from the linear frequen cy to the Mel frequency,which is more in line with the laws of human auditory perception.The overall calculation process is shown in Fig.8.

pagenumber_ebook=120,pagenumber_book=836

Fig.8.Flow chart of MFCC calculation [98].

During the normal operation and fault states of power equipment, there will be differences in the frequency,amplitude, duration, etc., of the sound signals it generates.MFCC can effectively extract this difference information and use it as the characteristic parameter of the equipment’s operating stat e.For example, when a mechanical fault occurs in the equipment,abnormal high frequency noise or lowfrequency vibration may appear in the sound signal, and these changes can be reflected by MFCC.

Chen et a l.[99] selected gasinsulated switchgear as the research object.After pre processing the sound signals, such as frame division and windowing, they used MFCC for audio feature extraction.Considering the sound characteristics of power equipment, they adopted the short-time MFCC coefficients as the feature extracti on method.Meanwhile, an attention mechanism was introduced to improve the recognition performance of the algorithm, and the F1 scores under different fault conditions generally exceeded 90%.Xiao et al.[100] pre processed the collected voiceprint data with a Hamming window,combined the static and dynamic features extracted by MFCC into a new feature vector, and input it into a Naive Bayes classifier for training.To accurately extract the acoustic features of the transformer core loosening fault,Cui et al.[101] used the Local Linear Embedding algorithm to improve the existing MFCC feature vectors through dimensionality reduction, and used a three dimensional convolutional neural network to identify the transformer core loosening fault.The improved MFCC feature vector extraction algorithm and the 3D-CNN model have good recognition results, with an accuracy rate as high as 98.33%, and the average iteration time can be shortened to 8.51126 s.To address the issue of insufficient comprehensive monitoring of power transformers in existing methods,Xiang et al.[102] constructed the MFCC features of the acoustic signals of power transformers.At the same time,they used a Deep Belief Network[103] to learn and train the MFCC features to complete the extraction of deep acoustic features.They also used a Support Vector Machine to conduct a correlation analysis between the acoustic features and the defect types to achieve accurate identification of the defect types.

MFCC can effectively extract the difference information in frequency, amplitude, duration, etc., of the sound signals of power equipment under normal operation and fault states, and has been widely applied in relevant fields.MFCC mainly reflects the frequency characteristics of voiceprint signals.For some complex power equipment faults,it may not be able to comprehensively capture all the fault pagenumber_ebook=121,pagenumber_book=837 characteristics, and it is relatively sensitive to environmental noise.Therefor e,the main challenges faced by MFCC are how to reduce data dependence and improve the algorithm’s adaptability in different environments.

In power equipment fault detection, MFCC is key for voiceprint feature extraction, simulating human auditory nonlinearity to enhance weak fault feature extraction, suitable for non-stationary signals.However, it depends on preprocessing parameters, lacks noise robustness, and struggles with multi-source signals.Future work may integrate attention mechanisms and transfer learning.

3.1.2 Gammatone frequency cepstral coefficients

Gammatone Frequency Cepstral Coefficients (GFCC)[104] mimic the human cochlea’s nonlinear frequency response by filtering the signal through a bank of gammatone filters.Each filter emphasizes a specific frequency band, capturing energy variations that correspond to auditory perception.After computing the filter bank energies, a logarithmic compression enhances subtle spectral differences, and DCT projects the result into the cepstral domain.The resulting GFCCs encode detailed frequency dependent characteristics of the voiceprint,offering improved low frequency resolution and robustness for progressive fault diagnosis compared to traditional MFCC.

GFCC has unique advantages in extracting the voiceprint features of power equipment.It provides an effective technical means for the intelligent fault de tection of power equipment and helps improve the accuracy and reliability of fault detection.Zhou et al.[105] proposed an extraction algorithm for the feature parameter GFCC.Taking the partial discharge ultrasonic signals of particle discharge and floating discharge as examples, they extracted the GFCC of these discharge ultrasonic signals.This method can realize the online intelligent non destructive detection of the insulation state of highvoltage equipment,breaking through the limitations of traditional detection and diagnosis methods that rely on a certain threshold or human ear recognition.Dang et al.[106] used the gammatone filter bank to match the real features of the transformer acoustic signals, calculated the feature parameters of GFCC, and carried out recognition through the high precision VGG16 convolutional neural network.They used deep learning methods to measur e and analyze the acoustic signals of 10 kV dry-type transformers under normal and typical fault conditions to better identify the appropriate state of power transformers.Geng et al.[107]introduced information entropy to extract the main sound feature information from GFCC.They used the whale algorithm to optimize the scale of the decision tree base classifiers and the feature subsets in the random forest,and constructed a classification model for typical mechanical faults of transformers based on the optimized random forest.Geng et al.[108] first used the Gammatone filter bank to decompose the frequency of the acoustic signals of power transformers to obtain a rich set of GFCC time frequency map samples.Then, they used the AlexNet[109] convolutional neural network to extract the intrinsic features of the GFCC timefrequency map samples of the transform er acoustic signals and used them as the input for the classifier for recognition.

In power equipment voiceprint extraction, GFCC simulates human auditory nonlinearity, capturing subtle noise changes via filter banks, aiding in detecting transformer and high-voltage equipment faults.Yet, it has high computational complexity, strict data quality demands, and limited generality, with future work on lightweight designs and multi-modal fusion.

3.1.3 Variational mode decomposition

Variational Mode Decomposition (VMD) [110] is an emerging adaptive signal processing method that demonstrates unique advantages in the field of intelligent fault detection of power equipment.The core principle of VMD lies in constructing and solving a variational problem.In this process, each Intrinsic Mode Function(IMF) is first regarded as an amplitude-modulated and frequency-modulated signal.The analytic signal is obtained through the Hilbert transform to determine the instantaneous frequency of each IMF.Then, by minimizing the sum of the estimated bandwidths of each IMF component, the optimal IMF decomposition result is found, while ensuring that the original signal can be accurately reconstructed by these IMF components.When solving the variational problem, the alternating direction method of multipliers is used for iterative operations,continuously updating each IMF component and the corresponding center frequency until the convergence condition is met, as shown in (11):

pagenumber_ebook=121,pagenumber_book=837

The objective of the variational problem is to minimize the sum of the estimated bandwidths of each IMF component, while ensuring that the sum of all IMF components is equal to the original signal.

On the one hand, VMD is a completely non-recursive algorithm, which avoids the common mode mixing problem in Empirical Mode Decomposition (EMD).It can more accurately decompose the signal into different intrinsic modes, making the decomposition result more stable and reliable.On the other hand, VMD has stronger robustness to noise.In the complex operating environment of power equipment, it can effectively process signals containing noise and extract purer fault characteristic information.

Chen et al.[111] analyzed several traditional harmonic detection methods for the problem of power grid harmonic detection and proposed a power grid harmonic detection method based on VMD for the first time.The VMD method was used to decompose the power grid signal containing harmonics into a series of IMFs.Then,the Hilbert-Huang Transform was applied to the decomposed IMF components to obtain the instantaneous frequency and instantaneous amplitude of each IMF component.Yu et al.[112] conducted research on the sound generation mechanism analysis of various faults of power transformers, the collection and separation of mixed sounds, the feature extraction of sound signals, and the identification of fault types.By combining the application of artificial intelligence technologies such as the independent component analysis algorithm, the wavelet packet energy distribution vector, the Mel logarithmic spectrum, and the BP neural network algorithm, they monitored the transformer without affecting its normal operation.Gao et al.[113] proposed an intelligent monitoring system for substation switch fault detection based on voiceprint recognition.They used VMD to process the voicepr int signals of substation switches to achieve accurate detection of switch faults.Shan et al.[114] used VMD to remove the highfrequency components of the motor noise and extract the voiceprint features of the Mel spectrum.Then, they used a convolutional neural network to extract the Mel voiceprint features again to fully obtain the highdimensional abstract features representing bearing faults.

In power equipment fault detection, VMD adaptively decomposes non-stationary voiceprint signals into IMFs,avoiding mode mixing and preserving features in noise,aiding weak fault extraction.However, it has high computational complexity, relies on manual mode number selection, and lacks robustness to strong impulse noise, with future work on optimization and fusion.

3.1.4 Linear predictive cepstral coefficient

The Linear Predictive Cepstral Coefficient(LPCC)[115]is a kind of important characteristic parameter in the fields of speech signal processing and audio analysis, playing a key role in numerous acoustic application scenarios.The calculation of LPCC is based on the theory of Linear Predictive Coding (LPC).LPC is a technology that uses past samples to predict the current sample, and its core assumption is that there exists a linear correlation among the samples of the speech signal.For a segment of speech signal,by establishing a linear prediction model, the value of the current sampling point can be approximated by the weighted sum of several past sampling points, as shown in (12):

Among them, s nis the predicted current sample value,s n k is the past sample value,a kis the prediction coefficient, and p is the prediction order.The determination of the prediction coefficient a kis usually achieved by minimizing the mean square value of the prediction error,that is, finding a set of a kto minimize the MSE.

Gong et al.[116] preprocessed the voiceprint data in the database based on wavelet transform and LPCC to obtain the main fault features.Jain et al.[117] compared MFCC and LPCC in terms of the time concept, feature extraction effect, etc., and measured their advantages and disadvantages in different environments.

In power equipment voiceprint extraction, LPCC uses linear prediction to capture short-term signal correlations,suitable for high-energy low-frequency faults, with low complexity and good low-SNR stability.But it relies on experiential prediction order, underuses phase info, and has poor cross-device adaptability; future work includes parameter tuning and expanded time-frequency analysis.

In summary, MFCC align with human auditory perception and effectively capture frequency and amplitude variations associated with equipment faults, rendering them suitable for transformer fault detection; however, they are sensitive to environmental noise and may fail to capture the full complexity of fault signatures.GFCC enhance resolution in the low frequency band and are well suited for the precise diagnosis of progressive faults, yet their high computational complexity, limited generalizability,and dependence on high quality data restrict wider applicability.VMD avoids mode mixing and can process noisy,non stationary signals to extract clean fault features,but it incurs significant computational overhead and requires manual specification of the number of modes.LPCC are computationally simple and maintain stability under low signal to noise ratio conditions, making them appropriate for real time monitoring of low frequency vibration faults;however, they rely on empirically chosen parameters and exhibit poor cross equipment adaptability.

pagenumber_ebook=122,pagenumber_book=838

Fig.9.Architecture diagram of the LSTM [114].

3.2 Voiceprint detection methods based on machine learning and deep learning

3.2.1 Long short-term memory network and gated recurrent unit

The Long Short-Term Memory (LSTM) network [118]is a special type of recurrent neural network proposed to address the problems of gradient vanishing and gradient explosion.It introduces a set of structures called ‘‘gates”.Through the control of these gates, as shown in Fig.9,the LSTM can selectively forget and store information,thereby more effectively capturing the dependencies in long sequences.

LSTM networks excel at modeling gradual and discrete time series changes in voiceprint data for fault detection.For instance, they track slow shifts in transformer winding or core grounding faults as the signal evolves from normal to faulty states.Similarly, by analyzing the temporal patterns during circuit breaker operations, LSTMs can detect mechanical issues like refusal to operate or misoperation.

Yu et al.[119] analyzed the dependencies between the voiceprint features at the previous and subsequent moments from both the forward and backward directions through the bidirecti onal LSTM,and captured the process from normal operation to overload or abnormal discharge.Qi et al.[120] used the bidirectional LSTM to make up for the deficiency of the CNN model in learning long-term sequential modeling and achieved high prediction accuracy in an environment with a high signal-tonoise ratio.

Due to the overly complex structure of LSTM, which has a large number of parameters and a large amount of calculation, the Gated Recurrent Unit (GRU) [121] was proposed to solve the above problems.Its design motivation is to simplify the model structure and reduce the number of parameters while retaining the a bility to capture the dependencies in long sequences, thereby reducing the calculation cost and improving the training efficiency.

GRU simplifies the structure of LSTM and mainly includes a reset gate and an update gate.The reset gate is used to determine how to combine the new input information with the previous hidden state.The closer its value is to 0, the less information of the previous hidden state is retained; the closer it is to 1, the more information is retained.The update gate determines how much of the past hidden state needs to be retained and how much of the new candidate hidden state needs to be added to the current hidden state.

Compared with LSTM, GRU has a more concise structure and a faster convergence training speed, so it is widely used in the field of voiceprint signal processing.Cui et al.[122] combined the traditional GMM algorithm with a deep neural network in the GE2E model.First, they optimized the bidirectional GRU using the SGD algorithm to improve the model’s ability to represent voiceprint features,and at the same time used the GE2E loss to improve the training stability of the model.Abulizi et al.[123] used MFCC to extract the voiceprint features of analog signals and GRU to extract the sequential features.

In power systems, LSTM and GRU use ‘‘gate” structures to capture long-term dependencies in complex,sequential voiceprint signals, aiding extraction of sequential features in progressive faults, with GRU suiting realtime scenarios.However, they have high complexity, poor noise stability, struggle with multi-source signals; future work includes dynamic gating optimization and crossmodal modeling.

3.2.2 Convolutional neural network

The one-dimensional convolutional neural network(1D-CNN) is a deep learning model for processing onedimensional sequential data.Its core operation is the convolution operation, which can automatically extract local features from the input data.The convolution kernel of 1D-CN N is a one-dimensional vector with a specific length and weight parameters.These parameters are continuously adjusted during training to learn effective features of the data.

In the convolution operation, the convolution kernel slides from left to right on the one-dimensional input data at a fixed step size.At each position, the weights of the convolution kernel are multiplied by the corresponding elements of the input data and then summed up to obtain a scalar value, which is the feature value extracted at this position.When the convolution kernel traverses the entire input data,the process of extracting local features from the input data is completed, as shown in (15).

Among them, pagenumber_ebook=123,pagenumber_book=839 is the input one-dimensional sequential data,n is the sequence index,is the j-th convolution kernel, i is the internal index, and Mis the length of the co nvolution kernel.

Two-dimensional Convolutional Neural Network (2D CNN) is mainly used to process two-dimensional data.The input data is either a two-dimensional matrix or a three-dimensional tensor.Its convolution kernel is a twodimensional matrix, which also has a specific size (such as 3 3, 5 5, etc.) and weight parameters that need to be learned during the training process.During the convolution operation, the two-dimensional convolution kernel slides on the two-dimensional input data at a certain step size in both horizontal and vertical directions.Each time it slides to a position, the weights of the convolution kernel are multiplied element-wise with the corresponding elements of the input data, and then all the products are summed up to obtain a scalar value.This value is the feature value extracted at that position.After the convolution kernel traverses the entire two-dimensional input data, a two-dimensional feature map is generated, as shown in(16).

In principle, voiceprint signals are essentially onedimensional time-series signals.CNN have a powerful ability to extract local features and can effectively capture the local patterns and features contained in voiceprint signals.Key features in voiceprints, such as pitch and formants, have local correlations in the time domain.The convolution kernels of CNN can slide over voiceprint signals to automatically identify these local features, providing an important basis for subsequent voiceprint recognition and verification.

In terms of data adaptability, the feature distribution of voiceprint data is complex and has a certain degree of translational invariance.That is, the voiceprint features of the same device may have a certain translation on the time axis, but the essential features remain unchanged.The convolution and pooling operations of CNN can well adapt to this characteristic.The convolution operation can extract diffe rent types of features through different convolution kernels, and the pooling operation can reduce the data dimension while retaining important features,enhancing the robustness and generalization ability of the model for voiceprint features and enabling it to work stably in different environments and conditions.

In terms of computational efficiency, CNN greatly reduce the number of model parameters through the weight-sharing mechanism.In the process of voiceprint feature extraction, weight sharing allows the convolution kernels to be reused across the entire voiceprint signal,avoiding the calculation of a large number of parameters in fully connected networks,reducing computational complexity, and increasing the speed of training and inference to meet the real-time requirements of practical applications.

Li et al.[124] input voiceprint signals into a onedimensional time-series convolutional network and used the Mish activation function and the hunter-prey algorithm to jointly optim ize the model parameters.To make full use of high-signal-to-noise-ratio data, Shan et al.[114] removed high-frequency noise from voiceprint signals based on VMD and mapped the extracted Mel spectrograms into high-dimensional feat ures through a CNN,effectively improving the model’s representation ability.Lu et al.[125] extracted the Mel spectrograms of voiceprint signals as two-dimensional features and input them into a CNN.They compared five common CNN architectures, including ResNet, GoogleNet, DenseNet, and MobileNet, and used performance indicators such as accuracy,recall,and training time to measure the classification performance of different models, verifying the superiority of MobileNet.

pagenumber_ebook=124,pagenumber_book=840

Fig.10.Architecture diagram of the self-attention [126].

In power equipment voiceprint processing, CNN enable hierarchical feature extraction via local receptive fields and weight sharing, with 1D handling time-series and 2D enhancing low-frequency vibration features, autolearning multi-scale features for non-stationary signals.However, they lack long-range time-series modeling, have high complexity, rely on large labeled data; future work includes lightweight designs, attention mechanisms, and multi-modal fusion.

3.2.3 Self-Attention mechanism

The self attention mechanism, first popularized by the Transformer model [126], enables a network to weigh the importance of each element in an input sequence by comparing every position to every other, as shown in Fig.10.This allows the model to dynamically focus on the most relevant parts-whether words in a sentence or time steps in a signal-while capturing long range dependencies in a single pass.Because self attention operates in parallel and adapts its focus ba sed on context, it offers both effi-ciency and flexibility.In fault detection, self attention helps models highlight critical features in voiceprint or infrared sequences, improving sensitivity to subtle anomalies.

In order to efficiently extract the operating status of power equipment,Chen et al.[99] used the attention mechanism to extract the fault features from the mid-term MFCC, so as to better model the long-distance dependencies.Jiao et al.[127] used feature transformation to convert the one-dimensional time-series signal of partial discharge into a two-dimensional topological feature image.They combined the residual network with the inter-channel attention, an d simultaneously fused the Center and Softmax loss functions for training, recognition,and classification, further improving the accuracy.

The self-attention mechanism flexibly captures longrange dependencies in voiceprint sequences, adapting to long-distance data links and highlighting key position importance in fault detection.However, it has high computational complexity (quadratic with sequence length),weak local feature capture, and poor low-SNR robustness.Future research will focus on optimization like low-rank approximation, hybrid architectures, and cross-modal fusion to adapt to power equipment monitoring needs.

3.2.4 Support vector machine

Support Vector Machine (SVM ) [128] is a powerful supervised learning algorithm widely used for classification and regression.It works by mapping input data into a high dimensional feature space via kernel functions and finding the hyperplane that maximally separates classes.The samples nearest this boundary-called support vectors-define the model and ensure strong generalization.In power system fault voiceprint detection, SVM effectively handles the complex,subtle differences between normal and fault related audio signals.By transforming nonlinearly separable voiceprint features into a space where they become linearly separable,SVM can accurately distinguish multiple fault types from normal operating conditions.

In terms of the number of samples, since the frequency of power grid faults is relatively low and there are diverse fault types.Based on the principle of structural risk minimization, SVM can still learn effectively with small sample data, avoiding overfitting problems.Thus, it ensures that the model trained on limited samples has good generalization ability and can accurately detect unknown power grid fault voiceprints.To reduce the need for professional knowledge in transformer fault diagnosis, Wang et al.[129] performed denoising based on empirical mode decomposition and used SVM to classify the dimensionalityreduced voiceprint features, making full use of the feature distribution rules.To reduce the cost and improve the accuracy of transformer fault diagnosis,Liu et al.[130]utilized the significant differences in voiceprint features under different transformer states and adopted the hybrid shuffled frog-leaping algorithm to update the parameters of SVM,achieving a significant improvement in diagnostic accuracy.

SVM, based on structural risk minimization, works well with small samples in power equipment voiceprint recognition, mapping nonlinear features via kernels for efficient classification and showing robustness to noise.However,it depends heavily on kernel selection and parameter tuning, has high computational complexity, and struggles with multi-class tasks; future research will focus on adaptive kernel optimization and hybrid models.

In summary, LSTM and GRU excel at modeling temporal dependencies in voiceprint signals and mitigating vanishing gradients, but incur high computational costs and underutilize non sequential features; they are best for analyzing time varying faults.CNNs automatically learn local, translation invariant features from time-frequency maps but capture temporal context poorly and require large annotated datasets; they suit pattern recognition in spectrograms.Self attention mechanisms flexibly model long range dependencies and highlight salient features, yet suffer from high complexity and overfitting in small sample settings; they improve accuracy in complex detection scenarios.SVMs build optimal separating hyperplanes with low computational cost and strong generalization on small datasets, but handle high dimensional data poorly and rely on empirical kernel selection; they are appropriate for low dimensional fault classification tasks.

4 Power system fault detection datasets

The development and evaluation of intelligent fault detection algorithms heavily rely on high-quality datasets that reflect real-world power equipment operating conditions.With the powerful feature extraction and pattern recognition capabilities of deep learning, models can learn the differences in features between normal and faulty equipment from a vast amount of infrared image data,enabling accurate fault diagnosis and classification.Existing infrared p ower equipment fault detection datasets primarily include equipment such as transformers and highvoltage cables, and are used for detecting faults like overheating and partial discharge.This paper summarizes some of the common infrared image datasets for power equipment, as shown in Table 4.

In this section, we summarize the current publicly available datasets for power equipment fault detection.Although current infrared image datasets for power equip-ment cover various types of equipment and faults, they have significant shortcomings.On the one hand, the scale of these datasets is generally limited, making it difficult to meet the data requirements of deep learning.On the other hand, due to insufficient data, it is often necessary to finetune models pre-trained on natural image datasets.However, the large differences between the two types of data can easily lead to model overfitting, severely affecting the accuracy of fault detection.

Table 4 Power equipment infrared image dataset.

pagenumber_ebook=125,pagenumber_book=841

DatasetEquipment Failure TypeScale IMPR2Surge Arresters, Circuit Breakers, Bushings, Oil Tanks, Current Transformers, Isolators672 CVPOverheating of Transmission Line Clamps282 DCVPSurge Arresters, Circuit Breakers, Bushings, Current Transformers, Isolators1640 IFECVSurge Arresters, Circuit Breakers, Bushings, Line Clamps, Oil Tanks, Current Transformers, Isolators, Heat Exchangers,Insulators, Transformers 655 IRmonitorLocalized Overheating461 BDZSubstations809 HVDDOverheating of High-Voltage Power Equipment278 OHOverheating of Cables2393 InsulatorOverheating of Insulators440

5 Conclusion and outlook

5.1 Research summary

As the scale and complexity of power systems continue to expand, issues such as power outages and equipment damage caused by faults have become increasingly prominent.This paper first classifies and summarizes the problems that occur in power equipment, and for each type of problem, it provides an overview of two modal detection methods: infrared imaging detection methods and sound wave detection methods.The paper categorizes infrared imaging detection as a n image prediction task and sound wave detection as a sequence prediction task,summarizing multiple solution methods for each problem.The paper also reviews commonly used datasets, covering both image and sound wave modal detection problems,providing a comprehensive foundation for researchers to further study power equipment fault detection.

5.2 Future research directions

To address the issues of scarce fault samples and high labeling costs in power equipment, future research can focus on techniques for small sample training.On the one hand, transfer learning is expected to be a key breakthrough.By using models pre-trained on large-scale general datasets, common features can be transferred to power equipment fault detection tasks.Only a small amount of specific fault data is needed for fine-tuning,allowing the model to quickly adapt to new tasks.For example, a model trained on a general image classification dataset could transfer its basic understanding of image features and be fine-tuned to the unique characteristics of infrared images of power equipment.On the other hand,meta-learning and other small sample learning algorithms are also worth exploring.These methods enable the model to learn how to learn quickly and find better learning strategies during the process of learning from a small number of samples, allowing the model to rapidly capture features and achieve accurate detection when faced with new fault samples,significantly reducing reliance on large-scale labeled data.

Given the current underdevelopment of multi-source data alignment and feature fusion mechanisms, and the fact that multi-modal power equipment fault detection models are still in the exploratory stage, future research could focus on building efficient multi-modal base models.In particular, integrating heterogeneous sensing modalities such as infrared imaging and voiceprint signals-which respectively reflect external thermal anomalies and internal acoustic patterns-could lead to more comprehensive and robust fault representations.Effective fusion of these complementary modalities requires novel architectures that can align spatial, temporal, and semantic features across domains, which remains an open challenge in practical deployment.First, precise data alignment algorithms should be developed to ensure the consistency of different modal data (e.g., infrared images and sound wave data) in both time and space dimensions, providing a solid foundation for later fusion.Next,innovative feature fusion strategies should be explored, such as attention-based fusion methods, allowing the model to automatically learn the importance of features from different modal data and achieve more effective fusion.Additionally, advanced multi-modal model architectures from natural language processing and computer vision can be adapted and improved for power equipment fault detection to fully exploit the complementary information from multimodal data,enhancing the accuracy and reliability of fault detection.

To meet the requirements of real-time detection and flexible detection equipment, research on the lightweighting of deep learning-based power equipment fault detection models is urgently needed.In terms of model architecture design, lightweight neural network structures such as MobileNet, ShuffleNet, and similar architectures designed for mobile and embedded devices can be developed.These structures reduce model parameters and computational requirements, ensuring efficient operation while maintaining a certain level of detection accuracy.Additionally, model pruning techniques can be employed to remove connections and neurons that contribute little to the detection results, further compressing the model size.Furthermore, quantization techniques are also an important research direction, converting model parameters and computations from high-precision data types to lowprecision ones.This process accelerates computation,reduces memory usage, and enables the model to run quickly on resource-constrained detection devices, achieving real-time detection.

CRediT authorship contribution statement

Xizhou Du: Funding acquisition.Xing Lei: Supervision.Ting Ye: Resources.Yingzhou Sun: Writing - original draft.Zewen Shang: Writing - review & editing, Writing- original draft.Zhiqiang Liu: Supervision, Conceptualization.Tianyi Xu: Supervision, Conceptualization.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by Science and Technology Project of State Grid Corporation of China (52094024003D).

References

[1]
National Electric Power Industry Statistics for 2023.National Development and Reform Commission, National Energy Administration, 2024. [百度学术]
[2]
Action Plan of the 14th Five-Year Plan for Electric Power Production Safety.National Development and Reform Commission (NDRC), National Energy Administration (NEA),2021. [百度学术]
[3]
Guiding Opinions on Promoting the Development of Smart Grid.National Development and Reform Commission, National Energy, 2015. [百度学术]
[4]
Guiding Opinions on Promoting the Integration of Electricity Source, Grid, Load and Storage and the Development of Multienergy Complementation.National Development and Reform Commission, National Energy Administration, 2021. [百度学术]
[5]
F.Zhang, Research on intelligent methods for condition monitoring and fault diagnosis of distribution electrical equipment, Electr.Times 06 (2024) 95-97. [百度学术]
[6]
Y.Bai, Condition monitoring and fault diagnosis of electrical equipment, Electr.Technol.Soft.Eng.18 (2021) 227-228. [百度学术]
[7]
M.Jia, G.Liu, S.Xu, et al., Heterogeneous image fusion algorithm and its application in power facility detection, Power Generat.Technol.45 (03) (2024) 558-565. [百度学术]
[8]
Q.Wang, T.Jin, M.A.Mohamed, et al., A minimum hitting set algorithm with prejudging mechanism for model-based fault diagnosis in distribution networks, IEEE Trans.Instrum.Meas.69 (7) (2019) 4702-4711. [百度学术]
[9]
Q.Wang, Y.Xiao, U.Dampage, et al., An effective fault section location method based three-line defense scheme considering distribution systems resilience, Energy Rep.8(2022)10937-10949. [百度学术]
[10]
X.Deng, G.Wu, C.Wei, et al., Heterogeneous image fusion algorithm based on multi-source data.(2023) Alexnet neural network fault diagnosis for large power grids based on multisource data fusion, Modern Power 40 (02) (2023) 161-169. [百度学术]
[11]
G.Wang, D.Fu, F.Du, et al., Repetitive pattern based fault diagnosis in large power grids.(2023) Transformer fault acoustic pattern recognition based on repeated pattern extraction and Gaussian mixture model, Guangdong Electric Power 36 (01)(2023) 126-134. [百度学术]
[12]
D.Yu, W.Zhang, H.Wang, Research on abnormal acoustic pattern diagnosis method of oil-immersed transformer based on LSTM neural network, Intelligent Power 51 (02) (2023) 45-52. [百度学术]
[13]
K.Chen, Y.Chen, H.Xu, Research progress of transmission line inspection method based on multimodal data,J.Shanghai Electric Power University 40 (06) (2024) 527-532. [百度学术]
[14]
L.Liu, B.Wang, F.Ma, et al., A concurrent fault diagnosis method of transformer based on graph convolutional network and knowledge graph, Front.Energy Res.10 (2024) 837553. [百度学术]
[15]
X.Meng, W.Chen, J.Li, Study on detection and localization of single-phase disconnection faults in medium voltage distribution networks, Electronic Measurement Technol.43(06)(2020)32-37. [百度学术]
[16]
J.S.Kim, K.N.Choi, S.W.Kang, Infrared thermal image-based sustainable fault detection for electrical facilities,Sustainability 13(2) (2021) 557. [百度学术]
[17]
J.Ou, J.Wang, J.Xue, et al., Infrared image target detection of substation electrical equipment using an improved faster R-CNN,IEEE Trans.Power Delivery 38 (1) (2022) 387-396. [百度学术]
[18]
K.Zhang, K.J.Yang, W.L.Huang, et al.,Transformer operating condition detection method and verification system based on acoustic pattern recognition, Comput.Technol.Automat.41(01)(2022) 1-6. [百度学术]
[19]
T.H.Dwiputranto, N.A.Setiawan, T.B.Adji, DGA-based early transformer fault detection using GA-optimize d ANN, in:Proceedings of Technology and Policy in Energy and Electric Power International Conference (ICT-PEP) Asia, Seq 29-30,2021 in Jakarta Indonesia, 2021, pp.342-347. [百度学术]
[20]
X.Yang, Z.Wen, Research and application of deep learning in transmission line insulator fault detection, China New Commun.20 (10) (2018) 208-210. [百度学术]
[21]
B.Jalil, G.R.Leone, M.Martinelli, et al., Fault detection in power equipment via an unmanned aerial system using multi modal data, Sensors 19 (13) (2019) 3014. [百度学术]
[22]
G.Wu, Research on transformer fault acoustic detection and diagnosis method, North China Electric Power University(Beijing), 2021. [百度学术]
[23]
H.Zhou, Research on transformer fault detection method based on acoustic pattern recognition, Zhejiang University of Technology, 2023. [百度学术]
[24]
J.Wu, X.Li, Y.Zhou, (2022) an infrared image detection of power equipment based on super-resolut ion reconstruction and YOLOv4, J.Eng.10 (2022) 1006-1016. [百度学术]
[25]
C.Yu, Research on transformer fault diagnosis algorithm based on acoustic pattern and its application, North China Electric Power University (Beijing), 2023. [百度学术]
[26]
J.Yang, M.Xin, Q.Feng, et al., Construction of power equipment condition monitoring system based on neural network image recognition technology, in: Proceedings of IntelligentComputing,Communication,andDevices International Conference (ICCD) Asia, Mar 3-5, 2023 in Hongkong China, 2023, pp.445-451. [百度学术]
[27]
L.Zhang, Research on noise reduction method for infrared images of overhead power lines, Sci.Technol Innovat.01 (2022)85-88. [百度学术]
[28]
G.Jiang, Z.Wan, K.Wang, et al., Anomaly detection by using multimodal deep learning, in: Proceedings of Algorithms, High Performance Computing, and Artificial Intelligen ce International Conference (AHPCAI) Asia, Aug 18-19,2023 in Yinchuan China,2023, pp.1243-1249. [百度学术]
[29]
B.Liu, R.Shi, H.Deng, et al.,OPS-YOLO:a lightweight foreign objectors detection model for overhead power system, in:Proceedings of the 43rd Chinese Control Conference (CCC)Asia, July 27-31, 2024 in Kunming China, 2024, pp.7899-7904. [百度学术]
[30]
X.Zhang, L.Dong, F.Lin, et al., Research on cable fault types and fault segmentation location of self-powered power supply,High Voltage Electrical Appliances 60 (11) (2024) 116-122. [百度学术]
[31]
D.D.Shipp, T.J.Dionise, V.Lorch, et al., Transformer failure due to circuit-breaker-induced switching transients, IEEE Trans.Ind.Appl.47 (2) (2010) 707-718. [百度学术]
[32]
Z.Bo, Y.Li, Radial instability of large transformer windings under multiple impact conditions, Transact.China Electr otech.Soc.32 (S2) (2017) 71-76. [百度学术]
[33]
T.Jin, F.Zhou, M.A.Mohamed, A novel approach based on CEEMDAN to select the faulty feeder in neutral resonant grounded distribution systems, IEEE Trans.Instrum.Meas.69(7) (2019) 4712-4721. [百度学术]
[34]
Q.Ou, L.Luo, X.Li, et al., Some understanding and research on the national standard of power transformer short-circuit withstand capacity, Transformer 58 (02) (2021) 11-18. [百度学术]
[35]
J.Zhang, L.Liu, D.Liu, et al., Method for simulating partial short-circuit fault current of transformer coil using field-circuit coupling, Transact.China Electrotech.Soc.30 (20) (2015) 65-70. [百度学术]
[36]
Z.Wang, S.Zhang, Z.Xu, et al., Test and evaluation of bending resistance of transformer self-bonded transposed conductor under multi-factor conditions, High Voltage Eng.48 (09) (2022) 3660-3669. [百度学术]
[37]
D.Geißler, T.Leibfried, Short-circuit strength of power transformer windings-verification of tests by a finite element analysis-based model, IEEE Trans.Power Delivery 32 (4) (2016)1705-1712. [百度学术]
[38]
H.Du, H.Liu, L.Lei, et al.,Research on power transformer fault detection based on multiple eigenvalues of vibration signals,Transact.China Electrotech.Society 38 (01) (2023) 83-94. [百度学术]
[39]
H.Liu, C.Hou, Y.Gao, Research on partial discharge diagnosis algorithm of cable accessories based on multi-sensor information fusion, Electr.Appl.Energy Effic.Manag.Technol.10(2024)36-41. [百度学术]
[40]
G.Wang, D.Fu, F.Du, et al., Transformer fault voiceprint recognition based on repetitive pattern extraction and Gaussian mixture model, Guangdong Electric Power 36 (01) (2023) 126-134. [百度学术]
[41]
H.Sun, Z.Li, R.Lin, et al., Two-level electrical fault voiceprint recognition algorithm based on novelty detection, Power Syst.Technol.45 (07) (2021) 2888-2895. [百度学术]
[42]
Y.Lu, C.Liao, Q.Li, et al.,Transformer defect diagnosis method based on voiceprint features and ensemble learning, Electric Power Eng.Technol.42 (05) (2023) 46-55. [百度学术]
[43]
F.Wang, S.Wang, S.Chen, et al., Transformer voiceprint recognition model based on improved MFCC and VQ, Proc.CSEE 37 (05) (2017) 1535-1543. [百度学术]
[44]
Y.Zhu, S.Ji, F.Zhang, et al., Research on the vibration generation mechanism and influencing factors of power transformers, J.Xi’an Jiaotong Univ.49 (06) (2015) 115-125. [百度学术]
[45]
P.Zhang, L.Li, F.Ji, et al., Experimental study on vibration and noise reduction of damping elastic body of HVDC anode saturated reactor, Power Syst.Technol.41(12)(2017)3839-3845. [百度学术]
[46]
M.Tao, S.Yao, Y.Dong, et al., Vibration research and optimization scheme of saturable reactor for UHV converter valve, High Voltage Electr.Equipment 55 (12) (2019) 200-204. [百度学术]
[47]
X.Wu, N.Zhou, J.Peng, et al., Analysis of noise characteristics and related factors of power transformer, J.Electric Power Sci.Technol.33 (03) (2018) 81-85. [百度学术]
[48]
P.Gao, F.Wang, L.Su, et al.,Vibration characteristics of power transformer under DC bias, Power Syst.Technol.38 (06) (2014)1536-1541. [百度学术]
[49]
D.Zhou, F.Wang, X.Dang, et al., Soundprint recognition of dry-type transformer based on compressed observation and discriminant dictionary learning, Proc.CSEE 40 (19) (2020)6380-6390. [百度学术]
[50]
S.Kanwal, S.Jiriwibhakorn, Artificial intelligence based faults identification, classification, and localization techniques in transmission lines-a review, IEEE Lat.Am.Trans.21 (12)(2023) 1291-1305. [百度学术]
[51]
L.Yang, J.Fan, Y.Liu, et al., A review on state-of-the-art power line inspection techniques, IEEE Trans.Instrum.Meas.69 (12)(2020) 9350-9365. [百度学术]
[52]
Y.He, B.Deng, H.Wang, et al., Infrared machine vision and infrared thermography with deep learning: a review, Infrared Phys.Technol.116 (2021) 103754. [百度学术]
[53]
C.Xia, M.Ren, B.Wang, et al., Infrared thermography-based diagnostics on power equipment: state-of-the-ar t, High Voltage 6(3) (2021) 387-407. [百度学术]
[54]
C.Zhang, W.Li, X.Yang, et al., Power equipment condition detection method based on voiceprint compression and recognition, in: Proceedings of the 4th Power System and Green Energy Conference (PSGEC) Asia, Aug 22-24, 2024 in Shanghai China, 2024, pp.60-64. [百度学术]
[55]
X.Xu, Q.Xiong, W.Liu, Application of substation secondary intelligent technology in power system fault detection and diagnosis, Electron Technol.53 (07) (2024) 292-293. [百度学术]
[56]
W.Zhang, X.Peng, R.Chen, et al., Intelligent diagnosis technology of heating defects in power transmission lines based on UAV infrared video, Power Syst.Technol.38(05)(2014)1334-1338. [百度学术]
[57]
L.Jin, D.Zhang, S.Duan, et al., Identification of insulator contamination status based on infrared and ultraviolet image information fusion, Transact.China Electrotech.Soc.29 (08)(2014) 309-318. [百度学术]
[58]
J.Liu, J.Luo, W.Zhang, et al.,Image processing for detection of current safety status of contact network based on infrared camera,Power Syst.Clean Energy 32 (11) (2016) 55-61. [百度学术]
[59]
L.Yuan, R.Zhao, X.Tan, et al., Zero-value insulator detection based on infrared imaging technology, High Voltage Electr.Appl.54 (02) (2018) 97-102. [百度学术]
[60]
F.Zhang, J.Zhang, T.Li, et al., Power defect detection method based on infrared hotspots, Power Grid and Clean Energy 34(03)(2018) 46-50. [百度学术]
[61]
J.Li, Y.Zhang, L.Zhao, et al., Research on leakage area detection of porcelain column SF6 circuit breaker based on infrared video image processing, High Voltage Electr.Appl.54(12) (2018) 50-55. [百度学术]
[62]
X.Liang, J.Yan, L.Yin, Transmission line fault identification basedoninfraredimages,ElectricalMeasurement Instrumentation 56 (24) (2019) 99-103. [百度学术]
[63]
X.Wu, B.Sun, Z.Ma, Research on adaptive multi-threshold segmentation of thermal images perceived by distribution network equipment, Electr.Power Informat.Commun.Technol.19 (09)(2021) 70-76. [百度学术]
[64]
X.Zhang, W.Cai, Y.Wu, et al., Thermal status diagnosis of substation equipment based on infrared detection, Smart Power 49 (09) (2021) 109-116. [百度学术]
[65]
Y.Liu, L.Yang, Y.Wang, et al.,Insulator contamination degree detection method based on the fusion of hyperspectral and infrared technology, New Technol.Electr.Eng.Energy 41 (03)(2022) 55-62. [百度学术]
[66]
X.Li, X.Tang, H.Chen, et al.,XLPE cable buffer layer ablation defect characteristic gas detection technology and application based on FTIR, High Volt.Technol.1 (2024) 1-9. [百度学术]
[67]
K.Gao, Y.Yang, X.Yan, et al., Research on quantitative detection technology of sulfur hexafluoride gas leakage based on dual-band infrared imaging, Power Syst.Technol.49 (01) (2025)167-176. [百度学术]
[68]
H.Zheng, J.Li, Y.Liu, et al.,Infrared target detection model for power equipment based on improved YOLOv3, Trans.China Electrotech.Soc.36 (07) (2021) 1389-1398. [百度学术]
[69]
H.Zhu, Z.Niu, K.Huang, et al., Target recog nition and positioning of infrared images of substation equipment b ased on single-stage target detection algorithm, Electric Power Au tomat.Equipment 41 (08) (2021) 217-224. [百度学术]
[70]
T.Chen, Y.Liu, S.Pei, Infrared diagnosis method for power equipment based on improved YOLOv3, Guangdong Electric Power 34 (06) (2021) 21-29. [百度学术]
[71]
J.Wu, L.Liang, X.Ji, et al., Insulator infrared image fault detection method based on YOLOv3 algorithm, Guangdong Electric Power 33 (09) (2020) 77-84. [百度学术]
[72]
D.Wang, Y.Yao, S.Zhang, et al.,Photovoltaic module hot spot deep learning detection method based on infrared thermal image,Proc.CSEE 43 (24) (2023) 9608-9616. [百度学术]
[73]
H.Li, C.Zhang, Z.Shi, et al., High-voltage bushing fault identification method based on infrared image detection and improved YOLOv4, High Voltage Electr.Appl.59(11)(2023)24-31. [百度学术]
[74]
R.Zhang, Z.Qiu, Z.Tong, et al., Transformer bushing heating defect detection method based on infrared image target detection and skew correction, China Southern Power Grid Technol.18(09)(2024) 59-68. [百度学术]
[75]
D.Yang, W.Duo, S.Li, et al., Terahertz imaging recognition method for internal defects of cable composite insulation structure based on improved YOLOv8, High Voltage Technol.50 (09)(2024) 4142-4151. [百度学术]
[76]
H.Zeng, B.Yang, T.Xu, et al., Transformer bushing heating defect recognition based on infrared image target detection and temperature feature extraction, Guangdong Electric Power 36(03)(2023) 99-106. [百度学术]
[77]
Y.Zhou, J.Hu, H.Xu, et al., An infrared image target detection algorithm for porcelain post insulators, Zhejiang Electric Power 42 (11) (2023) 78-85. [百度学术]
[78]
R.Huang, M.Dai, Y.Zheng, et al., Infrared image defect detection of power equipment, China Electric Power 54 (02)(2021) 147-155. [百度学术]
[79]
H.Liu, W.Mo, Y.Tan, et al., Transmission line equipment detection method based on direction adaptive detector, Power Syst.Technol.45 (12) (2021) 4888-4895. [百度学术]
[80]
Y.Bie, B.Li, J.Jiang, et al.,Intelligent recognition method of oil level in infrared image of transformer bushing based on improved SSD, Electric Power Eng.Technol.40 (05) (2021) 158-163. [百度学术]
[81]
X.Wang, H.Li, S.Fan, et al., Automatic detection method of infrared image anomalies of power equipment based on improved SSD, Trans.China Electrotech.Soc.35 (S1) (2020) 302-310. [百度学术]
[82]
Y.Liu, S.Pei, J.Wu, et al., Infrared image target detection method of abnormal heating points of power transmission and transformation equipment based on deep learning, China Southern Power Grid Technol.13 (02) (2019) 27-33. [百度学术]
[83]
W.Li, Y.Mao, X.Liao, et al., Intelligent diagnosis method of voltage-induced thermal defects of substation equipment infrared images based on rotating target detection, High Voltage Technology 47 (09) (2021) 3246-3253. [百度学术]
[84]
D.Chen, W.Tang, Z.Niu, Infrared image fault diagnosis method for power equipment based on deep learning,Guangdong Electric Power 34 (01) (2021) 97-105. [百度学术]
[85]
Q.Xu, H.Huang, X.Zhang, et al.,Online fault diagnosis method for infrared image feature analysis of high-voltage lead connector based on improved regional full convolutional network,Transactions of China Electrotechnical Society 36 (07) (2021)1380-1388. [百度学术]
[86]
Y.Wang, Z.Shen, H.Zhao, et al., Photovoltaic module infrared image fault diagnosis method based on improved CNN-SVM,Journal of North China Electric Power University (Natural Science Edition) 51 (03) (2024) 110-117. [百度学术]
[87]
X.Yu, H.Sun, Fault diagnosis of substation equipment based on image processing and semi-supervised learning, Power System and Clean Energy 38 (08) (2022) 60-68. [百度学术]
[88]
B.Li, H.Liu, S.Wang, et al., Research on infrared image segmentation and low zero value fault diagnosis of insulator based on GANR-UNet, Journal of North China Electric Power University (Natural Science Edition) 1 (2024) 1-11. [百度学术]
[89]
L.Guo, W.Cao, L.Bai, et al.,EPR cable terminal fault diagnosis method integrating attention mechanism and multi-scale network,High Volt.Technol.47 (11) (2021) 3872-3880. [百度学术]
[90]
J.Wang, L.Sun, C.Liu, et al.,Intelligent diagnosis of transformer insulation bushing fault based on improved Mask R-CNN,Zhejiang Electric Power 41 (08) (2022) 87-94. [百度学术]
[91]
J.Redmon, S.Divvala, R.Girshick, et al., You only look once:Unified, real-time object detection, in: Proceedings of the IEEE Computer Vision and Pattern Recognition Conference (CVPR)North America, Jun 27-30, 2016 in Las Vegas USA, 2016, pp.779-788. [百度学术]
[92]
Jocher G, Chaurasia A, Qiu J.Ultralytics YOLOv8.Ultralytics.2023.https://github.com/ultralytics/ultralytics (Accessed: 2023-10-01). [百度学术]
[93]
K.Duan, S.Bai, L.Xie, et al., Centernet: Keypoint triplets for object detection, in: Proceedings of the IEEE Computer Vision International Conference(ICCV) Europe, Oct 27-Nov 2, 2019 in Seoul Korea(South), 2019, pp.6569-6578. [百度学术]
[94]
W.Liu, D.Anguelov, D.Erhan, et al.,SSD:Single shot multibox detector, in: Proceedings of the 14th European Computer Vision Conference(ECCV) Europe, Oct 11-14, 2016 in Amsterdam Netherlands, 2016, pp.21-37. [百度学术]
[95]
R.Girshick, J.Donahue, T.Darrell, et al., Region-based convolutional networks for accurate object detection and segmentation, IEEE Trans.Pattern Anal.Mach.Intell.38 (1)(2015) 142-158. [百度学术]
[96]
R.Girshick, Fast R-CNN, in: Proceedings of the IEEE Computer Vision International Conference(ICCV) Europe, Dec 7-13, 2017 in Santiago Chile, 2015, pp.1440-1448. [百度学术]
[97]
K.He, G.Gkioxari, P.Dolla´r, et al., Mask R-CNN, in:Proceedings of the the IEEE Computer Vision International Conference(ICCV) Europe, Oct 22-29, 2017 in Venice Italy, 2017,pp.2961-2969. [百度学术]
[98]
S.Davis, P.Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans.Acoust.Speech Signal Process.28 (4) (1980) 357-366. [百度学术]
[99]
L.Chen, R.Wang, F.Hu, et al., Research on Voice Print Recognition of Electrical Faults based on Attention-MFCC Algorithm, in: Proceedings of the 2021 Power System and Green Energy Conference (PSGEC) Asia, Aug 20-22, 2021 in Shanghai China, 2021, pp.748-751. [百度学术]
[100]
Y.Xiao, X.Fu, S.Yang, et al., Application of voiceprint recognition in condition monitoring of power plant equipment,Internet Things Technol.12 (10) (2022) 4-7, +11. [百度学术]
[101]
J.Cui, H.Ma, Voiceprint recognition model of transformer core looseness fault based on improved MFCC and 3D-CNN,Electric Machines and Control 26 (12) (2022) 150-160. [百度学术]
[102]
Z.Xiang, H.Wei, Research on power transformer operation and maintenance detection technology based on voiceprint feature recognition, Electr.Design Eng.31 (20) (2023) 114-118. [百度学术]
[103]
G.E.Hinton, R.R.Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504-507. [百度学术]
[104]
B.Ayoub, K.Jamal, Z.Arsalane,Gammatone frequency cepstral coefficients for speaker identification over VoIP networks, in:Proceedings of the 2016 Information Technology for Organizations Development International Conference (IT4OD)Africa, Mar 30-Apr 1, 2016, in Fez Morocco, 2016, pp.1-5. [百度学术]
[105]
M.Zhou, Z.Tang, Z.Wang, et al., Study on Ultrasonic Signal Recognition of Partial Discharge based on Voiceprint Recognition System, High Volt.Apparatus 58 (09) (2022) 127-133. [百度学术]
[106]
X.J.Dang, F.H.Wang, W.J.Ma, Fault diagnosis of power transformer by acoustic signals with deep learning, in:Proceedings of the 2020 IEEE High Voltage Engineering and Application International Conference (ICHVE) Asia, Sep 6-10,2020, in Beijing China, 2024, pp.1-4. [百度学术]
[107]
Q.Geng, F.Wang, X.Jin, Mechanical fault sound diagnosis based on GFCC and random forest optimized by whale algorithm for dry type transformer, Electr.Power Automat.Equipment 40(08) (2020) 191-196+224+197-199. [百度学术]
[108]
Q.Geng, F.Wang, D.Zhou, Mechanical fault diagnosis of power transformer by GFCC time-frequency map of acoustic signal and convolutional neural network, in: Proceedings of the 2019 IEEE Sustainable Power and Energy Conference (iSPEC)Asia,Nov 21-23, 2019, in Beijing China, 2019, pp.2106-2110. [百度学术]
[109]
A.Krizhevsky, I.Sutskever, G.E.Hinton,Imagenet classification with deep convolutional neural networks, in: Advances in Neural Information Processing Systems, 2012, p.25. [百度学术]
[110]
K.Dragomiretskiy, D.Zosso, Variational mode decomposition,IEEE Trans.Signal Process.62 (3) (2013) 531-544. [百度学术]
[111]
Q.Chen, Harmonic detection method based on VMD, Electr.Measur.Instrument.55 (2) (2018) 59-65. [百度学术]
[112]
J.Yu, Research and application of intelligent voiceprint monitoring and diagnosis system for abnormal faults of power transformers, Technol.Innovat.Appl.14 (08) (2024) 149-152. [百度学术]
[113]
Z.Gao, D.Wang, D.Lin, Intelligent monitoring technology for substation switch fault based on voiceprint recognition recognition, Inform.Comput.36 (06) (2024) 143-147. [百度学术]
[114]
S.Shan, J.Liu, S.Wu, et al., A motor bearing fault voiceprint recognition method based on Mel-CNN model,Measurement 207(2023) 112408. [百度学术]
[115]
S.Ali, S.Tanweer, S.S.Khalid, et al., Mel frequency cepstral coefficient: a review, ICIDSSD (2020). [百度学术]
[116]
C.A.Gong, C.S.Su, Y.E.Liu, et al.,Deep learning with LPC and wavelet algorithms for driving fault diagnosis, Sensors 22 (18)(2022) 7072. [百度学术]
[117]
S.Jain, B.Kishore, Comparative study of voice print based acoustic features: MFCC and LPCC, Int.J.Adv.Eng., Manag.Sci.3 (4) (2017) 313-315. [百度学术]
[118]
A.Graves, A.Graves, Long short-term memory, Super.Sequen.Labell.Recurr.Neural Networks (2012) 37-45. [百度学术]
[119]
D.Yu, W.Zhang, H.Wang, Research on transformer voiceprint anomaly detection based on data-driven, Energies 16 (5) (2023)2151. [百度学术]
[120]
X.Qi, L.Shi, X.Li, et al., Transformer Voiceprint Feature Extraction and Fault Recognition based on MFCC and Deep Learning, in: Proceedings of 2023 IEEE 5th Power, Intelligent Computing and Systems International Conference(ICPICS)Asia,July 14-16, 2023 in Shenyang, China, 2023, pp.820-825. [百度学术]
[121]
K.Cho, B.Merrienboer, C.Gulcehre, et al., Learning phrase representations using RNN encoder-d ecoder for statistical machine translation, in: Proceedings of Empirical Methods in Natural Language Processing Conference (EMNLP) Asia,Oct 25-29, 2014 in Doha Qatar, 2014, pp.1724-1734. [百度学术]
[122]
C.Yao, H.Xin, Z.Xin, Deep neural network based acoustic pattern recognition system for fault localization application,Appl.Math.Nonl.Sci.9 (1) (2023). [百度学术]
[123]
J.Abulizi, Z.Chen, P.Liu, et al., Research on voiceprint recognition of power transformer anomalies using gated recurrent unit, in: Proceedings of the 2021 Power System and Green Energy Conference (PSGEC) Asia, Aug 20-22, 2021 in Shanghai China,2021, pp.743-747. [百度学术]
[124]
H.Li, Q.Yao, X.Li, Voiceprin t fault diagnosis of converter transformer under load influence based on multi-strategy improved mel-frequency spectrum coefficient and temporal convolutional network, Sensors 24 (3) (2024). [百度学术]
[125]
H.Lu, K.Zhang, S.Han, A Comparison of CNN-Based Transformer Fault Diagnosis Methods based on Voiceprint Signal, in: Proceedings of the 2024 IEEE 13th Data Driven Control and Learning Systems Conference (DDCLS) Asia, May 17-19, 2024 in Kaifeng, China, 2024, pp.819-824. [百度学术]
[126]
A.Vaswani, N.Shazeer, N.Parmar, et al., Attention is all you need, Adv.Neural Inf.Proces.Syst.30 (2017). [百度学术]
[127]
J.Jiao, J.Li, Cable terminal defect diagnosis method based on improved residual network, Electric Drive 53 (11) (2023) 31-36. [百度学术]
[128]
M.A.Hearst, S.T.Dumais, E.Osuna, et al., Support vector machines, IEEE Intell.Syst.Their Appl.13 (4) (1998) 18-28. [百度学术]
[129]
J.Wang, Z.Zhao, J.Zhu, et al., Improved support vector machine for voiceprint diagnosis of typical faults in power transformers, Machines 11 (5) (2023) 539. [百度学术]
[130]
G.Liu, L.Gao, L.Yu, et al., Research on Transformer Fault Diagnosis based on Voiceprint Signal, in: Proceedings of the Journal of Physics: Conference Series.North America, May 18-22, 2025 in Montreal Canada, 2024, pp.101-109. [百度学术]

Fund Information

Author

Xizhou Du

Xizhou Du received bachelor’s degree at Tianjin University in 1999.He is working in State Grid Shanghai Municipal Electric Power Company.His research interests include power system automation, equipment maintenance.

Publish Info

Received：

Accepted：

Pubulished：2025-10-26

Reference： Xizhou Du,Xing Lei,Ting Ye,et al.(2025) A review of research on intelligent fault detection of power equipment based on infrared and voiceprint: methods, applications and challenges☆.Global Energy Interconnection,8(5):821-846.

Contents

Figure（0）

Tables（0）

Recommended articles：

Global Energy Interconnection

A review of research on intelligent fault detection of power equipment based on infrared and voiceprint: methods, applications and challenges☆

Keywords

Abstract

0 Introduction

1 Overview of fault detection in power equipment

1.1 Representative fault types in power equipment

1.2 Classification of power equipment fault detection technologies

2 Infrared-based fault detection for power equipment

2.1 Infrared detection based on traditional image processing methods

2.2 Infrared detection methods based on deep learning

3 Fault detection for power equipment based on voiceprint

3.1 Voiceprint feature extraction method

3.2 Voiceprint detection methods based on machine learning and deep learning

4 Power system fault detection datasets

5 Conclusion and outlook

5.1 Research summary

5.2 Future research directions

References

Fund Information

Author

Xizhou Du

Publish Info