$$A^{2}$$DM: Enhancing EEG Artifact Removal by Fusing Artifact Representation into the Time-Frequency Domain

Validation Metrics

To quantitatively evaluate the performance of models, three metrics are used on the test dataset, including RRMSE in the temporal domain (\(RRMSE_\)), RRMSE in the spectral domain (\(RRMSE_\)), and the correlation coefficient (CC).

$$\begin _t=\frac(\hat-y)}(y)}, \end$$

(8)

$$\begin _f=\frac(\textit(\hat)-\textit(y))}(\textit(y))}, \end$$

(9)

$$\begin \textrm=\frac(\hat, y)}(\hat) \textit(y)}}. \end$$

(10)

where RMS is defined as above in Section “Dataset,” PSD denotes the power spectral density of the input data; Var and Cov denote the variance and covariance, respectively.

Table 2 Comparison of experiment results of the modelsFig. 4figure 4

Denoise performance comparison of different methods on different artifacts from testset; the first and second rows show the EOG and EMG denoising results, respectively. The noisy EEG (input), the denoised EEG (output), and the pure EEG (ground truth) are shown with the light blue line, green line, and orange line, respectively

Performance Evaluation on Semi-Synthetic EEG

We employed classical artifact removal methods, specifically the adaptive filter and Hilbert-Huang transform (HHT), for evaluation on the same dataset. Our results indicate that deep learning-based techniques significantly outperform traditional methods in artifact removal, attributed to the robust data fitting capabilities of deep learning models.

To validate the performance of \(A^2\) DM, we compared it with four denoising approaches: DeepSeparator [31], novel CNN [16], FCNN [1], simple CNN [1], and complex CNN [1]. The experiment results are shown in Table 2. It can be seen that \(A^2\) DM achieves the best performance, with the \(RRMSE_t\) and \(RRMSE_f\) being 0.6869 and 0.5314, respectively. This shows that our model can effectively remove multiple types of artifacts.

Simple CNN achieves the highest CC scores but performs poorly in RRMSE metrics. Our analysis suggests that this outcome may be attributed to the fact that shallow neural networks exhibit greater sensitivity to the trend information in temporal data, allowing them to learn structural information more effectively. Consequently, the denoised signals produced by shallow networks closely resemble the ground truth. In Section “The Insight of Shallow CNN Model for Denoise Task,” we conducted a further analysis of this phenomenon.

Fig. 5figure 5

The time-frequency analysis for different denoise models. We plot the spectrogram of the input EEG (a), the pure EEG as ground truth (b), and the output EEG denoised by FCNN (c), complex CNN (d), novel CNN (e), our \(A^2\)DM (f), respectively. The first and second rows show the EOG and EMG spectrogram analysis results, respectively

Table 3 Ablation study on effects of modules for \(A^2\) DM

We find that \(A^2\) DM achieves 15% higher CC scores than novel CNN. The reason for this may be due to the inconsistent length of the model input signals. For the fair comparison, we removed the blocks with channel 2048 from the novel CNN so that we obtained the CNN\(^\) model with the same number of denoise blocks as the \(A^2\) DM. Our model also shows a significant improvement over the novel CNN\(^\) model.

We conduct a visual analysis of denoising results using various models further to validate the effectiveness of \(A^2\) DM. Specifically, two distinct EEG segments containing EOG and EMG are randomly selected from the test set to showcase the practical outcomes of the denoising process. The resulting denoise EEG is shown in Fig. 4. \(A^2\) DM can effectively remove both EOG and EMG artifacts. Furthermore, we observe that the denoised EEG (depicted by the green line) generated by \(A^2\) DM exhibits a similar pattern to the ground truth (depicted by the orange line) in both amplitude and tendency, setting it apart from alternative models. This consistency distribution between the ground truth and denoised results is attributed to the \(A^2\) DM model’s superior ability to preserve the global properties of the EEG data.

In addition, we conduct a time-frequency analysis to evaluate the efficacy of denoise performance. Spectrograms are generated from the input EEG segment, the pure EEG, and the denoised EEG signal using different models, as presented in Fig. 5. Our results demonstrate that the denoised EEG segment produced by the \(A^2\) DM exhibits the highest degree of similarity to the pure EEG. This observation suggests that \(A^2\) DM is effective in capturing the crucial mode of EEG data, enabling accurate reconstruction of the noisy data. We find that \(A^2\)DM tends to remove information in the 20–30 Hz frequency range. This could be attributed to our FEM, which led the model to more aggressively filter out noise while retaining important frequency-domain information. Our model still outperforms other methods in terms of artifact removal. Moreover, it preserves the semantic information of the original signal, especially in the low-frequency region, during downstream tasks. This is a key advantage of \(A^2\)DM, as it leverages the FEM module to retain task-relevant information while effectively removing artifacts selectively.

Fig. 6figure 6

The temporal RRMSE (RRMSE_t), spectral RRMSE (RRMSE_f), and correlation coefficient (CC) result for EEG denoise at multiple SNR levels. The SNR ranges from -7 dB to 2 dB

Ablation Studies

To evaluate the efficacy of each module in the proposed \(A^2\) DM, we conducted ablation studies. Owing to the inherent flexibility in the design of FEM and TCM, we could seamlessly remove them from \(A^2\) DM without any alteration to the model’s overall architecture. In Table 3, we can see that the performance decreases by around 3% without adding FEM in model 2 and TCM in model 3. This result implies that incorporating FEM and TCM is beneficial for \(A^2\) DM.

Next, we analyze FEM and its impact on denoising results using \(RRMSE_f\) as the evaluation metric. Compared to model 2, model 3 achieves a higher \(RRMSE_f\) after incorporating the FEM model. This improvement is due to FEM’s ability to remove global information. Additionally, the inclusion of TCM enhances the denoising performance of \(A^2\) DM, highlighting the complementary roles of FEM and TCM.

Study of Artifact Representation

To demonstrate the efficacy of the artifact representation generated by the AAM, we randomly selected 20 samples from the test set for each class of segments and extracted the corresponding artifact representations. We visualized the resulting artifact representations using t-SNE [35] and UMAP [36], as depicted in Fig. 7. The clustering of the same types of artifacts indicates that AAM can successfully capture the representation information of artifact in the EEG data.

Furthermore, we introduced a variant (model 4) without the artifact representation by removing the FEM and TCM. The performance of the model 4 decreased by 4.9% in CC when compared to \(A^2\) DM, which highlights the significance of artifact representations in enabling \(A^2\) DM to handle multiple artifacts effectively.

Fig. 7figure 7

t-SNE (a) and UMAP (b) visualization of representation for artifact representation. The dots are the artifact representation in 2D space. Ten types of artifact representation clustered by t-SNE and UMAP have different colors

We present the denoise results from the baseline model and \(A^2\) DM at multiple SNR levels shown in Fig. 6. We observe that as the SNR level increases, the denoising performance of \(A^2\) DM improves (RRMSEt: mean=0.6994, std=0.045), while the performance of novel CNN remains largely unchanged (RRMSEt: mean=0.7608, std=0.003). This suggests that \(A^2\) DM is adaptable to multiple artifact spaces. On the other hand, the novel CNN models share a single artifact space during the denoising process. Moreover, we observed that simple CNN and complex CNN exhibit higher sensitivity to SNR levels. In Section “The Insight of Shallow CNN Model for Denoise Task,” we will discuss the mechanisms of shallow models for denoising tasks.

Analysis of Frequency Enhancement Module

In this section, we analyze the artifact removal process performed by the FEM on EOG and EMG signals separately, using the test dataset for evaluation. Specifically, we compute and visualize the probability density distribution of the selector matrix S, generated by the FEM component within the Denoise Block 1 of the \(A^2\)DM, as shown in Fig. 8a. The horizontal axis represents the mode of information retention, where 0 corresponds to complete removal and 1 corresponds to full preservation. A greater tendency towards 0 indicates that more mode information is removed during denoising, while a tendency towards 1 reflects the retention of more modes.

As illustrated in Fig. 8a, the EMG test dataset exhibits a prominent peak around a mean value of 0.45, significantly higher than that of the EOG dataset. This indicates that the FEM removes more mode information when processing EMG artifacts, likely due to the broader frequency domain distribution of EMG artifacts compared to EOG artifacts.

Conversely, the FEM retains more mode information when addressing EOG artifacts, as indicated by a peak around a mean value of 1. This can be attributed to the narrower frequency range of disturbances associated with EOG artifacts relative to EMG artifacts. These findings highlight the FEM’s capability to adaptively discriminate between different types of artifacts based on their frequency distribution characteristics.

To further evaluate the FEM’s sensitivity to the removal of EOG and EMG artifacts in the low-frequency domain, we analyzed the distribution of removed modes. Figure 8b depicts the histogram of removed modes for EMG artifact processing, while Fig. 8c shows the corresponding histogram for EOG artifact processing. Here, the horizontal axis represents various modes, and the vertical axis denotes the average mask values across these modes.

Our analysis reveals that within the 5–12 Hz range, the FEM removes more low-frequency information for EOG artifacts than for EMG artifacts. This indicates that the FEM effectively identifies and removes the distribution range of EOG artifacts. Furthermore, this targeted removal approach, based on hard attention mechanisms, can be effectively complemented by the TCM, enabling a more refined denoising process.

Fig. 8figure 8

Analysis of frequency enhancement module. a The probability density distribution of the selector matrix S generated by the FEM on the testset. b and c show histograms of the distribution of selector matrix S in the low-frequency domain, corresponding to the EMG and EOG testsets, respectively

The Effectiveness of Hard Attention in FEM

We validate the advantages of the hard attention mechanism in FEM for artifact removal tasks. We utilize a soft attention mechanism to rescale the frequency domain modes within the FEM. Specifically, the artifact representation is processed through the MLP layer and mapped via a sigmoid function to learn a set of modulation factors. This means that instead of discrete values within the selection matrix S, there is a continuous sequence of matrices. These factors are used to scale the frequency domain modes while the rest of the network structure remains unchanged, resulting in the soft attention-\(A^2\)DM model.

As shown in Table 4, the soft attention-\(A^2\)DM model exhibits a decline in denoising performance across all three metrics compared to the \(A^2\) DM model. This phenomenon can be explained from a frequency domain perspective. Different artifacts and original data are distributed across distinct frequency modes. The FEM with a hard attention mechanism can adaptively remove the modes containing artifact features. However, the soft attention mechanism dynamically adjusts the distribution of data across different modes without directly eliminating artifact features, resulting in less effective artifact removal compared to hard attention.

Table 4 Comparison of experiment results of the variant modelsFig. 9figure 9

Comparisons between the two versions of \(A^2\)DM. a depicts the framework of our \(A^2\)DM, b is a dual-stream network consisting FEM and TCM, and c is an inverted denoise block used in the variants for \(A^2\)DM

Fig. 10figure 10

Schema of the real-data experiment

The Effectiveness of Time-Domain Compensation Module

In this subsection, we analyze the impact of TCM on the denoising model. In Fig. 9a, we show the architecture of the \(A^2\) DM reported in the paper where the FEM is executed before the TCM and the overall network structure is serial. We investigate how the positioning of FEM and TCM affects the denoising model’s performance. The inverted denoising block is shown in Fig. 9b, where the TCM is executed before the FEM. As shown in Table 4, this structure performs worse than \(A^2\) DM. Placing FEM after TCM results in information loss due to the hard attention mechanism, thereby reducing denoising performance. This implies the TCM supplements the lost EEG information, thereby further enhancing the model’s performance. The representation of the time and frequency domains is complementary to the representation of the EEG signal in our \(A^2\) DM.

Moreover, we analyze the relationship between the frequency and the time domain representation within the FEM and TCM.

Figure 9c depicts a dual-stream network architecture where FEM and TCM operate in parallel, with the output features averaged across the two branches. In the experiment, we observe the network converges in the FEM branch faster than in the TCM branch. In addition, the FEM extracts features in the frequency domain, while the TCM extracts features in the time domain. The data distribution gap between the two branches further reduces the performance of the network. As Table 4 shows, the model performance is reduced by 4.6% in CC compared to \(A^2\) DM. Finally, we use the serial setting in the \(A^2\) DM.

Performance Evaluation on Real EEG

To evaluate our method on real data, we conducted the following experiments. We selected the MMCNN [34] model, which contains five modules, for the classification task. To accelerate training, we only used the EIN-a module.

As seen in Fig. 10, we added noise to the BCI-2a data to obtain the noisy EEG data. We divide the noisy BCI-2a data into training and test sets. The training set is used to retrain the denoising model, which is then applied to the test set to produce denoised BCI-2a data. The denoised BCI-2a data are used to complete the motion imagery classification task in the ELN-a module. The accuracy of the model is 59.37%.

To evaluate the effectiveness of the \(A^2\) DM, we first trained the EIN-a module using the noisy BCI-2a data as the baseline model. Additionally, we generated denoised data using different denoising models for the motor imagery classification task. For comparison, we selected the complex CNN [1] and novel CNN [16] models. We employed cross-validation to further reduce subject bias.

As can be seen from Table 5, we find that the model trained by dataset denoising from \(A^2\) DM improved the classification accuracy by 3.36% over the baseline model and 2.47% over the novel CNN. Experiments demonstrate the effectiveness of model denoising.

The Generalization of Hyperparameter t

To select the hyperparameter and evaluate the generalization of t, we split the dataset into training, validation, and test sets with a ratio of 8:1:1. Based on validation set performance, we set t=0.4 and evaluated it on the test dataset, as shown in Fig. 11. This setting yielded the best denoising performance compared to other thresholds.

In this study, we present findings regarding the selection of the threshold parameter t to identify the number of modes to be enhanced in the EEG frequency domain. Our experiments show that approximate results can be achieved when t is set to 0.4 or 0.7. The comparable outcomes suggest that the model may have an inherent robustness to noise or redundant information, allowing it to perform well under different filtering conditions.

It is worth noting that a larger t value may lead to the discarding of important features, resulting in a loss of feature information. Conversely, a smaller t value may introduce noise from the high-frequency domain, thereby reducing denoising performance. In light of the above considerations and the need to optimize the generalization ability of the model, we choose t=0.4. It is worth mentioning that the choice of t can be reassessed based on the performance of the downstream task.

Table 5 Performance comparison of the MMCNN model trained by denoising data generated by different denoising modelsFig. 11figure 11

Results of evaluation metrics using different selection thresholds on the testset

Comments (0)

No login
gif