No icon

Minority Oversampling in Kernel Adaptive Subspaces for Class Imbalanced Datasets

Minority Oversampling in Kernel Adaptive Subspaces for Class Imbalanced Datasets

Abstract:

The class imbalance problem in machine learning occurs when certain classes are underrepresented relative to the others, leading to a learning bias toward the majority classes. To cope with the skewed class distribution, many learning methods featuring minority oversampling have been proposed, which are proved to be effective. To reduce information loss during feature space projection, this study proposes a novel oversampling algorithm, named minority oversampling in kernel adaptive subspaces (MOKAS), which exploits the invariant feature extraction capability of a kernel version of the adaptive subspace self-organizing maps. The synthetic instances are generated from well-trained subspaces and then their pre-images are reconstructed in the input space. Additionally, these instances characterize nonlinear structures present in the minority class data distribution and help the learning algorithms to counterbalance the skewed class distribution in a desirable manner. Experimental results on both real and synthetic data show that the proposed MOKAS is capable of modeling complex data distribution and outperforms a set of state-of-the-art oversampling algorithms.

Existing System:

With great influx of attention devoted to the imbalance learning problem, several strategies have been proposed, which can be roughly divided into two categories: algo-rithm-level methods and data-level methods. Some of the algorithm-level methods use cost-sensitive learning in which the imbalance present in the dataset is counterbalanced by assigning higher cost to misclassifi-cation of the minority class instancesand lower cost to that of the majority class instances.

On the other hand, data-level methods establish class balance through data resampling techniques such as undersampling of the ma-jority class, oversampling of the minority class or a combination of both. This study lays emphasis on oversampling techniques since these methods do not disregard informative and important instances, which un-dersampling algorithms may during rejection of majority class instances.

Proposed System:

In SMOTE, the minority class is over-sampled by taking each minority class instanceto generatesynthetic instancesalong the line segments joining any/all of the minority class nearest neighbors.

The synthetic instances can be generated in a less application-specific manner by operating in thefea-ture space rather than dataspace. In ADASYN and MWMOTE, advanced mechanisms to determine the hard-to-learn minority class instancesare proposed to improve the classifier learning efficiency.

Comment As:

Comment (0)