A Correlation based Feature Weighting Filter for Naive Bayes in Java

A Correlation based Feature Weighting Filter for Naive Bayes in Java

Abstract:

Due to its simplicity, efficiency, and efficacy, naive Bayes (NB) has continued to be one of the top 10 algorithms in the data mining and machine learning community. Of numerous approaches to alleviating its conditional independence assumption, feature weighting has placed more emphasis on highly predictive features than those that are less predictive. In this paper, we argue that for NB highly predictive features should be highly correlated with the class (maximum mutual relevance), yet uncorrelated with other features (minimum mutual redundancy). Based on this premise, we propose a correlation-based feature weighting (CFW) filter for NB. In CFW, the weight for a feature is a sigmoid transformation of the difference between the feature-class correlation (mutual relevance) and the average feature-feature intercorrelation (average mutual redundancy). Experimental results show that NB with CFW significantly outperforms NB and all the other existing state-of-the-art feature weighting filters used to compare. Compared to feature weighting wrappers for improving NB, the main advantages of CFW are its low computational complexity (no search involved) and the fact that it maintains the simplicity of the final model. Besides, we apply CFW to text classification and have achieved remarkable improvements.