Boosting SpLSA for Text Classification

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Text classification is a challenge in document labeling tasks such as spam filtering and sentiment analysis. Due to the descriptive richness of generative approaches such as probabilistic Latent Semantic Analysis (pLSA), documents are often modeled using these kind of strategies. Recently, a supervised extension of pLSA (spLSA [10]) has been proposed for human action recognition in the context of computer vision. In this paper we propose to extend spLSA to be used in text classification. We do this by introducing two extensions in spLSA: (a) Regularized spLSA, and (b) Label uncertainty in spLSA. We evaluate the proposal in spam filtering and sentiment analysis classification tasks. Experimental results show that spLSA outperforms pLSA in both tasks. In addition, our extensions favor fast convergence suggesting that the use of spLSA may reduce training time while achieving the same accuracy as more expensive methods such as sLDA or SVM.