Boosting SpLSA for Text Classification

Hurtado, Julio; Mendoza, Marcelo; Nanculef, Ricardo

Boosting SpLSA for Text Classification

Date

2017

Authors

Hurtado, Julio

Mendoza, Marcelo

Nanculef, Ricardo

Abstract

Text classification is a challenge in document labeling tasks such as spam filtering and sentiment analysis. Due to the descriptive richness of generative approaches such as probabilistic Latent Semantic Analysis (pLSA), documents are often modeled using these kind of strategies. Recently, a supervised extension of pLSA (spLSA [10]) has been proposed for human action recognition in the context of computer vision. In this paper we propose to extend spLSA to be used in text classification. We do this by introducing two extensions in spLSA: (a) Regularized spLSA, and (b) Label uncertainty in spLSA. We evaluate the proposal in spam filtering and sentiment analysis classification tasks. Experimental results show that spLSA outperforms pLSA in both tasks. In addition, our extensions favor fast convergence suggesting that the use of spLSA may reduce training time while achieving the same accuracy as more expensive methods such as sLDA or SVM.

URI

https://repositorio.uc.cl/handle/11534/85937

Collections

Artículos de conferencia

Full item page