Distributed Text Classification With an Ensemble Kernel-Based Learning Approach

Silva, Catarina; Lotric, Uros; Ribeiro, Bernardete; Dobnikar, Andrej

http://hdl.handle.net/10400.8/14614

Use this identifier to reference this record.

Name:	Description:	Size:	Format:
Distributed text classification with an ensemble kernel-based learning approach.pdf	Constructing a single text classifier that excels in any given application is a rather inviable goal. As a result, ensemble systems are becoming an important resource, since they permit the use of simpler classifiers and the integration of different knowledge in the learning process. However, many text-classification ensemble approaches have an extremely high computational burden, which poses limitations in applications in real environments. Moreover, state-of-the-art kernel-based classifiers, such as support vector machines and relevance vector machines, demand large resources when applied to large databases. Therefore, we propose the use of a new systematic distributed ensemble framework to tackle these challenges, based on a generic deployment strategy in a cluster distributed environment. We employ a combination of both task and data decomposition of the text-classification system, based on partitioning, communication, agglomeration, and mapping to define and optimize a graph of dependent tasks. Additionally, the framework includes an ensemble system where we exploit diverse patterns of errors and gain from the synergies between the ensemble classifiers. The ensemble data partitioning strategy used is shown to improve the performance of baseline state-of-the-art kernel-based machines. The experimental results show that the performance of the proposed framework outperforms standard methods both in speed and classification.	1.56 MB	Adobe PDF	Download

Send Feedback

Authors

Abstract(s)

Constructing a single text classifier that excels in any given application is a rather inviable goal. As a result, ensemble systems are becoming an important resource, since they permit the use of simpler classifiers and the integration of different knowledge in the learning process. However, many text-classification ensemble approaches have an extremely high computational burden, which poses limitations in applications in real environments. Moreover, state-of-the-art kernel-based classifiers, such as support vector machines and relevance vector machines, demand large resources when applied to large databases. Therefore, we propose the use of a new systematic distributed ensemble framework to tackle these challenges, based on a generic deployment strategy in a cluster distributed environment. We employ a combination of both task and data decomposition of the text-classification system, based on partitioning, communication, agglomeration, and mapping to define and optimize a graph of dependent tasks. Additionally, the framework includes an ensemble system where we exploit diverse patterns of errors and gain from the synergies between the ensemble classifiers. The ensemble data partitioning strategy used is shown to improve the performance of baseline state-of-the-art kernel-based machines. The experimental results show that the performance of the proposed framework outperforms standard methods both in speed and classification.

Description

Fonte: https://www.researchgate.net/publication/224108209_Distributed_Text_Classification_With_an_Ensemble_Kernel-Based_Learning_Approach

Keywords

Distributed learning ensembles kernel-based machines text classification

URI

http://hdl.handle.net/10400.8/14614

Citation

Silva, Catarina & Lotric, Uros & Dobnikar, Andrej. (2010). Distributed Text Classification With an Ensemble Kernel-Based Learning Approach. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on. 40. 287 - 297. DOI: https://doi.org/10.1109/TSMCC.2009.2038280.