Browsing by Author "Ribeiro, Bernardete"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- Adaptive learning for dynamic environments: A comparative approachPublication . Costa, Joana; Silva, Catarina; Antunes, Mário; Ribeiro, BernardeteNowadays most learning problems demand adaptive solutions. Current challenges include temporal data streams, drift and non-stationary scenarios, often with text data, whether in social networks or in business systems. Various efforts have been pursued in machine learning settings to learn in such environments, specially because of their non-trivial nature, since changes occur between the distribution data used to define the model and the current environment. In this work we present the Drift Adaptive Retain Knowledge (DARK) framework to tackle adaptive learning in dynamic environments based on recent and retained knowledge. DARK handles an ensemble of multiple Support Vector Machine (SVM) models that are dynamically weighted and have distinct training window sizes. A comparative study with benchmark solutions in the field, namely the Learn++.NSE algorithm, is also presented. Experimental results revealed that DARK outperforms Learn++.NSE with two different base classifiers, an SVM and a Classification and Regression Tree (CART).
- Choice of Best Samples for Building Ensembles in Dynamic EnvironmentsPublication . Costa, Joana; Silva, Catarina; Antunes, Mário; Ribeiro, BernardeteMachine learning approaches often focus on optimizing the algorithm rather than assuring that the source data is as rich as possible. However, when it is possible to enhance the input examples to construct models, one should consider it thoroughly. In this work, we propose a technique to define the best set of training examples using dynamic ensembles in text classification scenarios. In dynamic environments, where new data is constantly appearing, old data is usually disregarded, but sometimes some of those disregarded examples may carry substantial information. We propose a method that determines the most relevant examples by analysing their behaviour when defining separating planes or thresholds between classes. Those examples, deemed better than others, are kept for a longer time-window than the rest. Results on a Twitter scenario show that keeping those examples enhances the final classification performance.
- Improving Text Classification Performance with Incremental Background KnowledgePublication . Silva, Catarina; Ribeiro, BernardeteText classification is generally the process of extracting interesting and non-trivial information and knowledge from text. One of the main problems with text classification systems is the lack of labeled data, as well as the cost of labeling unlabeled data. Thus, there is a growing interest in exploring the use of unlabeled data as a way to improve classification performance in text classification. The ready availability of this kind of data in most applications makes it an appealing source of information. In this work we propose an Incremental Background Knowledge (IBK) technique to introduce unlabeled data into the training set by expanding it using initial classifiers to deliver oracle decisions. The defined incremental SVM margin-based method was tested in the Reuters-21578 benchmark showing promising results.
- Knowledge Extraction with Non-Negative Matrix Factorization for Text ClassificationPublication . Silva, Catarina; Ribeiro, BernardeteText classification has received increasing interest over the past decades for its wide range of applications driven by the ubiquity of textual information. The high dimensionality of those applications led to pervasive use of dimensionality reduction methods, often black-box feature extraction non-linear techniques. We show how Non-Negative Matrix Factorization (NMF), an algorithm able to learn a parts-based representation of data by imposing non-negativity constraints, can be used to represent and extract knowledge from a text classification problem. The resulting reduced set of features is tested with kernel-based machines on Reuters-21578 benchmark showing the method's performance competitiveness.
- Learning the hash code with generalised regression neural networks for handwritten signature biometric data retrievalPublication . Ribeiro, Bernardete; Lopes, Noel; Silva, CatarinaHandwritten signature recognition is one important component of biometric authentication. This is a central process in a broad range of areas requiring personal identification, such as security, legal contracts and bank transactions. Extensive efforts have been put into the research towards the verification of handwritten signatures, which contain biometric information. Although many successful methods have been used, they often disregard the size of databases, which can be very large, posing scalability problems to their application in real-world scenarios. To overcome this problem, in this paper, we use binary embeddings of high-dimensional data which is an efficient tool for indexing big datasets of biometric images. The rationale is to find a good hash function such that similar data points in Euclidean space preserve their similarities in the resulting Hamming space for fast data retrieval and state-of-the-art classification performance. In the settings of an handwritten signature retrieval system, an indexing hashing-based scheme is presented. We propose to learn k-bits hash code with a generalised regression neural network (GRNN), which yielded competitive results in the GPDS database.