Repository logo
 
Loading...
Profile Picture

Search Results

Now showing 1 - 10 of 14
  • Adaptive learning for dynamic environments: A comparative approach
    Publication . Costa, Joana; Silva, Catarina; Antunes, Mário; Ribeiro, Bernardete
    Nowadays most learning problems demand adaptive solutions. Current challenges include temporal data streams, drift and non-stationary scenarios, often with text data, whether in social networks or in business systems. Various efforts have been pursued in machine learning settings to learn in such environments, specially because of their non-trivial nature, since changes occur between the distribution data used to define the model and the current environment. In this work we present the Drift Adaptive Retain Knowledge (DARK) framework to tackle adaptive learning in dynamic environments based on recent and retained knowledge. DARK handles an ensemble of multiple Support Vector Machine (SVM) models that are dynamically weighted and have distinct training window sizes. A comparative study with benchmark solutions in the field, namely the Learn++.NSE algorithm, is also presented. Experimental results revealed that DARK outperforms Learn++.NSE with two different base classifiers, an SVM and a Classification and Regression Tree (CART).
  • Improving Text Classification Performance with Incremental Background Knowledge
    Publication . Silva, Catarina; Ribeiro, Bernardete
    Text classification is generally the process of extracting interesting and non-trivial information and knowledge from text. One of the main problems with text classification systems is the lack of labeled data, as well as the cost of labeling unlabeled data. Thus, there is a growing interest in exploring the use of unlabeled data as a way to improve classification performance in text classification. The ready availability of this kind of data in most applications makes it an appealing source of information. In this work we propose an Incremental Background Knowledge (IBK) technique to introduce unlabeled data into the training set by expanding it using initial classifiers to deliver oracle decisions. The defined incremental SVM margin-based method was tested in the Reuters-21578 benchmark showing promising results.
  • Decision support system using mobile app statistics
    Publication . Constante, Fabian; Guevara, Juan; Silva, Catarina; Gonçalves, Dulce; Marcelino, Luis
    Nowadays to make the right decision about where and how to request or buy a service, a user is often supported by a mobile device that offers more than simple descriptive data. Nevertheless, not all information on services is fully accessible. The difficulty in keeping track of changes in services’ costs causes delays and can result in waste of time and money. In fact, the decision of which service to use usually involves some level of uncertainty and risk. Hence, the user should have access to some form of decision support system that could be easily available through mobile applications. The power of these devices allows to apply knowledge areas already developed, carrying statistics with dynamic and interactive graphics, thus allowing for a more systematic control of services and corresponding expenses. In this work we analyze the existing related work on mobile decision support systems and propose an architecture of a decision support system using Mobile App Statistics. Tests were carried out with a car fuel app to support the decision of choosing the gas station at each point. Results show that using the additional statistical information provided users can take better decisions during the request of a service.
  • Choice of Best Samples for Building Ensembles in Dynamic Environments
    Publication . Costa, Joana; Silva, Catarina; Antunes, Mário; Ribeiro, Bernardete
    Machine learning approaches often focus on optimizing the algorithm rather than assuring that the source data is as rich as possible. However, when it is possible to enhance the input examples to construct models, one should consider it thoroughly. In this work, we propose a technique to define the best set of training examples using dynamic ensembles in text classification scenarios. In dynamic environments, where new data is constantly appearing, old data is usually disregarded, but sometimes some of those disregarded examples may carry substantial information. We propose a method that determines the most relevant examples by analysing their behaviour when defining separating planes or thresholds between classes. Those examples, deemed better than others, are kept for a longer time-window than the rest. Results on a Twitter scenario show that keeping those examples enhances the final classification performance.
  • Improving Visualization, Scalability and Performance of Multiclass Problems with SVM Manifold Learning
    Publication . Silva, Catarina; Bernardete Ribeiro
    We propose a learning framework to address multiclass challenges, namely visualization, scalability and performance. We focus on supervised problems by presenting an approach that uses prior information about training labels, manifold learning and support vector machines (SVMs). We employ manifold learning as a feature reduction step, nonlinearly embedding data in a low dimensional space using Isomap (Isometric Mapping), enhancing geometric characteristics and preserving the geodesic distance within the manifold. Structured SVMs are used in a multiclass setting with benefits for final multiclass classification in this reduced space. Results on a text classification toy example and on ISOLET, an isolated letter speech recognition problem, demonstrate the remarkable visualization capabilities of the method for multiclass problems in the severely reduced space, whilst improving SVMs baseline performance.
  • On privacy in user tracking mobile applications
    Publication . Gasparovic, Marko; Nicolau, Pedro; Marques, Ana; Silva, Catarina; Marcelino, Luis
    In mobile applications, user tracking with Global Positioning System (GPS) can be very beneficial, making life easier for the user, by e.g. finding points of interest nearby, such as gas stations, super markets, restaurants etc. Nevertheless, the location of the user can be misused and hence privacy issues can become a relevant problem in mobile application development. Technically, location is determined either internally by the device or externally by interacting systems and networks. The resultant location information may be stored and used under various conditions and applications can track the position of the user without his/her consent and eventually misuse it for instance with the intent of sending redirected publicity or even getting logs of the user's location. However, the user's location may not always be obtained using the most precise location function available. In this work we discuss and propose different options for the accuracy geo localization in an application can be and uphold that it is up to the developer to decide which method is appropriate or that the the user should have the freedom to define his/her privacy thresholds. These thresholds can be extremely variable both between users and scenarios, and we present a survey to approach this issue. Results show that users are concerned with privacy issues, but they are not necessarily acting accordingly to keep their privacy at a high level of protection. Finally, we point out that developers shouldn't misuse possibilities of tracking and users should be more cautious with application permissions as will be shown in a real case study.
  • Knowledge Extraction with Non-Negative Matrix Factorization for Text Classification
    Publication . Silva, Catarina; Ribeiro, Bernardete
    Text classification has received increasing interest over the past decades for its wide range of applications driven by the ubiquity of textual information. The high dimensionality of those applications led to pervasive use of dimensionality reduction methods, often black-box feature extraction non-linear techniques. We show how Non-Negative Matrix Factorization (NMF), an algorithm able to learn a parts-based representation of data by imposing non-negativity constraints, can be used to represent and extract knowledge from a text classification problem. The resulting reduced set of features is tested with kernel-based machines on Reuters-21578 benchmark showing the method's performance competitiveness.
  • A Telemedicine Application Using WebRTC
    Publication . Antunes, Mário; Silva, Catarina; Barranca, Joaquim
    ICT in healthcare businesses has been growing in Portugal in the past few decades. The implementation of large scale information systems in hospitals, the deployment of electronic prescription and electronic patient records applications are just a few examples. Telemedicine is another emergent and widely used ICT solution to smooth the communication between patients and healthcare professionals, by allowing video and voice transfer over the Internet. Although there are several implementations of telemedicine solutions, they usually have some drawbacks, namely: i) too specific for a purpose; ii) based on proprietary applications; iii) require additional software installation; iv) and usually have associated costs. In this paper we propose a telemedicine solution based on WebRTC Application Programming Interface (API) to transmit video and voice in real time over the Internet, through a web browser. Besides microphone and webcam control, we have also included two additional functionalities that may be useful to both patients and healthcare professionals during the communication, namelyi) bidirectional sending files capability and ii) shared whiteboard which allows free drawing. The proposed solution uses exclusively open source software components and requires solely a WebRTC compatible web browser, like Google Chrome or Firefox. We have made two types of tests in healthcare environment: i) a bidirectional patient-doctor communication; ii) and connecting at one end an external USB medical device with an integrated webcam. The results were promising, since they revealed the potential of using WebRTC API to control microphone and webcam in a telemedicine application, as well as the appropriateness and acceptance of the features included.
  • Experience of an International Collaborative Project with First Year Programming Students
    Publication . Paterson, James H.; Karhu, Markku; Cazzola, Walter; Illina, Irina; Law, Robert; Malchiodi, Dario; Maximiano, Marisa; Silva, Catarina
    This paper describes an Erasmus Intensive Programme that used international collaboration as a novel pedagogical approach to teaching programming skills to first-year students in a blended learning context using a mixture of virtual environment and intensive teaching. The experience and outcomes of the Programme are evaluated from the viewpoints of the students and instructors and conclusions are drawn on the value and conduct of international student collaborations.
  • Learning the hash code with generalised regression neural networks for handwritten signature biometric data retrieval
    Publication . Ribeiro, Bernardete; Lopes, Noel; Silva, Catarina
    Handwritten signature recognition is one important component of biometric authentication. This is a central process in a broad range of areas requiring personal identification, such as security, legal contracts and bank transactions. Extensive efforts have been put into the research towards the verification of handwritten signatures, which contain biometric information. Although many successful methods have been used, they often disregard the size of databases, which can be very large, posing scalability problems to their application in real-world scenarios. To overcome this problem, in this paper, we use binary embeddings of high-dimensional data which is an efficient tool for indexing big datasets of biometric images. The rationale is to find a good hash function such that similar data points in Euclidean space preserve their similarities in the resulting Hamming space for fast data retrieval and state-of-the-art classification performance. In the settings of an handwritten signature retrieval system, an indexing hashing-based scheme is presented. We propose to learn k-bits hash code with a generalised regression neural network (GRNN), which yielded competitive results in the GPDS database.