Resultados da pesquisa

A mostrar 1 - 10 de 10

A hierarchical broad-class classification to enhance phoneme recognition
Publication . Lopes, Carla, Alexandra Calado Lopes; Perdigão, Fernando
In this paper a hierarchical classification of different levels of phonetic information is proposed in order to improve phone recognition. In this paradigm several intermediate classifiers give posterior probability predictions for broad phonetic classes, achieving phone detail in the last layer. Class membership probabilities are weighted and combined in order to get a more robust phoneme prediction. A method for finding the best set of weights is also proposed based on discriminative training in a hybrid MLP/HMM system. Experiments show that the use of broad-class information enhances phone recognition. Relative improvements of 8% in Correctness and 5% in Accuracy were achieved in phoneme recognition on the TIMIT database compared to a baseline system.
2009-08Comunicação em conferência Acesso aberto Ver mais
Detection of Mispronunciations and Disfluencies in Children Reading Aloud
Publication . Proença, Jorge; Lopes, Carla; Tjalve, Michael; Stolcke, Andreas; Candeias, Sara; Perdigão, Fernando
To automatically evaluate the performance of children reading aloud or to follow a child’s reading in reading tutor applications, different types of reading disfluencies and mispronunciations must be accounted for. In this work, we aim to detect most of these disfluencies in sentence and pseudoword reading. Detecting incorrectly pronounced words, and quantifying the quality of word pronunciations, is arguably the hardest task. We approach the challenge as a two-step process. First, a segmentation using task-specific lattices is performed, while detecting repetitions and false starts and providing candidate segments for words. Then, candidates are classified as mispronounced or not, using multiple features derived from likelihood ratios based on phone decoding and forced alignment, as well as additional meta-information about the word. Several classifiers were explored (linear fit, neural networks, support vector machines) and trained after a feature selection stage to avoid overfitting. Improved results are obtained using feature combination compared to using only the log likelihood ratio of the reference word (22% versus 27% miss rate at constant 5% false alarm rate).
2017-08-20Comunicação em conferência Acesso aberto Ver mais
HESITA(te) in Portuguese
Publication . Candeias, Sara; Celorico, Dirce; Proença, Jorge; Veiga, Arlindo; Lopes, Carla, Alexandra Calado Lopes; Perdigão, Fernando
Hesitations, so-called disfluencies, are a characteristic of spontaneous speech, playing a primary role in its structure, reflecting aspects of the language production and the management of inter-communication. In this paper we intend to present a database of hesitations in European Portuguese speech - HESITA - as a relevant base of work to study a variety of speech phenomena. Patterns of hesitations, hesitation distribution according to speaking style, and phonetic properties of the fillers are some of the characteristics we extrapolated from the HESITA database. This database also represents an important resource for improvement in synthetic speech naturalness as well as in robust acoustic modelling for automatic speech recognition. The HESITA database is the output of a project in the speech-processing field for European Portuguese held by an interdisciplinary group in intimate articulation between engineering tools and experience and the linguistic approach.
2014-05Comunicação em conferência Acesso aberto Ver mais
Design and Analysis of a Database to Evaluate Children’s Reading Aloud Performance
Publication . Proença, Jorge; Celorico, Dirce; Lopes, Carla, Alexandra Calado Lopes; Dias, Miguel Sales; Tjalve, Michael; Stolcke, Andreas; Candeias, Sara; Perdigão, Fernando
To evaluate the reading performance of children, human assessment is usually involved, where a teacher or tutor has to take time to individually estimate the performance in terms of fluency (speed, accuracy and expression). Automatic estimation of reading ability can be an important alternative or complement to the usual methods, and can improve other applications such as elearning. Techniques must be developed to analyse audio recordings of read utterances by children and detect the deviations from the intended correct reading i.e. disfluencies. For that goal, a database of 284 European Portuguese children from 6 to 10 years old (1st-4th grades) reading aloud amounting to 20 hours was collected in private and public Portuguese schools. This paper describes the design of the reading tasks as well as the data collection procedure. The presence of different types of disfluencies is analysed as well as reading performance compared to known curricular goals.
2016Comunicação em conferência Acesso restrito Ver mais
Automatic evaluation of reading aloud performance in children
Publication . Proença, Jorge; Lopes, Carla, Alexandra Calado Lopes; Tjalve, Michael; Stolcke, Andreas; Candeias, Sara; Perdigão, Fernando
Evaluating children’s reading aloud proficiency is typically a task done by teachers on an individual ba sis, where reading time and wrong words are marked manually. A computational tool that assists with recording reading tasks, automatically analyzing them and outputting performance related metrics could be a significant help to teachers. Working towards that goal, this work presents an approach to automat ically predict the overall reading aloud ability of primary school children by employing automatic speech processing methods. Reading tasks were designed focused on sentences and pseudowords, so as to obtain complementary information from the two distinct assignments. A dataset was collected with recordings of 284 children aged 6–10 years reading in native European Portuguese. The most common disfluencies identified include intra-word pauses, phonetic extensions, false starts, repetitions, and mispronunciations. To automatically detect reading disfluencies, we first target extra events by employing task-specific lat tices for decoding that allow syllable-based false starts as well as repetitions of words and sequences of words. Then, mispronunciations are detected based on the log likelihood ratio between the recognized and target words. The opinions of primary school teachers were gathered as ground truth of overall read ing aloud performance, who provided 0–5 scores closely related to the expected performance at the end of each grade. To predict these scores, various features were extracted by automatic annotation and re gression models were trained. Gaussian process regression proved to be the most successful approach. Feature selection from both sentence and pseudoword tasks give the closest predictions, with a correla tion of 0.944 compared to the teachers’ grading. Compared to the use of manual annotation, where the best models obtained give a correlation of 0.949, there was a relative decrease of only 0.5% for using automatic annotations to extract features. The error rate of predicted scores relative to ground truth also proved to be smaller than the deviation of evaluators’ opinion per child.
2017-11Artigo científico Acesso restrito Ver mais
The LetsRead Corpus of Portuguese children reading aloud for performance evaluation
Publication . Proença, Jorge; Celorico, Dirce; Candeias, Sara; Lopes, Carla; Perdigão, Fernando
This paper introduces the LetsRead Corpus of European Portuguese read speech from 6 to 10 years old children. The motivation for the creation of this corpus stems from the inexistence of databases with recordings of reading tasks of Portuguese children with different performance levels and including all the common reading aloud disfluencies. It is also essential to develop techniques to fulfill the main objective of the LetsRead project: to automatically evaluate the reading performance of children through the analysis of reading tasks. The collected data amounts to 20 hours of speech from 284 children from private and public Portuguese schools, with each child carrying out two tasks: reading sentences and reading a list of pseudowords, both with varying levels of difficulty throughout the school grades. In this paper, the design of the reading tasks presented to children is described, as well as the collection procedure. Manually annotated data is analyzed according to disfluencies and reading performance. The considered word difficulty parameter is also confirmed to be suitable for the pseudoword reading tasks.
2016Comunicação em conferência Acesso restrito Ver mais
Prospetiva 2035 - Três Cenários para o Futuro de Leiria e Oeste
Publication . Lopes, Carla, Alexandra Calado Lopes; Almeida, Isabel; Carriço, Silvia; Mouga, Teresa; Fernandes, Isabel; Siopa, Jorge; Gala, Pedro; Antunes, Mário; Silva, Agostinho
A EM@IPLeiria é um think tank criado em 2023 para impulsionar um desenvolvimento sustentável, inovador e competitivo na região de Leiria e Oeste. Mais do que um centro de estudos, é uma fábrica de ideias e soluções, dedicada à análise dos desafios estruturais do território, à identificação de novas oportunidades e ao teste de respostas concretas para problemas reais. Como espaço de cocriação e experimentação, a EM@IPLeiria envolve diversos atores regionais, incluindo autarquias, empresas, instituições de ensino e a sociedade civil, promovendo um modelo de trabalho colaborativo e participativo na construção de estratégias para o futuro. A sua abordagem alia design thinking e prospetiva estratégica, permitindo antecipar tendências, conceber cenários e testar soluções inovadoras antes da sua aplicação em larga escala.
2025-03Livro Acesso aberto Ver mais
Children's reading aloud performance: a database and automatic detection of disfluencies
Publication . Proença, Jorge; Celorico, Dirce; Candeias, Sara; Lopes, Carla; Perdigão, Fernando
The automatic evaluation of children's reading performance by detecting and analyzing errors and disfluencies in speech is an important tool to build automatic reading tutors and to complement the current method of manual evaluations of overall reading ability in schools. A large amount of speech from children reading aloud plentiful in errors and disfluencies is needed to train acoustic, disfluency and pronunciation models for an automatic reading assessment system. This paper describes the acquisition and analysis of a read-aloud speech database of European Portuguese from children aged 6-10 from the first to fourth school grades. Towards the goal of detecting all reading errors and disfluencies, we apply a decoding process to the utterances using flexible word level lattices that allow syllable based false starts and repetitions of two or more word sequences. The proposed method proved promising in detecting corrections and repetitions in sentences, and provides an improved alignment of the data, helpful for future annotation tasks. The analysis of the database also shows agreement to government defined curricular goals for reading.
2015-09-06Comunicação em conferência Acesso aberto Ver mais
A European Portuguese Children Speech Database for Computer Aided Speech Therapy
Publication . Lopes, Carla, Alexandra Calado Lopes; Veiga, Arlindo; Perdigão, Fernando
This paper introduces a European Portuguese speech database containing spoken material recorded from children. The need for such database arose from the need of train phone models for the development of a computer aided speech therapy system. Articulatory disorders affect a significant number of children in pre-school age. We propose a system intended to assist and reinforce the conventional speech therapy programs. Through the systematic use of games, it learns the phones where the child has more difficulty to pronounce. The child is then taken to train the production of those phones by playing games. Another interest of a children speech database is that accurate children's phone recognition is only possible using training data that reflects the population of users. It is a difficult task due to the high pitch of children's speech
2012Comunicação em conferência Acesso restrito Ver mais
Acoustic Similarity Scores for Keyword Spotting
Publication . Veiga, Arlindo; Lopes, Carla; Sá, Luís; Perdigão, Fernando
This paper presents a study on keyword spotting systems based on acoustic similarity between a filler model and keyword model. The ratio between the keyword model likelihood and the generic (filler) model likelihood is used by the classifier to detect relevant peaks values that indicate keyword occurrences. We have changed the standard scheme of keyword spotting system to allow keyword detection in a single forward step. We propose a new loglikelihood ratio normalization to minimize the effect of word length on the classifier performance. Tests show the effectiveness of our normalization method against two other methods. Experiments were performed on continuous speech utterances of the Portuguese TECNOVOZ database (read sentences) with keywords of several lengths.
2014Comunicação em conferência Acesso aberto Ver mais

Lopes, Carla

Filtros

Autor

Assunto

Data

Entidade

Configurações

Ordenar por

Resultados por página

Resultados da pesquisa