Browsing by Author "Stolcke, Andreas"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- Automatic Evaluation of Children Reading Aloud on Sentences and PseudowordsPublication . Proença, Jorge; Lopes, Carla; Tjalve, Michael; Stolcke, Andreas; Candeias, Sara; Perdigão, FernandoReading aloud performance in children is typically assessed by teachers on an individual basis, manually marking reading time and incorrectly read words. A computational tool that assists with recording reading tasks, automatically analyzing them and providing performance metrics could be a significant help. Towards that goal, this work presents an approach to automatically predicting the overall reading aloud ability of primary school children (6-10 years old), based on the reading of sentences and pseudowords. The opinions of primary school teachers were gathered as ground truth of performance, who provided 0-5 scores closely related to the expectations at the end of each grade. To predict these scores automatically, features based on reading speed and number of disfluencies were extracted, after an automatic disfluency detection. Various regression models were trained, with Gaussian process regression giving best results for automatic features. Feature selection from both sentence and pseudoword reading tasks gave the closest predictions, with a correlation of 0.944. Compared to the use of manual annotation with the best correlation being 0.952, automatic annotation was only 0.8% worse. Furthermore, the error rate of predicted scores relative to ground truth was found to be smaller than the deviation of evaluators’ opinion per child.
- Automatic evaluation of reading aloud performance in childrenPublication . Proença, Jorge; Lopes, Carla, Alexandra Calado Lopes; Tjalve, Michael; Stolcke, Andreas; Candeias, Sara; Perdigão, FernandoEvaluating children’s reading aloud proficiency is typically a task done by teachers on an individual ba sis, where reading time and wrong words are marked manually. A computational tool that assists with recording reading tasks, automatically analyzing them and outputting performance related metrics could be a significant help to teachers. Working towards that goal, this work presents an approach to automat ically predict the overall reading aloud ability of primary school children by employing automatic speech processing methods. Reading tasks were designed focused on sentences and pseudowords, so as to obtain complementary information from the two distinct assignments. A dataset was collected with recordings of 284 children aged 6–10 years reading in native European Portuguese. The most common disfluencies identified include intra-word pauses, phonetic extensions, false starts, repetitions, and mispronunciations. To automatically detect reading disfluencies, we first target extra events by employing task-specific lat tices for decoding that allow syllable-based false starts as well as repetitions of words and sequences of words. Then, mispronunciations are detected based on the log likelihood ratio between the recognized and target words. The opinions of primary school teachers were gathered as ground truth of overall read ing aloud performance, who provided 0–5 scores closely related to the expected performance at the end of each grade. To predict these scores, various features were extracted by automatic annotation and re gression models were trained. Gaussian process regression proved to be the most successful approach. Feature selection from both sentence and pseudoword tasks give the closest predictions, with a correla tion of 0.944 compared to the teachers’ grading. Compared to the use of manual annotation, where the best models obtained give a correlation of 0.949, there was a relative decrease of only 0.5% for using automatic annotations to extract features. The error rate of predicted scores relative to ground truth also proved to be smaller than the deviation of evaluators’ opinion per child.
- Design and Analysis of a Database to Evaluate Children’s Reading Aloud PerformancePublication . Proença, Jorge; Celorico, Dirce; Lopes, Carla, Alexandra Calado Lopes; Dias, Miguel Sales; Tjalve, Michael; Stolcke, Andreas; Candeias, Sara; Perdigão, FernandoTo evaluate the reading performance of children, human assessment is usually involved, where a teacher or tutor has to take time to individually estimate the performance in terms of fluency (speed, accuracy and expression). Automatic estimation of reading ability can be an important alternative or complement to the usual methods, and can improve other applications such as elearning. Techniques must be developed to analyse audio recordings of read utterances by children and detect the deviations from the intended correct reading i.e. disfluencies. For that goal, a database of 284 European Portuguese children from 6 to 10 years old (1st-4th grades) reading aloud amounting to 20 hours was collected in private and public Portuguese schools. This paper describes the design of the reading tasks as well as the data collection procedure. The presence of different types of disfluencies is analysed as well as reading performance compared to known curricular goals.
- Detection of Mispronunciations and Disfluencies in Children Reading AloudPublication . Proença, Jorge; Lopes, Carla; Tjalve, Michael; Stolcke, Andreas; Candeias, Sara; Perdigão, FernandoTo automatically evaluate the performance of children reading aloud or to follow a child’s reading in reading tutor applications, different types of reading disfluencies and mispronunciations must be accounted for. In this work, we aim to detect most of these disfluencies in sentence and pseudoword reading. Detecting incorrectly pronounced words, and quantifying the quality of word pronunciations, is arguably the hardest task. We approach the challenge as a two-step process. First, a segmentation using task-specific lattices is performed, while detecting repetitions and false starts and providing candidate segments for words. Then, candidates are classified as mispronounced or not, using multiple features derived from likelihood ratios based on phone decoding and forced alignment, as well as additional meta-information about the word. Several classifiers were explored (linear fit, neural networks, support vector machines) and trained after a feature selection stage to avoid overfitting. Improved results are obtained using feature combination compared to using only the log likelihood ratio of the reference word (22% versus 27% miss rate at constant 5% false alarm rate).
- Mispronunciation Detection in Children's Reading of SentencesPublication . Proença, Jorge; Lopes, Carla Alexandra; Tjalve, Michael; Stolcke, Andreas; Candeias, Sara; Perdigão, FernandoThis work proposes an approach to automatically parse children’s reading of sentences by detecting word pronunciations and extra content, and to classify words as correctly or incorrectly pronounced. This approach can be directly helpful for automatic assessment of reading level or for automatic reading tutors, where a correct reading must be identified. We propose a first segmentation stage to locate candidate word pronunciations based on allowing repetitions and false starts of a word’s syllables. A decoding grammar based solely on syllables allows silence to appear during a word pronunciation. At a second stage, word candidates are classified as mispronounced or not. The feature that best classifies mispronunciations is found to be the log-likelihood ratio between a free phone loop and a word spotting model in the very close vicinity of the candidate segmentation. Additional features are combined in multi-feature models to further improve classification, including: normalizations of the log-likelihood ratio, derivations from phone likelihoods, and Levenshtein distances between the correct pronunciation and recognized phonemes through two phoneme recognition approaches. Results show that most extra events were detected (close to 2% word error rate achieved) and that using automatic segmentation for mispronunciation classification approaches the performance of manual segmentation. Although the log-likelihood ratio from a spotting approach is already a good metric to classify word pronunciations, the combination of additional features provides a relative reduction of the miss rate of 18% (from 34.03% to 27.79% using manual segmentation and from 35.58% to 29.35% using automatic segmentation, at constant 5% false alarm rate).
