Self-Similarity Matrices and Localized Attention for Chorus Recognition: A Data-Efficient Music Information Retrieval Approach

Mena, Jose Daniel Luna

http://hdl.handle.net/10400.8/15177

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
Project_2230396_Jose_Mena_MCD_c_f.pdf		3.15 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Mena, Jose Daniel Luna

Orientador(es)

Malheiro, Ricardo Manuel da Silva

Resumo(s)

This project presents an efficient approach to chorus recognition in English song lyrics that achieves state-of-the-art performance with significantly fewer resources than existing methods. We developed a Bidirectional Long Short-Term Memory (BiLSTM) model with localized attention mechanisms, trained on only 780 songs compared to the 25,000+ songs typically used in Music Information Retrieval research. Our approach addresses class imbalance through comprehensive stabilization techniques and leverages nine feature views capturing structural, semantic, and rhythmic patterns via selfsimilarity matrices. Through systematic experimentation, we demonstrate that chorus detection relies primarily on local contextual patterns rather than global structural awareness, with head self-similarity features (line beginnings) proving most critical for segmentation. The BiLSTM + Attention model achieves 78.2% Macro F1 at the line level, matching Watanabe & Goto's (2020) performance with 100,000+ songs and significantly exceeding Fell et al.'s (2018) 67.4% F1 with 25,000 songs. For boundary detection, the model achieves 59.6% F1 for exact boundaries and 74.7% F1 with ±2 tolerance. The research demonstrates that strategic data curation, comprehensive feature engineering, and targeted optimization can compete effectively with resource-intensive approaches, showing that local pattern recognition outperforms complex global modeling strategies in specialized domains like lyric analysis.