| Name: | Description: | Size: | Format: | |
|---|---|---|---|---|
| 3.15 MB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
This project presents an efficient approach to chorus recognition in English song lyrics that
achieves state-of-the-art performance with significantly fewer resources than existing
methods. We developed a Bidirectional Long Short-Term Memory (BiLSTM) model with
localized attention mechanisms, trained on only 780 songs compared to the 25,000+ songs
typically used in Music Information Retrieval research.
Our approach addresses class imbalance through comprehensive stabilization techniques and
leverages nine feature views capturing structural, semantic, and rhythmic patterns via selfsimilarity
matrices. Through systematic experimentation, we demonstrate that chorus
detection relies primarily on local contextual patterns rather than global structural awareness,
with head self-similarity features (line beginnings) proving most critical for segmentation.
The BiLSTM + Attention model achieves 78.2% Macro F1 at the line level, matching
Watanabe & Goto's (2020) performance with 100,000+ songs and significantly exceeding
Fell et al.'s (2018) 67.4% F1 with 25,000 songs. For boundary detection, the model achieves
59.6% F1 for exact boundaries and 74.7% F1 with ±2 tolerance.
The research demonstrates that strategic data curation, comprehensive feature engineering,
and targeted optimization can compete effectively with resource-intensive approaches,
showing that local pattern recognition outperforms complex global modeling strategies in
specialized domains like lyric analysis.
Description
Keywords
Lyric segmentation Chorus detection Attention mechanisms Self-similarity matrices Local pattern recognition
