MUSIB: musical score inpainting benchmark

Abstract
Abstract Music inpainting is a sub-task of automated music generation that aims to infill incomplete musical pieces to help musicians in their musical composition process. Many methods have been developed for this task. However, we observe a tendency for each method to be evaluated using different datasets and metrics in the papers where they are presented. This lack of standardization hinders an adequate comparison of these approaches. To tackle these problems, we present MUSIB, a new benchmark for musical score inpainting with standardized conditions for evaluation and reproducibility. MUSIB evaluates four models: Variable Length Piano Infilling (VLI), Music InpaintNet, Music SketchNet, and AnticipationRNN, and over two commonly used datasets: JSB Chorales and IrishFolkSong. We also compile, extend, and propose metrics to adequately quantify note attributes such as pitch and rhythm with Note Metrics, but also higher-level musical properties with the introduction of Divergence Metrics, which operate by comparing the distance between distributions of musical features. Our evaluation shows that VLI, a model based on Transformer architecture, is the best performer on a larger dataset, while VAE-based models surpass this Transformer-based model on a relatively small dataset. With MUSIB, we aim at inspiring the community towards better reproducibility in music generation research, setting an example for strongly founded comparisons among SOTA methods.
Description
Keywords
Music generation, Music inpainting, Music infilling, Benchmark, Evaluation, Reproducibility
Citation
Araneda-Hernandez, M., Bravo-Marquez, F., Parra, D. et al. MUSIB: musical score inpainting benchmark. J AUDIO SPEECH MUSIC PROC. 2023, 19 (2023). https://doi.org/10.1186/s13636-023-00279-6