TY - GEN
T1 - Separation of monophonic music signal based on user-guided onset information
AU - Park, Jeongsoo
AU - Lee, Kyogu
PY - 2014
Y1 - 2014
N2 - In this paper, we present a novel informed source separation (ISS) algorithm from mono-phonic sound mixtures based on user-guided onset information. Conventional user-guided ISS methods have studied various cases to provide target sound information. We consider the case where the information is given via a user-guided audio signal. Conventional algorithm requires both spectral and temporal information of the target source to be separated, often provided by means of singing/humming. However, it might be difficult for an unskilled user to provide exact pitch information of the target source, which can result in severe performance degradation of the spectrogram decomposition algorithms. On the other hand, it is relatively easier for novice users to give onset information by finger- or foot-tapping, for example. In this paper, we propose a novel informed source separation algorithm where only temporal information from the user is given by means of note onsets of the target source. To this end, we utilize non-negative matrix factorization (NMF) comprised of two steps. In the first step of NMF, we aim to estimate a spectral basis of the target source with the use of sparsity constraint. In the second step, we estimate the corresponding temporal basis of the target source. Finally, we reconstruct the estimated target sound based on the results of the two-step NMF. Experiments show that the proposed algorithm can successfully separate the target source using just the onset information from the user when there exists no significant overlap in onsets between the target and other sources.
AB - In this paper, we present a novel informed source separation (ISS) algorithm from mono-phonic sound mixtures based on user-guided onset information. Conventional user-guided ISS methods have studied various cases to provide target sound information. We consider the case where the information is given via a user-guided audio signal. Conventional algorithm requires both spectral and temporal information of the target source to be separated, often provided by means of singing/humming. However, it might be difficult for an unskilled user to provide exact pitch information of the target source, which can result in severe performance degradation of the spectrogram decomposition algorithms. On the other hand, it is relatively easier for novice users to give onset information by finger- or foot-tapping, for example. In this paper, we propose a novel informed source separation algorithm where only temporal information from the user is given by means of note onsets of the target source. To this end, we utilize non-negative matrix factorization (NMF) comprised of two steps. In the first step of NMF, we aim to estimate a spectral basis of the target source with the use of sparsity constraint. In the second step, we estimate the corresponding temporal basis of the target source. Finally, we reconstruct the estimated target sound based on the results of the two-step NMF. Experiments show that the proposed algorithm can successfully separate the target source using just the onset information from the user when there exists no significant overlap in onsets between the target and other sources.
UR - http://www.scopus.com/inward/record.url?scp=84922605618&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84922605618
T3 - 21st International Congress on Sound and Vibration 2014, ICSV 2014
SP - 2483
EP - 2490
BT - 21st International Congress on Sound and Vibration 2014, ICSV 2014
PB - International Institute of Acoustics and Vibrations
T2 - 21st International Congress on Sound and Vibration 2014, ICSV 2014
Y2 - 13 July 2014 through 17 July 2014
ER -