Analysing Attack Times
Python. I used the spectrogram to detect attack times in a single line melody by shaping signal’s power graph. It works better with only slow passages in which the notes are separable from each other. I wrote its algorithms by taking reference the signal’s total power. Basically, it reshapes the total power by a smoothing algorithm, and finds the attack times for each note based on a threshold value. One of the bottlenecks is that reshaping algorithm might fail for fast passages and/or deciding threshold value could be unclear. I will improve the reshaping strategy later on.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | import os import numpy as np import scipy.io.wavfile as wavfile import matplotlib.pyplot as plt from scipy.signal import spectrogram def analyseAttackTimes(fileName): # fs: sampling rate of the wav file, x: integer point array if (os.path.isfile(fileName) == False): raise ValueError("Input file is not valid") fs, x = wavfile.read(fileName) # We only accept mono and 44.1khz audio files for this implementation if (len(x.shape) != 1): raise ValueError("Audio file is not mono") if (fs != 44100): raise ValueError("Sampling rate of input sound is not 44100") # Get the total power, normalize it in the range [0, 1], and reshape f, T, P = spectrogram(x, fs, nperseg=512) totalPower = np.sum(P, axis=0) normTotalPower = totalPower / np.max(totalPower) reshaped = reshapeSignal(normTotalPower.copy()) # Calculate the attack times timeIndexes = findAttackTimeIndexes(reshaped) sigDuration = len(x)/fs attackTimes = [] for m in timeIndexes: attackTimes.append((sigDuration/len(T)) * m) # Create subplots and display them fig, axs = plt.subplots(2, 1, figsize=(10, 8), dpi=300, sharex=True) time_axis = np.linspace(0, len(x) / fs, len(totalPower)) plt.subplot(2, 1, 1) plt.plot(time_axis, normTotalPower, color='blue') plt.text(0.95, 0.95, 'Total Power', ha='right', va='top', fontsize=15, transform=plt.gca().transAxes) plt.subplot(2, 1, 2) plt.plot(time_axis, reshaped, color='red') plt.xlabel('Time [sec]') plt.text(0.95, 0.95, 'Reshaped Total Power', ha='right', va='top', fontsize=15, transform=plt.gca().transAxes) plt.tight_layout() plt.show() return attackTimes def reshapeSignal(signal, cycle = 2): for i in range(0, cycle): for j in range(2, len(signal)-1): signal[j] = (signal[j-1] + signal[j+1]) / 2 return signal def findAttackTimeIndexes(signal, thresholdValue = 0.3): threshold = thresholdValue * np.max(signal) timeIndexes = [] isAttackTime = False for i in range(0, len(signal)): if((signal[i] >= threshold) and not isAttackTime): timeIndexes.append(i) isAttackTime = True elif((signal[i] < threshold) and isAttackTime): isAttackTime = False return timeIndexes if __name__ == "__main__": result = analyseAttackTimes('C:/Users/Hakan/Desktop/Sounds/piano.wav') print(result) |
For demonstration, I used a simple piano segment. It is a 5 notes single melody line. Its STFT parameter and threshold can be changed. It gives the output the related attack times. For this simple melody line;
- 0.04 sec
- 0.85 sec
- 1.04 sec
- 1.55 sec
- 2.08 sec
References:
[1] Mathematics of the Discrete Fourier Transform, Julius O. Smith III