Discrete Fourier Transform

Python. This is my DFT implementation to test a wav file. It only accepts mono wav files with 44100 Hz sample rate. I used 32 floating point (with max 23 mantissa) in the range -1 and +1 for normalisation.

I chose 2048 for N points. DFT’s computational complexity is O(N^2) while FFT’s computational complexity is O(NlogN), so it takes very long time when N point is chosen too big. It is acceptable till 8192. After that, it takes time. There is also time starting variable, so it can be start anywhere from 0. I did not use windowing with zero padding at this implementation because it was a direct application of DFT:

(1) $\begin{equation*} X[k]=\sum_{n=0}^{N-1}{x[n]e^{{-j\frac{2\pi}{N}}nk}} \end{equation*}$

where k = 0, 1, 2,…, N – 1

The implementation gives magnitude spectra (one in normal one in dB) and phase. It is easier to see to the peaks (fundamental frequency and harmonics) in the normal spectrum, that’s why I added the normal magnitude spectrum to the figure as well. Since magnitude spectrum is symmetric, it would be better to take the first halves of the absolute value of spectra. Finally, to show the results in frequency range, frequency resolution (sample rate/N) was applied to the plots.

import os
import numpy as np
import matplotlib.pyplot as plt
from scipy.io.wavfile import read

def DFT(filename):

    #Reading a wav file and converting it to a normalized floating point array
    #fs: sampling rate of the wav file, x: integer point array

    if (os.path.isfile(filename) == False):					
            raise ValueError("Input file is not valid")
            
    fs, x = read(filename)

    #We only accept mono and 44.1khz audio files for this implementation
    
    if (len(x.shape) != 1):                                 
            raise ValueError("Audio file is not mono")
    if (fs != 44100):                                       
            raise ValueError("Sampling rate of input sound is not 44100")
            
    #Converting the wav file into floating point 32 in the range of -1 to 1
    #Computational complexity of DFT is O(N^2), N should not be high

    INT16_FAC = (2**15)-1
    norm_fact = {'int16':INT16_FAC,'float32':1.0}
    
    x = np.float32(x)/norm_fact[x.dtype.name]   
    
    time = 0.5                                  #starting time frame of the signal
    N = 2048                                    #frame size for DFT, only even numbers
    x = x[int(time*fs):int(time*fs)+N]          #input sequence for DFT without windowing                
    Xf = np.array([])                           #frequency domain array

    for k in range(N):
        s = np.exp(-1j * 2 * np.pi * k * np.arange(N)/ N)
        Xf = np.append(Xf, sum(x*s))
    
    mX = abs(Xf)                                #amplitude
    mXdb = 20*np.log(mX)                        #amplitude in dB
    pX = np.unwrap(np.angle(Xf))                #phase
                         
    #Frequency resolution would be applied to x axis for amplitude and phase
    #First halves will be used because the amplitude is symmetric
                                      
    freqRes = float(fs/N)                       #frequency resolution
    hFrame = int(N/2)                           #half of DFT frame size
    xInput = freqRes * np.arange(hFrame)    
    
    plt.subplot(311)
    plt.plot(xInput, mX[:hFrame], 'b')
    plt.axis([0, fs/2, 0, max(mX)+5])
    plt.title('magnitude spectra and phase')
    plt.ylabel('amplitude')
    
    plt.subplot(312)
    plt.plot(xInput, mXdb[:hFrame], 'b')
    plt.axis([0, fs/2, min(mXdb[:hFrame])-10, max(mXdb[:hFrame])+15])
    plt.ylabel('amplitude (dB)')
    
    plt.subplot(313)
    plt.plot(xInput, pX[:hFrame], 'b')
    plt.axis([0, fs/2, min(pX[:hFrame])-10, max(pX[:hFrame])+10])
    plt.xlabel('frequency (Hz)')
    plt.ylabel('phase (radians)')

    plt.show()
    
    return

import os

import numpy as np

import matplotlib.pyplot as plt

from scipy.io.wavfile import read

def DFT(filename):

#Reading a wav file and converting it to a normalized floating point array

#fs: sampling rate of the wav file, x: integer point array

if (os.path.isfile(filename) == False):

raise ValueError("Input file is not valid")

fs, x = read(filename)

#We only accept mono and 44.1khz audio files for this implementation

if (len(x.shape) != 1):

raise ValueError("Audio file is not mono")

if (fs != 44100):

raise ValueError("Sampling rate of input sound is not 44100")

#Converting the wav file into floating point 32 in the range of -1 to 1

#Computational complexity of DFT is O(N^2), N should not be high

INT16_FAC = (2**15)-1

norm_fact = {'int16':INT16_FAC,'float32':1.0}

x = np.float32(x)/norm_fact[x.dtype.name]

time = 0.5 #starting time frame of the signal

N = 2048 #frame size for DFT, only even numbers

x = x[int(time*fs):int(time*fs)+N] #input sequence for DFT without windowing

Xf = np.array([]) #frequency domain array

for k in range(N):

s = np.exp(-1j * 2 * np.pi * k * np.arange(N)/ N)

Xf = np.append(Xf, sum(x*s))

mX = abs(Xf) #amplitude

mXdb = 20*np.log(mX) #amplitude in dB

pX = np.unwrap(np.angle(Xf)) #phase

#Frequency resolution would be applied to x axis for amplitude and phase

#First halves will be used because the amplitude is symmetric

freqRes = float(fs/N) #frequency resolution

hFrame = int(N/2) #half of DFT frame size

xInput = freqRes * np.arange(hFrame)

plt.subplot(311)

plt.plot(xInput, mX[:hFrame], 'b')

plt.axis([0, fs/2, 0, max(mX)+5])

plt.title('magnitude spectra and phase')

plt.ylabel('amplitude')

plt.subplot(312)

plt.plot(xInput, mXdb[:hFrame], 'b')

plt.axis([0, fs/2, min(mXdb[:hFrame])-10, max(mXdb[:hFrame])+15])

plt.ylabel('amplitude (dB)')

plt.subplot(313)

plt.plot(xInput, pX[:hFrame], 'b')

plt.axis([0, fs/2, min(pX[:hFrame])-10, max(pX[:hFrame])+10])

plt.xlabel('frequency (Hz)')

plt.ylabel('phase (radians)')

plt.show()

return

I used a short frame of soprano singing at E4 to test it. It is clearly to see peaks in both spectrum since it was a single note singing. The first peak is fundamental frequency. It is approximately 336 Hz. However, she was singing in vibrato style, plus it depends on choosing starting time and N values. Therefore, it is a good idea to check its harmonics and measure their average values. For example, first harmonic/2, second harmonic/3, and third harmonic/4. Then, measuring the final average value for all of them can approximately give the correct fundamental frequency she was singing.

Second harmonic is 650 Hz, third harmonic is 987 Hz, and fourth harmonic is 1323 Hz. Average values are approximately 325, 329, and 331 Hz in order. Final average value for 4 frequencies is 330.25 Hz. E4 frequency is exactly 329.63 Hz. As a result, it is a pretty close approximation. The second figure is the zoomed version of magnitude spectras.

References:

[1] Mathematics of the Discrete Fourier Transform, Julius O. Smith III

[2] The Scientist and Engineer’s Guide to
Digital Signal Processing, Steven W. Smith

[3] The Frequency Domain, www.alwayslearn.com