Audio Signals in Python

Up to now I’ve mostly analysed meta data about music, and when I have looked at the track content I’ve focused on the lyrics. Now I want to look at analysing the sound itself. In this post I will demonstrate how to extract some useful information from an audio file using Python.

Starting with a basic question; how do I convert music to data? For analogue sound this is impractical, however, digital music is effectively data. Sound is just pressure waves, and these waves can be represented by numbers over a time period. Music stored as .WAV, are the audio waves stored as numbers, and MP3 files are a compressed version of the .WAV

I began with a sample of the track Inspiration Information by Shuggie Otis provided by Spotify. I download this MP3 file, uncompress it to a WAV, then read in the WAV file as a data arrray


#required libraries
import urllib
import scipy.io.wavfile
import pydub

#a temp folder for downloads
temp_folder="/Users/home/Desktop/"

#spotify mp3 sample file
web_file="http://p.scdn.co/mp3-preview/35b4ce45af06203992a86fa729d17b1c1f93cac5"

#download file
urllib.urlretrieve(web_file,temp_folder+"file.mp3")
#read mp3 file
mp3 = pydub.AudioSegment.from_mp3(temp_folder+"file.mp3")
#convert to wav
mp3.export(temp_folder+"file.wav", format="wav")
#read wav file
rate,audData=scipy.io.wavfile.read(temp_folder+"file.wav")

print(rate)
print(audData)

The output from the wavefile.read are the sampling rate on the track, and the audio wave data. The sampling rate represents the number of data points sampled per second in the audio file. In this case 44100 pieces of information per second make up the audio wave. This is a very common rate. The higher the rate, the better quality the audio.

If the number of data points in the audio wave is divided by the rate we can get the length of the track in seconds. In this case 30s

#wav length
audData.shape[0] / rate

Looking at the shape of the audio data it has two arrays of equal length. It is a stereo recording so there is separate data for the left and right channels


#wav number of channels mono/stereo
audData.shape[1]
#if stereo grab both channels
channel1=audData[:,0] #left
channel2=audData[:,1] #right

The data is stored as int16. This is the size of the data stored in each datapoint. Common storage formats are 8, 16, 32. Again the higher this is the better the audio quality


audData.dtype

In the same way you can read a file, you can also save the data back to a WAV file. This means it is possible to manipulate the sound data then save it.


#save wav file
scipy.io.wavfile.write(temp_folder+"file2.wav", rate, audData)
#save a file at half and double speed
scipy.io.wavfile.write(temp_folder+"file2.wav", rate/2, audData)
scipy.io.wavfile.write(temp_folder+"file2.wav", rate*2, audData)
#save a single channel
scipy.io.wavfile.write(temp_folder+"file2.wav", rate, channel1)

I also tried creating a mono version, by averaging the data in the left and right channel. This works to a point but does seem to damage the audio.


import numpy as np
#averaging the channels damages the music
mono=np.sum(audData.astype(float), axis=1)/2
scipy.io.wavfile.write(temp_folder+"file2.wav", rate, mono)

The values in the data represent the amplitude of the wave (or the loudness of the audio). The energy of the audio can be described by the sum of the absolute amplitude.


#Energy of music
np.sum(channel1.astype(float)**2)

This will depend on the length of the audio, the sample rate and the volume of the audio. A better metric is the Power which is energy per second


#power - energy per unit of time
1.0/(2*(channel1.size)+1)*np.sum(channel1.astype(float)**2)/rate

Next i wanted to plot my track. I plotted the amplitude over time for each channel


import matplotlib.pyplot as plt

#create a time variable in seconds
time = np.arange(0, float(audData.shape[0]), 1) / rate

#plot amplitude (or loudness) over time
plt.figure(1)
plt.subplot(211)
plt.plot(time, channel1, linewidth=0.01, alpha=0.7, color='#ff7f00')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.subplot(212)
plt.plot(time, channel2, linewidth=0.01, alpha=0.7, color='#ff7f00')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.show()

The next thing to look at is the frequency of the audio. In order to do this you need todecompose the single audio wave into audio waves at different frequencies. This can be done using a Fourier transform. However, the last time I thought about Fourier transforms was at university, so I thought I better brush up. I went through the first few weeks of this free signal processing course on coursera, and it was a great help.

The Fourier transform effectively iterates through a frequency for as many frequencies as there are records (N) in the dataset, and determines the Amplitude of that frequency. The frequency for record (fk) can be calculated using the sampling rate (fs)

The following code performs the Fourier transformation on the left channel sound and plots it. The maths produces a symetrical result, with one real data solution, and an imaginary data solution


from numpy import fft as fft

fourier=fft.fft(channel1)

plt.plot(fourier, color='#ff7f00')
plt.xlabel('k')
plt.ylabel('Amplitude')

We only need the real data solution, so we can grab the first half, then calculate the frequency and plot the frequency against a scaled amplitude.


n = len(channel1)
fourier = fourier[0:(n/2)]

# scale by the number of points so that the magnitude does not depend on the length
fourier = fourier / float(n)

#calculate the frequency at each point in Hz
freqArray = np.arange(0, (n/2), 1.0) * (rate*1.0/n);

plt.plot(freqArray/1000, 10*np.log10(fourier), color='#ff7f00', linewidth=0.02)
plt.xlabel('Frequency (kHz)')
plt.ylabel('Power (dB)')

Another common way to analyse audio is to create a spectogram. Audio spectograms are heat maps that show the frequencies of the sound in Hertz (Hz), the volume of the sound in Decibels (dB), against time.

In order to calculate a Fourier transform over time the specgram function used below uses a time window based Fast Fourier transform. This simplifies the calculation involved, and makes it possible to do in seconds. It calculates many Fourier transforms over blocks of data ‘NFFT’ long. Each Fourier transform over a block, results in the frequencies represented in that block, and to what magnitude. So the resultant array is NFFT times smaller than the original data. The range of frequencies explored relates to half the sample rate. The number of samples in the block (NFFT) determines how many frequencies in that range are considered. So a bigger block results in a greater frequency range, but reduces the information with respect to time.


plt.figure(2, figsize=(8,6))
plt.subplot(211)
Pxx, freqs, bins, im = plt.specgram(channel1, Fs=rate, NFFT=1024, cmap=plt.get_cmap('autumn_r'))
cbar=plt.colorbar(im)
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')
cbar.set_label('Intensity dB')
plt.subplot(212)
Pxx, freqs, bins, im = plt.specgram(channel2, Fs=rate, NFFT=1024, cmap=plt.get_cmap('autumn_r'))
cbar=plt.colorbar(im)
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')
cbar.set_label('Intensity (dB)')
#plt.show()

The result allows us to pick out a certain frequency and examine it


np.where(freqs==10034.47265625)
MHZ10=Pxx[233,:]
plt.plot(bins, MHZ10, color='#ff7f00')

So thats the basics of audio processing. I’m now looking forward to analysing my favourite music. I’m sure there will be posts on that to come.

As always all of the above code can be found together in the following gist

3 thoughts on “Audio Signals in Python

  1. I received the following error (even using hardcoded path in from_mp3() method:

    Do you know what I’m doing wrong?

    —————————————————————————
    OSError Traceback (most recent call last)
    in ()
    2 #urllib.urlretrieve(web_file,temp_folder+”file.mp3″)
    3 #read mp3 file
    —-> 4 mp3 = pydub.AudioSegment.from_mp3(“/Users/myname/Downloads/audio_analysis/file.mp3″)
    5 #convert to wav
    6 mp3.export(temp_folder+”file.wav”, format=”wav”)

    /usr/local/lib/python2.7/site-packages/pydub/audio_segment.pyc in from_mp3(cls, file)
    512 @classmethod
    513 def from_mp3(cls, file):
    –> 514 return cls.from_file(file, ‘mp3’)
    515
    516 @classmethod

    /usr/local/lib/python2.7/site-packages/pydub/audio_segment.pyc in from_file(cls, file, format, **kwargs)
    495 log_conversion(conversion_command)
    496
    –> 497 p = subprocess.Popen(conversion_command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    498 p_out, p_err = p.communicate()
    499

    /usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.pyc in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags)
    388 p2cread, p2cwrite,
    389 c2pread, c2pwrite,
    –> 390 errread, errwrite)
    391 except Exception:
    392 # Preserve original exception in case os.close raises.

    /usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.pyc in _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, universal_newlines, startupinfo, creationflags, shell, to_close, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite)
    1022 raise
    1023 child_exception = pickle.loads(data)
    -> 1024 raise child_exception
    1025
    1026

    OSError: [Errno 2] No such file or directory

    1. The error you are getting:
      OSError: [Errno 2] No such file or directory
      Suggests its not finding the file. Is there definitely a file called file.mp3 in the directory?
      If not, it sounds like the line before it isn’t working properly
      urllib.urlretrieve(web_file,temp_folder+”file.mp3″)

Leave a Reply

Your email address will not be published. Required fields are marked *