Audio Signals in Python

Up to now I’ve mostly analysed meta data about music, and when I have looked at the track content I’ve focused on the lyrics. Now I want to look at analysing the sound itself. In this post I will demonstrate how to extract some useful information from an audio file using Python.

Starting with a basic question; how do I convert music to data? For analogue sound this is impractical, however, digital music is effectively data. Sound is just pressure waves, and these waves can be represented by numbers over a time period. Music stored as .WAV, are the audio waves stored as numbers, and MP3 files are a compressed version of the .WAV

I began with a sample of the track Inspiration Information by Shuggie Otis provided by Spotify. I download this MP3 file, uncompress it to a WAV, then read in the WAV file as a data arrray

#required libraries
import urllib
import pydub

#a temp folder for downloads

#spotify mp3 sample file

#download file
#read mp3 file
mp3 = pydub.AudioSegment.from_mp3(temp_folder+"file.mp3")
#convert to wav
mp3.export(temp_folder+"file.wav", format="wav")
#read wav file


The output from the are the sampling rate on the track, and the audio wave data. The sampling rate represents the number of data points sampled per second in the audio file. In this case 44100 pieces of information per second make up the audio wave. This is a very common rate. The higher the rate, the better quality the audio.

If the number of data points in the audio wave is divided by the rate we can get the length of the track in seconds. In this case 30s

#wav length
audData.shape[0] / rate

Looking at the shape of the audio data it has two arrays of equal length. It is a stereo recording so there is separate data for the left and right channels

#wav number of channels mono/stereo
#if stereo grab both channels
channel1=audData[:,0] #left
channel2=audData[:,1] #right

The data is stored as int16. This is the size of the data stored in each datapoint. Common storage formats are 8, 16, 32. Again the higher this is the better the audio quality


In the same way you can read a file, you can also save the data back to a WAV file. This means it is possible to manipulate the sound data then save it.

#save wav file"file2.wav", rate, audData)
#save a file at half and double speed"file2.wav", rate/2, audData)"file2.wav", rate*2, audData)
#save a single channel"file2.wav", rate, channel1)

I also tried creating a mono version, by averaging the data in the left and right channel. This works to a point but does seem to damage the audio.

import numpy as np
#averaging the channels damages the music
mono=np.sum(audData.astype(float), axis=1)/2"file2.wav", rate, mono)

The values in the data represent the amplitude of the wave (or the loudness of the audio). The energy of the audio can be described by the sum of the absolute amplitude.

#Energy of music

This will depend on the length of the audio, the sample rate and the volume of the audio. A better metric is the Power which is energy per second

#power - energy per unit of time

Next i wanted to plot my track. I plotted the amplitude over time for each channel

import matplotlib.pyplot as plt

#create a time variable in seconds
time = np.arange(0, float(audData.shape[0]), 1) / rate

#plot amplitude (or loudness) over time
plt.plot(time, channel1, linewidth=0.01, alpha=0.7, color='#ff7f00')
plt.xlabel('Time (s)')
plt.plot(time, channel2, linewidth=0.01, alpha=0.7, color='#ff7f00')
plt.xlabel('Time (s)')

The next thing to look at is the frequency of the audio. In order to do this you need todecompose the single audio wave into audio waves at different frequencies. This can be done using a Fourier transform. However, the last time I thought about Fourier transforms was at university, so I thought I better brush up. I went through the first few weeks of this free signal processing course on coursera, and it was a great help.

The Fourier transform effectively iterates through a frequency for as many frequencies as there are records (N) in the dataset, and determines the Amplitude of that frequency. The frequency for record (fk) can be calculated using the sampling rate (fs)

The following code performs the Fourier transformation on the left channel sound and plots it. The maths produces a symetrical result, with one real data solution, and an imaginary data solution

from numpy import fft as fft


plt.plot(fourier, color='#ff7f00')

We only need the real data solution, so we can grab the first half, then calculate the frequency and plot the frequency against a scaled amplitude.

n = len(channel1)
fourier = fourier[0:(n/2)]

# scale by the number of points so that the magnitude does not depend on the length
fourier = fourier / float(n)

#calculate the frequency at each point in Hz
freqArray = np.arange(0, (n/2), 1.0) * (rate*1.0/n);

plt.plot(freqArray/1000, 10*np.log10(fourier), color='#ff7f00', linewidth=0.02)
plt.xlabel('Frequency (kHz)')
plt.ylabel('Power (dB)')

Another common way to analyse audio is to create a spectogram. Audio spectograms are heat maps that show the frequencies of the sound in Hertz (Hz), the volume of the sound in Decibels (dB), against time.

In order to calculate a Fourier transform over time the specgram function used below uses a time window based Fast Fourier transform. This simplifies the calculation involved, and makes it possible to do in seconds. It calculates many Fourier transforms over blocks of data ‘NFFT’ long. Each Fourier transform over a block, results in the frequencies represented in that block, and to what magnitude. So the resultant array is NFFT times smaller than the original data. The range of frequencies explored relates to half the sample rate. The number of samples in the block (NFFT) determines how many frequencies in that range are considered. So a bigger block results in a greater frequency range, but reduces the information with respect to time.

plt.figure(2, figsize=(8,6))
Pxx, freqs, bins, im = plt.specgram(channel1, Fs=rate, NFFT=1024, cmap=plt.get_cmap('autumn_r'))
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')
cbar.set_label('Intensity dB')
Pxx, freqs, bins, im = plt.specgram(channel2, Fs=rate, NFFT=1024, cmap=plt.get_cmap('autumn_r'))
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')
cbar.set_label('Intensity (dB)')

The result allows us to pick out a certain frequency and examine it

plt.plot(bins, MHZ10, color='#ff7f00')

So thats the basics of audio processing. I’m now looking forward to analysing my favourite music. I’m sure there will be posts on that to come.

As always all of the above code can be found together in the following gist

10 thoughts on “Audio Signals in Python

  1. I received the following error (even using hardcoded path in from_mp3() method:

    Do you know what I’m doing wrong?

    OSError Traceback (most recent call last)
    in ()
    2 #urllib.urlretrieve(web_file,temp_folder+”file.mp3″)
    3 #read mp3 file
    —-> 4 mp3 = pydub.AudioSegment.from_mp3(“/Users/myname/Downloads/audio_analysis/file.mp3″)
    5 #convert to wav
    6 mp3.export(temp_folder+”file.wav”, format=”wav”)

    /usr/local/lib/python2.7/site-packages/pydub/audio_segment.pyc in from_mp3(cls, file)
    512 @classmethod
    513 def from_mp3(cls, file):
    –> 514 return cls.from_file(file, ‘mp3’)
    516 @classmethod

    /usr/local/lib/python2.7/site-packages/pydub/audio_segment.pyc in from_file(cls, file, format, **kwargs)
    495 log_conversion(conversion_command)
    –> 497 p = subprocess.Popen(conversion_command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    498 p_out, p_err = p.communicate()

    /usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.pyc in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags)
    388 p2cread, p2cwrite,
    389 c2pread, c2pwrite,
    –> 390 errread, errwrite)
    391 except Exception:
    392 # Preserve original exception in case os.close raises.

    /usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.pyc in _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, universal_newlines, startupinfo, creationflags, shell, to_close, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite)
    1022 raise
    1023 child_exception = pickle.loads(data)
    -> 1024 raise child_exception

    OSError: [Errno 2] No such file or directory

    1. The error you are getting:
      OSError: [Errno 2] No such file or directory
      Suggests its not finding the file. Is there definitely a file called file.mp3 in the directory?
      If not, it sounds like the line before it isn’t working properly

  2. Can you elaborate a bit on the last 3 lines:

    plt.plot(bins, MHZ10, color=’#ff7f00′)

    What exactly is this doing?

  3. Hi, I have a few questions:
    1. Can you cite the equation you used and how it relates to the Fourier Transform?
    2. What does the value for ‘k’ mean in the equation/second graph you used? And what unit is it at as well?
    3. How does amplitude relate to the equation you used, and how did you solve for amplitude without having it within the equation?
    Thank you!

    1. k refers to the period or time in the audio. so fk is the frequency at a given time. to answer your other questions would take a lot of explaining instead i really recommend you go through the online signal processing course i suggested in the blog as they explain clear than i could. its also totally free

  4. Sugesstion:
    By averaging we get a damaged wav file as you’ve suggested but it can be fixed by another completely unrelated to averaging method

    Use pydub for extracting mono to prevent damage to the audio

    mono = pydub.AudioSegment.from_wav(‘Music/file.wav’)
    mono.export(‘Music/pydubfile.wav’, format=”wav”)

  5. This is very helpful for a beginner to get into audio processing in Python. There are some other libraries like librosa which would do the jobs, but it is good to understand what is going on behind the scene and it is very well explained here.

Leave a Reply

Your email address will not be published. Required fields are marked *