1. Read the Audio Data:
  • Use a library like ‘libsndfile’ to read the audio data from files (e.g., WAV format or MP3 format for simplicity). This will give you access to the raw PCM samples.
  • Goal is to essentially create PCM samples which are digital representations of the audio signals. 


  1. Preprocessing:

Input : PCM samples 

  • Normalization 
    • Essentially adjusting the volume of the audio 
    • The typical method is to find the loudest point in the audio and then adjust the whole audio signal so that this point reaches a standard level (often the maximum possible without distortion). 
  • Hamming or Hanning 
    • Dividing data into smaller “windows” or “segments” so that we can avoid distortions due to abrupt start and stop. 
    • We want the data to fade in and fade out in the beginning and end and hamming gives us the mathematical function to apply this fade. 
  • Pre-processing is needed because Normalization ensures that the volume levels are consistent, and windowing ensures that the chunk is smoothly tapered, making the FFT analysis more accurate and reliable. 


  1. Apply the FFT:
  • FFT takes in an audio signal and represents it in terms of the frequencies that make it. 
  • The output of the FFT is typically a series of complex numbers. The magnitude of each number corresponds to the strength (or amplitude) of a specific frequency in that window of the audio. The position of that number in the sequence indicates which frequency it corresponds to.
    • So having the third number in the FFT output have a high magnitude, would mean that the third frequency in our analysis range has a strong presence in that window. 

Next few steps are focused on getting the visual representation  

  1. Calculate Magnitude:
  • For each FFT output, calculate the magnitude for each frequency bin (this would basically just be the magnitude of the output complex number) 
  • This magnitude represents the intensity of that particular frequency for the corresponding time chunk.


  1. Visual Representation:
  • Last step is mapping the magnitudes derived. Often, the magnitude is mapped to a color scale where higher intensities correspond to warmer colors.
  • Can use a graphics library like `SDL` or `OpenGL` to render the spectrogram in real-time or to an image file.


  • No labels