- Read the Audio Data:
- Use a library like ‘libsndfile’ to read the audio data from files (e.g., WAV format or MP3 format for simplicity). This will give you access to the raw PCM samples.
- Goal is to essentially create PCM samples which are digital representations of the audio signals.
- Preprocessing:
Input : PCM samples
- Normalization
- Essentially adjusting the volume of the audio
- The typical method is to find the loudest point in the audio and then adjust the whole audio signal so that this point reaches a standard level (often the maximum possible without distortion).
- Hamming or Hanning
- Dividing data into smaller “windows” or “segments” so that we can avoid distortions due to abrupt start and stop.
- We want the data to fade in and fade out in the beginning and end and hamming gives us the mathematical function to apply this fade.
- Pre-processing is needed because Normalization ensures that the volume levels are consistent, and windowing ensures that the chunk is smoothly tapered, making the FFT analysis more accurate and reliable.
- Apply the FFT:
- FFT takes in an audio signal and represents it in terms of the frequencies that make it.
- The output of the FFT is typically a series of complex numbers. The magnitude of each number corresponds to the strength (or amplitude) of a specific frequency in that window of the audio. The position of that number in the sequence indicates which frequency it corresponds to.
- So having the third number in the FFT output have a high magnitude, would mean that the third frequency in our analysis range has a strong presence in that window.
Next few steps are focused on getting the visual representation
- Calculate Magnitude:
- For each FFT output, calculate the magnitude for each frequency bin (this would basically just be the magnitude of the output complex number)
- This magnitude represents the intensity of that particular frequency for the corresponding time chunk.
- Visual Representation:
- Last step is mapping the magnitudes derived. Often, the magnitude is mapped to a color scale where higher intensities correspond to warmer colors.
- Can use a graphics library like `SDL` or `OpenGL` to render the spectrogram in real-time or to an image file.