Week 8: Predicting Sam and Khali in Vocal Music
April 25, 2025
After weeks of working with isolated tabla recordings, this week marked an important shift: applying what the system has learned to actual music. With the vocal dataset and annotations from Week 7 in place, I focused on training models that could recognize sam and khali points directly from vocal audio—a step that brings the project closer to real-world musical interaction.
Classifying Vocal Beats
The first part of the week was spent preparing the vocal annotations for modeling. Each labeled sam and khali was paired with a segment of audio and its corresponding extracted features, including MFCCs, spectral descriptors, pitch contour, and energy.
Using this data, I trained two baseline models—Random Forest and SVM—to classify beat type (sam, khali, or other) from these features. Even without percussion, the models were able to pick up on subtle shifts in phrasing and energy, especially around sam. Accuracy was promising, especially for the tree-based models.
Introducing Sequence Modeling
Static classification only goes so far when you’re working with music. To capture the flow of a taal, I began building a simple LSTM model that could process sequences of vocal features. The goal was to train the system to understand the broader rhythmic structure—not just isolated beats.
I structured the data into time-aligned sequences based on taal cycles and trained the LSTM to predict the beat type at each step. Early results showed that the model could pick up on repeating patterns and even anticipate sam returns based on melodic buildup.
Visualization and Evaluation
To understand how well the models were performing, I built new visualizations that aligned model predictions with the waveform of the original vocal track. Seeing the predicted sam and khali points plotted on the audio made it easier to assess how closely the system was tracking the actual rhythm of the composition.
This approach also highlighted edge cases—places where phrasing blurred the rhythmic boundaries—and helped identify areas where the model could be improved with additional features or better temporal resolution.
Looking Ahead
With Week 9 approaching, the focus now shifts to integration. The final phase of the project will involve:
-
Polishing the sequence model
-
Completing the end-to-end demo: input vocal → predict beat types → display aligned visualization
-
Preparing documentation and final presentation materials
This week was all about letting the system listen like a musician. Next week, it needs to perform like one.

Leave a Reply
You must be logged in to post a comment.