Week 4: Solidifying MediaPipe Framework Concepts

March 24, 2024

Welcome back to my weekly blog where I track my journey towards creating a virtual personal trainer app! This week, I took time to gain a deeper understanding of the MediaPipe Framework, laying a robust foundation for normalizing pose landmarks from video frames. Let’s break down the core concepts of MediaPipe with some functions and code examples.

Packets
Packets are the fundamental units of data within MediaPipe, carrying a pointer to an immutable payload tagged with a timestamp. Here’s a simple example of creating a packet with a string payload:

import mediapipe as mp
packet = mp.packet_creator.create_string('Hello, MediaPipe!')

Graphs
Graphs define the pathways data takes through the app, specifying how packets are processed and passed between nodes. Here’s a basic graph configuration in protobuf format:

node {
calculator: "PassThroughCalculator"
input_stream: "input_video"
output_stream: "output_video"
}

This configuration sets up a simple graph where video packets pass through unchanged from input to output.

Calculators
Calculators are the nodes of a graph where data is processed and transformed. Each node can have multiple input and output streams.

Streams and Side Packets
Streams carry sequences of packets between nodes, while side packets are used for data that remains constant. Below is an example of how you can add a packet to a stream in Python:

graph = mp.CalculatorGraph(graph_config=config_text)
graph.start_run()

graph.add_packet_to_input_stream(
'input_video',
mp.packet_creator.create_image_frame(
image_format=mp.ImageFormat.SRGB,
data=rgb_img
).at(mp.Timestamp(1))
)

Preparing for Program Development

Our next step is developing a detailed plan for grading the poses. This process should enable us to accurately compare various user poses against ideal forms and give feedback based on the performance. Below is an outline for the program:

Initialize MediaPipe Pose Detection
- Load the MediaPipe Pose detection model
- Configure the pose detection pipeline with necessary settings (ex static image mode, model complexity)
Capture Video Frame
- For each frame in the video input, convert the frame into the format required by MediaPipe (ex RGB)
Detect Poses in Frame
- Pass the converted frame to the MediaPipe Pose model
- Retrieve pose landmarks from the output
Define Reference Model
- Load or define a reference skeletal model that represents the ideal pose for the exercise
  - The reference model should include normalized positions of key landmarks (ex shoulders, knees, wrists, hips).
Normalize Captured Skeletal Data to
- For each detected pose in the frame:
  - Calculate the scale factor by comparing the distance between specific landmarks (ex shoulder width, height) in the captured pose to the reference model, then scale the model
  - Align the base landmark of the captured pose with that of the reference model to adjust the position, translating and rotating as necessary
Calculate Deviation
- Compare the normalized captured pose to the reference model.
- Calculate the deviation for each landmark by measuring the distance between the corresponding landmarks in the normalized pose and the reference model.
Generate Feedback
- Based on the calculated deviations, generate specific feedback for the user, placing weights on deviations dependent on the location’s impact on safety and hypertrophy
Repeat for Each Frame
- Repeat steps 2-7 for each frame in the video input to continuously analyze and provide feedback on the user’s pose throughout their exercise
Conclude Analysis
- Once all frames have been processed, compile and present the overall feedback to the user, highlighting areas of improvement and consistency

Next Up:

With a clearer understanding of MediaPipe and preliminary plans for our algorithm, we are now ready to start coding! Thank you for being a part of this journey!

View more of Ajay A.'s posts.

Week 4: Solidifying MediaPipe Framework Concepts

Reader Interactions

Leave a Reply Cancel reply