Week 3: Data Processing and Pose Normalization with MediaPipe
March 18, 2024
Welcome back to my senior project blog! This week, I’ll be sharing in-depth my plan for normalizing poses within our virtual trainer app using MediaPipe, a crucial step enabling comparison between skeletal models of different individuals. This is necessary to provide accurate fitness advice to individuals of any body type. Let’s dive into how we’re harnessing MediaPipe to achieve this.
Role of Normalization
At the core of our app’s functionality is the ability to provide feedback on the user’s form, which will be made possible by comparing a user’s exercise form against a reference model. This comparison will allow us to identify areas of improvement through discrepancies in form. However, for the app to offer tailored advice that aligns with each user’s unique physiological makeup, we must find a way to “normalize” the input pose, translating and scaling each segment by a certain weight to enable comparison with a selected reference. To begin normalizing the data, it must be processed first. I will first be developing a working program on my macOS in Python before diving into app development. Our input data will be packets consisting of an image and a string. Below is a walkthrough of a basic usage of MediaPipe for processing data. This Python program is written to process a single packet and must be extended to continuous video input.
Processing Data
Begin by importing MediaPipe:
import mediapipe as mp
Next, define a calculator graph configuration:
config_text = """
input_stream: 'in_stream'
output_stream: 'out_stream'
node {
calculator: 'PassThroughCalculator'
input_stream: 'in_stream'
output_stream: 'out_stream'
}
"""
Next, initialize a CalculatorGraph and set up an listener on the graph’s output stream:
graph = mp.CalculatorGraph(graph_config=config_text)
output_packets = []
graph.observe_output_stream(
'out_stream',
lambda stream_name, packet:
output_packets.append(mp.packet_getter.get_str(packet)))
_name, packet:
output_packets.append(mp.packet_getter.get_str(packet)))
Next, add the packets to the input stream:
graph.start_run()
graph.add_packet_to_input_stream(
'in_stream', mp.packet_creator.create_string('abc').at(0))
rgb_img = cv2.cvtColor(cv2.imread('/path/to/your/image.png'), cv2.COLOR_BGR2RGB)
graph.add_packet_to_input_stream(
'in_stream',
mp.packet_creator.create_image_frame(image_format=mp.ImageFormat.SRGB,
data=rgb_img).at(1))
Lastly, close your graph:
graph.close()
Looking Ahead
To extend this to our pose-detection program, we must enable continuous video processing to take the poses within the duration of the set as input. After getting a skeletal model of each frame, the pose must be translated to match our reference pose which should showcase exemplary form. To accurately compare these two, weights will be assigned to various parts of the body to scale our input to our reference and enable comparison. This will be our normalization process.
Stay tuned for next week’s update, where I’ll share my progress on processing data, normalization, and selecting references. Thank you for joining me on this exciting journey towards creating a transformative fitness tool!
Reference:
https://developers.google.com/mediapipe/framework/getting_started/python_framework
Leave a Reply
You must be logged in to post a comment.