Week 3: Extracting Pharmacokinetic Data
March 17, 2026
Hi everyone! Welcome back to my third blog post. This week, I mainly focused on extracting the pharmacokinetic data necessary for my project. Allow me to break down for you guys what exactly that means and also what I’ve discovered along the way.
The primary tool I used for this, as mentioned before, is WebPlotDigitizer, a free web-based program that helps me extract data from a graph. You upload the image, calibrate the axes by clicking on known coordinate points, and then select specific datapoints which the program translates into real data. For graphs that showed only mean ± SD data, I ran three separate extractions per curve, one on the mean, one on the upper error bar, and one of the lower error bar, before calculating the standard deviations at each time point.
The only limitation I’ve noticed during this process, however, is the resulting data may not be incredibly accurate as I am manually selecting where I believe the center of the data point is, causing an inherent possibility for human error as a misclick of a few pixels is enough to change the extracted value. This is unavoidable since I’m working with image figures rather than the raw numerical data. Overall though, I’m not terribly concerned about this effecting my model, since the data representing 189 datapoints should average out across the entire dataset, rather than actually changing my results in a clear direction; as such the large sample size is incredibly necessary for this project.
One methodological change I’ve had to make was how my dataset is structured and the information it’d contain. Most of the papers I’m taking data from have only group mean data rather than individualized results, while some others do provide time data on each individual participant. I’ve decided to keep these two kinds of data separate, with the mean data being used to create my training set for fitting the pharmacokinetic model, and the individualized data will be kept out as a validation set. Meaning, model will never see the individualized data during training and I’ll use it afterwards to test how accurate the resulting fitted curve predicts individual-level observations. One thing I must address though is that this means I do not actually have data for 189 individual participants, but am instead working with groups means. The more accurate way to frame this is that my model will be the result of data representing 189 participants across 9 independent studies with doses ranging from 3mg to 30mg.
The data for concentrations is now extracted and organized. Next week I will begin the exact same process but for the pharmacodynamic data, meaning I will be extracting the data on the “any drug effect” graphs with VAS scores to form the other half of my master dataset. See you all then!
Reader Interactions
Comments
Leave a Reply
You must be logged in to post a comment.

Hi Eileen! Great post this week!
I was wondering: how did you settle on exactly 189 participants, and could there be any value (like a higher accuracy rate) if you decided to increase this number?
Hello Eileen! I really enjoy the project!
However, I do have a question about the dosages. What are the differences in effect produced from 3mg to 30mg? For example 3mg and 30mg of water make practically no difference for a human. I was also wondering if that difference was very linear or if it followed some other metric. If so, why did you specifically choose these bounds?