Week 5: Organizing the Dataset

April 9, 2026

Hey everyone! Back again. For week 5, my main focus was on organizing my raw extracted data into something the computer can process, something that can be read and curves can be made from.

My work was overall on reconciling the data points to a specific time point. My concentration data and subjective effects data were digitized from differing graphs within each paper, and said graphs did not always share the same time resolution. For example, a concentration measurement might exist at t=0.123 hours while the nearest effect measurement was at t=0.132 hours. These are close enough in time to represent the same moment biologically, but a computer would treat them as different time points and won’t pair them automatically. As such, my solution was to round all the time points to a consistent resolution, one whole number point in time ending with a 5 or 0, based off what seemed to be a reasonable pattern in the data collection. I did this manually, which was time consuming, but with the alternative being interpolating between time points mathematically, I took this route since the present data already had enough noise and this was more straightforward than adding another layer of estimation.

Once time points were reconciled, I classified each row in the master dataset with two specifications. One was the data type, either a mean of a group or from an individual. If it came from a mean I also recorded the group number so that when I fit the curves I can weight the data accordingly. The individual data is, again, reserved for my validation set. The second classification was whether a row had paired data, meaning it had both the concentration and subjective effect data for one time point. Rows with only the concentration data is still useful for pharmacokinetic modeling so they were kept.

The dataset also has a standard deviation column with every mean value. These SD values won’t be used for fitting directly, instead they will serve as an indicator for measurement uncertainty. I will re-run the curve fittings to the mean +SD and mean -SD bounds in order to understand how accurate my EC50 and Emax values based off the mean curve may be.

This week was just quite a lot of manual labor, but setting up this data set correctly is vital to my project. Next week, I will be starting work on creating and testing the pharmacokinetic model. See you all then!

View more of Eileen G.'s posts.

Week 5: Organizing the Dataset

Reader Interactions

Leave a Reply Cancel reply