Week 2: Structuring the Pipeline
March 6, 2026
Hi everyone, welcome back! This week involved a mix of reading and reconsidering how the technical workflow should actually be structured.
One of the main sources I worked through was Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation by Jez Humble and David Farley. Even though this book is about software engineering, many of its concepts apply surprisingly well to machine learning systems. One of the concepts that was presented in the book is the value of structured pipelines. Basically, they are systems where processes move through clearly defined stages that can be repeated consistently. It is important to keep in mind that for this project, instead of running tasks manually each time, the goal is to build automated processes that make development more consistent and easier to manage and maintain.
This book also made me reconsider how I originally organized my project timeline. Initially, I had separated the work into two distinct stages: I would prepare the data first, and then I would construct the modeling infrastructure. However, as I thought more about how real machine-learning systems and MLOps pipelines operate, I realized that these steps are much more tightly connected than I initially thought. In practice, data preparation, preprocessing, model training, and evaluation all exist within the same pipeline, where each stage feeds directly into the next. Due to this, decisions made during data preparation don’t just affect the model itself, but also how the entire system processes, tracks, and reproduces experiments.
Therefore, I updated my project syllabus slightly. Rather than treating data preparation and model development as completely separate stages, the early part of the project now focuses more on understanding and designing the machine-learning pipeline that will connect these stages. This includes planning how data preprocessing, model training, and evaluation will interact within the overall system before fully implementing each component. This shift better reflects how real MLOps systems integrate these stages into a single pipeline. The detailed cleaning of the transcripts and survey data will now happen alongside building the model infrastructure that will ultimately use that data.
On a technical level, I am still focusing heavily on the data itself, but I also started thinking more about the modeling process and how it will be integrated into the overall architecture. Specifically, I started designing preprocessing pipelines that will convert the raw deliberation data into structured inputs that the models can use later. Although these pipelines are not developed as of right now, designing them early ensures that the data flows smoothly into the rest of the system.
This week was more about planning and architecture rather than coding, which helped clarify how the various elements of the project will be incorporated with each other. For week 3, I will proceed to clean the data as I develop the machine-learning pipeline infrastructure. See you next week!

Leave a Reply
You must be logged in to post a comment.