Week 5: Building Real Attack Demonstrations + ML
April 16, 2026
This week at MITRE, we moved beyond the simulated proof of concept and started working with actual running systems. The Python scripts from earlier weeks were helpful to understand the concept, but they weren’t real LLM tests, as it was just fake data we created ourselves. To make the demonstration actually convincing, we needed to show a real attack on a real system.
We set up an actual language model running locally so we’d have a real LLM to test against, not a simulation. We then built a simple web-based chat interface from scratch so someone could have a real conversation with it. This required figuring out how to connect the frontend (the webpage where you type messages) to the backend API (the interface that sends prompts and gets responses back). Getting the real-time message display working took some debugging, but now you can interact with it like a normal chat application.
Essentially, what I made was a Python script that demonstrates how an attacker could read from a running process’s memory. On Linux, every program’s memory is accessible if you have the right permissions. This is standard operating system functionality, not anything extra. The script finds the target process, scans through its RAM regions, and extracts readable text (specifically, I’m looking for conversation content and user messages).
This is the actual vulnerability we were trying to show, but now it’s physically accessible from a chat with an LLM. When you type something into the chat, the scanner can pull that text directly out of the process’s memory and display it. It’s demonstrating an actual attack on actual memory from an actual running system.
The next step is comparing this against a system with proper memory protection to show the difference in what an attacker can access.
The web development experience I got this week is also helpful to my research project since I’m planning to build a similar interface where users input their business information and get AI readiness predictions back. So understanding how to structure a web application with a backend that does the processing and a frontend that displays results is pretty helpful.
On the research side, I started building the Random Forest models this week. Random Forest is a machine learning algorithm that creates multiple decision trees. Each tree makes predictions by asking a series of yes/no questions about the data (like “Does this business use cloud computing? Yes. Do they have innovation programs? No.”), and then combines all the trees’ predictions. Unlike a logistic regression, which assumes variables affect outcomes in simple linear ways, Random Forest can check for specific patterns like “cloud adoption only matters if the business also has skilled workers.”
I’m training separate Random Forest models for micro-businesses (1-9 employees) and larger small businesses (10-49 employees). The process involves testing different parameter settings, like how many trees to build, how deep each tree can grow, and how many variables each tree considers when splitting. Finding the right combination takes trial and error because too simple and the model misses patterns, too complex and it just memorizes the training data instead of learning generalizable rules.
I’m also working on preventing overfitting, which happens when the model performs great on training data but poorly on new data it hasn’t seen before. The way I’m checking for this is by evaluating the model on the test set I saved earlier (the 20% of data the model never trained on). If training accuracy is 90% but test accuracy is only 60%, that means the model memorized the training data rather than learned it.

Leave a Reply
You must be logged in to post a comment.