Vincent Y. | BASIS Independent Schools

Vincent Y. 2025 | BASIS Independent Fremont

Project Title: Evaluating Large Language Models through AI Red Teaming
BASIS Independent Advisor: Dr Sharma
Internship Location: Santa Clara University (Virtual)
Onsite Mentor: Haibing Lu

From ChatGPT to Google Gemini, over the past few years, the adoption of AI in the form of large language models (LLMs) Has become ubiquitous, and is continuing to spread, bringing potential for innovation alongside significant risks. Limitations and vulnerabilities of AI concerning privacy, hallucinations, toxicity, bias, etc. can potentially harm users, especially in critical sectors like healthcare. In the next few years, AI governance and regulation such as the EU AI Act in 2024 will become increasingly relevant. For such regulations to work properly, reliable third-party testing of LLMs is imperative. This project seeks to address this need; I will be conducting a literature review to familiarize myself with conventions in the field and using prompt engineering and AI-red teaming techniques to systematically evaluate LLMs against established metrics and identify their vulnerabilities, with the goal of writing a research paper summarizing my results, ultimately contributing to the responsible development and usage of AI applications.

My Posts

Week 10 - Gilchrest Savoy

May 10, 2025

Greetings and Salutations. It is I, Gilchrest Savoy. That’s a lie. My first name isn’t Gilchrest and my last name isn’t Savoy. The project nears its end. Week ten was spent simply consolidating everything I had done in the previous nine and producing from it a paper that details how red-teaming plays a role in […]

Week 9 - Testing

May 3, 2025

Hi. Progress this week was a bit slow. I once again read more papers to prepare for writing my research paper. I talked about that last week and there wasn’t much new so I won’t bore you with the details. More importantly, I finally was able to do some practical testing on AI models myself […]

Week 8 - Preparation

April 25, 2025

Hello. Hello. Hello. Hello. 喂？ This week, I focused on reading a few academic papers that explored different aspects of large language model (LLM) safety, security, privacy, and deployment at the edge in preparation for the creation of my final product. Each paper contributed to my understanding of both the technical vulnerabilities of LLMs and […]

Week 7 - Crimes

April 19, 2025

Hello. పునఃస్వాగతం. This week I dedicated myself to learning about a variety of jailbreaking techniques, mainly from The Jailbreak Cookbook from General Analysis. Hopefully I will have an opportunity to employ some before the project is over. Jailbreaking can be categorized using three dichotomies: white-box vs. black box, semantic vs. nonsensical, and systematic vs. manual. […]

Page 1 of 3