Week 8: More AI Work
April 29, 2023
Last week I started work on a Neural Network to translate BrainF into Piet. After talking more with my advisors I feel it’s not within the scope of the project to use this specific model. Creating a neural network from scratch is one of the most complex tasks in ML, and it is a lot to learn in this timeframe. Instead, I’m going to try two different simpler approaches. One is using an off the shelf diffusion model. These models power most of the current AI image generation tools today, like Midjourney, DALLE 2, and Stable Diffusion. These work by first taking an image full of random noise, and “de-noising” it until there’s an image that it thinks matches whatever prompt and training it was given. While this is by itself a powerful model, it can’t encompass the full complexity of a neural network. So in using this model, the output may be images that look similar to the training Piet, but it probably won’t capture the entire complexity of the language.
Here’s a demonstration of what diffusion does to generate an image. I used the simple prompt “A photograph of a cat” in DiscoDiffusion, and it went through these steps to generate the final image, starting from pure random noise.
While this definitely doesn’t look completely like a cat, the apparent textures in the image are the primary indicator that it is supposed to be one. This is because these diffusion models focus on the small details, while sometimes ignoring where actual features of the image should be (the missing eye in the cat is pretty glaring here). Where other models might focus more on the structure of the image, diffusion models would try to make the image seem more like a cat, without radically changing what’s going on.
With the diffusion model I use, I still plan to train using the randomly generated BrainF, and its associated Piet. While the model might be different, all the training data should still be just as relevant.
The other method I want to try would be utilizing OpenAI’s GPT-4. In my understanding this application is the current gold standard for natural language processing. And while image generation isn’t part of NLP, a sufficiently powerful model should, for example, be able to produce python code that can draw a simple image, or generate the prompt necessary to create something more complex in an off the shelf diffusion model. Considering GPT-4’s training, it may recognize Piet and BrainF syntax (and if not, it wouldn’t be infeasible to teach it), and could possibly translate from either language to the other. In this case, I wouldn’t need any of this random training data, because the model would already understand the syntax and form of each language.
There’s not too much time left in the project, so I’ll try to get these steps done as quickly as I can, while still looking for interesting results.