Week 2 – Video Colorization

March 17, 2023

Overview

This week, I built on my previous week’s work on the colorization model, to actually colorize videos themselves. I spliced sample black and white videos into individual frames, colorized each image, then merged them into a new video. Let’s look into each step of this process.

Colorization

My colorization model still needs work but is able to reasonably color images (not many outrageous colors in the picture). The model utilizes a GAN model, which essentially uses a generator model and a discriminator model. The generator is trained to transform black and white pictures into colored ones, its output being sent to the discriminator. The discriminator model is trained to differentiate colored pictures sent by the generator from real colored pictures. In this way, the model gets better at determining what colored pictures should look like and how to actually make them. The structure of a GAN is shown below.

Furthermore, each image is transformed to 256 x 256 dimensions, regardless of the original size, meaning all pictures and all videos are squished into a square format. This is done so that the neural networks can run easier. While it is possible for the models to run with other dimensions, it would require much more testing that I can revisit later on. With each epoch, the colorization gets gradually better.

Since run times become exceedingly long with increased epoch numbers, I have yet to experiment with models run for 100 or more epochs. The most I have done as of now is solely 20 epochs. Sample colorized images under 3 epochs are shown above.

Video Editing

The next step of the colorization process is to actually handle videos. The model I am using takes a sample black and white video and separates it into individual frames at a rate of about 23 frames per second. I then take each frame and feed it to my colorization model. Here, I must concede that the original dimensions of the video are lost to my model transforming everything to 256 × 256 pixels. After each frame is colorized, I merge everything back into a video that can be saved.

Below I have embedded both sample input and output videos. Notice the reduced dimensions and differing frames per second ratios, elements I intend to correct in the coming week.

Next Steps

In the coming week, I hope to fix some key issues I noticed with the video. As you can see, the people in the colorized video appear to be moving slower than in the input video. I believe that is due to some frames-per-second conversion error. Furthermore, the colorization could use some work. In the next week, I will try running the model on a higher number of epochs to see if colorization improves, or I might just use a different colorizing model. Finally, I will experiment with the dimensions the model can run on and aim for some approach that can preserve initial dimensions.

View more of Dhruva P.'s posts.