Week 11: Depths of Pixels and Math
May 10, 2024
After expanding the equations in PatchMatchUpdatePixel(), I have arrived at a formula that I can use to express the position of the pixel being solved for in terms of BSI.
In bundle adjust, the camera poses of all of the images were solved for and stored as rotation and translation matrices, which are extrinsic parameters of the camera. Using those matrices, compute_depthmaps will now calculate the 3D positions of the pixels in the images by randomizing the depth and calculating the other coordinates using some formulas based on the camera extrinsic parameters.
Equations
Since all of the camera extrinsic parameters were calculated on a coordinate system centering around the first image, the position of the pixels are calculated with respect to the first image as well. As such, the homography matrix, a 3×3 matrix representing the transformation matrix from one image to another, needs to be calculated between each image and the first image.
H = 3×3 homography matrix
K = 3×3 projection matrix, how cameras projects onto an image plane based on focal length
R = 3×1 rotation matrices of the first image and the other image combined
t = 3×1 translation matrix
Using this homography matrix, the 3D points of each pixel in the image can be calculated
Similarly, based on the homography matrix, the a pixel in the first image can be projected to a position in the other image.
Since these equations rely purely on H, i, and j, we can store the values of H, i, and j for each iteration and rewrite the equations in terms of a vector of each element. For example,
where multiplication between vectors is defined as
Now that the equations have been rewritten in terms of vectors, it can be easily rewritten in terms of BSI by converting those vectors into BSI.
Implementation
For now, I will only implement the equations for u, v, and w in terms of H because I need to start off somewhere simple. To solve for H in terms of the extrinsic parameters would require 36 additional vectors for each element in each of the matrices. I also can’t implement the Hx0 or dfdx variables yet because I haven’t implemented the framework for division with BSI.
Currently, I have created a vector for all of the H matrices, i, and j values for each iteration of PatchMatchForwardPass and PatchMatchBackwardPass. There are about 4 million H matrices per iteration, and each of the elements of H are floats ranging from 0 to 2 with up to 5 decimal places. Since BSI uses quantization to store floats, I will test with different precisions of H.
For now, my program keeps crashing after some number of iterations, so I need to figure out a way to break up how many values of H I store into even more chunks.
BSI Division
As I mentioned, I haven’t yet implemented the framework for division, but I can explain the algorithm that I plan to use. Division by a number can be represented as multiplying by the inverse of that number.
Fractions can be rewritten such that it is in the form
Since X is stored in binary format, it is easy to divide X by powers of 2 by shifting X. As such, it would be easy to divide X by D if we could instead divide X by a power of 2, then multiply by some constant Y.
In other words, the division can be rewritten as
Depending on how much precision the final answer needs to be in, there may be some multiplication by a positive power of 2 with the BSI performed earlier on, which may be later offset by storing the power of 2 the BSI was multiplied by.
For the upcoming week, I will be fixing the cause of my program crashing, likely due to memory limit exceeded. Once I have fixed that, I will be able to show some numbers for runtime and accuracy on the final presentation!
Leave a Reply
You must be logged in to post a comment.