Week 2: Reconstructing Understanding

March 8, 2024

This week I’ve been focusing on two major tasks: setting up OpenSFM on my Mac and reading through the OpenSFM library and documentation.

For context, however, we’re gonna start with some previous research done on this topic.

Related Works

To capitalize on photogrammetry’s photorealism, there is an important barrier to overcome, bad atmospheric conditions. Since the detail in photogrammetry relies on contrast, bad weather can make landscapes appear too symmetric to correctly identify matching features across images. Dark lighting causes errors in camera calibration, which can distort the resulting geometry. As discovered in [1], the increase of uniformly spread ground control points (GCPs), points with exactly known coordinates, increased accuracy of the model near GCPs.

The equipment used to capture images for photogrammetry is much cheaper than that of LIDAR. To evaluate the accuracy of 3D models produced by images from a low-cost UAV and digital camera, [2] conducts an experiment over the surface mine Jastrabá in Slovakia. Out of 237 test points, only 3 points failed the minimum requirements for orthophoto maps in national legislation. The results of this experiment not only demonstrates the utility of photogrammetry for mining, but also for other land operations.

Optimizations can take place during the process of taking photos or during the pre-processing of data. An example is the pipeline for color enhancement, image denoising, color-to-gray conversion and image content enrichment proposed in [3]. CBM3D-new and BID, the proposed methods for image denoising and grayscale reduction, respectively, proved to result in denser and more complete dense point clouds. [4] covers trajectory planning to reduce redundant camera angles. [5] shows that a high camera inclination of 20-40 degrees enhances photogrammetry and can improve the spatial structure of errors more significantly than the loss of accuracy from higher flying height.

Within the calculations of the structure from motion process, there are proposed mathematical models that can increase accuracy of feature matching. The scale invariant feature transform (SIFT) from [6] has been fundamental to many point-based methods like the PCA-SIFT [7].

Algorithm Explained

OpenSFM employs incremental structure from motion: starting from a pair of images and iteratively adding the rest of the images to reconstruct the 3D scene.

Step 1: Choosing a pair of images

Good pairs need enough parallax, i.e. the camera was displaced between 2 shots relatively far compared to distance to scene.

To compute how much parallax is “good enough,” OpenSFM tries to fit a rotation-only camera model to the two images. Using the RANSAC algorithm, an iterative method for estimating parameters of a math model by random sampling, the number of outliers that don’t fit the model is determined. If the number of outliers exceeds 30%, the pair is accepted.

Step 2: Initializing a reconstruction

To find the relative camera motion between the two images, at least 5 corresponding points is needed. For each configuration, a set of inliers is found, and the algorithm triangulates, or in other words, calculates the position of the points. The Rodrigues transformation is an efficient algorithm for applying transformation matrices used to check the camera poses with respect to the images. Using the best inliers with the least error, the reconstruction is initialized with the corresponding poses, triangulated with matches from GCP and GPS data, and bundle adjusted to fit the positions.

Step 3: Adding more images

For each new image added, the process is repeated. The algorithm finds the camera position that makes the reconstructed 3D points project to the correct position in the new image via resectioning. If the resectioning works, the image is added, the features of the new image seen in the reconstructed images are triangulated, the reconstruction is bundle adjusted, and all the features are re-triangulated. Finally, the reconstruction is rigidly moved, as in any affine transformations are applied to the whole reconstruction rather than parts, to adjust to GPS or GCP data.

OpenSFM actually has built-in optimization for large datasets. If the GPS data of the images are known, the dataset can be split into subsets to reconstruct submodels. The algorithm can be run in parallel on all of the submodels. After all the submodels have been reconstructed, the GPS data can be used to find the relative positions of the submodels to each other in order to reconstruct the final model.

Matrix Operations

After going through the files, I’ve compiled almost all the matrix operations I will have to override. I say almost all because I only focused on the NumPy operations, while the library has defined several of its own Matrix classes composed of NumPy arrays, so I’ll have to see how that translates. I won’t be missing any major matrix operations, just some implementation differences. Thus, here’s the noncomprehensive list kind of in the order they first appear when running the algorithm (yes, I traced almost the whole library just to get this):

Dot product
Transpose
Addition and subtraction
Normalize
Rodrigues transform (in cv2 library)
Determinant
Inverse
Multiplication and division by scalars
Matrix multiplication
Nonzero
Drop column
Divide by column
Find homography (in cv2 library)
Argmax
Mean
Eigenvalues and eigenvectors
Sort
Less than, greater than
Sum
Add column
Vstack
Is empty
Identity matrix
Extract diagonal
Standard deviation
Concatenate
Reshape
Flatten
Outer product
Pseudo inverse
Angle between vectors
Rotate matrix
Roll elements

Game Plan

As frustrating as it is, there’s no magic trick to get OpenSFM working (still having trouble building, high hopes for the future because setting up is usually the hardest part), so I’ll grind that for another indefinite period of time. In the mean time, I can get started with creating an auxiliary class for BSI to fit into the functions I identified above. Most of the algorithms have implementations, but I need to check how much quantization they do, if at all, since there will be a lot of floats when dealing with pi and conversions between degrees and radians. Not sure if anyone will believe me, but reading through the OpenSFM library was actually pretty fun. While it is tedious, there are some veeeery brief introductions to the algorithms and optimizations used, and with a little extra Google search, I found out about some pretty cool stuff, like the Rodrigues transformations. If you’re into Linear Algebra, I highly suggest taking a look at it.

Sources:

[1] P. Burdziakowski and K. Bobkowska. “UAV Photogrammetry under Poor Lighting Conditions-Accuracy Considerations.” MDPI, Multidisciplinary Digital Publishing Institute, 19 May 2021, www.mdpi.com/1424-8220/21/10/3531.

[2] B. Kršák, P. Blišťan, A. Pauliková, P. Puškárová, L. Kovanič, J. Palková, and V. Zelizňaková. “Use of Low-Cost UAV Photogrammetry to Analyze the Accuracy of a Digital Elevation Model in a Case Study.” Measurement, Elsevier, 12 May 2016, www.sciencedirect.com/science/article/abs/pii/S0263224116301749.

[3] M. Gaiani, F. Remondino, F. I. Apollonio, and A. Ballabeni. “An Advanced Pre-Processing Pipeline to Improve Automated Photogrammetric Reconstructions of Architectural Scenes.” MDPI, Multidisciplinary Digital Publishing Institute, 25 Feb. 2016, www.mdpi.com/2072-4292/8/3/178.

[4] Q. Li, W. Yu, and S. Jiang. “Optimized Views Photogrammetry: Precision Analysis and a Large-Scale Case Study in Qingdao.” arXiv.Org, 24 June 2022, arxiv.org/abs/2206.12216.

[5] W. Dai, R. Qiu, B. Wang, W. Lu, G. Zheng, S.O.Y. Amankwah, and G. Wang. “Enhancing UAV-SFM Photogrammetry for Terrain Modeling from the Perspective of Spatial Structure of Errors.” MDPI, Multidisciplinary Digital Publishing Institute, 31 Aug. 2023, www.mdpi.com/2072-4292/15/17/4305.

[6] D. G. Lowe. Object recognition from local scale-invariant features. In Proc. of the International Conference on Computer Vision ICCV, Corfu, 1999.

[7] Y. Ke and R. Sukthankar. PCA-SIFT: a more distinctive representation for local image descriptors. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., volume 2, pages II–506–II–513 Vol.2, June 2004.

View more of Cindy Z.'s posts.