Introduction

Inverse problems are integral parts of numerous scientific disciplines with applications in earth sciences such as geophysics and oceanography, but also in the areas of signal processing and computer vision [1]. Their solution is traditionally based on the analytical evaluation of physical models that (minimally) describe the processes to be examined using specific functions and parameters. In addition to such explicitly formalized models, data-driven models that provide probabilistic solutions directly through the analysis of given measurement data are increasingly being used. This includes machine learning methods, which have proven to be powerful and versatile tools for solving inverse problems and are state-of-the-art in many areas due to their high precision [8].

In the field of computer vision, such techniques have greatly expanded the limits of what was previously possible in areas such as object recognition, visual tracking or semantic segmentation. Their capabilities are not firmly anchored in structures defined by experts, but learned from scratch through training using example data. However, the assumption that the incorporation of formalized knowledge should generally be avoided when training neural networks is a fallacy.

An example of this is the 3D reconstruction of an object or scene from given 2D images. In computer graphics, rendering refers to the process of generating images of the scene defined by it from a 3D model with surface description, light, and camera properties. Processes in this area look back on decades of research and are highly developed and technically mature. Physically driven models such as ray tracing, which realistically depict the radiation exchange of light, sometimes achieve photorealism and have almost closed visible gaps between synthetic and real images.

Making this formalized knowledge usable for 3D reconstruction from images motivates the research area of inverse rendering. Standard renderers cannot easily be coupled with gradient-based methods such as neural networks, since they contain discretization steps like rasterization. In order to close this gap, differentiable renderers that eliminate these obstacles by replacing unsuitable program parts have been developed in recent years.

In this paper, we discuss synergetic effects of a fusion of statistical and physical models. Therefore, we present Gemini Connector, an initiative to create a platform for differential rendering with an easily extendable modular concept. In this way, optical phenomena should be physically describable or learnable via neural networks and thus expand the possibilities of image analysis through inverse rendering (see Fig. 1).

Fig. 1
figure 1

Central idea: The fusion of neural networks and differentiable physically based models enables any recombination of these approaches as a bidirectional interface between the world and its parametric description. Instances can be synthesized from the latter, but conversely the model parameters can also be optimized. This hybrid approach aims to achieve a synergetic effect between formalized and data-driven models

In a concrete application example, we investigate how such a combination can be used for improved image analysis to solve visual challenges in underwater recordings using a close-meshed network of differential renderers with physical and statistical models of underwater optics. This enables the measurement of visual underwater properties from ocean imagery, e.g., color absorption patterns or turbidity. It has been shown that these optical parameters can be used to detect and analyze further ocean health parameters [3]. We will therefore discuss possible applications in other projects of Cross-Domain Fusion (CDF), especially but not limited to ocean science.

Related work

It is generally known that neural networks are now being used very successfully in all conceivable areas. This triumph was based on the successful application of the backpropagation algorithm in conjunction with graphics processing unit (GPU)-based autodiff frameworks such as Caffe, PyTorch, and Tensorflow. Recently, there have also been significant developments in the area of differentiable physical models, which take advantage of a similar technical framework, which is accompanied by interoperability.

Differential rendering

Differential rendering is a young but rapidly developing field of image analysis, the first open source framework OpenDR was released in 2014. In just a few years, a large number of application areas in the field of 3D, material, and light reconstruction as well as body and pose estimation emerged. Kato et al. provide an overview of current technologies [6]. The rapidly growing visibility is driven by new platforms from major vendors such as NVIDIA’s Kaolin (2019), Meta’s PyTorch3D (2020), or Alphabet’s Tensor-flow Graphics (2021) and their direct connection to established libraries for machine learning such as Pytorch and Tensorflow. Despite their visually appealing results, conventional computer graphic models are simplified and only realistic to a limited extent. Therefore, renderers like Mitsuba2 (2019) introduce differential ray tracing, in which the light distribution is underlaid with physical principles. Although this method is very computationally intensive and is sometimes not considered to be sufficiently performant for deep learning approaches [7], it enables inverse rendering with even more realistic feedback.

Image analysis in the medium of water

Scattering media reduce the quality of vision and cast a visual haze over distant objects. Depending on the composition, the range of vision can be barely or greatly restricted, image content can be blurry, color components can be desaturated, or scattered light can cover the field of vision. Although these properties hide areas behind the medium, they also provide information about the visual medium itself. In relation to the medium of water, particularly open bodies of water on our planet, the visual properties mentioned in underwater images can provide information about the condition of the body of water. For example, suspended particulate materials such as plankton and detritus, nutrients, or microplastics can be optically detected [3]. The lighting model is initially dependent on whether the modeled water layer is illuminated by sunlight or artificial light sources. While in the first case the lighting model actually intended for foggy weather by Nayar [10] has also established itself in uniformly illuminated underwater scenarios, a macroscopic underwater model was presented by McGlamery and later extended by Jaffe [5] (Fig. 2). In addition to effects such as wavelength-dependent light attenuation, forward and backward scattering light components were identified and considered. Measurements by Petzold et al. show that scattering distributions differ greatly in calm and turbid water [11].

Fig. 2
figure 2

Three-dimensional scene in grid representation (a) and with underwater lighting model according to Jaffe-McGlamery (b) with its light components (c). Sunlight is not taken into account, but can be integrated via other models

Approach

The reconstruction of image content sometimes requires complex inverse rendering pipelines—a suitable combination of methods is one of the main challenges. We strive for a simple platform for the fusion of such methods and motivate this idea with a modular extension of differentiable renderings with physical models of underwater optics. This work is based directly on the results of Nakath et al. [9], who were able to successfully estimate light and water parameters using differential ray tracing.

The goal of this effort is to recover scene parameters. Rendered images are compared with a reference image using an error function and parameters to be optimized are iteratively adjusted using the gradient descent method. How many and which parameters are estimated or specified is freely configurable and has a major impact on accuracy. In the following, we therefore examine several research questions with different configurations of the rendering pipeline.

Optimizing light and water parameters

Fig. 3 shows the core application, the differentiable renderer, augmented with underwater lighting (Jaffe-McGlamery) and light scattering data (Petzold’s scattering model). Depending on the application, parameters can be specified or learned, with larger numbers of parameters to be learned at the same time leading to complex optimization problems that can sometimes only be solved with difficulty or imprecisely.

Fig. 3
figure 3

Structure of the advanced differentiable renderer around the Jaffe-McGlamery model with Petzold’s scattering measurements capable of detecting individual color fading and water turbidity

Light scattering

Synthetic copies can be generated from real input images with known geometry, texture, and pose, which ideally only differ in the light scattering. Therefore, all standard parameters described in Fig. 3 should be known for the measurement of specific light behaviour. Using the optimization approach described, these values can be traced back to light distributions if a suitable model has been selected for them. Fig. 4 shows a return to discrete measured values that can be interpolated to a Gaussian-like light distribution. An implicit functional modeling would also be possible, as described in the upcoming section about adaptive Monte Carlo sampling. If a fixed scattering model was already used as shown in Fig. 3, the water turbidity of the input images could be determined from this instead.

Fig. 4
figure 4

The scattered light does not follow a normal distribution and can be estimated depending on the angle. a Input images. b Forward scatter measurement results overlaid with an estimated isotropic light distribution [12]

Color attenuation

The measurement of the color attenuation through the medium can be carried out using a similar setup in which all standard parameters are also known. In addition, the use of a Macbeth chart (see Fig. 5) is suitable for optimally determining the gradients of the different color channels.

Fig. 5
figure 5

a Use of an ArUco board with a Macbeth chart to calibrate the water parameters (experimental setup adapted from [9]). For comparison: b Macbeth chart with undistorted colors and without reflections. MacBeth ColorChecker Chart by Acser123 (https://en.wikipedia.org/wiki/File:Color_Checker.pdf). License: https://creativecommons.org/licenses/by-sa/4.0/deed.en

Adaptive Monte Carlo sampling

Physically realistic rendering is usually based on path tracing, i.e., Monte Carlo-based ray tracing. In this way, photon currents up to the emitting light sources are measured according to probabilistic methods with the help of the bidirectional reflection distribution and their influence on the image sensor is simulated. The greater the number of samples examined, the more the result converges to a realistic global illumination.

Despite technical progress, the computing effort is still a limiting factor. In addition to noise-suppression methods, adaptive methods have also been developed that fit the probabilistic distribution of the Monte Carlo simulation to the light field. In his work, Huo gives an overview and presents a reinforcement learning approach to achieve realistic results with a smaller number of samples [4]. He also suggests mapping light distributions implicitly functionally, for example via neural radiance fields. Their superior interpolation properties appear very promising for the reconstruction of light distributions. Likewise, as shown in Fig. 6, an extension to complex lighting conditions of underwater scenarios would be very interesting, since these are not fully physically recorded, but are only predetermined by a discrete number of measured values.

Fig. 6
figure 6

Replacing the scattering model (see Fig. 3) and using a neural network to learn and interpolate the scattering distribution of light in water, dependent on turbidity

Conclusion

In this paper we presented the idea of Gemini Connector, a project within the CDF initiative. We strive for a fusion of data-driven and model-based approaches achieved through end-to-end differentiation. By doing so, we presented an example application that measures optical underwater parameters that can be used to deduce further ocean parameters. We seek to combine those observations with other measurements and models to synergistically create a more holistic view on Earth-scale systems. The examination of digital twinning and CDF are central aspects of this research. While we seek to develop specific methods for the underwater projects planned in this initiative [2], the announced platform should not be limited to these applications. Using examples in the underwater optical domain, we have shown the potential of this approach, which offers new possibilities for the estimation of environmental parameters, but also for the physically based instantiation of samples of these models. By recombination of the modules, the interface can be developed as desired and thus adapted to the respective requirements. We want to employ this potential within the CDF initiative in order to promote the postulated paradigm of CDF through a technical approach.