^{*}

Edited by: Philippe Renard, Université de Neuchâtel, Switzerland

Reviewed by: Maruti Kumar Mudunuru, Los Alamos National Laboratory (DOE), United States; Niklas Linde, Université de Lausanne, Switzerland

This article was submitted to Freshwater Science, a section of the journal Frontiers in Environmental Science

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

This paper focuses on reducing the computational cost of the Monte Carlo method for uncertainty propagation. Recently, Multi-Fidelity Monte Carlo (MFMC) method (Ng,

Effective propagation of uncertainties through nonlinear dynamical systems has become an essential task for model based engineering applications (e.g., water resources management, petroleum reservoir management) (Elsheikh et al.,

One viable option to handle such situations is the Monte Carlo method (MC) where repeated evaluations of the high-fidelity flow models using different instantiations of the random input are performed. The output of these simulations is post-processed for estimates of the desired statistics such as the mean and the variance of the QoI. Generally, the estimators of the MC method are unbiased. However, since the accuracy of the MC method is measured in terms of the estimator variance (Giles,

In this work, in order to make use of the aforementioned advantages of the MC method and to alleviate the slow convergence rate, we employ a variant of control variate method (Ng,

Similar to MLMC method, Multi-Fidelity Monte Carlo method (Ng,

We now present a brief literature review of MLMC method as applied to uncertainty quantification (UQ) tasks. It appears Heinrich (

In the context of fluid flow in porous media, Müller et al. (

Historically, MLMC method constructs a hierarchy of coarse spatial and/or time discretization models as low-fidelity models. However, it is also possible to formulate a sequence of low-fidelity models utilizing projection based reduced order models (Wang et al.,

In this manuscript, we propose a Multi-Fidelity-Multi-Level Monte Carlo (MFML-MC) method to address some of the limitations of standard MLMC method with Galerkin projection based ROMs (Antoulas et al.,

The proposed MFML-MC method utilizes a number of ideas that are detailed as follows. The first idea of the MFML-MC approach is to obtain a sequence of POD based approximations of the QoI and use these sequence of POD based approximations as low-fidelity models in MLMC framework. More precisely, we compute the optimal POD bases from the singular value decomposition of the snapshot matrix built directly from the training samples of the QoI. We then employ the computed POD bases in the least-squares reconstruction method to obtain a sequence of POD based approximations of the QoI (see section 4 for more details). Since the dimension of the QoI is much smaller than the state variable's dimension, the dimension of the basis vector utilized to approximate the QoI is much smaller than the basis vector utilized to build a standard POD ROM. Therefore, building QoI POD instead of a full state variable POD enables the efficient extraction of high-level PODs at a limited computational cost. The second idea is to employ the MFMC method at each level of the MLMC method so that the high-fidelity model is utilized to provide an unbiased estimator, while the low computational cost of low-fidelity models are exploited to run a very large number of realizations in order to obtain a low variance estimator. The third idea in the MFML-MC approach is to represent the difference between every two consecutive level models of the MLMC framework in a reduced dimension. We utilize principal component analysis (PCA) to perform this dimensionality reduction. The main reason to utilize PCA for dimensionality reduction is to exploit the linearity of the expected value operator. The fourth idea is to use a data-driven approach to construct a non-intrusive ROM (Wang et al.,

The remaining of this manuscript is organized as follows. In section 2, multi-phase porous media flow problem is formulated. In section 3, MC, MFMC, and MLMC methods are briefly explained. In section 4, MFML-MC method is introduced. In section 5, Numerical results for two subsurface multi-phase porous media flow problems showing the performance of MFML-MC method are reported. We note that building reduced order models for these porous media flow problems is quite challenging where standard POD-Galerkin reduced order models produce inaccurate and unstable results even for the cases where a large number of POD basis vectors is utilized (He et al.,

We consider an immiscible two-phase (oil and water) flow in an incompressible porous media domain. The flow behavior of oil and water in a porous media domain can be described by conservation of mass and Darcy's law for each phase (Bastian,

where the subscript α = _{α} is the phase velocity, _{rα} is the relative permeability of phase _{α} is the viscosity of phase α (Bastian, _{rα} models the interactions between the two phases and usually, _{rα} is described as a function of phase saturation (volume of phase α in a given pore space of the porous media domain) (Aarnes et al.,

The total conservation of mass can be expressed in terms of incompressibility condition that takes the form

_{o}+_{w} is the total velocity vector and

where λ = λ_{w} + λ_{o} is the total mobility, λ_{α} = _{rα}/_{α} is the phase mobility, _{w} = λ_{w}/(λ_{w}+λ_{o}) is termed as the fractional flow function for the water phase and with the constraint _{w} + _{o} = 1. In the rest of the manuscript, we use _{w} to denote water saturation.

In this problem, we consider Equation (3) as the high-fidelity model and we solve Equation (3) for pressure and saturation using sequential formulation where we solve for pressure first and then solve for the water saturation. We use finite volume method to discretize the spatial derivatives of Equation (3) in a spatial domain of _{s} is the water saturation value at the

The QoI is defined as ^{m}, where _{i} = _{s}(_{i}, _{i}), _{i}, _{i})

Let

where

where 𝕍ar(

Control variate is a variance reduction technique which uses alternative estimator for 𝔼[

where

where ρ is the correlation between ^{2} lies between 0 and 1, _{HF} and _{HF} is the number of high-fidelity model samples. Furthermore, it was proved in Ng (

The potential limitation in the aforementioned multi-fidelity estimator (Ng,

where ^{1} … ^{I} ∈ ℝ are auxiliary random variables obtained from ^{i}] using _{i} samples of low-fidelity model ^{1} … β^{I} ∈ ℝ are the coefficients. The low-fidelity model _{1} … _{Mi} realizations of the input random vector _{i−1} realizations of _{HF} is the cost of evaluating a high-fidelity model, and _{i} is the cost of evaluating a low-fidelity model ^{1*} … β^{I*}} such that the mean square error of the MFMC estimator is lower than the Monte Carlo estimator for a fixed computational budget.

The multi-level idea is an another extension of the control variate approach in which a sequence of low-fidelity models at different levels (_{i} with respect to the true solution ^{m}. This means, as _{i} is refined to approximate _{i} with

where _{i} = _{i+1} − _{i} with _{I} = _{I}, and we set _{0} = 0. Exploiting the linearity of the expected value operator 𝔼, the expected value 𝔼[

The MLMC estimator for the expected value of

The mean square error (mse) of MLMC estimator

It is evident from Equation (12) that the mse (ϵ^{ml}) of MLMC estimator is sum of several smaller contributions

The MLMC method is mainly based on the fact that _{i}) as low level samples are computed at low computational cost. At high levels, the level variances 𝕍ar(_{i}) are expected to be typically small, thus _{i} can be small and hence MLMC method incurs few expensive high-fidelity model simulations. In summary, MLMC method relies on the following variance hierarchy:

and also expects _{0} < _{1} < _{2} < … < _{I}, where _{i} is the computational cost to compute one sample of _{i}. In MLMC method, the optimal values for the number of samples ^{ml} to a specific value (say

Although MLMC in general refer to control variate method with a sequence of _{1} … _{I}.

A practical implementation of the MLMC algorithm is the following (Müller et al.,

Fix a sequence of levels based on grid resolutions or POD basis

Fix a number of offline samples _{o} and fix a threshold for the estimated standard error.

Perform _{o} samples of high fidelity simulations.

If POD basis models, Derive

Compute _{o} samples of _{i} on every level.

Solve the optimization (Müller et al., _{i} samples of _{i} with

Update the estimates for 𝔼[_{i}], 𝕍ar(_{i}), and _{i} on every level.

Compute and update the required number of samples _{i} on each level.

On every level, if the updated _{i} is more than the number of samples already computed, then add an additional sample of _{i} and continue with step 6. If no level requires an additional sample, then quit.

In this section, we present a novel variance reduction method called Multi-Fidelity-Multi-Level Monte Carlo (MFML-MC) method addressing the limiting facts observed in the standard MLMC method with Galerkin projection based ROMs (see section 1 for more details). In MFML-MC method, we formulate a MLMC framework with

Outline of the (MFML-MC) method described in section 4. The low-fidelity_{i} (yellow color) denotes low-fidelity model

The first step of MFML-MC method is to formulate a sequence of POD approximations of the QoI _{1}, … , _{I}] in MLMC framework. More precisely, in this approach, _{i} is

where _{i} orthonormal basis vectors in its columns. The optimal orthonormal basis vectors are computed from the singular value decomposition (SVD) of the snapshot matrix _{u} is expressed as (Kani and Elsheikh,

where _{1} > σ_{2} > σ_{3} > … σ_{m} ≥ 0) are the singular values of the snapshot matrix _{u}. The associated error termed as least–squares errors in approximating _{i} using only _{i} basis vectors is given by (Berkooz et al.,

Please note that the dimension _{u} is much smaller than _{i} in this MLMC framework in comparison to standard MLMC method. Moreover, _{I} can be obtained by using less number of basis vectors (_{I} ≈ _{I} ≈ 0) in comparison to standard MLMC method with a sequence of Galerkin projection based ROMs.

The second step of MFML-MC method is to compute the reduced representation of _{i} over all levels in MLMC framework (see Equation 9). The reduced representation of _{i} is expressed as

where _{i}, _{i} orthonormal basis vectors in its columns. The optimal orthonormal basis matrix _{ij} = _{i+1j} − _{ij} (_{Ij} = _{j} − _{Ij} for all _{i} is computed from the difference between two consecutive levels of POD based approximations _{i+1} and _{i}, i.e., _{i} = _{i+1} − _{i}, the least–squares error in approximating _{i} by

Now the MLMC estimator (see Equation 11) for the expected value of

The third step of MFML-MC method is to set _{i} for all _{i} = _{i} = σ_{i} and therefore, we expect _{i} to be attracted to a certain low dimensional subspace of dimension _{i} = Δ_{i} = (_{i+1} − _{i}) = 1 over all the levels.

In the fourth step, we extend the Multi-Level Multi-Fidelity method introduced in Geraci et al. (

where _{i} number of level specific low-fidelity models of _{i}, and _{i} = 1 for all

Now, the MFML-MC estimator for the expected value of

In the fifth step, we utilize a data-driven approach to derive a level specific low-fidelity model

where

In this section, we present numerical results to evaluate the performance of MFML-MC method. The numerical results are based on two UQ tasks involving two-phase flow in the heterogeneous porous media domain. The two test cases are quarter five spot problem and the uniform flow problem with the uncertainties in the permeability field (Kani and Elsheikh,

We consider two-phase flow of oil and water in a two dimensional porous media domain [0 1] × [0 1] where water is injected to displace the residual oil. We consider Equation (3) as a high-fidelity model to describe the flow behavior of oil and water. We define the relative permeability based on Corey's model _{wc} is the irreducible water saturation and _{or} is the residual oil saturation (Aarnes et al., _{wc} = 0.2, _{or} = 0.2, and initial water saturation to _{wc}(0.2). We set the porosity field in the porous media domain to a constant value of 0.2. We set viscosity ratio of water and oil to 0.2. We consider uncertainties from the permeability field and assumed to be modeled as a log-normal distribution function with zero mean and exponential covariance that takes the form

where _{k} is the correlation length. We set _{k} to 0.1. Sample realizations of log-permeability values are displayed in

Sample plots of log-permeability field. Uncertain permeability field is modeled from a log-normal distribution function with zero mean and exponential covariance.

As mentioned in section 2, we use sequential formulation to solve Equation (3) for pressure and water saturation (Aarnes et al.,

As defined in section 2, QoI is ^{m}, where _{i} = _{s}(_{i}, _{i}), _{i}, _{i})

We first compute the optimal POD bases matrices _{u} and _{Yi} for all _{u}, _{Yi},

Following that, the obtained matrix _{u} is utilized to build a sequence of POD approximations of _{i}) for all _{i} = _{i} = 1 as already mentioned in section 4.

Next, we build a level specific GBTR on every level (_{i} and utilize the estimated _{i} in Equation (22) to compute

We first compute optimal POD basis vectors for the pressure and saturation solution vectors from the SVD of the corresponding snapshot matrices. We built the snapshot matrices from the solution vectors (pressure and saturation) collected from the solutions of the high-fidelity model for 45 random realizations of the permeability field. We then built low-fidelity ROM models of different dimensions via Galerkin projection of the discretized system of the high-fidelity model Equation (3) on to the POD space spanned by the POD basis vectors.

Following that one can obtain MLMC framework using Galerkin projection POD ROMs as low-fidelity models. However, in the two numerical test cases namely, the quarter five spot problem (5.5), and the uniform flow problem (5.6), we obtained accurate and stable POD results only when the dimensions of the POD-Galerkin ROMs were on the order of magnitude nearly equivalent to the dimension of the high-fidelity state variable (Xiao et al.,

We evaluate the performance of MFML-MC method using two time specific error metrics defined by

where _{e} is the number of runs utilized to estimate the errors, _{t}] obtained from Monte Carlo estimate _{t}] that can be obtained from various estimators including Monte Carlo estimate that uses only high-fidelity model, Monte Carlo estimate that uses only low-fidelity model, and the MFML-MC estimator. We note that, _{e}.

Additionally, we utilize two global error metrics defined as

where all the time snapshots of _{e} = 15 to evaluate the two time specific error metrics and the two global error metrics.

Test case 1 is two dimensional quart-five spot problem where water is injected in the lower left corner (0, 0) of the porous media domain to produce oil and water in the top right corner (1, 1) (Kani and Elsheikh, _{u}.

Test case 1. _{u}.

_{t}] (first moment of _{t}] obtained from MC-LF deviates significantly from the reference result. This clearly shows that utilizing only low-fidelity model in MC framework resultant in biased estimation with respect to the reference result. Furthermore, _{t}] obtained from MFML-MC estimator is almost indistinguishable from the reference result. This result confirm that combining higher number of low-fidelity model realizations with the high-fidelity model in MFML-MC framework can improve the estimator of the first moment of the saturation field.

Test case 1: Comparison of estimation of 𝔼[_{t}] (mean water saturation field at 6 × 6 spatial grid) for a fixed computational budget _{t}] at time _{t}] at time

^{2}, where

Test case 1: Plot of _{t}] (water saturation field at 6 × 6 spatial grid) obtained from various estimators. ^{2}, where

Test case 1: Plot of ^{2}, where

Performance chart of MFML-MC estimator for test case 1.

10^{−4} |
5 × 10^{2} |
125 | 15 | 8.3 |

10^{−5} |
9 × 10^{3} |
2,250 | 210 | 10.5 |

10^{−6} |
25 × 10^{3} |
6,250 | 490 | 13.4 |

Test case 2 is a two dimensional uniform flow problem where water is injected from the left side of the porous media domain to produce oil and water from the right side. We set no flow boundary conditions in the remaining two sides (top and bottom) of the domain. We set inflow rate to 0.08 and outflow rate to 0.08 due to incompressibility constraint set in the problem (Kani and Elsheikh, _{u}.

Test case 2: _{u}.

Test case 2: Comparison of estimation of 𝔼[_{t}] (mean water saturation field at 6 × 6 spatial grid) for a fixed computational budget _{t}] at time _{t}] at time

Test case 2: Plot of _{t}] (water saturation field at 6 × 6 spatial grid) obtained from various estimators. ^{2}, where

Test case 2: Plot of ^{2}, where

Performance chart of MFML-MC estimator for test case 2.

10^{−4} |
5 × 10^{2} |
148 | 14 | 10.8 |

10^{−5} |
9 × 10^{3} |
2,850 | 197 | 14.5 |

10^{−6} |
25 × 10^{3} |
7,950 | 410 | 19.4 |

In this paper, we proposed a MFML-MC method combining the features of both the MFMC method and the MLMC method. In MFML-MC method, we formulated MLMC framework with a sequence of POD approximations of high-fidelity model outputs. Furthermore, in MFML-MC method, we formulated a MFMC setup on every level of MLMC framework in order to compute an unbiased statistical estimation. Finally, we utilized GBTR in the MFMC setup to formulate a level specific low-fidelity model.

We applied MFML-MC method on two uncertainty quantification problems involving two-phase flows in random heterogeneous porous media where standard MLMC method with POD-Galerkin ROMs is ineffective. The uncertain permeability field is modeled from log-normal distribution function with exponential covariance function. Estimate of the first statistical moments of the water saturation at uniformly selected spatial grid points over a specific instant in time are calculated by MFML-MC, MC-HF, and MC-LF methods. Comparisons between MFML-MC and MC-LF suggested that MC-LF as a biased estimator and MFML-MC estimator as an unbiased estimator of the expectation. Comparisons between the MFML-MC and MC-HF computing times showed speedups of MFML-MC with respect to MC-HF that ranged from 8 up to 19 at equivalent accuracy.

Future work should consider the extension of MFML-MC method by utilizing two or more level specific low-fidelity models in the MFMC setup. In addition, it will also be interest to use MFML-MC method for history matching (Elsheikh et al.,

NJ developed the algorithm, coded the algorithm in python and obtained the results, and wrote the manuscript. AE is the Ph.D. supervisor of NJ. NJ did this paper under the guidance of AE.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.