As geoscientists, we are predisposed to associative thinking. Trained for pattern recognition by our education and experiences, we have learned to recognize familiar elements in a new dataset and integrate those pieces of information into a subsurface geological model. However, this learning system is usually biased and most of the time we are unaware of it. With the increasingly common use of machine learning in our workflows to bolster human interpretation, we must become increasingly aware of our biases, so that they can be minimized as we train the algorithms. Herein is a case study and bias discussion from the Ceará Basin in Brazil, where deep convolutional neural networks are used to aid in the petrophysical analysis and volumetric assessment of a potential reservoir.
The Ceará Basin in the Equatorial South Atlantic Ocean (figure 1) began its genesis during the Early Cretaceous, and evolved from an extensional tectonic regime to a chiefly strike-slip. The sediment deposit is heavily controlled by the tectonism resulting in three major tectonostratigraphic sequences: synrift (Barremian- Early Aptian), transitional (Late Aptian- Early Albian), and the drift (Early Albian -Holocene). The Ceará Basin has produced since the 1980s, predominantly onshore and beneath the transitional waters, although the deepwater Mundaú sub-basin remains a new frontier.
Exploration activity in the Mundaú sub-basin of the Ceará Basin, to date, consists of five deepwater wells (water depths greater than one kilometer) drilled between 2012 and 2014, a dense array of 2-D seismic lines, 3-D acquisitions (2008 and 2018). The Pecém well, drilled in 2012, reported production on a sequence of intercalations of sandstone shales, siltstones and marls. However, the permeability, the porosity and the actual extent of this productive zone made the Pecém discovery uneconomical to develop.
To better characterize the Albian-Turonian interval in the Mundaú sub-basin, we apply machine learning in order to model and predict seismic velocities and employ those results to establish an elastic rock property model. Using those results, we discuss interpretational bias and investigate how the well-log modeling influences the seismic amplitude interpretation in this area. Additionally, a relationship between lithotypes (facies) and seismic amplitude is determined, via porosity. While these characterization objectives are the bread and butter of seismic exploration, as machine learning applications are becoming more common, we ought not to forget – as in any new technology – the capabilities and limitations of a new method.
Using the available wireline logs, mineralogy, porosity and fluid volume are initially estimated (figure 2a). Ideally, this estimation would be calibrated by core data or drilling cuts analysis. However, the scale (centimeters to meters) and point source nature (i.e., lack of continuous calibration points) require use of a rock-physics template to evaluate the petrophysics and the elastic model. As these templates are theoretical models, our interpretational bias begins to be introduced even in this first stage of analysis.
Machine learning represents a watershed in exploration seismology with the plethora of digital seismic records and 3-D seismic acquisition. There are too many balkanized machine learning techniques to mention, but we can cluster them into four main types: classical learning, reinforcement learning, ensemble methods and neural networks. The machine learning application employed in this case is a deep convolutional neural network, which is used to model and predict bulk density and the P and S velocities (See the August 2021 Geophysical Corner).
It is important to note that the S-wave log is critical for petrophysical analysis, as the shear wave curve strongly impacts the facies calculation from seismic amplitude. Despite the importance of shear wave data for rock property calculations and predictions, shear wave log acquisition is often sporadic due to its acquisition cost. Because of the common lack of S-wave logs, estimating shear waves is a routine task for geophysicists. Dvorkin, in his paper “Yet another Vs. equation,” summarized not only the relevance of shear wave estimation but also the importance of a rock physics model for reservoir property calculation. In this work, we used two inputs for the DCNN to predict shear wave velocity, compressional velocity, and bulk density: the output from the petrophysics that corresponds to the rock matrix and the original elastic logs (figure 2a).
As the first step, synthetic seismic gathers were generated to aid in the investigation of seismic amplitude relationship with lithotypes and fluid presence, and were calculated for the Albian-Turonian interval (figure 3b). Note that the oil shows and test from the boreholes are in a sandstone interval thinner than five meters, and hence, below seismic resolution. To mitigate this, a synthetic fluid model is calculated. We selected a brine-saturated thick sandstone interval for the synthetic gathers to show the saturated brine response and created a fluid substitution framework using the Gassmann equations. While often overlooked as a source of bias, the synthetic gather generation process and parameters chosen introduce interpretational bias in this analysis. Additionally, the angle gather separation is biased by the chosen velocity model that is chosen to compute reflection angles, even though the selected offset is chosen to mimic the acquisition parameters.
An essential aspect of the fluid substitution framework is the assumed pore system and its related porosity. To consider only the fluid change in porous rock, we use only the effective porosity, and set it so that where clay content exceeds 60 percent, the porosity is deemed to be null. Note that this also is an assumption that is based on the interpreter’s knowledge of the system, as well as other analogue systems. Almost all assumptions that are made by geoscientists are based in a bias, whether that bias is experiential or rooted in data and knowledge availability. The results of the fluid substitution are presented as amplitude versus angle curves and as a cross plot of intercept and gradient (IxG) in figure 3c. The “what-if” analysis makes a simulation of the possible cases from the borehole data show the cluster distribution for each fluid and background. It is interesting to note in figure 3d we see that the separation of the brine cluster with the hydrocarbon clusters – called “fluid vector” – is almost null, thus revealing the inherent difficulty of discerning fluid from lithology using amplitude information. While not the ideal result for the geoscientist, this analysis tells us that the answer in this case is that there is no answer.
Next, using well-established amplitude-based inversion methods to calculate acoustic and elastic parameters, we generated the P-impedance, and the rock-physics model transformed the P-impedance cube into a porosity cube. The P-impedance cube is generated using both datasets to highlight the difference in results between unmodeled logs and modeled logs. This comparison between the recorded and modeled logs clearly demonstrates how interpreter assumptions can bias the geologic input and be propagated further into our workflows and analyses.
Seismic inversion has several different, but crucial steps: the well-tie, wavelet extraction, and low-frequency model building (see the May, June and July 2015 installments of Geophysical Corner). These steps in the process in particular are profoundly affected by borehole logs.
During the first step of the well-tie, the time position is slightly different using the same seismic data (less than 8 milliseconds), even the seismic correlation is similar (about .45). However, the wavelet extracted from the two different well-ties shows a more stable phase and amplitude spectrum. These two possible scenarios based on just the well-tie choice demonstrate one of the advantages of being aware of the nuances of our thinking, and how we might introduce bias into this process, Another implicit bias is the methodololgy selected for the well-tie process. Instead, for every unique well tie, we should be challenging ourselves and asking questions such as, “ Which well-tie represents better the position of the borehole? Where does the wavelet difference come? From lithology or seismic processing?”
In these cases, deeper questioning of the assumptions and choices made may be able to affect the results positively. In this case the impact is positive since it improves the wavelet extraction and well-tie with minimized bias.
The low-frequency model is built using both datasets (modeled and unmodeled). Similarly, two cases are created for the extracted wavelet: a single wavelet representing the unmodeled well-tie, and the same for modeled logs. The two P-impedance cubes are calculated using each wavelet and LFM. The primary difference in these steps is the wavelet, as can be observed in figure 4a where the inverted P-Impedance (full band) is plotted against the P-impedance from the borehole to quality-control the seismic inversion. A convenient benefit of transforming seismic amplitude into P-Impedance is that we can link it with porosity using the established rock-physics equations. Yet, our selected rock-physics model and the selected equation still remain an educated guess, which transforms P-impedance to porosity. This educated guess can also be linked to an implicit bias, based on the choice of equations that the interpreter decides are the best fit.
As the last step in this workflow, the porosity cube is visualized. Here we use a voxel classification technique that highlights specific porosity intervals linked to lithotypes. This voxel visualization has several benefits, including revealing specific porosity rock types that can be considered reservoir rock and estimate the volume of porous rock, which leads to volumetric hydrocarbon estimations.
The modeled logs generated using DCNN show two advantages: the estimation of shear wave velocity where the shear wave is absent, and in the intervals where the log is heavily affected by the acquisition. Moreover, the DCNN parallel generated the modeled compressional velocity log and bulk density; consequently, the modeling is fairly comprehensive among the logs that comprehend the elastic model (P and S velocities and bulk density). An unintended benefit is that the crossplot shows a better correlation with the reported lithology and deviation from original logs.
The modeled logs yielded a rock-physics analysis and synthetic gather generation. These two partial results were analyzed and integrated into the fluid substitution framework to understand if the amplitude variability is responding to the liquid content. Hence, direct hydrocarbon indicator tools are not reliable in the area for this interval. We need to stress these results since seismic amplitude interpretation often misleads us. One interpretation of this result is that it is informing us of what we do not know and what we don’t have the confidence to infer. The AVO simulation and the intercept-gradient crossplot (IxG) confirm the little-to-none separation between brine and hydrocarbon clusters (figure 4a). Furthermore, figure 4b shows the need for two elastic parameters to resolve the reservoir.
The P-Impedance calculation is improved because of increased wavelet stability from the modeled log and low-frequency model building. A benefit of transforming seismic amplitude to P-impedance is that we can correlate physical properties. Exploration geophysics is not looking for the best acoustic impedance; but rather looking for amplitude as a proxy for insight into the porosity distribution in the intervals of interest. In this region, the prospectivity in the Albian-Turonian interval should be the focus on traps with a stratigraphical component.
Finally, the different volumetric results reveal the nuance between the two porosity cubes. The voxel distribution is contrasting. The result from the inversion of the raw logs inversion indicates a larger volume (figure 5c). The voxel volume from the modeled logs (figure 5d) reveals a different volume, maybe a more constrained scenario that might be useful to consider in the decision-making process.
In general, self-organized maps, neural networks and machine learning techniques have summers and winters, just as many other techniques that were popular before reaching a technological limit became obsolete. We ought to keep in mind that these techniques present different interpretation scenarios, such as different volumetric estimations or evaluations of borehole information. There is no silver bullet, no single algorithm that can successfully solve every possible problem. Thus, we should be cautious while using these algorithms; as with human interpreters, machine learning techniques aren’t bias free, but are accompanied by different parameters to measure different descriptive statistical parameters. Thus, within all of these parameters and estimations, we must always be aware of where our interpretational biases and assumptions are influencing the final model.
Acknowledgements: We want to thank ANP for the dataset provided for the study.