TPE.MS.ID.000541

Equivalent Circulation Density Prediction Using Random Forest Model

Abnormal equivalent circulation density (ECD) often leads to a series of adverse effects on drilling operations, such as wellbore stability, blowout and well collapse, especially in narrow windows. Accurate ECD is crucial for improving the safety, efficiency of drilling operations. The existing ECD calculation processes do not fully consider parameters, such as mud pit gain, total hydrocarbon content, etc. When gas enters the annulus from the formation, the density of the drilling fluid in the annulus will decrease, causing the bottom hole pressure to decrease. In this paper, we presented a machine learning method to predict ECD. Specifically, a data set of 5421 drilling parameters from surface sensors were collected to predicted ECD by utilizing a Random Forest. Eleven parameters, such as well depth, rate of penetration (ROP), mud density, pump rate, mud pit gain, etc. are taken as input parameters, and actual measured ECD is taken as output parameters in the work. In order to evaluate the developed models, the result of novel model was compared with the real ECD that from pressure while drilling tools (PWD). The resulted showed that the RF model predicted the ECD with R² of 0.9939 and RMSE of 0.001 in the training; while R² and MSE were 0.9859 and 0.0017 in the testing datasets. The model was capable of ensuring bottom-hole pressure is within a safe range, avoid dangerous accidents, and minimize the non-productive time (NPT).

Keywords: Data-Driven, Random Forest, Equivalent circulation density, Drilling surface parameters

Equivalent circulating density (ECD) is considered a crucial parameter in the realm of drilling and well control. It is the effective density of a circulating mud fluid in wellbore and concludes two parts: mud hydrostatic pressure and annulus pressure loss in annular. Inaccurate ECD may result in bottom-hole pressure imbalance and complex well control problem, such as stuck pipe, collapse, lost circulation, gas kick and blowout, etc.^1-2 Especially, if ECD is higher than formation pore pressure, the mud will enter the formation and lost circulation will occur; when ECD is lower than the formation pore pressure, gas kick will occur. Therefore, understanding and accurate prediction bottom hole ECD is key to managing the formation pressure and achieving optimal drilling.

There are several factors that have an effect on the ECD during the drilling operations, including mud pump rate, mud density, drill pipe rotation, standpipe pressure, concentration of cuttings, downhole temperature and pressure.^3-5

There are two common methods that can be used to predict ECD in the literature. The first is to use PWD and measurement while drilling (MWD) for real-time monitoring of bottom-hole pressure during drilling.⁶ The equipment has a high-precision quartz gauge, which can measure accurate ECD, annular pressure, and internal pressure measurement. It is the most accurate method, which is particular useful in complex well conditions and high-risk areas. The method was almost perfect, but the cost is high, and it cannot predict the undrilled section.

The second method for calculating ECD is mathematical model. It is calculated as:

Where:

ECD is Equivalent circulating density, g/cm³

Pm means the hydrostatic pressure of a column of fluid, MPa

△Pa is annulus friction pressure loss, MPa

Pwbh represents wellhead back pressure, MPa

DTVD means total vertical depth, m

g is the acceleration due to gravity, m/s²

It can be seen that only the annular friction pressure loss is unknown, and other parameters are known. Therefore, accurate calculation of the annular space becomes very important. The value of annular pressure loss is related to pump flow rate, drilling fluid properties, annular size, wellbore trajectory, gas intrusion and other parameters. To accurately calculate annular pressure loss, certain assumptions must be made, such as assuming that the borehole was regular, concentric annular and circular sections. Based on this, hydraulic software was developed to calculate the bottom hole ECD and make advance predictions for the undrilled well sections. Several rheological models, such as Bingham plastic, power law, Herschel-Bulkley, etc were used to predict ECD and standpipe pressure.^7-8 However, it had been found that every rheological model needs different input parameters, and there also was a discrepancy between predicted ECD and measured by PWD.

Nowadays, an alternative approach for predicting the bottom-hole ECD with high accuracy is by using AI and machine learning method. These methods can process and analyze large amounts of complex drilling data, identify patterns that are difficult to discover using traditional methods, and use them for prediction and decision-making, thereby improving the accuracy of predictions.⁹

Ma considered the effect of temperature, drill pipe eccentric, drill pipe rotation and cutting bed though research horizontal well drilling on borehole pressure and ECD. It was found that drill fluid density, viscosity, and pump rate were the main influencing parameters.¹⁰ Gamal carried out the prediction of ECD by using artificial neural networks (ANNs) and adaptive network-based fuzzy inference systems. The drilling data of 3570 and 1130 were used to build, tested and validated the model. The resulted showed that two models had the strong prediction capability for ECD.¹¹ Abdelaal¹² also developed a approach to predict equivalent circulation density, which based on drilling data while drilling. Nevertheless, initial mud weight, mud pumping standpipe pressure, pumping rate are considered, the mud pool gain and total hydrocarbon content are ignored in his work. Al-Rubaii¹³ focused on develop a novel method name as ECDeffc.m to predict ECD by utilizing three artificial intelligence techniques such ANN, SVM, DT. The data for ECD is 4371, and mud pump flow rate (GPM), mud weight (MW), plastic viscosity (PV), low shear yield point (LSYP), yield point (YP), standpipe pressure (SPP), rate of penetration (ROP) were selected as input. It has a correlation co-efficient of 0.9947 and an average absolute percentage error of 0.23%. Table 1 show several research that utilized artificial intelligence as prediction tools to predict ECD.

It is clear from the table that most machine learning methods can obtain a higher accuracy, however, the models are different in terms of the input parameters, the data used for the ECD prediction. In addition, the annular pressure, temperature in the literature are regarded as the input, however, obtaining these parameters requires installing downhole sensors and it will increase operational costs. We also found that there was little research on ECD prediction when gas enters the annulus, such as total hydrocarbon content increases and mud pool gain changes. Ignoring these parameters in this work would increase the error while predicting ECD, and cause well control problems.

The novel approach in this study is that the AI and machine learning method are mainly dependent only on the real drilling parameters that are well depth, vertical depth, rate of penetration, weight on bit, top_drive_torque, pump pressure, mud pumping rate, mud density, mud pit gain, total hydrocarbons. To this end, several real data points, collected from surface sensors and well-bore PWD, were analyzed, filtered, and processed to build models for predicting the ECD. The results of these models were compared against each other, as well as real ECD measured by PWD to verify the accuracy of the model.

Data Description

The data obtained for the current study was collected from a 8-1/2 in. horizontal section and the well depth is 3700.00m. A total of 5206 data points of surface drilling parameters and ECD measured by PWD were obtained. The following drilling parameters monitored by surface sensor were used as the input variables for the modelling: depth (D), vertical_depth (VD), bit position (BP), rate of penetration (ROP), weight on bit (WOB), top_drive_torque (TDT), pump pressure (PP), mud pumping rate (MPR), mud density (MD), mud overflow (MO), total hydrocarbons (TH), and ECD. The bottom-hole ECD measured by the PWD is the target value for fitting the machine learning model. Table 2 shows the statistical parameters of the whole datasets. D ranges from 3568.00 to 3684.77m; VD ranges from 3304.26 to 3397.96m; BP ranges from 3504.65 to 3684.77m/h; ROP ranges from 29.68 to 75.42 m/h; WOB ranges from 10.11 to 108.34kN; TDT ranges from 9.17 to 17.21kNm; PP ranges from 15.44 to 24.30MPa; MPR ranges from 26.77 to 34.91L/min; MD ranges from 1.65 to 1.70g/cm³; MO ranges from -0.01 to 2.29m³; TH ranges from 0.07% to 58.34%; ECD ranges from 1.79 to 1.84 g/cm³.

Figure 1 shows the heat map of the data distribution. The high values are in red, and low values show in yellow. It can be seen that flow rate, rate of penetration, and mud overflow on bit have a strong positive relationship with the bottom ECD, that is, 0.51, 0.22, and 0.08, respectively.

Date Splitting

It is necessary to develop a machine learning model that make a precision predictions for measured datasets. The training data is used to build the model, adjust the relationship between input and output data by the algorithm, and also select, tune the model’s hyperparameters (such as learning rate, number of trees, regularization strength, etc.). The testing data is used to check whether the model is overfitting, evaluate the performance of the trained model, and check whether the model has good generalization ability by using evaluation indicators, such as mean square error, accuracy, F1 score, etc.

In this analysis, eleven surface drilling parameters were used as input variables: D, VD, BP, ROP, WOB, TDT, PP, MPR, MD, MO, TH, while ECD was regarded as a dependent parameter (output). In addition, the datasets has been divided into two parts: training and testing sets. And it was randomly split with a ratio of 75:25. Specifically, the training data has 3905 data points and the test data contains 1301 data points. Table 3 and table 4 present the statistical parameters of the training and testing datasets, respectively.

To ensure that the datasets (input and target variable) can more accurately reflect the real situation and improve the performance and reliability of the model, the datasets were cleaned by the filter method to avoid missing values, outliers or duplicate values. Median filtering, average filtering, and Kalman filtering are the general filtering methods. The first method is to replace the current data point by middle value within the window ranges, which is suitable for removing sharp impulse noise. It is good at removing the impulse noise, but the effect on stationary noise is average, and the computational complexity is relatively high. The average filter is a simple and common signal smoothing method, suitable for dealing with stable datasets. As for sudden signal, it easily makes the smoothing of signal edges. The last is a recursive algorithm based on linear minimum mean square error estimation, which is suitable for dynamic systems and time series data, and can be adjusted dynamically.

The data collected in this paper comes from the well depth of 3568.00~3697.44m, and the signal is relative stable, so average filtering method is used in this work for signal processing. It is shown in follows:

Where xi the original value of the parameter, yi means the filtered data, N represents the window size, and k is the half of the window.

Data Standardization

Data standardization is an important step in machine learning, which refers to processing data to make it conform to a certain standard or distribution. The purpose of standardization is to ensure that each feature is calculated on the same scale to avoid some features having too much impact on the model due to their large values. The values of input variables were standardized using the following formula:

Where X is the original input parameter need to be normalized; Xmin is the minimum input value and y is the maximum input value.

Random Forest Model

Random forest is a general machine learning method, proposed by Leo Breiman and Adele Cutler in 2001, that uses the output of multiple decision trees to make decisions on each tree, and finally get a prediction result by voting or averaging.²² It has high accuracy and robustness, and can effectively prevent overfitting problems. Python library’s Scikit-Learn was used to build the random forest model. The steps of the random forest algorithm are as follows:

A. Data preparation

Prepare a dataset for training and testing in advance. The dataset usually includes a feature matrix X and a target variable Y.

B. Create multiple sub-samples

It will extract multiple sub-samples from the training data set with replacement during the training process. Each sub-sample is used to train a decision tree.

C. Construct a decision tree

It is necessary to construct a decision tree for each sub-dataset. The process include the randomly select features, selection of the best split point, and construction of decision tree.

D. Prediction of aggregated decision trees

The input test data is passed to each decision tree to obtain the prediction results of each tree. The average of all tree prediction values is taken as the final prediction result for regression process. Figure 2 shows the flow chart of Random Forest model.

E. Model Evaluation

It is important to use appropriate evaluation indicators (such as mean square error, accuracy, F1 score, etc.) to evaluate the performance of the random forest model. The relative mean square error and coefficient of determination were used to evaluate the accuracy of the models in the work.

Where N means the number of datasets tested, yactual is the actual ECD measured by PWD, ypredicted is the corresponding to predicted ECD, ymeanactual means the average of measured ECD.

The parameters used to improve the performance of the random forest model, including

1. The number of trees in the random forest “n_estimators”, the more trees, the better the model performance will generally be, while the computational cost will also increase.
2. The maximum depth of each tree “max_depth”, which can limit the depth of the tree and prevent overfitting.
3. The minimum number of samples required for internal node splits “min_samples_split”, larger values contribute to preventing the creation of too small subtrees, thus reducing the risk of overfitting.
4. The minimum number of samples required for leaf nodes “min_samples_leaf”.
5. The number of features to consider when finding the best split “max_features”.
6. Whether to use bootstrap sampling “bootstrap”. Different values of n_estimators from 100 to 400, four types of max_depth (i.e., None, 10, 20, 30), three types of min_samples_split (i.e., 2, 5, 10), three types of min_samples_leaf (i.e., 1, 2, 4), three types of max_features (i.e., auto, sqrt, log2), and two types of bootstrap (i.e., True, False) were tuned using GridSearchCV. Python code was used to build the RF model.

This section will discuss the bottom hole ECD results predicted by RF models in this work.
It is necessary to find the best hyperparameter combination for improving the accuracy of random forest model prediction. The RF model performance with different parameters (n_estimators, max_depth, min_samples_split, min_samples_leaf, max_features, and bootstrap) were tuned using GridSearchCV method.

Figure 3 shows that the cross-plot of the actual and predicted ECD of the training and testing datasets for the Random Forest model. The RF predicted the ECD with an 0.9939 and the lowest root-mean-square error (RMSE) of 0.001 in the training datasets, while the R² and RMSE were 0.9859 and 0.0017, respectively, in the testing datasets. The optimum model parameters of RF model are shown in Table 5. The optimum, including the n_estimators, max_depth, min_samples_split, min_samples_leaf, max_features, and bootstrap are 400, 20, 2, 1, sqrt, and False, respectively.

An Random forest model was developed using Python to predict the ECD based on 5206 data points obtained from surface sensors and PWD during drilling operations. The ECD predicted by random forest was compared with the measured ECD of pwd. Based on the results, the following can be concluded:

Total 5206 data points, including 11 input variables of depth, vertical_depth, bit position, rate of penetration, etc., were collected to predict bottom hole ECD.
The optimal parameters were obtained by the random forest model grid search method, such as the n_estimators of 400, max_depth of 20, min_samples_split of 2, min_samples_leaf of 1, max_features of sqrt, and bootstrap of False, respectively.
The random forest predict the ECD with R² of 0.9939 and MSE of 0.001 in the training datasets, while R² and MSE were 0.9859 and 0.0017 in the testing datasets,respectively.
In this paper, this model can only calculate the bottom hole ECD under fixed parameters. In practical applications, it is necessary to extend the real-time accurate estimation of the ECD in the future.

The authors would like to acknowledge the SINOPEC Research Institute of Petroleum Engineering, Co.,Ltd. This research is partially supported by the National Key R&D Program of China (grant number, 2023YFC3009204), Sinopec’s scientific research and technological breakthrough projects (grant number, P24223).

This Research Article received no external funding.

Regarding the publication of this article, the authors declare that they have no conflict of interest.

Article Type

Research Article

Publication history

Received date: 05 November, 2024
Published date: 13 November, 2024

Address for correspondence

Xiaodong Gao, SINOPEC Research Institute of Petroleum Engineering, Co.,Ltd., Beijing, 102206, China

Copyright

How to cite this article

Xiaodong Gao, Hongkang Fan. Equivalent Circulation Density Prediction Using Random Forest Model: Research Article. Trends Petro Eng. 2024;4(2):1–8. DOI: 10.53902/TPE.2024.04.000541

Author Info

Xiaodong Gao,^1,2* Hongkang Fan^1,2

¹Sinopec Key Laboratory of Ultra-Deep Well Drilling Engineering Technology, China

²SINOPEC Research Institute of Petroleum Engineering, Co.,Ltd., China

Guidelines

Journals

Information

Trends in Petroleum Engineering

Equivalent Circulation Density Prediction Using Random Forest Model

Abstract

Introduction

Data and Methods

Results and Discussion

Conclusion

Acknowledgments

Funding

Conflicts of Interest