As the oil and gas industry faces increasing scrutiny over its climate impact, it becomes essential to adopt effective strategies to monitor and reduce greenhouse gas (GHG) emissions. During the extraction phase of hydrocarbons, the generation of energy by burning gas is the primary contributor to emissions. In this paper, we propose a data-driven methodology for estimating fuel gas in relation to production at treatment plant level where the extraction process takes place. The proposed approach is designed with a pragmatic perspective, considering both the industrial setting and the constraints imposed by available empirical data. Given that this analysis relies on administrative data, extensive preprocessing has been implemented to effectively analyze and model the phenomenon. To enhance the analysis and identify key variables, various clustering techniques were used to group treatment plants exhibiting similar behavior patterns. Despite the comprehensive preliminary analysis, inherent challenges persisted, including the presence of highly correlated numerical variables, which resulted in outcomes that were misaligned with real-world phenomena. In order to address these issues, Principal Component Analysis (PCA) was adopted to mitigate the effects of confounding variables. This approach, combined with an unsupervised random forest algorithm, facilitated the categorization into four distinct clusters. These clustered observations were then used for a split-panel regression analysis.
Carfagna, E., Macedoni, P. (2026). Data Driven Estimation of Treatment Plants Fuel Gas Consumption in the Oil and Gas Upstream Sector. APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 42(3 (May/June)), 1-15 [10.1002/asmb.70099].
Data Driven Estimation of Treatment Plants Fuel Gas Consumption in the Oil and Gas Upstream Sector
Carfagna, Elisabetta
;Macedoni, Pietro
2026
Abstract
As the oil and gas industry faces increasing scrutiny over its climate impact, it becomes essential to adopt effective strategies to monitor and reduce greenhouse gas (GHG) emissions. During the extraction phase of hydrocarbons, the generation of energy by burning gas is the primary contributor to emissions. In this paper, we propose a data-driven methodology for estimating fuel gas in relation to production at treatment plant level where the extraction process takes place. The proposed approach is designed with a pragmatic perspective, considering both the industrial setting and the constraints imposed by available empirical data. Given that this analysis relies on administrative data, extensive preprocessing has been implemented to effectively analyze and model the phenomenon. To enhance the analysis and identify key variables, various clustering techniques were used to group treatment plants exhibiting similar behavior patterns. Despite the comprehensive preliminary analysis, inherent challenges persisted, including the presence of highly correlated numerical variables, which resulted in outcomes that were misaligned with real-world phenomena. In order to address these issues, Principal Component Analysis (PCA) was adopted to mitigate the effects of confounding variables. This approach, combined with an unsupervised random forest algorithm, facilitated the categorization into four distinct clusters. These clustered observations were then used for a split-panel regression analysis.| File | Dimensione | Formato | |
|---|---|---|---|
|
Appl Stoch Models Bus Ind - 2026 - Carfagna - Data Driven Estimation of Treatment Plants Fuel Gas Consumption in the Oil.pdf
accesso aperto
Tipo:
Versione (PDF) editoriale / Version Of Record
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
6.63 MB
Formato
Adobe PDF
|
6.63 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



