Home / Blog / Data Science / Oil Prices and Stock Market Analysis using K-Means Clustering

Oil Prices and Stock Market Analysis using K-Means Clustering

July 02, 2025
93

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Data Collection:

Yahoo! Finance is the source of data for this analysis, which is a media property that belongs to the Yahoo! network. It provides financial news, but also it gives an API that handles the financials and stock market data and expert comments that includes stock quotes, press releases, financial reports, and original content. We have connected Yahoo API for share price dataset from Yahoo Finance on a daily frequency from the companies, Shell (RDSB.L), BP (BP.L), Cairn Energy (CNE.L), Premier Oil (PMO.L), Statoil (STL.OL), TOTAL (FP.PA), ENGIE (ENGI.PA), Schlumberger (SLB.PA) and REPSOL (REP.MC). For the oil price dataset, we have taken the data set from the U.S Energy Information Administration.

Fig 1: Packages and Datasets import in Python

The yahoo finance package, known as yfinance, and the datetime package are both imported in the sample of code above. The goal is to collect data over a specified time period of years and request that 10,000 records (days) of that data be utilised for analysis. For variable-created shares, the firm names were saved. The Data, Open, High, Low, Close, adjusted close, and Volume elements make up the initial dataset from the Yahoo Finance database. Data and oil_price characteristics make up the dataset for the price of oil.

Fig 2: Yahoo finance dataset with share market details related to Oil Company

Fig 3: Dataset with oil price details related to Oil Company

Data Preprocessing:

The share price data has been mapped with the company name based on respective dates in the data. This information has been concatenated with the oil price dataset, finally creating a new dataset containing share price, oil price, and company details. The share price has been scaled and saved as share_price_scaled and added to the same data set.

Fig 4: all_data dataset with added column share_price_scaled.

Fig 5: Data concatenation and transformation to create a new dataset (all_data)

Exploratory Data Analysis:

We have the final dataset after data preparation, which we will then process further for the clustering procedure. Exploratory data analysis would shed some light on the dataset's hidden patterns and the connections between its various aspects. Machine learning for the EDA process would leverage several useful data visualisation approaches. To do this, I've decided to use a Simple line plot on the price of oil, a Pairplot on the price of BP's shares from 2000 to 2017, a Pairplot on the price of BP's shares using the previous five years, a Pairplot on the price of BP's shares using the last five years, and more. Oil price violin plan, oil share price violin plot, oil share price violin plot oil & gas firms, Premier Oil and Statoil's joint plots are compared together with the share prices of several firms plotted against the price of oil using various templates.

In Fig 6, we notice the fluctuations in oil prices from time to time movement. Around 1988 the price was 20 $ and currently forecasted to 2026, showing as 120 $. The highest oil prices were noticed in the period 2008 when the value of the barrel went to more than 140 $. Oil prices could be influenced by many reasons depending on decisions from petrol independent countries such as Russia and private oil-producing firms like ExxonMobil. According to “The Organization of Petroleum Exporting Countries (OPEC)”, the role of oil exporting countries will be great in oil price fluctuations. The supply and demand also influence the oil rates in the world. Natural disasters that could disturb production, and political conflicts in oil-producing countries all influence pricing.

360DigiTMG

Fig 6: Simple Line Plot between date and oil_price (on left) and date and share price (on right)

Fig 7: Pairplot on BP share price from years 2000 to 2017 using a color gradient for different years

In this instance, Figure 7's Pairplot displays the comparison between share price and oil price together with a timeline analysis spanning the years 2000 and 2017. The pair plot reveals the pairwise correlations between the features dataset and each variable's univariate data distribution. We looked into how these factors related to one another.

Fig 8: Pairplot on BP share price for last 5 years using a color gradient

This analysis was performed on all oil companies’ data in a dataset, where the oil price and share price details were analyzed for 18 years and the last 5 years also. In the last five years, the distribution gives us that the oil price has been frequently fluctuating between 30 to 60 USD/bbl and 100 to 120 USD/bbl. Between 60 and 100 USD/bbl, we could notice there is not much data. There is a positive correlation with high confidence for 2016/17 of share price and oil price. Here the year, when all values changed for this company was 2014, we will see later if that's the case for other companies. In 2014, there was a change in pattern correlations and high changeability of data. We noticed the differences between the two worlds, two market behaviors, and a change in management regardless of the independent events of the company. Several understandings can be gained from this plot.

Fig 9: Violin plot of the share price of several oil & gas companies

Based on the aforementioned study, we have seen how each company's stock is affected by the price of oil. A few violin plots will be constructed. A box plot with probability density data is somewhat similar to a violin plot. Although the variations in stock price range and distribution for many firms was distinct, the 2014 oil price's wide range was. Although several of the mentioned firms have been identified as typical variants, only few are particularly sensitive. Stock prices are scaled between 0 and 1 using their maximum and minimum values during the past 20 years, roughly, which might result in incorrect interpretations.

Unsupervised Learning - Cluster analysis on oil companies:

As we understand, unsupervised learning has only input data and no corresponding output variables. Mainly the algorithm is used to model the structure or distribution of the data. When it comes to unsupervised learning problems, commonly we use clustering, to discover groupings of data or find patterns. A possible application of this algorithm would be to assess the comparative value of the share associated with the oil price. Thus, this analysis could provide a sign, that the share is overpriced or undervalued.

The following oil firms were chosen for analysis: REPSOL (REP.MC), BP (BP.L), Cairn Energy (CNE.L), Premier Oil (PMO.L), Statoil (STL.OL), TOTAL (FP.PA), ENGIE (ENGI.PA), Schlumberger (SLB.PA), and ENGIE. The main focus of the investigation was establishing a connection between the price of oil and stock prices. In order to determine the number of clusters, we created a scree plot. Based on the need, we chose 6 clusters and moved on.

Fig 10: K-Means clustering for all companies, comparing oil price and share price.

The clusters were compared with the oil price and share price, we can notice cluster 0 in light blue, data points were starting from very low prices, but the share price is not much influenced by the oil price. Whereas cluster 1 in yellow color, has high oil prices, but a moderate share price.

Fig 11: Scree plot for finding number of clusters and Scatter plot for representing the clusters of BP Company, comparing oil price and share price.

Most often, cluster 3 also experiences increased oil prices, although this has little effect on share prices. Perhaps cluster 4, where we can see that share prices were quite high, is the greatest period to sell shares. Oil prices had no noticeable effect on share prices during this time period in this specific corporation. In the overall study, Cluster 5 is in a neutral position where neither the oil prices nor the stock prices have significantly altered. Similar to what we did above with the BP data, we may conduct studies for all firms.

Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore