Stock price prediction refers to understanding various aspects of the stock market that can influence the price of a stock, and based on these potential factors, build a model to predict the stock's price. This can help individuals and institutions speculate on the stock price trend and help them decide whether to buy or short the stock price to maximize their profit. While using Machine Learning and Time Series helps us to discover the future value of a particular stock and other financial assets traded on an exchange. The entire idea of analysis and prediction is to gain significant profits.
Focus areas for Analysis:
- The change in closing price of the stock over time.
- Visualization of Candlestick Monthly data.
- The % daily return of the stock.
- The moving average of various stocks.
Prediction:
- We will be predicting future stock behaviour by predicting the closing price of the stock using LSTM.
#Import Libraries
!pip install datetime numpy pandas yfinance seaborn matplotlib import datetime import numpy as np import pandas as pd import yfinance as yf import seaborn as sns import matplotlib.pyplot as plt
Exploratory Data Analysis (EDA) using ELT (Extract, Load, Transform)
Dataset
I have taken the stock price data of United Breweries Holdings Limited from Yahoo Finance from 1st Jan 2022 to 1st Jan 2023.
Time Period of Data: Define the timeframe for which you want to fetch data.
start_date = datetime.datetime(2020, 1, 15) end_date = datetime.datetime(2023, 12, 31)
Loading Data from Yahoo Finance
df = yf.download('UBL.NS', start_date, end_date)
View Dataframe
df
Check index
print(df.index)
Reset Index
df1 = df.reset_index() df1['Date'] = pd.to_datetime(df1['Date']) df1
Converting from Daily to Monthly Frequency data
monthly_data = df.resample('M').agg({'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last', 'Volume': 'sum'})
monthly_data.head()
Plot - Line and Frequency -Daily closing Price
# Plotting plt.figure(figsize=(10, 6)) #plt.plot(df['Date'], df['Open'], label='Open') #plt.plot(df['Date'], df['High'], label='High') #plt.plot(df['Date'], df['Low'], label='Low') plt.plot(df.index, df['Close'], label='Close')
plt.title('UBL Stock Prices Over Time') plt.xlabel('Date') plt.ylabel('Closing Price') plt.legend() plt.grid(True) plt.show()
Plot - Candlestick and Frequency - Monthly OHLC Volume Data
#Plotting monthly candlestick chart with a separate volume plot with MA(20) #mpf.plot(monthly_data, type='candle', style='charles', volume=True, mav=(20), show_nontrading=True, addplot=mpf.make_addplot(monthly_data['Volume'], panel=1, ylabel='Volume'),tight_layout=True, figratio=(16, 9), scale_width_adjustment=dict(volume=0.7, candle=1))
#Plotting monthly candlestick chart with a separate volume plot mpf.plot(monthly_data, type='candle', style='charles', volume=True, show_nontrading=True, tight_layout=True, figratio=(16, 9), scale_width_adjustment=dict(volume=0.7, candle=1))
Total Rows & Columns
df.shape
Data Information
df.info()
Data Quality Check
Duplicate Values
len(df[df.duplicated()])
Missing Values/Null Values
print(df.isnull().sum())
Variable Information
# Columns df.columns
#Describe df.describe()
# Check unique values for each variable
for i in df.columns.tolist():
print("No. of unique values in ",i,"is",df[i].nunique(),".")
Plotting Moving Average (50,200) is a simple technal analysis that smooths out price data.
ma_day = [50, 200]plt.figure(figsize=(10, 6))
#Plot Close price plt.plot(df.index, df['Close'], label='Close')
#Plot Moving Averages for ma in ma_day: column_name = f"MA for {ma} days" df[column_name]=df['Close'].rolling(ma).mean() plt.plot(df.index, df[column_name], label=column_name)
plt.title('UBL Daily Close Prices and Moving Averages') plt.xlabel('Date') plt.ylabel('Price') plt.legend() plt.grid(True) plt.show()
Average Daily Returns
#Calculate daily return percentage df['Daily Return'] = df['Close'].pct_change()
plt.figure(figsize=(10, 6)) plt.plot(df.index, df['Daily Return'], linestyle='--', marker='o', label='Daily Return')
plt.title('Daily Return Percentage') plt.xlabel('Date') plt.ylabel('Percentage') plt.legend() plt.grid(True) plt.show()
plt.figure(figsize=(12, 9)) df['Daily Return'].hist(bins=50, alpha=0.5, label='UBL')
plt.xlabel('Daily Return') plt.ylabel('Counts') plt.title('Daily Return of UBL using histogram') plt.legend() plt.grid(True) plt.tight_layout() plt.show()
2. Prediction using LTSM (Long Short-Term Memory) is a type of recurrent neural network (RNN) architecture well-suited for sequence prediction problems.
2.0 Before Prediction
plt.figure(figsize=(16,6))
plt.title('Close Price History')
plt.plot(df['Close'])
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price', fontsize=18)
plt.show()
2.1 Data Prepartion
#Create a new dataframe with only the 'Close column data = df.filter(['Close'])
#Convert the dataframe to a numpy array because ML/DL libraries requires numpy arrays as inputs dataset = data.values
#Get the number of rows to train the model on training_data_len = int(np.ceil( len(dataset) * .80 ))training_data_len
2.2 Data Scaling
#Scale the data from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0,1)) scaled_data = scaler.fit_transform(dataset)scaled_data
2.3 Creating Training Data
#Create the scaled training data set train_data = scaled_data[0:int(training_data_len), :] # Split the data into x_train and y_train data sets x_train = [] y_train = []
for i in range(60, len(train_data)): x_train.append(train_data[i-60:i, 0]) y_train.append(train_data[i, 0]) if i<= 61: print(x_train) print(y_train) print()
# Convert the x_train and y_train to numpy arrays x_train, y_train = np.array(x_train), np.array(y_train)
# Reshape the data x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
# x_train.shape
2.4 Model Building
from keras.models import Sequential from keras.layers import Dense, LSTM
#Build the LSTM model model = Sequential() model.add(LSTM(128, return_sequences=True, input_shape= (x_train.shape[1], 1))) model.add(LSTM(64, return_sequences=False)) model.add(Dense(25)) model.add(Dense(1))
#Compile the model model.compile(optimizer='adam', loss='mean_squared_error')
2.5 Model Training
#Train the model model.fit(x_train, y_train, batch_size=1, epochs=1)
2.6 Creating Testing Data
#Create the testing data set #Create a new array containing scaled values from index 1543 to 2002 test_data = scaled_data[training_data_len - 60: , :]
#Create the data sets x_test and y_test x_test = [] y_test = dataset[training_data_len:, :] for i in range(60, len(test_data)): x_test.append(test_data[i-60:i, 0])
#Convert the data to a numpy array x_test = np.array(x_test)
#Reshape the data x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))
2.7 Making Predictions
#Get the models predicted price values predictions = model.predict(x_test) predictions = scaler.inverse_transform(predictions)
2.8 Model Evaluations
#Get the root mean squared error (RMSE)
rmse = np.sqrt(np.mean(((predictions - y_test) ** 2)))
print("Root Mean Squared Error (RMSE):", rmse)
#Calculate accuracy percentage
accuracy_percentage = (1 - (rmse / valid['Close'].mean())) * 100
print("Accuracy Percentage:", accuracy_percentage)
2.9 Visualization
#Plot the data train = data[:training_data_len] valid = data[training_data_len:] valid['Predictions'] = predictions
#Visualize the data plt.figure(figsize=(16,6)) plt.title('Model') plt.xlabel('Date', fontsize=18) plt.ylabel('Close Price USD ($)', fontsize=18) plt.plot(train['Close']) plt.plot(valid[['Close', 'Predictions']]) plt.legend(['Train', 'Val', 'Predictions'], loc='lower right') plt.show()