Basic Stock Data Analysis Using Jupyter Notebook (2024)

Evafachria

10 min read

Sep 10, 2023

Basic Stock Data Analysis Using Jupyter Notebook (2)

Stock data analysis commonly involves two primary approaches: fundamental analysis and technical analysis. Fundamental analysis entails assessing a stock’s intrinsic value by examining fundamental company information, such as financial reports, financial ratios, company news, valuation, and macroeconomic factors. On the other hand, technical analysis revolves around analyzing historical stock movement data and identifying specific patterns within the trends of these stock movements. Both fundamental and technical approaches come with their respective strengths and weaknesses. Technical analysis is often used for short-term stock price analysis, including activities like short selling, whereas fundamental analysis is more frequently applied to long-term investment analysis.
The tools for analyzing stock data have continuously evolved to meet the demands of these two approaches. Python, a popular programming language, can be utilized to simplify stock data analysis. When combined with Jupyter Notebook, Python provides a versatile and interactive environment for conducting stock data analysis. In the following sections, we will explore how Python and Jupyter Notebook can be used for stock data analysis in greater detail.

To access stock price data in Jupyter Notebook, the first step is to import several supporting libraries such as:

import numpy as np
import pandas as pd
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt
import yfinance as yf
import talib as ta

These libraries can be accessed by first installing the required packages using pip install, such as:

pip install pandas-datareader
pip install yfinance
pip install TA-Lib

After installing and importing the necessary libraries, the next step is to retrieve the stock data itself by:

yf.pdr_override()LQ45 = pdr.get_data_yahoo("^JKLQ45", start="2013-01-01", end="2023-04-30")
LQ45.head(3)

The above code example is used to fetch the Indonesian LQ45 stock price data over a 10-year period, starting from January 1, 2013, to April 30, 2023. LQ45 stock price is the most liquid and widely traded blue-chip stock in Indonesia. Based on the July 2023 data from the Indonesia Stock Exchange (BEI), the members of the LQ45 stock are as follows

ACES, BBRI, EXCL, ITMG, SMGR, ADRO, BBTN, GOTO, JPFA, SRTG, AKRA, BMRI, HRUM, KLBF, TBIG, AMRT, BRIS, ICBP, MDKA, TINS, ANTM, BRPT, INCO, MEDC, TLKM, ARTO, BUKA, INDF, PGAS, TOWR, ASII, CPIN, INDY, PTBA, TPIA, BBCA, INKP, SCMA, UNTR, BBNI, ESSA, INTP, SIDO, UNVR.

The resulting data includes daily stock prices during that period, including Open, High, Low, Close, Adjusted Close, and Volume prices. Here is the output

Basic Stock Data Analysis Using Jupyter Notebook (3)

Still using the same stock data, which is Indonesian LQ45, we attempted to create visualizations of stock price movements during the specified start and end periods. Here, the price data used is the adjusted close price. Furthermore, to simplify the coding process, we renamed the adjusted close to “price.”

Next, to generate visualizations of stock price movements, the code used is:

import matplotlib.pyplot as plt
plt.style.use('seaborn')plt.figure(figsize=(16,10))
plt.plot(LQ45['Price'],'-')
plt.gcf().autofmt_xdate()
plt.title('LQ45 Stock Price 2013-2023', fontsize=16)
plt.xlabel('Date',fontsize=16)
plt.ylabel('Price in Rupiah', fontsize=16)
plt.show()

and the output is:

Basic Stock Data Analysis Using Jupyter Notebook (4)

The line chart above illustrates the trend of LQ45 stock prices, which, when observed over the past 10 years, has shown an upward trend, despite a significant decline in 2020 due to COVID-19. However, in general, stocks in the LQ45 have experienced an increasing trend.

To calculate the average return generated by LQ45 stocks, we can use logarithmic returns to determine the percentage change (increase or decrease in return) during the specified start and end period.

From the output below, it can be seen that over a period of 10 years, LQ45 stocks can provide a return of 25.83%. In other words, if we invest 100 million Indonesian Rupiah in these stocks, we will receive a yearly return of Rp 2,583,000.

price_2013=LQ45['Price'][0]
price_2023=LQ45['Price'][-1]
roi=np.log(price_2023/price_2013)*100
print(f'Price on {LQ45.index[0].strftime("%d-%m-%Y")}: RP{round(price_2013,2)}')
print(f'Price on {LQ45.index[-1].strftime("%d-%m-%Y")}: RP{round(price_2023,2)}')
print(f'Return on Investment: {round(roi,2)}%')output
Price on 02-01-2013: RP742.79
Price on 28-04-2023: RP961.75
Return on Investment: 25.83%

To track the specific changes in returns, you can create a line chart by first calculating the monthly return changes

LQ45['Price d+1']=LQ45['Price'].shift(-1)
LQ45['ROI']=np.log(LQ45['Price d+1']/LQ45['Price'])*100

Below is a function to create a template for a line chart that can be reused.

def plot_time_series_with_summary (data, title, x_label, y_label):
 plt.figure(figsize=(16,10))
 plt.plot(data,'-')
 plt.gcf().autofmt_xdate()
 plt.axhline(y=data.mean(), label='Mean', color='r')
 plt.fill_between(data.index, (data.mean()-data.std()),(data.mean()+data.std()), color='b',alpha=.1, label='Volatility' )
 plt.title(title, fontsize=16)
 plt.xlabel(x_label,fontsize=16)
 plt.ylabel(y_label, fontsize=16)
 plt.legend()
 plt.show()

LQ45=LQ45[['ROI']].resample('1M').sum()plot_time_series_with_summary(LQ45['ROI'], 'Volatility of LQ45', 'Date', 'ROI in %' )

Here are the results of the monthly stock price change visualization.

Basic Stock Data Analysis Using Jupyter Notebook (5)

From the line chart above, it can be seen that the movement of LQ45 stock prices is quite volatile. However, we cannot predict whether this movement is risky or not. Statistically, you can assess the level of volatility of LQ45 stock by using LQ45.describe() and looking at the standard deviation value. The higher the standard deviation value, the higher the risk associated with it.

Basic Stock Data Analysis Using Jupyter Notebook (6)

We can analyze the level of volatility based on the fluctuation rate or statistically through the standard deviation of the LQ45 ROI. However, the standard deviation value of 4.79 may be difficult to interpret in terms of risk level without a benchmark. Therefore, it is necessary to conduct volatility analysis for each individual stock within the LQ45 index.

First, we need to identify the members of the LQ45 stocks and define them in the code below.

idx_tickers=[f'{idx_tickers}.JK' for idx_tickers in [
 'ACES',
 'EXCL',
 'ITMG',
 'SMGR',
 'ADRO',
 'BBTN',
 'GOTO',
 'JPFA',
 'SRTG',
 'AKRA',
 'BMRI',
 'BRPT',
 'HRUM',
 'KLBF',
 'TBIG',
 'AMRT',
 'BRIS',
 'ICBP',
 'MDKA',
 'TINS',
 'ANTM',
 'BRPT',
 'INCO',
 'MEDC',
 'TLKM',
 'ARTO',
 'AALI',
 'INDF',
 'PGAS',
 'TOWR',
 'ASII',
 'CPIN',
 'INDY',
 'PTBA',
 'TPIA',
 'BBCA',
 'INKP',
 'SCMA',
 'UNTR',
 'BBNI',
 'ESSA',
 'INTP',
 'SIDO',
 'UNVR',
 'BBRI']]
tickers=['AAPL','TSLA', 'META', 'AMZN', 'MSFT']+idx_tickers #tickers sengaja ditambahakan sebagai perbandingan dengan saham yang listing di luar BEI

Next, we can retrieve stock price data for each individual member of the LQ45 stocks in Jupyter Notebook using code like the following:

tickers_data = []
tickers_prices = {}
tickers_df = {}for ticker in tickers:
 try:
 raw_data = pdr.get_data_yahoo(ticker, start="2013-01-01", end="2023-04-30")[['Adj Close']].rename(columns={'Adj Close': 'Price'}).copy()
 raw_data['d+1'] = raw_data['Price'].shift(-1)
 raw_data['ROI'] = np.log(raw_data['d+1'] / raw_data['Price']) * 100
 monthly_price = raw_data.resample('M').last()
 monthly_price['lower'] = monthly_price['ROI'] - monthly_price['ROI'].std()
 monthly_price['upper'] = monthly_price['ROI'] + monthly_price['ROI'].std()
 tickers_prices[ticker] = {
 'Expected Return': monthly_price['ROI'].mean(),
 'Risk': monthly_price['ROI'].std(),
 'Data_length': len(monthly_price['ROI']),
 'Data': {
 'Changes': monthly_price['ROI'].values.tolist(),
 'Margin of Error': monthly_price[['lower', 'upper']].values.tolist()
 }
 }
 tickers_data.append([ticker, monthly_price['ROI'].mean(), monthly_price['ROI'].std(), len(monthly_price['ROI'])])
 tickers_df[ticker] = monthly_price
 except Exception as e:
 print(e)
 print(ticker)

And next, we can visualize a sample of stock prices to observe their volatility using the following steps:

sample_tickers = ['BBCA.JK', 'UNVR.JK', 'ASII.JK', 'TLKM.JK']
counter = 0
figure, axes = plt.subplots(2, 2, figsize=(15, 15))
figure.suptitle('Sample of Stock Volatility/Risk', fontsize=20)
for i in range(2):
 for j in range(2):
 data = tickers_df[sample_tickers[counter]]['ROI']
 axes[i, j].plot(data, '-')
 axes[i, j].axhline(y=data.mean(), label='Mean', color='r')
 axes[i, j].fill_between(data.index, (data.mean() - data.std()), (data.mean() + data.std()), color='b', alpha=0.1, label='Volatility')
 axes[i, j].set_title(f'{sample_tickers[counter]}')
 counter += 1plt.legend()
plt.show()

And the visualization result looks like this:

Basic Stock Data Analysis Using Jupyter Notebook (7)

For the visualization above, we only took a few samples for comparison. It can be seen that the stock with the code ESSA has a very high level of volatility with a standard deviation exceeding 15. Compared to the other four sample stocks, ESSA can be considered quite volatile or risky.

However, the visualization above still has limitations. If we want to compare all the stocks in the LQ45 index, it would be very difficult to compare multiple line charts. Therefore, to make it easier to assess which stocks are risky and how their returns compare among the many stocks available, we can use a scatter plot.

To compare stocks with the lowest risk or the highest expected return, we can analyze them using a scatter plot. However, it’s important to understand that here, “risk” is synonymous with volatility, and “expected return” is synonymous with ROI (Return on Investment). We use these terms to make them more relevant to the terminology commonly used in the field of investment.

To create a scatter plot, we can use the following code:

plt.figure(figsize=(20,10))
plt.plot(df_tickers_data['Expected Return'], df_tickers_data['Risk'], '.', markersize=20)
plt.title('Expected Return VS Risk', fontsize=20)
for i in range (len(df_tickers_data)):
 txt=df_tickers_data['Ticker'][i]
 x=df_tickers_data['Expected Return'][i]
 y=df_tickers_data['Risk'][i]
 plt.annotate(txt,(x-0.01, y +0.01), fontsize=12)
plt.axhline(y=df_tickers_data['Risk'].mean(),label= 'Risk Mean', color='r')
plt.vlines(x=df_tickers_data['Expected Return'].mean(),label='Expected Return Mean', color='y', ymin=(min(df_tickers_data['Risk'])), ymax=(max(df_tickers_data['Risk'])))
plt.xlabel('Expected Return', fontsize=16)
plt.ylabel('Risk', fontsize=16)
plt.legend()
plt.show()

And the result is:

Basic Stock Data Analysis Using Jupyter Notebook (8)

The scatter plot above helps analyze the risk and return of multiple stocks as desired, and also makes it easier for us to identify which stocks have returns above average or risks below average. The vertical yellow line above represents the average expected return line, which can be used as a benchmark to determine if a stock has a good or above-average return. Meanwhile, the red horizontal line represents the average risk line, where if it is above the average line, it means that the stock has a high risk.

To extract more information from the scatter plot above, we first need to understand the concepts of risk aversion and risk-taking. Risk-averse individuals have a low tolerance for risk, so they will invest in stocks with the lowest risk but reasonable returns. On the other hand, risk-takers will choose stocks with high risk and high returns. Why is that? Because stock investments, fundamentally, follow the concept of “high risk, high return.” Therefore, based on this concept, from the scatter plot, we can recommend or purchase stocks for risk-averse individuals in quadrant 4 (e.g., INDF, UNTR, PTBA, BBCA, TLKM, etc.), while those with a risk-taking behavior will choose stocks in quadrant 1 (e.g., ADRO, ANTM, INCO, etc.).

After analyzing stock data for long-term investment purposes, we can then use candlestick analysis for short-term technical stock analysis. This analysis is typically used by traders who aim to capture signals of stock price fluctuations with the intention of short-term buying and selling or short selling.

In Python, to visualize these candlestick patterns, we can use the following code:

import talib as ta
?ta.CDLENGULFINGdata=pdr.get_data_yahoo("BBCA", start="2023-06-01", end="2023-09-01")
data['ENGULFING']=ta.CDLENGULFING(data['Open'],data['High'],data['Low'],data['Close'])
data

to perform technical analysis with candlestick , we need to import a supporting library, which is TA-Lib. After that, to retrieve the stock data for analysis, we can use pdr.get_data_yahoo, where in this case, the example stock BBCA is used for analysis. Since technical analysis is typically used for short-term stock analysis, the data period for BBCA stock is narrowed down to historical data for 3 months.

import mplfinance as mpffig,axs=plt.subplots(2,1, gridspec_kw={'height_ratios':[3,1]}, figsize=(20,10))
colors=mpf.make_marketcolors(up='#00ff00',
 down='#ff0000')
mpl_style=mpf.make_mpf_style(base_mpf_style='yahoo', marketcolors=colors)
mpf.plot(data, type='candle', ax=axs[0], style=mpl_style)
axs[1].plot(data['ENGULFING'], color='blue')

Next, to create a candlestick visualization, we need to import mplfinance and use code as shown above. The resulting candlestick looks like this:

Basic Stock Data Analysis Using Jupyter Notebook (9)

To read the above candlestick, we need to understand the concepts of bearish and bullish in trading.

Basic Stock Data Analysis Using Jupyter Notebook (10)

Bearish (Bear Market):

When the market or a stock is considered bearish, it means there is negative sentiment among investors, and stock prices tend to decline.
Bearish investors believe that stock prices will decrease in the future, and they may be inclined to sell or short sell those stocks.
Candlestick chart patterns representing bearish sentiment may include patterns such as “bearish engulfing” , “dark cloud cover , or “shooting star”.

Bullish (Bull Market):

When the market or a stock is considered bullish, it means there is positive sentiment among investors, and stock prices tend to rise.
Bullish investors believe that stock prices will increase in the future, and they may be inclined to buy or hold onto those stocks.
Candlestick chart patterns representing bullish sentiment may include patterns such as “bullish engulfing” , “hammer” , or “morning star”.

In essence, the tools used for stock data analysis are quite diverse, but by using Python, it greatly simplifies the process of managing and visualizing the data. Python allows processing various and numerous stock data easily by simply importing a few libraries. The looping function in Python makes it easier for us to obtain data results and visualizations from the many stocks used for analysis within just one loop code.

Disclaimer

Credit to kitangoding1739, I used the syntax or code provided by kitangoding1739 with some modifications and different stock cases also different time frame.

Source

https://www.youtube.com/watch?v=N9NqTp_D_bw
https://www.youtube.com/@kitangoding1739
https://finance.yahoo.com/
https://www.youtube.com/watch?v=30BaSfz0FGE (how to install Ta-Lib)