BOBOBK

Fetching Stock Data Using Python's yfinance Package

TECHNOLOGY

Historical stock data is a very important kind of time series data, playing a significant role in data science. Let’s start learning how to handle time series data, preparing for future stock prediction and analysis. Of course, data analysis cannot do without data, so the first question is how to obtain stock data for machine learning analysis?

Since Yahoo released its stock data API, getting stock data has become very simple and efficient. The author has found a very convenient Python package called yfinance to fetch stock data. For convenience in explanations, this article uses Microsoft stock data as an example, stock ticker is MSFT. The steps are as follows:

  • Install yfinance package
  • Basic usage tutorial of yfinance

1. Installing the yfinance package

You can install the package directly with pip. yfinance relies on packages widely used in data analysis such as pandas and numpy. Also, requests is essential to fetch data from Yahoo’s API. So just run the following commands to install all required packages in one go.

#!env bash
pip install numpy
pip install pandas
pip install requests
pip install yfinance

If you are using a conda environment, you can also use the following command to install:

#! env bash
conda install -c ranaroussi yfinance

After installation, import to confirm success:

import yfinance

If no errors appear, installation is successful.

2. Basic usage tutorial of yfinance

To understand the data structure of yfinance, you can check yfinance official GitHub. For better understanding, this article simplifies and annotates the code into several parts:

  • Ticker() module
  • Downloading single company’s stock data
  • Downloading multiple companies’ stock data

2.1 Ticker() module

This part provides meta information about the stock, including basic info, shareholders, etc. Just check the comments in the code below.

# Get basic stock info, including address, phone, industry, etc.
msft.info
# Historical stock prices with daily interval as example
hist = msft.history(interval="1d", period="max")
hist
"""
        Open    High    Low     Close   Volume  Dividends   Stock Splits
Date                            
1986-03-13  0.06    0.07    0.06    0.06    1031788800  0.0         0.0
1986-03-14  0.06    0.07    0.06    0.06    308160000   0.0         0.0
1986-03-17  0.06    0.07    0.06    0.07    133171200   0.0         0.0
1986-03-18  0.07    0.07    0.06    0.06    67766400    0.0         0.0
1986-03-19  0.06    0.06    0.06    0.06    47894400    0.0         0.0
...         ...     ...     ...     ...     ...        ...         ...
2020-05-04  174.49  179.00  173.80  178.84  30372900   0.0         0.0
2020-05-05  180.62  183.65  179.90  180.76  36839200   0.0         0.0
2020-05-06  182.08  184.20  181.63  182.54  32139300   0.0         0.0
2020-05-07  184.17  184.55  182.58  183.60  28316000   0.0         0.0
2020-05-08  184.98  185.00  183.36  184.68  30877800   0.0         0.0
"""
# You can see the stock info fetched is the latest data
# Stock dividends, splits info
msft.actions
"""
        Dividends   Stock Splits
Date        
1987-09-21  0.00    2.0
1990-04-16  0.00    2.0
1991-06-27  0.00    1.5
1992-06-15  0.00    1.5
1994-05-23  0.00    2.0
...         ...     ...
2019-02-20  0.46    0.0
2019-05-15  0.46    0.0
2019-08-14  0.46    0.0
2019-11-20  0.51    0.0
2020-02-19  0.51    0.0
"""
# Only dividends info
msft.dividends
"""
Date
2003-02-19    0.08
2003-10-15    0.16
2004-08-23    0.08
2004-11-15    3.08
2005-02-15    0.08
              ... 
2019-02-20    0.46
2019-05-15    0.46
2019-08-14    0.46
2019-11-20    0.51
2020-02-19    0.51
"""
# Major shareholders percentage
msft.major_holders
"""

0    1
0    1.42%    % of Shares Held by All Insider
1    74.09%   % of Shares Held by Institutions
2    75.16%   % of Float Held by Institutions
3    4630     Number of Institutions Holding Shares
"""
# Institutional holders
msft.institutional_holders
"""
Holder                          Shares      Date Reported  % Out   Value
0   Vanguard Group, Inc. (The)  623667281  2019-12-30     0.0822  98352330213
1   Blackrock Inc.              517578906  2020-03-30     0.0683  81627369265
2   State Street Corporation    315672520  2019-12-30     0.0416  49781556404
3   FMR, LLC                   239124143  2019-12-30     0.0315  37709877351
4   Capital World Investors    180557630  2019-12-30     0.0238  28473938251
5   Price (T.Rowe) Associates Inc 175036277 2019-12-30   0.0231  27603220882
6   Geode Capital Management, LLC 113401519 2019-12-30   0.0150  17883419546
7   Capital International Investors 99996798 2019-12-30   0.0132  15769495044
8   Northern Trust Corporation  93192050  2019-12-30     0.0123  14696386285
9   Capital Research Global Investors 92776236 2019-12-30 0.0122  14630812417
"""
# These are the important information
# Other info is not very relevant

2.2 Downloading single company’s stock data

After understanding the stock basic info structure, next is to download actual stock data for analysis as a dataframe.

#! env python
# Just the stock data, below downloads Microsoft stock data since 2000

import yfinance as yf
msft  = yf.download("MSFT", start="2000-01-01")
"""
[*********************100%***********************]  1 of 1 completed
msft
        Open      High      Low       Close     Adj Close  Volume
Date                        
1999-12-31  58.750000  58.875000  58.125000  58.375000  37.453701  12517600
2000-01-03  58.687500  59.312500  56.000000  58.281250  37.393559  53228400
2000-01-04  56.781250  58.562500  56.125000  56.312500  36.130390  54119000
2000-01-05  55.562500  58.187500  54.687500  56.906250  36.511333  64059600
2000-01-06  56.093750  56.937500  54.187500  55.000000  35.288280  54976600
...         ...       ...       ...       ...       ...       ...
2020-05-04  174.490005 179.000000 173.800003 178.839996 178.839996 30372900
2020-05-05  180.619995 183.649994 179.899994 180.759995 180.759995 36839200
2020-05-06  182.080002 184.199997 181.630005 182.539993 182.539993 32139300
2020-05-07  184.169998 184.550003 182.580002 183.600006 183.600006 28316000
2020-05-08  184.979996 185.000000 183.360001 184.679993 184.679993 30877800
"""

2.3 Downloading multiple companies’ stock data

The package also allows downloading multiple stocks’ data at once. Here we use Apple and Microsoft as examples:

data = yf.download(tickers = "MSFT AAPL", start="2019-01-01", group_by='ticker')
data
"""
        MSFT                                            AAPL
Open    High    Low     Close   Adj Close Volume   Open    High    Low     Close   Adj Close Volume
Date                                                                                            
2018-12-31  101.290001  102.400002  100.440002  101.570000  99.817421  33173800  158.529999  159.360001  156.479996  157.740005  154.618546  35003500
2019-01-02  99.550003   101.750000  98.940002   101.120003  99.375191  35329300  154.889999  158.850006  154.229996  157.919998  154.794983  37039700
2019-01-03  100.099998  100.190002  97.199997   97.400002   95.719376  42579100  143.979996  145.720001  142.000000  142.190002  139.376251  91312200
2019-01-04  99.720001   102.510002  98.930000   101.930000  100.171211 44060600  144.529999  148.550003  143.800003  148.259995  145.326126  58607100
2019-01-07  101.639999  103.269997  100.980003  102.059998  100.298965 35656100  148.699997  148.830002  145.899994  147.929993  145.002686  54777800
...         ...        ...        ...        ...        ...       ...       ...        ...        ...        ...        ...       ...
2020-05-04  174.490005  179.000000  173.800003  178.839996  178.839996 30372900  289.170013  293.690002  286.320007  293.160004  292.368561  33392000
2020-05-05  180.619995  183.649994  179.899994  180.759995  180.759995 36839200  295.059998  301.000000  294.459991  297.559998  296.756683  36937800
2020-05-06  182.080002  184.199997  181.630005  182.539993  182.539993 32139300  300.459991  303.239990  298.869995  300.630005  299.818390  35583400
2020-05-07  184.169998  184.550003  182.580002  183.600006  183.600006 28316000  303.220001  305.170013  301.970001  303.739990  302.919983  28803800
2020-05-08  184.979996  185.000000  183.360001  184.679993  184.679993 30877800  305.640015  310.350006  304.290009  310.130005  310.130005  33459600
"""

Summary

This article introduced how to use the yfinance package to fetch stock data as a preparation for later valuation and prediction. This module fetches data via Yahoo’s API, so access speed might be slow in China. It’s recommended to run on overseas servers or use a proxy.

Related