Historical stock data is a very important kind of time series data, playing a significant role in data science. Let’s start learning how to handle time series data, preparing for future stock prediction and analysis. Of course, data analysis cannot do without data, so the first question is how to obtain stock data for machine learning analysis?
Since Yahoo released its stock data API, getting stock data has become very simple and efficient. The author has found a very convenient Python package called yfinance to fetch stock data. For convenience in explanations, this article uses Microsoft stock data as an example, stock ticker is MSFT. The steps are as follows:
- Install yfinance package
- Basic usage tutorial of yfinance
1. Installing the yfinance package
You can install the package directly with pip. yfinance relies on packages widely used in data analysis such as pandas and numpy. Also, requests is essential to fetch data from Yahoo’s API. So just run the following commands to install all required packages in one go.
#!env bash
pip install numpy
pip install pandas
pip install requests
pip install yfinance
If you are using a conda environment, you can also use the following command to install:
#! env bash
conda install -c ranaroussi yfinance
After installation, import to confirm success:
import yfinance
If no errors appear, installation is successful.
2. Basic usage tutorial of yfinance
To understand the data structure of yfinance, you can check yfinance official GitHub. For better understanding, this article simplifies and annotates the code into several parts:
- Ticker() module
- Downloading single company’s stock data
- Downloading multiple companies’ stock data
2.1 Ticker() module
This part provides meta information about the stock, including basic info, shareholders, etc. Just check the comments in the code below.
# Get basic stock info, including address, phone, industry, etc.
msft.info
# Historical stock prices with daily interval as example
hist = msft.history(interval="1d", period="max")
hist
"""
Open High Low Close Volume Dividends Stock Splits
Date
1986-03-13 0.06 0.07 0.06 0.06 1031788800 0.0 0.0
1986-03-14 0.06 0.07 0.06 0.06 308160000 0.0 0.0
1986-03-17 0.06 0.07 0.06 0.07 133171200 0.0 0.0
1986-03-18 0.07 0.07 0.06 0.06 67766400 0.0 0.0
1986-03-19 0.06 0.06 0.06 0.06 47894400 0.0 0.0
... ... ... ... ... ... ... ...
2020-05-04 174.49 179.00 173.80 178.84 30372900 0.0 0.0
2020-05-05 180.62 183.65 179.90 180.76 36839200 0.0 0.0
2020-05-06 182.08 184.20 181.63 182.54 32139300 0.0 0.0
2020-05-07 184.17 184.55 182.58 183.60 28316000 0.0 0.0
2020-05-08 184.98 185.00 183.36 184.68 30877800 0.0 0.0
"""
# You can see the stock info fetched is the latest data
# Stock dividends, splits info
msft.actions
"""
Dividends Stock Splits
Date
1987-09-21 0.00 2.0
1990-04-16 0.00 2.0
1991-06-27 0.00 1.5
1992-06-15 0.00 1.5
1994-05-23 0.00 2.0
... ... ...
2019-02-20 0.46 0.0
2019-05-15 0.46 0.0
2019-08-14 0.46 0.0
2019-11-20 0.51 0.0
2020-02-19 0.51 0.0
"""
# Only dividends info
msft.dividends
"""
Date
2003-02-19 0.08
2003-10-15 0.16
2004-08-23 0.08
2004-11-15 3.08
2005-02-15 0.08
...
2019-02-20 0.46
2019-05-15 0.46
2019-08-14 0.46
2019-11-20 0.51
2020-02-19 0.51
"""
# Major shareholders percentage
msft.major_holders
"""
0 1
0 1.42% % of Shares Held by All Insider
1 74.09% % of Shares Held by Institutions
2 75.16% % of Float Held by Institutions
3 4630 Number of Institutions Holding Shares
"""
# Institutional holders
msft.institutional_holders
"""
Holder Shares Date Reported % Out Value
0 Vanguard Group, Inc. (The) 623667281 2019-12-30 0.0822 98352330213
1 Blackrock Inc. 517578906 2020-03-30 0.0683 81627369265
2 State Street Corporation 315672520 2019-12-30 0.0416 49781556404
3 FMR, LLC 239124143 2019-12-30 0.0315 37709877351
4 Capital World Investors 180557630 2019-12-30 0.0238 28473938251
5 Price (T.Rowe) Associates Inc 175036277 2019-12-30 0.0231 27603220882
6 Geode Capital Management, LLC 113401519 2019-12-30 0.0150 17883419546
7 Capital International Investors 99996798 2019-12-30 0.0132 15769495044
8 Northern Trust Corporation 93192050 2019-12-30 0.0123 14696386285
9 Capital Research Global Investors 92776236 2019-12-30 0.0122 14630812417
"""
# These are the important information
# Other info is not very relevant
2.2 Downloading single company’s stock data
After understanding the stock basic info structure, next is to download actual stock data for analysis as a dataframe.
#! env python
# Just the stock data, below downloads Microsoft stock data since 2000
import yfinance as yf
msft = yf.download("MSFT", start="2000-01-01")
"""
[*********************100%***********************] 1 of 1 completed
msft
Open High Low Close Adj Close Volume
Date
1999-12-31 58.750000 58.875000 58.125000 58.375000 37.453701 12517600
2000-01-03 58.687500 59.312500 56.000000 58.281250 37.393559 53228400
2000-01-04 56.781250 58.562500 56.125000 56.312500 36.130390 54119000
2000-01-05 55.562500 58.187500 54.687500 56.906250 36.511333 64059600
2000-01-06 56.093750 56.937500 54.187500 55.000000 35.288280 54976600
... ... ... ... ... ... ...
2020-05-04 174.490005 179.000000 173.800003 178.839996 178.839996 30372900
2020-05-05 180.619995 183.649994 179.899994 180.759995 180.759995 36839200
2020-05-06 182.080002 184.199997 181.630005 182.539993 182.539993 32139300
2020-05-07 184.169998 184.550003 182.580002 183.600006 183.600006 28316000
2020-05-08 184.979996 185.000000 183.360001 184.679993 184.679993 30877800
"""
2.3 Downloading multiple companies’ stock data
The package also allows downloading multiple stocks’ data at once. Here we use Apple and Microsoft as examples:
data = yf.download(tickers = "MSFT AAPL", start="2019-01-01", group_by='ticker')
data
"""
MSFT AAPL
Open High Low Close Adj Close Volume Open High Low Close Adj Close Volume
Date
2018-12-31 101.290001 102.400002 100.440002 101.570000 99.817421 33173800 158.529999 159.360001 156.479996 157.740005 154.618546 35003500
2019-01-02 99.550003 101.750000 98.940002 101.120003 99.375191 35329300 154.889999 158.850006 154.229996 157.919998 154.794983 37039700
2019-01-03 100.099998 100.190002 97.199997 97.400002 95.719376 42579100 143.979996 145.720001 142.000000 142.190002 139.376251 91312200
2019-01-04 99.720001 102.510002 98.930000 101.930000 100.171211 44060600 144.529999 148.550003 143.800003 148.259995 145.326126 58607100
2019-01-07 101.639999 103.269997 100.980003 102.059998 100.298965 35656100 148.699997 148.830002 145.899994 147.929993 145.002686 54777800
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-04 174.490005 179.000000 173.800003 178.839996 178.839996 30372900 289.170013 293.690002 286.320007 293.160004 292.368561 33392000
2020-05-05 180.619995 183.649994 179.899994 180.759995 180.759995 36839200 295.059998 301.000000 294.459991 297.559998 296.756683 36937800
2020-05-06 182.080002 184.199997 181.630005 182.539993 182.539993 32139300 300.459991 303.239990 298.869995 300.630005 299.818390 35583400
2020-05-07 184.169998 184.550003 182.580002 183.600006 183.600006 28316000 303.220001 305.170013 301.970001 303.739990 302.919983 28803800
2020-05-08 184.979996 185.000000 183.360001 184.679993 184.679993 30877800 305.640015 310.350006 304.290009 310.130005 310.130005 33459600
"""
Summary
This article introduced how to use the yfinance package to fetch stock data as a preparation for later valuation and prediction. This module fetches data via Yahoo’s API, so access speed might be slow in China. It’s recommended to run on overseas servers or use a proxy.