Compare commits

...

8 Commits

Author SHA1 Message Date
Ran Aroussi a6554c637a
Delete CNAME 2023-05-10 16:07:32 +01:00
Ran Aroussi 5f3ab6b893
Create CNAME 2023-05-10 14:42:28 +01:00
silvavn ccd5d95566 Documented util.py 2021-10-13 12:04:17 -03:00
silvavn a53349f886 added advanced usage 2021-10-13 09:36:28 -03:00
silvavn 37c6ca1086 Merge branch 'main' into documentation 2021-10-13 09:27:46 -03:00
silvavn 852ef93fa3 fixing documentation 2021-10-13 09:26:37 -03:00
Ran Aroussi 932b3a1731
Delete _config.yml 2021-06-27 12:03:04 +01:00
Ran Aroussi 2339aade13 Set theme jekyll-theme-minimal 2021-06-27 11:59:06 +01:00
6 changed files with 235 additions and 151 deletions

4
.gitignore vendored
View File

@ -6,6 +6,4 @@ yfinance.egg-info
.coverage
.vscode/
build/
*.html
*.css
*.png
site/

View File

@ -0,0 +1,132 @@
Advanced Usage
==============
Using Proxies
-------------
If you want to use a proxy server for downloading data, use:
``` python
import yfinance as yf
msft = yf.Ticker("MSFT")
msft.history(..., proxy="PROXY_SERVER")
msft.get_actions(proxy="PROXY_SERVER")
msft.get_dividends(proxy="PROXY_SERVER")
msft.get_splits(proxy="PROXY_SERVER")
msft.get_balance_sheet(proxy="PROXY_SERVER")
msft.get_cashflow(proxy="PROXY_SERVER")
msft.option_chain(..., proxy="PROXY_SERVER")
...
```
To use a custom `requests` session (for example to cache calls to the
API or customize the `User-agent` header), pass a `session=` argument to
the Ticker constructor.
``` python
import requests_cache
import yfinance as yf
session = requests_cache.CachedSession('yfinance.cache')
session.headers['User-agent'] = 'my-program/1.0'
ticker = yf.Ticker('msft aapl goog', session=session)
# The scraped response will be stored in the cache
ticker.actions
```
To initialize multiple `Ticker` objects, use
``` python
import yfinance as yf
tickers = yf.Tickers('msft aapl goog')
# ^ returns a named tuple of Ticker objects
# access each ticker using (example)
tickers.tickers.MSFT.info
tickers.tickers.AAPL.history(period="1mo")
tickers.tickers.GOOG.actions
```
Fetching data for multiple tickers
----------------------------------
``` python
import yfinance as yf
data = yf.download("SPY AAPL", start="2017-01-01", end="2017-04-30")
```
I've also added some options to make life easier :)
``` python
import yfinance as yf
data = yf.download( # or pdr.get_data_yahoo(...
# tickers list or string as well
tickers="SPY AAPL MSFT",
# use "period" instead of start/end
# valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
# (optional, default is '1mo')
period="ytd",
# fetch data by interval (including intraday if period < 60 days)
# valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
# (optional, default is '1d')
interval="1m",
# group by ticker (to access via data['SPY'])
# (optional, default is 'column')
group_by="ticker",
# adjust all OHLC automatically
# (optional, default is False)
auto_adjust=True,
# download pre/post regular market hours data
# (optional, default is False)
prepost=True,
# use threads for mass downloading? (True/False/Integer)
# (optional, default is True)
threads=True,
# proxy URL scheme use use when downloading?
# (optional, default is None)
proxy=None
)
```
Managing Multi-Level Columns
----------------------------
The following answer on Stack Overflow is for [How to deal with
multi-level column names downloaded with
yfinance?](https://stackoverflow.com/questions/63107801)
- `yfinance` returns a `pandas.DataFrame` with multi-level column
names, with a level for the ticker and a level for the stock price
data
- The answer discusses:
- How to correctly read the the multi-level columns after
saving the dataframe to a csv with `pandas.DataFrame.to_csv`
- How to download single or multiple tickers into a single
dataframe with single level column names and a ticker column
`pandas_datareader` override
----------------------------
If your code uses `pandas_datareader` and you want to download data
faster, you can "hijack" `pandas_datareader.data.get_data_yahoo()`
method to use **yfinance** while making sure the returned data is in the
same format as **pandas\_datareader**'s `get_data_yahoo()`.
``` python
from pandas_datareader import data as pdr
import yfinance as yf
yf.pdr_override() # <== that's all it takes :-)
# download dataframe
data = pdr.get_data_yahoo("SPY", start="2017-01-01", end="2017-04-30")
```

View File

@ -3,13 +3,13 @@ Installation
Install `yfinance` using `pip`:
``` {.sourceCode .bash}
``` bash
$ pip install yfinance --upgrade --no-cache-dir
```
Install `yfinance` using `conda`:
``` {.sourceCode .bash}
``` bash
$ conda install -c ranaroussi yfinance
```

View File

@ -9,7 +9,7 @@ Pythonic way:
Note: yahoo finance datetimes are received as UTC.
``` {.sourceCode .python}
``` python
import yfinance as yf
msft = yf.Ticker("MSFT")
@ -70,129 +70,4 @@ msft.options
# get option chain for specific expiration
opt = msft.option_chain('YYYY-MM-DD')
# data available via: opt.calls, opt.puts
```
If you want to use a proxy server for downloading data, use:
``` {.sourceCode .python}
import yfinance as yf
msft = yf.Ticker("MSFT")
msft.history(..., proxy="PROXY_SERVER")
msft.get_actions(proxy="PROXY_SERVER")
msft.get_dividends(proxy="PROXY_SERVER")
msft.get_splits(proxy="PROXY_SERVER")
msft.get_balance_sheet(proxy="PROXY_SERVER")
msft.get_cashflow(proxy="PROXY_SERVER")
msft.option_chain(..., proxy="PROXY_SERVER")
...
```
To use a custom `requests` session (for example to cache calls to the
API or customize the `User-agent` header), pass a `session=` argument to
the Ticker constructor.
``` {.sourceCode .python}
import requests_cache
session = requests_cache.CachedSession('yfinance.cache')
session.headers['User-agent'] = 'my-program/1.0'
ticker = yf.Ticker('msft aapl goog', session=session)
# The scraped response will be stored in the cache
ticker.actions
```
To initialize multiple `Ticker` objects, use
``` {.sourceCode .python}
import yfinance as yf
tickers = yf.Tickers('msft aapl goog')
# ^ returns a named tuple of Ticker objects
# access each ticker using (example)
tickers.tickers.MSFT.info
tickers.tickers.AAPL.history(period="1mo")
tickers.tickers.GOOG.actions
```
Fetching data for multiple tickers
----------------------------------
``` {.sourceCode .python}
import yfinance as yf
data = yf.download("SPY AAPL", start="2017-01-01", end="2017-04-30")
```
I've also added some options to make life easier :)
``` {.sourceCode .python}
data = yf.download( # or pdr.get_data_yahoo(...
# tickers list or string as well
tickers = "SPY AAPL MSFT",
# use "period" instead of start/end
# valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
# (optional, default is '1mo')
period = "ytd",
# fetch data by interval (including intraday if period < 60 days)
# valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
# (optional, default is '1d')
interval = "1m",
# group by ticker (to access via data['SPY'])
# (optional, default is 'column')
group_by = 'ticker',
# adjust all OHLC automatically
# (optional, default is False)
auto_adjust = True,
# download pre/post regular market hours data
# (optional, default is False)
prepost = True,
# use threads for mass downloading? (True/False/Integer)
# (optional, default is True)
threads = True,
# proxy URL scheme use use when downloading?
# (optional, default is None)
proxy = None
)
```
Managing Multi-Level Columns
----------------------------
The following answer on Stack Overflow is for [How to deal with
multi-level column names downloaded with
yfinance?](https://stackoverflow.com/questions/63107801)
- `yfinance` returns a `pandas.DataFrame` with multi-level column
names, with a level for the ticker and a level for the stock price
data
- The answer discusses:
- How to correctly read the the multi-level columns after
saving the dataframe to a csv with `pandas.DataFrame.to_csv`
- How to download single or multiple tickers into a single
dataframe with single level column names and a ticker column
`pandas_datareader` override
----------------------------
If your code uses `pandas_datareader` and you want to download data
faster, you can "hijack" `pandas_datareader.data.get_data_yahoo()`
method to use **yfinance** while making sure the returned data is in the
same format as **pandas\_datareader**'s `get_data_yahoo()`.
``` {.sourceCode .python}
from pandas_datareader import data as pdr
import yfinance as yf
yf.pdr_override() # <== that's all it takes :-)
# download dataframe
data = pdr.get_data_yahoo("SPY", start="2017-01-01", end="2017-04-30")
```

View File

@ -1,19 +1,26 @@
# site_name: My Docs
site_name: My Docs
# # mkdocs.yml
# theme:
# name: "material"
# mkdocs.yml
theme:
name: "material"
# plugins:
# - search
# - mkdocstrings
plugins:
- search
- mkdocstrings
# nav:
# - Introduction: 'index.md'
# - Installation: 'installation.md'
# - Quick Start: 'quickstart.md'
# # - Ticker: 'Ticker.md'
# - TickerBase: 'TickerBase.md'
# # - Tickers: 'Tickers.md'
# - utils: 'utils.md'
# - multi: 'multi.md'
markdown_extensions:
- pymdownx.highlight
- pymdownx.inlinehilite
- pymdownx.superfences
- pymdownx.snippets
nav:
- Introduction: 'index.md'
- Installation: 'installation.md'
- Quick Start: 'quickstart.md'
- Advanced Usage: 'advancedUsage.md'
# - Ticker: 'Ticker.md'
- TickerBase: 'TickerBase.md'
# - Tickers: 'Tickers.md'
- utils: 'utils.md'
- multi: 'multi.md'

View File

@ -26,7 +26,6 @@ import re as _re
import pandas as _pd
import numpy as _np
import sys as _sys
import re as _re
try:
import ujson as _json
@ -34,9 +33,15 @@ except ImportError:
import json as _json
user_agent_headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
user_agent_headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
def empty_df(index=[]):
'''
The "empty_df" function creates a pandas dataframe with the index being the dates, and columns including open, high, low, close, adj close and volume.
It is used to create an empty dataframe that will be filled later on.
'''
empty = _pd.DataFrame(index=index, data={
'Open': _np.nan, 'High': _np.nan, 'Low': _np.nan,
'Close': _np.nan, 'Adj Close': _np.nan, 'Volume': _np.nan})
@ -45,12 +50,25 @@ def empty_df(index=[]):
def get_html(url, proxy=None, session=None):
'''
url: the website you want to visit.
proxy: a dictionary of your proxies, like {'http': 'http://127.0.0.1:1080', 'https': 'https://127.0.0.1:1080'}
session: if you have already opened a session with your proxies, then pass it in here.
'''
session = session or _requests
html = session.get(url=url, proxies=proxy, headers=user_agent_headers).text
return html
def get_json(url, proxy=None, session=None):
def get_json(url: str, proxy: dict = None, session=None):
'''
url: the website we want to get json from
proxy: the proxies that we use to avoid being detected as a robot by websites.
It is a dictionary, e.g., {"http":"http://10.10.1.10:3128"}
session: requests library's object used for sending requests and receiving responses in multiple threads or asynchronous applications
The function will return a dictionary of data if it works well, otherwise an empty dictionary.
'''
session = session or _requests
html = session.get(url=url, proxies=proxy, headers=user_agent_headers).text
@ -78,6 +96,10 @@ def camel2title(o):
def auto_adjust(data):
'''
The "auto_adjust" function is used to adjust the dataframe according to the adjusted close price.
It takes in a dataframe as an argument and returns a new dataframe with adjusted prices for all columns except volume.
'''
df = data.copy()
ratio = df["Close"] / df["Adj Close"]
df["Adj Open"] = df["Open"] / ratio
@ -98,7 +120,11 @@ def auto_adjust(data):
def back_adjust(data):
""" back-adjusted data to mimic true historical prices """
'''
The function takes in a dataframe as an input and returns the same dataframe with adjusted columns for "Open", "High", "Low" and "Close".
The ratio of each adjusted column is calculated by dividing the original Adj Close price by the Close price.
Each adjusted column is then multiplied by this ratio to get the new value for that column, which will be used in future calculations.
'''
df = data.copy()
ratio = df["Adj Close"] / df["Close"]
@ -119,6 +145,12 @@ def back_adjust(data):
def parse_quotes(data, tz=None):
'''
The function takes in the data from the "get_data" function and parses it into a pandas DataFrame.
It uses the timestamps to index each row of data, with the OHLC, volume, and adjusted close all contained within one DataFrame.
If no timezone is specified for this function then it will default to UTC.
'''
timestamps = data["timestamp"]
ohlc = data["indicators"]["quote"][0]
volumes = ohlc["volume"]
@ -148,6 +180,11 @@ def parse_quotes(data, tz=None):
def parse_actions(data, tz=None):
'''
The function takes in the data from "get_data" and then checks if there are any events (dividends or splits)
If so, it creates a pandas DataFrame for each type of event with the date as an index and uses the values as columns
It also converts all of the dates to datetime objects and sets them as timezone aware if a timezone was specified
'''
dividends = _pd.DataFrame(columns=["Dividends"])
splits = _pd.DataFrame(columns=["Stock Splits"])
@ -180,6 +217,10 @@ def parse_actions(data, tz=None):
class ProgressBar:
def __init__(self, iterations, text='completed'):
'''
The "__init__" function is the constructor for the class.
It takes in the parameters that were passed into the class and stores them in variables.
'''
self.text = text
self.iterations = iterations
self.prog_bar = '[]'
@ -189,6 +230,10 @@ class ProgressBar:
self.elapsed = 1
def completed(self):
"""
The "completed" function is a function that is called when the program is completed.
It prints out the progress bar and then ends the program.
"""
if self.elapsed > self.iterations:
self.elapsed = self.iterations
self.update_iteration(1)
@ -197,6 +242,12 @@ class ProgressBar:
print()
def animate(self, iteration=None):
'''
The "animate" function is a function that is called to update the progress bar.
It takes in an optional parameter, "iteration".
If "iteration" is not passed in, then the function will increment the "elapsed" variable by 1.
If "iteration" is passed in, then the function will increment the "elapsed" variable by the value of "iteration".
'''
if iteration is None:
self.elapsed += 1
iteration = self.elapsed
@ -208,12 +259,29 @@ class ProgressBar:
self.update_iteration()
def update_iteration(self, val=None):
'''
The "update_iteration" function is a function that is called to update the progress bar.
It takes in an optional parameter, "val".
If "val" is not passed in, then the function will set the "elapsed" variable to the value of the "iterations" variable.
If "val" is passed in, then the function will set the "elapsed" variable to the value of "val".
'''
val = val if val is not None else self.elapsed / float(self.iterations)
self.__update_amount(val * 100.0)
self.prog_bar += ' %s of %s %s' % (
self.elapsed, self.iterations, self.text)
def __update_amount(self, new_amount):
'''
The "__update_amount" function is a function that is called to update the progress bar.
It takes in one parameter, "new_amount".
It calculates the percentage that the program has completed and stores it in a variable called "percent_done".
It calculates how many hashes are needed and stores it in a variable called "all_full".
It calculates how many hashes need to be displayed and stores it in a variable called "num_hashes".
It creates a string called "pct_string" that displays the percentage that the program has completed.
It creates a string called "prog_bar" that displays the progress bar.
It creates a string called "pct_place" that stores where the percentage string should be displayed.
It returns the value of "prog_bar".
'''
percent_done = int(round((new_amount / 100.0) * 100.0))
all_full = self.width - 2
num_hashes = int(round((percent_done / 100.0) * all_full))
@ -225,4 +293,8 @@ class ProgressBar:
(pct_string + self.prog_bar[pct_place + len(pct_string):])
def __str__(self):
'''
The "__str__" function is a function that is called to return the value of the progress bar.
It returns the value of "prog_bar".
'''
return str(self.prog_bar)