Compare commits

...

No commits in common. "main" and "gh-pages" have entirely different histories.

816 changed files with 390 additions and 1213701 deletions

BIN
.DS_Store vendored 100644

Binary file not shown.

132
.gitignore vendored
View File

@ -1,132 +0,0 @@
data/m5/
# _build/
.DS_Store
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/

View File

@ -1,34 +0,0 @@
VAR references
Main References
Lütkepohl, H. (2005). Introduction. New Introduction to Multiple Time Series Analysis, 1-7. doi:10.1007/978-3-540-27752-1_1
Kilian, L., & Lütkepohl, H. (2018). Structural vector autoregressive analysis. Cambridge: Cambridge University Press.
Supplementary References
https://sccn.ucsd.edu/wiki/Chapter_3.5._Model_order_selection
https://www.fil.ion.ucl.ac.uk/~wpenny/course/array.pdf
https://towardsdatascience.com/simple-multivariate-time-series-forecasting-7fa0e05579b2
https://arxiv.org/pdf/1302.6613.pdf
https://towardsdatascience.com/vector-autoregressions-vector-error-correction-multivariate-model-a69daf6ab618
https://www.analyticsvidhya.com/blog/2018/09/multivariate-time-series-guide-forecasting-modeling-python-codes/#:~:text=A%20Multivariate%20time%20series%20has,used%20for%20forecasting%20future%20values.&text=In%20this%20case%2C%20there%20are,considered%20to%20optimally%20predict%20temperature.
https://rstudio-pubs-static.s3.amazonaws.com/270271_9fbb9b0f8f0c41e6b7e06b0dc2b13b62.html
http://www.phdeconomics.sssup.it/documents/Lesson17.pdf
https://www.sas.upenn.edu/~fdiebold/Teaching104/Ch14_slides.pdf
https://towardsdatascience.com/vector-autoregressive-for-forecasting-time-series-a60e6f168c70
http://www.ams.sunysb.edu/~zhu/ams586/VAR_Lecture2.pdf
https://otexts.com/fpp2/VAR.html
https://online.stat.psu.edu/stat510/lesson/11/11.2
http://conference.scipy.org/scipy2011/slides/mckinney_time_series.pdf
https://www.reed.edu/economics/parker/311/VAR-readings.pdf
https://stats.stackexchange.com/questions/342898/interpretation-of-impulse-response-and-variance-decomposition-graphs
https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/
https://faculty.washington.edu/ezivot/econ582/multivariatetimeseriesslides.pdf
https://www.statsmodels.org/dev/vector_ar.html#statistical-tests
https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html
https://www.statsmodels.org/dev/vector_ar.html?highlight=impulse#impulse-response-analysis
http://web.pdx.edu/~crkl/ceR/Python/example14_3.py
https://medium.com/@seemakurthi.teja.1999/vector-auto-regression-time-series-model-d7ed5cb943f2
https://www.reed.edu/economics/parker/s10/312/notes/Notes12.pdf
https://www.machinelearningplus.com/time-series/vector-autoregression-examples-python/
http://statmath.wu.ac.at/~hauser/LVs/FinEtricsQF/FEtrics_Chp4.pdf

View File

@ -1,150 +0,0 @@
import numpy as np
import pandas as pd
import itertools
import statsmodels.tsa as tsa
from statsmodels.tsa.vector_ar.var_model import VAR
from statsmodels.tsa.arima_model import ARIMA
from sklearn.linear_model import LinearRegression
from statsmodels.tsa.stattools import acf, adfuller, ccf
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, mean_absolute_error
def fit_arima(train,
p_list=[1,2,3,4],
d_list=[1],
q_list=[1,2,3,4]):
aic, bic, hqic = [], [], []
pdqs = list(itertools.product(p_list, d_list, q_list))
index=[]
for pdq in pdqs:
try:
model = ARIMA(train, order=pdq)
result = model.fit()
aic.append(result.aic)
bic.append(result.bic)
hqic.append(result.hqic)
index.append(pdq)
except ValueError:
continue
order_metrics_df = pd.DataFrame({'AIC': aic,
'BIC': bic,
'HQIC': hqic},
index=index)
return order_metrics_df
def forecast_arima(train, test, order):
history = list(train)
predictions = list()
model = ARIMA(history, order=order)
model_fit = model.fit(disp=0)
output = model_fit.forecast(steps=len(test))
predictions = output[0]
return np.array(predictions).flatten()
# def forecast_naive(train, test):
# lr = LinearRegression()
# x_train = train[:-1]
# y_train = train[1:]
# lr.fit(x_train.reshape(-1, 1), y_train)
# forecast_lr = [lr.predict(np.array([train[-1]]).reshape(-1, 1))]
# for n in range(len(test)-1):
# forecast_lr.append(lr.predict(np.array([forecast_lr[-1]]).reshape(-1, 1)))
# forecast_lr = np.hstack(forecast_lr)
# return forecast_lr
def cross_corr_mat(df, yi_col, yj_col, lag=0):
yi_yi = acf(df[yi_col].values, unbiased=False, nlags=len(df)-2)
yj_yj = acf(df[yj_col].values, unbiased=False, nlags=len(df)-2)
yi_yj = ccf(df[yi_col].values, df[yj_col].values, unbiased=False)
yj_yi = ccf(df[yj_col].values, df[yi_col].values, unbiased=False)
ccm = pd.DataFrame({yi_col: [yi_yi[lag], yj_yi[lag]],
yj_col: [yi_yj[lag], yj_yj[lag]]},
index=[yi_col, yj_col])
return ccm
def invert_transformation(df_train, df_forecast, second_diff=False):
df_fc = df_forecast.copy()
columns = df_train.columns
for col in columns:
# Roll back 2nd Diff
if second_diff:
df_fc[str(col)+'-d1'] = (df_train[col].iloc[-1]-df_train[col].iloc[-2]) + df_fc[str(col)+'-d2'].cumsum()
# Roll back 1st Diff
df_fc[str(col)+'-forecast'] = df_train[col].iloc[-1] + df_fc[str(col)+'-d1'].cumsum()
return df_fc
def plot_forecasts_static(train_df,
test_df,
forecast_df,
column_name,
min_train_date=None,
title='',
suffix=['-forecast']):
train_df = pd.concat([train_df, test_df.iloc[:1]])
if min_train_date is not None:
train_df = train_df.loc[train_df.index>=min_train_date]
fig, ax = plt.subplots(figsize=(16, 2.5), sharex=True)
train_df[column_name].plot(ax=ax)
test_df[column_name].plot(ax=ax)
for s in suffix:
forecast_df[column_name+s].plot(ax=ax)
plt.legend(['train', 'test'] + [s.split('-')[-1] for s in suffix], loc=2)
plt.title(title)
plt.tight_layout()
return fig, ax
def plot_forecasts_interactive(train_df,
test_df,
forecast_df,
column_name,
suffix='-forecast'):
fig = go.Figure()
train_df = pd.concat([train_df, test_df.iloc[:1]])
fig.add_trace(
go.Scatter(name="train",
x=list(train_df.index),
y=list(train_df[column_name])))
fig.add_trace(
go.Scatter(name="test",
x=list(test_df.index),
y=list(test_df[column_name])))
fig.add_trace(
go.Scatter(name='VAR forecast',
x=list(forecast_df.index),
y=list(forecast_df[column_name+suffix])))
fig.update_layout(
autosize=False,
width=1000,
height=250,
margin=dict(
l=60,
r=60,
b=30,
t=30,
)
)
return fig
def mean_absolute_percentage_error(y_true, y_pred):
y_true, y_pred = np.array(y_true), np.array(y_pred)
return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
def test_performance_metrics(test_df, forecast_df, suffix='-VAR'):
mae = []
mse = []
mape = []
cols = test_df.columns
for c in cols:
mae.append(mean_absolute_error(test_df[c], forecast_df[c+suffix]))
mse.append(mean_squared_error(test_df[c], forecast_df[c+suffix]))
mape.append(mean_absolute_percentage_error(test_df[c], forecast_df[c+suffix]))
metrics_df = pd.DataFrame({'MAE': mae,
'MSE': mse,
'MAPE': mape}, index=[c+suffix for c in cols])
return metrics_df.T

Binary file not shown.

View File

@ -1,6 +0,0 @@
numpy = 1.18.5
matplotlib = 3.2.2
pandas = 1.1.3
scipy = 1.5.0
statsmodels = 0.11.1
pywt = 1.1.1

View File

@ -1,6 +0,0 @@
python 3.8.6
numpy 1.18.5
pandas 1.1.4
lightgbm 3.3.1
matplotlib
sklearn

21
LICENSE
View File

@ -1,21 +0,0 @@
MIT License
Copyright (c) 2021 Prince
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -1 +0,0 @@
{}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 60 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 60 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 84 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 86 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 86 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 93 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 142 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 138 KiB

File diff suppressed because one or more lines are too long

View File

@ -1,5 +0,0 @@
# Advanced Time Series Analysis
This notebook introduces us to the concept of time series, fore-casting, and different fundamentals that we will use across the different chapters of our discussion. Specifically, this notebook will discuss:
1. Time Series
2. Forecasting
3. Stochastic Processes

View File

@ -1,18 +0,0 @@
# Chapter 1: AutoRegressive Integrated Moving Average (ARIMA)
In this notebook, we will introduce our first approach to time-series forecasting which is **ARIMA** or AutoRegressive Integrated Moving Average. ARIMA, or AutoRegressive Integrated Moving Average, is a set of models that explains a time series using its own previous values given by the lags (**A**uto**R**egressive) and lagged errors (**M**oving **A**verage) while considering stationarity corrected by differencing (oppossite of **I**ntegration.) In other words, ARIMA assumes that the time series is described by autocorrelations in the data rather than trends and seasonality.
This notebook will discuss:
1. Definition and Formulation of ARIMA models
2. Model Parameters (p, d, and q) and Special Cases of ARIMA models
3. Model Statistics and How to Interpret
4. Implementation and Forecasting using ARIMA
#### Datasets used:
- *Synthetic Data* (Filename: [`../data/wwwusage.csv`](https://raw.githubusercontent.com/selva86/datasets/master/wwwusage.csv))
- *Climate Data* (Filename: `../data/jena_climate_2009_2016.csv"`)

View File

@ -1,7 +0,0 @@
# Chapter 2: Linear, Trend, and Momentum Forecasting
In this chapter we introduce basic tools on forecasting, which utilize simple algebraic formula. In the previous chapter, ARIMA was discussed where the future values of a time series are forecasted using its past or lagged values. It was shown that ARIMA can only be applied after removing the trend and seasonality of the data. We note however that for some forecasting tools, the trend is relevant and is part of the formula for prediction. In this work, forecasting will be demonstrated while making use of the relationships and trends in the data.
In the first half of this notebook, we demonstrate forecasting by fitting time series data with linear regression. For the second half, we demonstrate that by using the trends of the time series data such as moving averages, we can predict the possible future direction of the trend using momentum forecasting.
Lastly, it is important to note that the concept of moving average (MA) in ARIMA is not the same in this chapter since the moving average that will be discussed is just the classical definition of MA.

File diff suppressed because one or more lines are too long

View File

@ -1,25 +0,0 @@
# Chapter 3: Vector Autoregressive Methods
Previously, we have introduced the classical approaches in forecasting single/univariate time series like the
Autoregressive-Moving-Average (ARIMA) model and the simple linear regression model. We learned that stationarity
is a condition that is necessary when using ARIMA while this need not be imposed when using the linear regression model.
In this notebook, we extend the forecasting problem to a more generalized framework where we deal with
**multivariate time series**--time series which has more than one time-dependent variable. More specifically,
we introduce **vector autoregressive (VAR)** models and show how they can be used in forecasting mutivariate time series.
The [notebook](03_VectorAutoregressiveMethods.ipynb) is outlined as follows:
* Multivariate Time Series model
* Motivation
* Univariate VS Multivariate Time Series
* Examples
* Foundations
* Vector Autoregressive (VAR) Models
* VAR(1) model
* VAR(*p*) model
* Choosing the order *p*
* Building a VAR model
* Structural Analysis
* Impulse Response Function
* Forecast Error Variance Decomposition
* Takeaways
* References

File diff suppressed because one or more lines are too long

View File

@ -1,19 +0,0 @@
# Chapter 4: Granger Causality Test
In the first three chapters, we discussed the classical methods for both univariate and multivariate time series forecasting. We now introduce the notion of causality and its implications on time series analysis in general. We also describe a test for the linear VAR model discussed in the previous chapter.
The [notebook](04_GrangerCausality.ipynb) is outlined as follows:
* Notations
* Definitions
* Assumptions
* Testing for Granger Causality
* Ipo Dam Dataset
* Causality between Rainfall and Ipo Dam Water Level
* Causality between NIA Release Flow and Ipo Dam Water Level
* La Mesa Dam Dataset
* Causality between Rainfall and La Mesa Dam Water Level
* Causality between NIA Release Flow and La Mesa Dam Water Level
* Jena Climate Data
* Causality between Pressure and Temperature
* Summary
* References

View File

@ -1,16 +0,0 @@
# Chapter 5: Empirical Dynamic Modeling (Simplex and SMap Projections)
In the previous sections, we looked at the different methods to characterize a time-series and other statistical operations that we can execute to perform predictions. Many of these methods involve calculating for the models that would best fit the time series, extracting the optimal parameters that would describe the data with the least error possible. However, many real world processes exhibit nonlinear, complex, dynamic characteristics, necessitating the need of other methods that can accommodate as such.
In this section, we will introduce and discuss methods that uses empirical models instead of complex, parametized, and hypothesized equations. Using raw time series data, we will try to reconstruct the underlying mechanisms that might be too complex, noisy, or dynamic to be captured by equations. This method proposes a altenatively more flexible approach in working and predicting with dynamic systems.
## This Notebook will discuss the following:
- Introduction to Empirical Dynamic Modelling
- Visualization of EDM Prediction with Chaotic Time Series
- Lenz' Attractor
- Taken's Theorem / State-Space Reconstruction (SSR)
- Simplex Projection
- Determination of Optimal Embedding Values
- Differentiation Noisy Signals from Chaotic Signals
- S-Map Projection (Sequentially Locally Weighted Global Linear Map)

View File

@ -1,7 +0,0 @@
## Introduction
In the previous chapters we talked about Simplex Projection, a forecasting technique that looks for similar trends in the past to forecast the future by computing for nearest neighbors on an embedding. In this chapter, we discuss Convergent Cross Mapping (CCM) also formulated by [Sugihara et al., 2012](https://science.sciencemag.org/content/338/6106/496) as a methodology that uses ideas from Simplex Projection to identify causality between variables in a complex dynamical system (e.g. ecosystem) using just time series data.
We will go through the key ideas of CCM, how it addresses the limitations of Granger causality, and the algorithm behind it. We will then test the CCM framework on simulated data where we will deliberately adjust the influence of one variable over the other. Finally, we will apply CCM on some real world data to infer the relationships between variables in a system.
### causal-ccm Package
`ccm_sugihara.ipynb` explains the CCM methodology in detail. If you wish to apply this in your own projects, install the framework using `pip install causal-ccm`. See `using_causal_ccm_package.ipynb` notebook for details how to use.

File diff suppressed because one or more lines are too long

View File

@ -1,19 +0,0 @@
# Chapter 7: Cross-Correlations, Fourier Transform, and Wavelet Transform
Gaining a deeper undersanding of causality, we look at time series forecasting through another lens.
In this chapter, we will take a different approach to how we analzye time series that is complementary to forecasting. Previously, methods of explaining such as the Granger Causality, S-mapping, and Cross-mapping, focused on the time domain - the values of the time series when measured over time or in its phase space. While both are useful for many tasks, it can be often useful to transform these time domain measurements to unearth patterns which are difficult to tease out. Specifically, we want to look at the frequency domain to both analyze the dynamics and perform pre-processing techniques that may be used to modify real-world datasets.
We will be analyzing the dynamics of time series not exactly to make forecasts, but to understand them in terms of their frequencies in complement to the previous methods of causality and explainability presented.
We introduce three techniques:
1. Cross-correlations
2. Fourier Transform
3. Wavelet Transform
and test their use on the Jena Climate Dataset (2009-2016) along with a handful of other datasets.

View File

@ -1,32 +0,0 @@
# Chapter 8: Winningest Methods in Time Series Forecasting
In previous sections, we examined several models used in time series forecasting such as ARIMA, VAR, and Exponential Smoothing methods. While the main advantage of traditional statistical methods is their ability to perform more sophisticated inference tasks directly (e.g. hypothesis testing on parameters, causality testing), they usually lack predictive power because of their rigid assumptions. That is not to say that they are necessarily inferior when it comes to forecasting, but rather they are typically used as performance benchmarks.
In this section, we demonstrate several of the fundamental ideas and approaches used in the recently concluded M5 Competition where challengers from all over the world competed in building time series forecasting models for both accuracy and uncertainty prediction tasks. Specifically, we explore the machine learning model that majority of the competition's winners utilized: LightGBM, a tree-based gradient boosting framework designed for speed and efficiency.
### How to use these notebooks
Please access the notebooks in this sequence:
- lightgbm_m5_forecasting.ipynb
- lightgbm_m5_tuning.ipynb
- lightgbm_jena_forecasting.ipynb
### M5 Dataset
The M5 dataset consists of Walmart sales data. Specifically, daily unit sales of 3,049 products, classified in 3 product categories (Hobbies, Foods, and Household), and 7 product departments in which the previously mentioned categories are disaggregated. The products are sold across 10 stores, located in 3 States (California, Texas, and Wisconsin).
To download the dataset, you may do so at the following link: https://www.kaggle.com/c/m5-forecasting-accuracy
### Chapter Outline
1. M5 Dataset
2. Pre-processing
3. One-Step Prediction
4. Multi-Step Prediction
5. Feature Importance
6. Summary

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,51 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Preface: Introduction to Time Series Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This handbook extensively covers time series analysis and forecasting, delving from the most fundamental methods to the state-of-the-art. The handbook was made in Python and is designed such that readers can both learn the theory and apply them to real-world problems. Although chapters were made to be stand alone, it is recommended that readers go through the first few chapters to be able to fully appreciate the latter chapters. Moreover, the \n",
"__[Jena climate dataset](https://www.kaggle.com/stytch16/jena-climate-2009-2016)__ is used across several chapters, with a summary of the performance of the models used at the end.\n",
"\n",
"The handbook is structured as follows: in the first part, classical forecasting methods are discussed in detail. The middle part is then dedicated to dynamical forecasting methods and as well as causality and correlations, topics that are particularly essential in understanding the intricacies of time series forecasting. Finally, the last part shows a glimpse into the current trends and open problems in time series forecasting and modeling.\n",
"\n",
"The aim of this handbook is to serve as a practitioners guide to forecasting, enabling them to better understand relationships in signals. It is made for an audience with a solid background in Statistics and Mathematics, as well as a basic knowledge of Python. Familiarity with Machine Learning methods is a plus, especially for the later chapters. \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@ -1,74 +0,0 @@
# Time Series Handbook
This handbook extensively covers time series analysis and forecasting, delving from the most fundamental methods to the state-of-the-art. The handbook was made in Python and is designed such that readers can both learn the theory and apply them to real-world problems. Although chapters were made to be stand alone, it is recommended that readers go through the first few chapters to be able to fully appreciate the latter chapters. Moreover, the
[Jena climate dataset](https://www.kaggle.com/stytch16/jena-climate-2009-2016) is used across several chapters, with a summary of the performance of the models used at the end.
The handbook is structured as follows: in the first part, classical forecasting methods are discussed in detail. The middle part is then dedicated to dynamical forecasting methods and as well as causality and correlations, topics that are particularly essential in understanding the intricacies of time series forecasting. Finally, the last part shows a glimpse into the current trends and open problems in time series forecasting and modeling.
The aim of this handbook is to serve as a practitioners guide to forecasting, enabling them to better understand relationships in signals. It is made for an audience with a solid background in Statistics and Mathematics, as well as a basic knowledge of Python. Familiarity with Machine Learning methods is a plus, especially for the later chapters.
## Outline
This handbook contains a variety of techniques that you can use for time series analysis -- from simple statistical models to some of the state-of-the-art algorithms as of writing. Here are the items that are covered in this material:
- Chapter 0: [Introduction to Time Series Analysis](00_Introduction)
- Chapter 1: [Autoregressive integrated moving average](01_AutoRegressiveIntegratedMovingAverage)
- Chapter 2: [Linear Trend and Momentum Forecasting](02_LinearForecastingTrendandMomentumForecasting)
- Chapter 3: [Vector Autoregressive Methods](03_VectorAutoregressiveModels)
- Chapter 4: [Granger Causality](04_GrangerCausality)
- Chapter 5: [Simplex and S-map Projections](05_SimplexandSmapProjections)
- Chapter 6: [Convergent Cross Mapping and Sugihara Causality](06_ConvergentCrossMappingandSugiharaCausality)
- Chapter 7: [Cross-Correlations, Fourier Transform and Wavelet Transform](07_CrosscorrelationsFourierTransformandWaveletTransform)
- Chapter 8: [Winningest Methods](08_WinningestMethods)
# How to use this reference
Each of the chapters mentioned above includes Jupyter notebook/s that contain/s the discussion of each topic (background, limitations, applications). Most of the datasets used in the handbook are included in this repository, and the details of each are described in the [data folder](data).
## Setting up your virtual environment
To be able to run the contents of this repository, it is advised that you setup a virtual environment. You can install one via Anaconda or via Python's native `venv` module.
##### Anaconda
To set up a virtual environment called `atsa`, run the following in your terminal:
```bash
# this will create an anaconda environment
# called atsa in 'path/to/anaconda3/envs/'
conda create -n atsa
```
To activate and enter the environment, run `conda activate atsa`. To deactivate the environment, either run `conda deactivate atsa` or exit the terminal. For more information on setting up your virtual evironment using Anaconda, please visit [this page](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).
##### Python venv
To set up a virtual environment called `atsa`, run the following in your terminal:
```bash
# this will create a virtual environment
# called atsa in your home directory
python3 -m venv ~/atsa
```
To activate and enter the environment, run `source ~/atsa/bin/activate`. To deactivate the environment, either run `deactivate` or exit the terminal. Note that every time you want to work on the assignment, you should rerun `source ~/atsa/bin/activate`.
## Rendering the notebooks
To view the individual notebooks outside of Github without setting up a repository or installing any software, you may use [The Jupyter Notebook Viewer](https://nbviewer.jupyter.org/).
- Open `https://nbviewer.jupyter.org/`
- Paste the link to the notebook.
When a notebook rendered in nbviewer appears differently from the one rendered github, just append `?flush_cache=true` to the end of the nbviewer URL to force it to rerender.
## Jupyterbook
To view all the chapters of this handbook, please visit this link: [Time Series Analysis Handbook](https://phdinds-aim.github.io/time_series_handbook/)
![Screenshot](jupyterbook_handbook.png)
# Contributors
- Benjur Emmanuel Borja
- Gilbert Michael G. Chua
- Francis James Corpuz
- Carlo Vincienzo Dajac
- Sebastian C. Ibañez
- Prince Joseph Erneszer Javier
- Marissa P. Liponhay
- Maria Eloisa M. Ventura

Some files were not shown because too many files have changed in this diff Show More