Exchange Rate Prediction: Time Series Forecasting with ARIMA (2024)

Utilizing the previous work with Regression, this final part aims to forecast future exchange rates using ARIMA, a machine learning algorithm that “predicts the future from the past”.

Exchange Rate Prediction: Time Series Forecasting with ARIMA (3)

Here comes the final part of my end-to-end project in Data Science. If you have followed or read the previous two parts, you might have known what to expect in this article. But if you haven’t, do not hesitate to check them out: 1st part and 2nd part.

Here’s a summary of the entire structure of this project:

  • Part 1: Explanatory Data Analysis (EDA) & Data Visualisation (Bonus: Hypothesis Testing)
  • Part 2: Machine Learning with 4 Regression Models
  • Part 3: Machine Learning (cont.) with ARIMA

With the first part paving the foundation for the analysis with data cleaning and visualization and the second employing Regression models to fit all data points, this final part will utilize them all to predict the future (in this case, AUD/USD exchange rates in 2020). In order to achieve so, prerequisites and process should be taken into account:

  1. Stationary Testing (ADF & KPSS)
  2. Autocorrelation (ACF) and Partial Autocorrelation (PACF)
  3. ARIMA Forecast

Let’s get started!

Before jumping in, you might wonder why we need to follow the mentioned structure before using ARIMA. If so, let’s briefly understand what ARIMA is and needs prior!

What does ARIMA stand for?

When it comes to forecasting a time series, we can choose either:

  • Univariate: if we believe the previous values of the time series are able to predict its future values
  • Multi-Variate: if we use predictors OTHER THAN the series itself to predict the future values

In this case, ARIMA stands for ‘AutoRegressive Integrated Moving Average’, an algorithm originating from the belief that the past values of a time series can alone be used to predict future values.

To use ARIMA, we need to determine the values for its parameter: p, d, q

  • p: the order of the AR term (AR: Auto-Regressive)
  • d: the number of differencing
  • q: the order of the MA term (MA: Moving Average)

That’s pretty much all information needed for you to get acquainted with what’s coming below.

What is Stationary Testing and why do we need it?

ARIMA requires the dataset to be “stationary” in order for the model to produce accurate predictions. In a nutshell, stationary series, technically, does not vary over time. Particularly, parameters such as mean, variance, and covariance remain unchanged with time.

To illustrate it more visually engaging, let’s look at the charts below to see the difference between stationary and non-stationary data:

Exchange Rate Prediction: Time Series Forecasting with ARIMA (4)

As you can see, it does not mean the dataset is “frozen” at any particular point of time. It does change, but its mean and variance remain the same as time shifts (chart 1). In contrast, if such unusual peaks or dips are seen (no clear patterns), it means the dataset and its statistical properties change over time (chart 2). It’s commonly known to be owing to trends or seasonality that intervene in the dataset.

If you want to dive in the difference between the two, mathematically, check out this article on Towards Data Science:

Stationarity in time series analysisA review of the concept and types of

Stationary data is important because it’s easier for the model to learn the pattern as well as for us to analyze and forecast, which has become a common assumption in most time series analysis.

Having said so, how do we know whether the dataset is stationary or not? First of all, we can rely on statistical testing to check the stationarity and if it’s not, we need to transform the data from non-stationary to stationary.

ADF (Augmented Dickey-Fuller) and KPSS (Kwiatkowski-Phillips-Schmidt-Shin) are two common statistical tests for stationarity. The purpose of these two tests is technically the same, except for the technique used behind. Whereas ADF uses differencing to transform the dataset, KPSS removes the trend to make the data stationary. Therefore, it’s well advised to use both tests to ensure the expected outcomes.

However, one important to note is that the H0 (null hypothesis) and H1 (alternative hypothesis) of the two tests are the opposite to each other:

  • ADF: H0 states that the dataset is NOT stationary while H1 says it is.
  • KPSS: H0 states that the dataset is stationary while H1 says it is NOT.
from statsmodels.tsa.stattools import adfulleradf_test = adfuller(y_pred)
print('stat=%.3f, p=%.3f' % adf_test[0:2])
if adf_test[1] > 0.05:
print('Probably not Stationary')
print('Probably Stationary')

The result of ADF test: stat=-0.434, p=0.904— ‘Probably not Stationary’

from statsmodels.tsa.stattools import kpsskpss_test = kpss(y_pred, nlags='auto')print('stat=%.3f, p=%.3f' % kpss_test[0:2])
if kpss_test[1] > 0.05:
print('Probably Stationary')
print('Probably not Stationary')

The result of KPSS test: stat=0.692, p=0.014 — ‘Probably not Stationary’

It turns out that the dataset is NOT stationary at all! Let’s see it visually:

from statsmodels.tsa.seasonal import seasonal_decomposey_pred_list = y_pred.tolist()
result = seasonal_decompose(y_pred_list, model='additive', period=1)
Exchange Rate Prediction: Time Series Forecasting with ARIMA (5)

Seasonal_decompose tells us how our dataset looks in terms of trend, seasonal and residual. As you can see, the pattern between Observed and Trend is pretty similar to each other, implying that trend has intervened as the cause of non-stationary data.

Let’s make it stationary, again!!

date = list(range(1,37))
date_fx = pd.DataFrame(zip(date, y_pred_list), columns=['Date', 'FX'])
date_fx_log = np.log(date_fx)date_fx_log_diff = date_fx_log - date_fx_log.shift(1)
y_stationary = date_fx_log_diff.iloc[:,1]

First of all, I replaced the actual date with a range of numbers from 1 to 36 (representing 36 months in our data), for the sake of simplicity. Then, to stabilize the dataset, I applied transformation into the original data with np.log (this can be done by different methods: power transform, square root or log transform). Finally, I used differencing to make the data stationary with .shift(1), which substracts every point in the dataset by the one that precedes it.

There we go, we have transformed the non-stationary predicted exchange rates (y_pred_list, as in the previous part) to the stationary one (y_stationary)!

Let’s double check with ADF and KPSS again!

#a. ADF Test
adf_test_2 = adfuller(y_stationary)
print('stat=%.3f, p=%.3f' % adf_test_2[0:2])
if adf_test_2[1] > 0.05:
print('Probably not Stationary')
print('Probably Stationary')
#b. KPSS Test
kpss_test_2 = kpss(y_stationary, nlags='auto')
print('stat=%.3f, p=%.3f' % kpss_test_2[0:2])
if kpss_test_2[1] > 0.05:
print('Probably Stationary')
print('Probably not Stationary')

The results are:

  • ADF: stat=-5.316, p=0.000, Probably Stationary
  • KPSS: stat=0.192, p=0.100, Probably Stationary


What is ACF and PACF and how do they relate to ARIMA?

After making the dataset stationary, we move on to the 2nd step of the process: determining the p, d, q for the ARIMA model. This can be done by looking at ACF and PACF plots!

According to Machine Learning Mastery,

These are plots that graphically summarize the strength of a relationship with an observation in a time series with observations at prior time steps.

This exactly aligns with one of the steps we have done previously: differencing with .shift(1)! If you recall, we have employed shift as the way to subtract the current value by its preceding one to make the data stationary. Indeed, ACF and PACF can do more than just differencing!

Let’s plot them out:

from import plot_acf, plot_pacfdate_fx_log_diff = date_fx_log_diff.values.reshape(-1)plot_acf(date_fx_log_diff, lags=50)
plot_pacf(date_fx_log_diff, lags=50)
Exchange Rate Prediction: Time Series Forecasting with ARIMA (6)

Based on the shape and the pattern of ACF and PACF, we can determine the p (AR term) and q (MA term) for the ARIMA model. If you want to dive into it, check out these articles here and here. Essentially, the number of significant lags in the charts “delivers the message”, in this case the values of p & q.

As you can see in ACF plot, a large spike is seen in lag 1, followed by 2 or 3 more lags and all of them are deemed significant as they lie above the “limit threshold” or “significance line” (the blue area). Likewise, in the PACF plot, a large spike is seen in lage 1, followed by another in lag 3. So what does it all imply?

To me, this finding is a key to testing which combination of (p, d, q) brings out the best result for our ARIMA model, in which p can be found in ACF and q in PACF. So what result is it, specifically? It’s AIC.

The Akaike Information Critera (AIC) is a widely used measure of a statistical model. It basically quantifies 1) the goodness of fit, and 2) the simplicity/parsimony, of the model into a single statistic. — Abbas Keshvani, Coolstatsblog

Generally, the lower the AIC is, the better. Let’s move to part 3 to find out which combination brings the lowest AIC!

Let’s plug in the values of p, d, q to our ARIMA model!

As I explained earlier, the number of significant lags in the ACF and PACF plots can be translated into the corresponding p & q. Let’s see how ARIMA looks with the following values:

  • p = 3 as 3 significant lags in ACF
  • d = 1 as computed in .shift(1)
  • q =2 as 2 significant lags in PACF
from statsmodels.tsa.arima_model import ARIMAy = date_fx_log.iloc[:, 1]model_arima = ARIMA(y, (3,1,2))
model_arima_fit =

First of all, we imported ARIMA from Statsmodels library and extracted y from the transformed dataset. Then, we fit ARIMA model with the above values of (p, d, q) and called .summary():

Exchange Rate Prediction: Time Series Forecasting with ARIMA (7)

As much complicated as it looks, we just have to focus on two metrics in the table: AIC and p-value (P>|z|).

  • AIC = -183.227, which we will test against another combination of p, d, q to see if it’s the lowest value or not.
  • With p = 3, the model returns 3 different p-values for each AR in different lines. While the first two’s p-values <0.05 (which is ideally recommendable), the AR.L3’s is above 0.05. This means that there might actually be 2 significant lags in the ACF plot.

Based on the finding, let’s test out another combination of (p, d, q) = (2,1,2):

Exchange Rate Prediction: Time Series Forecasting with ARIMA (8)

In this case, the AIC (-183.096) does not improve compared to the first model’s AIC (-183.227). Furthermore, the p-values of AR.L1 (the 1st p) and MA.L1 (the 1st q) are >0.05, which indicates that we might only need p=1 and q=1. Let’s test this out with (p, d, q) = (1, 1, 1):

Exchange Rate Prediction: Time Series Forecasting with ARIMA (9)

Bingo! The AIC has improved quite a lot as it’s the best value so far after three trials (lowest AIC and all p-values <0.05).

Okay, this process has been a bit too manual and time-consuming. How about we use iteration with a for loop and see which combination (p, d, q) brings out the lowest AIC?

import itertoolsp = range(1, 4)
d = range(1, 2)
q = range(1, 3)
pdq = list(itertools.product(p, d, q))aics = []
params = []
for param in pdq:
model = ARIMA(y, order=param)
model_fit =
aic = model_fit.aic
combo = list(zip(aics, params))
combo_array = np.array(combo)

As seen in the above, I created different ranges for p, d and q corresponding with the ACF and PACF plots:

  • p = range(1, 4): as 3 lags are seen in ACF
  • d = range(1, 2): as we need at least 1 differencing to make the data stationary
  • q = range(1, 3): as 2 lags are seen in PACF

Then, I used itertools.product() to generate different combinations from the given values. Finally, I applied a for loop and fit the ARIMA model with each combination generated.

Exchange Rate Prediction: Time Series Forecasting with ARIMA (10)

So there you go, the lowest AIC is -185.892 from the (p, d, q) of (1, 1, 1), which is the same as the latest trial above!

Exchange Rate Prediction: Time Series Forecasting with ARIMA (11)

Finally, the very last step of all, forecasting:

pred = model_arima_fit.forecast(12, alpha=0.05)[0]

Just a quick note, .forecast() includes the number of values which we need to forecast (in this case, 12 months in 2020 following the dataset period) and alpha=0.05 as 95% confidence interval. plot_predict(1, 42) as (start, end) of the entire dataset of forecast (in this case, 1 as the 1st month, 01/2017, and 2 as the last month, 12/2020).

Exchange Rate Prediction: Time Series Forecasting with ARIMA (12)

Regarding the matchability of our y-value (orange) and the forecast (blue), it doesn’t look bad! In fact, if you recall from the chart with Polynomial Regression, the forecast (blue) perfectly matches the line that we’ve discovered!

Tadah! We nearly get to the end of the project, which we have forecasted the exchange rates in 2020. If you print the predicted values, it might look odd to you:

Exchange Rate Prediction: Time Series Forecasting with ARIMA (13)

That’s because the predicted values originated from the dataset which had been log-transformed before running this model! Simply revert the values with the help of exponential function, which is the inverse of log.

pred = np.exp(pred)
forecast = pred.tolist()
fx_2020 = np.array(list(zip(month_year_future, forecast)))
Exchange Rate Prediction: Time Series Forecasting with ARIMA (14)

Perfect! Let’s visualize it as a whole, from 2017 to 2020!

x_merge = ['2019-12', '2020-01']
y_merge = [y_pred[-1], forecast[0]]
plt.scatter(month_year, y_fx, alpha=0.4)
plt.plot(month_year, y_fx_predict, color='b')
plt.plot(month_year, y_fx_predict_2, color='r')
plt.plot(month_year, y_fx_predict_3, color='g')
plt.plot(month_year, y_pred, color='black')
plt.plot(x_merge, y_merge, color='y')
plt.plot(month_year_future, forecast, color='y')
plt.legend(['1var', '2var', '3var', '3var with Poly', 'Forecast'])
plt.title("Linear Regression: AUD/USD Exchange Rate (3 var: Interest Rate, GDP & UER)")
plt.ylabel("Exchange Rate")
Exchange Rate Prediction: Time Series Forecasting with ARIMA (15)


That’s the end of this project — Exchange Rate Prediction! Despite a lot information to absorb, I do hope you find the entire series (part 1, part 2, and final part here) helpful and informative!

Do look out for my upcoming projects in Data Science and Machine Learning in the near future! In the meantime, give me a clap if you find this helpful and feel free to check out my Github here for the complete repository:



Exchange Rate Prediction: Time Series Forecasting with ARIMA (2024)


Is ARIMA good for time series forecasting? ›

ARIMA models provide another approach to time series forecasting. Exponential smoothing and ARIMA models are the two most widely used approaches to time series forecasting, and provide complementary approaches to the problem.

What is the ARIMA model in forex? ›

ARIMA (AutoRegressive Integrated Moving Average) forecasting is a time series forecasting method that combines autoregressive (AR), differencing (I), and moving average (MA) components to model and predict future values of a time series.

How do you forecast exchange rates? ›

Purchasing power parity (PPP) is one of the most widely used methods of predicting currency fluctuations. This is based on the “law of one price,” which suggests that any given product should have the same price in any country's economy.

When should you not use ARIMA? ›

ARIMA modeling is generally inadequate for long-term forecastings, such as more than six months ahead, because it uses past data and parameters that are influenced by human thinking. For this reason, it is best used with other technical analysis tools to get a clearer picture of an asset's performance.

Is ARIMA better than LSTM for stock market prediction? ›

The LSTM model provides better results when the data set is large and has fewer Nan values. Whereas, despite providing better accuracy than LSTM, the ARIMA model requires more time in terms of processing and works well when all the attributes of the data set provide legitimate values.

What are the four common techniques used to forecast exchange rates? ›

One possible way to minimise risks lies in foreign exchange rate forecasting. There are many ways to go about this, including fundamental and technical analysis, relative economic strength, econometric models, and purchasing power parity.

What is the formula for predicted exchange rate? ›

If you don't know the exchange rate, you can use this formula: starting amount (base currency) / ending amount (foreign currency) = exchange rate. Use the currency conversion formulas mentioned earlier to calculate how much you'd get for your currency if you were trading in the forex market.

What is the best way for exchange rate? ›

Local banks and credit unions usually offer the best rates. Major banks, such as Chase or Bank of America, often offer the added benefit of having ATMs overseas. Online peer-to-peer foreign currency exchanges. Online bureaus or currency converters, such as Travelex, provide convenient foreign exchange services.

What is the formula for ARIMA prediction? ›

Multi-step prediction intervals for ARIMA(0,0,q ) models are relatively easy to calculate. We can write the model as yt=εt+q∑i=1θiεt−i.

What are the disadvantages of ARIMA model? ›

However, it's worth noting that ARIMA does have some limitations to be aware of. For example, its linear assumptions can lead to errors during events like market crashes. The process of selecting parameters can also introduce subjectivity and uncertainty into the analysis.

Is LSTM better than ARIMA? ›

The ARIMA model achieved the best performance overall, with a mean absolute percentage error (MAPE) of 2.76% and root mean squared error (RMSE) of $302.53. The LSTM model had higher errors, with MAPE of 3.97% and RMSE of $381.34. The gated recurrent unit (GRU) variant performed slightly better than the standard LSTM.

Which method is best for time series forecasting? ›


AutoRegressive Integrated Moving Average (ARIMA) models are among the most widely used time series forecasting techniques: In an Autoregressive model, the forecasts correspond to a linear combination of past values of the variable.

Which models are best for time series prediction? ›

Traditional approaches include moving average, exponential smoothing, and ARIMA, though models as various as RNNs, Transformers, or XGBoost can also be applied. The most popular benchmark is the ETTh1 dataset.

Top Articles
Thankful on Thanksgiving | White Coat Investor
Financial planning: Smart money moves to make after retirement
Citi Trends Watches
Tmobile Ipad 10Th Gen
The Clapping Song Lyrics by Belle Stars
Scary Games 🕹️ | Play For Free on GamePix
Yogabella Babysitter
Shaw Centre for the Salish Sea — Eight Arms, Eight Interesting Facts: World Octopus Day
B Corp: Definition, Advantages, Disadvantages, and Examples
Knock At The Cabin Showtimes Near Fat Cats Mesa
Chris Evert Twitter
Yoga With Thick Stepmom
60 Days From May 31
Fisher-Cheney Funeral Home Obituaries
Chlamydia - Chlamydia - MSD Manual Profi-Ausgabe
Eztv Ig
BugBitten Jiggers: a painful infestation
Rick Harrison Daughter Ciana
Rub Rating Louisville
Boys golf: Back-nine surge clinches Ottumwa Invite title for DC-G
Poe Poison Srs
10425 Reisterstown Rd
Numerous people shot in Kentucky near Interstate 75, officials say | CNN
Taco Bell Fourth Of July Hours
Antonios Worcester Menu
[마감]봄나들이 갈때 나만의 스타일을 골라보아요~!마감된이벤트 - dodry
Pcc Skilled Nursing Login
Craigslist Chicagoland Area
Matrix Skilled Nursing Login
Joy Jenkins Barnett Obituary
Lo que necesitas saber antes de desrizarte el cabello
Otter Bustr
ACMG - American College of Medical Genetics and Genomics on LinkedIn: #medicalgenetics #genomics
Kare11.Com Contests
Bulk Amateur 51 Girls Statewins Leak – BASL058
Sam's Club Gas Price Hilliard
City Of Irving Tx Jail In-Custody List
450 Miles Away From Me
Does Family Dollar Accept Fsa Cards
4Myhr Mhub
Effingham Radio News
Dr Ommert Norwalk Ohio
Infinity Pool Showtimes Near Maya Cinemas Bakersfield
Johnnie Robinson Auto Sales
El Confidencial Vanitatis
Water Temperature Robert Moses
Saqify Leaks
St Anthony Hospital Crown Point Visiting Hours
Intervallfasten 5/2: Einfache Anfänger-Anleitung zur 5:2-Diät
Schematic Calamity
Csuf Mail
Omgekeerd zoeken op telefoonnummer |
Latest Posts
Article information

Author: Stevie Stamm

Last Updated:

Views: 5877

Rating: 5 / 5 (80 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Stevie Stamm

Birthday: 1996-06-22

Address: Apt. 419 4200 Sipes Estate, East Delmerview, WY 05617

Phone: +342332224300

Job: Future Advertising Analyst

Hobby: Leather crafting, Puzzles, Leather crafting, scrapbook, Urban exploration, Cabaret, Skateboarding

Introduction: My name is Stevie Stamm, I am a colorful, sparkling, splendid, vast, open, hilarious, tender person who loves writing and wants to share my knowledge and understanding with you.