Neural Networks Research

Research paper on using neural networks to predict the foreign exchange market.

python
neural-networks
forex
neat
pandas
numpy
matplotlib
pickle
Word Count: 2577
Published Date:
Reading Time: 10m

Overview

This page is a copy of the ipynb file being hosted on Github. Full research paper can be found here

Predicting the forex market can be quite hard for a nerual network to understand due to the properties of the market. To get an accurate prediction, a neural network needs a very high number of parameters, a lot of data, and thousands of iterations of training.

This paper tries to explore the question on whether a neural network is able to accurately predict the foreign exchange market given a limited set of parameters.

The program works by using a neural networking algorithm known as "NEAT". NEAT stands for "Neural Networks through Augmented Topologies". The neat algorithm works by using the concept of life, where each generation starts to learn more based on the knowledge of the previous generation; The best scoring individual from this generation gets to reproduce and go onto the next generation.

A proper explanation on how the algorithm works can be found here

Retrieving / Downloading Forex Charts

To start, we'll retrieve the data using a public module which imports data from yahoo.finance and automatically converts it into a csv file. This piece of code saves the data as a .csv file.

# Imports from matplotlib import ticker import yfinance as yf # Variables ticker_name: str = "JPY=X" # "JPY=X" # Download ticker data ticker_data = yf.download( tickers=ticker_name, # Set ticker as variable period="max", # Get as much data as possible ) ticker_data.to_csv(f"./../assets/{ticker_name}.csv") # Save dataframe as a csv file ticker_data # Display Ticker Data

Retrieving the dataset used

To retrieve the data, we'll use pandas to pull the file from the github repository. Pandas automatically takes care of making the web request.

import pandas as pd import http.client as client # Need to read text file # Use links to store data assets_path: str = "https://github.com/royce-mathew/data/raw/master" # Pandas automatically makes the request ticker_data: pd.DataFrame = pd.read_csv(f"{assets_path}/JPY=X.csv", index_col=0)

Plotting Initial Data

To plot the initial data, pandas.plot is used. Pandas.plot internally uses matplotlib to plot data.

This plot displays the adjusted close price for each 24 hour period.

# Plot the data ticker_data.index = pd.to_datetime(ticker_data.index) # Convert the index to a datetime value ticker_data.plot.line(y="Adj Close", use_index=True)

Data Analysis

To use NEAT, we need to convert the ticker data and optimize the parameters we pass to it.

We need to show the algorithm some trends in the data so it is able to learn patterns, so we pass the algorithm:

  • RSI: Relative strength index
  • EMA: Exponential moving averages
  • Close Ratios: The ratio of the current close by the rolling averages of the past closes
  • Trend Horizons: Current trend of profits / losses

After filling the dataset with all these values, the program minmaxes all the values to be between 0 and 1, generally considered good practise for inputs in machine learning.

import pandas_ta as ta # We use pandas_ta to see trends in the ticker data from sklearn.preprocessing import MinMaxScaler # Used to scale values between 0 and 1 # Insert new rows to the ticker data using pandas_ta """ RSI tells us overbought / oversold conditions in the market EMAF tells us the exponential moving average The rolling averages will tell us whether the market has gone up or down """ horizons: list[int] = [2, 5, 60, 250, 1000] # The mean closing prices we'll use; 2days, 5days, ... def convert_dataframe(dataframe) -> pd.DataFrame: dataframe.reset_index(inplace=True) # Reset index of dataframe, this gives us an index of integers instead of Dates dataframe.drop(["Volume", "Close", "Date", "High", "Low"], axis=1, inplace=True) # Drop unneeded columns # Create new rows to see target value for tomorrow; this is used to train the model dataframe["Target"] = dataframe["Adj Close"] - dataframe["Open"] # Calculate whether the value increased or decreased dataframe["Target"] = dataframe["Target"].shift(-1) # Shift the target by - 1 so we know tomorrow's target dataframe["TargetClass"] = (dataframe["Target"] > 0).astype(int) # Classify buy/sell dataframe["RSI"] = ta.rsi(dataframe["Adj Close"], length=15) dataframe["EMAM"] = ta.ema(dataframe["Adj Close"], length=100) dataframe["EMAF"] = ta.ema(dataframe["Adj Close"], length=20) dataframe["EMAS"] = ta.ema(dataframe["Adj Close"], length=150) for horizon in horizons: # Loop through these closing prices and add them as a column rolling_averages: pd.DataFrame = dataframe.rolling(horizon).mean() # Add to the dataframe ratio_column: str = f"Close Ratio {horizon}" dataframe[ratio_column] = dataframe["Adj Close"] / rolling_averages["Adj Close"] trend_column: str = f"Trend {horizon}" dataframe[trend_column] = dataframe.shift(1).rolling(horizon).sum()["TargetClass"] # Drop Missing Values dataframe.dropna(inplace=True) dataframe.reset_index(inplace=True) # Reset index again because we dropped null values dataframe.drop(["index"], axis=1, inplace=True) scaled_data: pd.DataFrame = dataframe.copy(deep=True) # Deep Copy # Only give the model values it will know in the real world scaled_data.drop(["Open", "Target", "TargetClass"], axis=1, inplace=True) # Optimize data range between 0 to 1 for the model scaled_data = MinMaxScaler(feature_range=(0,1)).fit_transform(scaled_data) return scaled_data scaled_data = convert_dataframe(ticker_data) # Convert dataframe with important values

Plotting Input Data

Plot the important inputs that are passed to the neural network. The following function plots:

  • Adj Close
  • EMA: Exponential Moving Average
  • RSI: Relative Strength Index

The input data that are passed to the neural network should help it approximate whether the next time period's closing value will be higher / lower compared to the current close.

import matplotlib.pyplot as plt import numpy as np def input_plot(): index = range(len(scaled_data)) # Plot values plt.plot(index, ticker_data["Adj Close"], 'r', label="Adj Close") plt.plot(index, ticker_data["EMAM"], 'g', label="EMAM") plt.plot(index, ticker_data["EMAF"], 'b', label="EMAF") plt.title("Input Data") plt.xlabel("Index") plt.ylabel("Parameters") plt.grid() plt.legend(loc="best") plt.show() plt.figure() plt.plot(index, ticker_data["RSI"], 'r', label="RSI", alpha=0.7) # Add Labels plt.title("Input Data") plt.xlabel("Index") plt.ylabel("Parameters") plt.grid() plt.legend(loc="best") plt.show() input_plot() # Plot the inputs

Plotting the Statistics of Neural Network

To plot the statistics of the neural network, matplotlib is used while the neat module gives us most of the info needed to create the charts.

This plot displasy the fitness of each each generation. The fitness attribute is given to a genome when it starts making profits by predicting the outcome of the future closes.

%matplotlib inline # Plotting the statistics def plot_statistics(stats): generation = range(len(stats.most_fit_genomes)) best_fitness = [c.fitness for c in stats.most_fit_genomes] # Get best fitnessses in each generation avg_fitness = np.array(stats.get_fitness_mean()) # Get the average fitness of the generation stdev_fitness = np.array(stats.get_fitness_stdev()) # Ge the standard deviation of the fitness # Plot values plt.plot(generation, avg_fitness, 'b-', label="average") plt.plot(generation, avg_fitness - stdev_fitness, 'g-.', label="-1 std", alpha=0.5) plt.plot(generation, avg_fitness + stdev_fitness, 'g-.', label="+1 std", alpha=0.5) plt.plot(generation, best_fitness, 'r-', label="best") # Add Labels plt.title("Profits vs Generation") plt.xlabel("Generations") plt.ylabel("Profits") plt.grid() plt.legend(loc="best") plt.show()

Training the Model

The model is trained using the intial data that was was declared as a dataframe.

The training works by essentially telling the giving the genome today's close and rewarding it fitness based on the prediction it makes.

The genome is passed all the values in the scaled_data dataframe with the state that it is currently in. The genome gains a very small amount of fitness for holding onto a trade while the value of that trade increases.

There are 3 outputs that the genome gives:

  • Index 0: represents a hold state
  • Index 1: represents a buy state
  • Index 2: represents a sell state

The genome is not allowed to gain negative profits and is removed from the population if it starts gaining negative profits.

%matplotlib inline import neat import pickle import random import time config_path: str = "../assets/neat-config.txt" # Path of neat config class Forex: # Forex Class, Holds buy / close difference def __init__(self) -> None: self.bought_for: float = 0 self.held_for: float = 0 self.bought: int = 0 def buy(self, bought_for) -> None: if self.bought != 1: self.bought_for = bought_for self.bought = 1 def check_value(self, current_value) -> float: return (current_value - self.bought_for) if self.bought == 1 else 0 def hold(self) -> None: self.held_for += 1 def sell(self, sold_for) -> float: to_return: float = sold_for - self.bought_for if self.bought_for == 0: # Don't return anything if we haven't bought yet return 0 else: self.bought_for = 0 # Reset trade self.bought = 0 return to_return def evaluate_genomes(genomes, config) -> None: networks: list = [] genome_list: list = [] money_list: list = [] index: int = random.randint(0, len(ticker_data) - 1000) # Make model learn from random index each iteration ? for genome_id, genome in genomes: genome.fitness = 0 # Start every genome with a fitness of 0 # Append the net to the nets list networks.append(neat.nn.FeedForwardNetwork.create(genome, config)) # Neat.NeuralNetwork.FeedForwardNetwork genome_list.append(genome) # append each genome to the genome list, when genome list is empty go to the next generation money_list.append(Forex()) # Loop until end dataframe's end while index < len(ticker_data): process_data = scaled_data[index] # MinMaxed data target = ticker_data.T[index][2] # Difference between today's close and tomorrows close target_c = ticker_data.T[index][3] # Positive/ Negative, tells whether we should've bought or sold current_close = ticker_data.T[index][1] # Loop through list of people for x, model in enumerate(genome_list): money: Forex = money_list[x] # Make prediction prediction = networks[x].activate(np.append(process_data, money.bought)) decision = prediction.index(max(prediction)) # Main Logic match decision: case 0: # Hold trade money.hold() # Hold the stock if money.check_value(current_close) > 0: # If money starts gaining value model.fitness += 0.0001 case 1: # Buy if money.bought_for == 0: # If we haven't already bought a stock yet money.buy(current_close) # Buy a new stock model.fitness += 0.0001 else: model.fitness -= 0.0002 case 2: # Sell # Add gained profit to fitness model.fitness += money.sell(current_close) # Sell the stock # If the model has negative fitness if model.fitness < 0: networks.pop(x) # Remove model from this generation genome_list.pop(x) money_list.pop(x) # Go to the next row if there are no people left if len(genome_list) > 0: index += 1 # Go to next day else: break def train_model(checkpoint = None, generations: int = 1000) -> None: # Set neat config config: neat.Config = neat.Config( neat.DefaultGenome, neat.DefaultReproduction, neat.DefaultSpeciesSet, neat.DefaultStagnation, config_path ) # Initialize Population Variable population: neat.Population; # Check if checkpoint parameter was passed if checkpoint is not None: try: population = neat.Checkpointer.restore_checkpoint(f"neat-checkpoint-{checkpoint}") except FileNotFoundError: print(f"File neat-checkpoint-{checkpoint} not found, starting from generation 0") population = neat.Population(config=config) else: population = neat.Population(config=config) # Add reporters (Testing purposes) networkStatistics = neat.StatisticsReporter() population.add_reporter(neat.Checkpointer(2000)) # population.add_reporter(neat.StdOutReporter(True)) population.add_reporter(networkStatistics) # Run the population best_node = population.run(evaluate_genomes, generations) plot_statistics(networkStatistics) # Plot the statistics after we are finished 1000 generations # Save the best node in a pickle file with open("../assets/best.pickle", "wb") as file: pickle.dump(best_node, file) # load_best(config_path) train_model(generations=200)

Profit vs Generation

Testing

The best genome can be tested on the decisions it chooses each day.

index: int = 222 # Index to start testing from test_data_len: int = 500 # Number of days to test total_profit: float = 0 # The profit the genome has made test_data: pd.DataFrame = pd.read_csv(f"{assets_path}/INR=X.csv", index_col=0) test_scaled: pd.DataFrame = convert_dataframe(test_data) # Convert dataframe so we can send it to the model genome_money: Forex = Forex() # Initialize a new object to hold profit statistics for current genome decision_counter: list[int] = [0, 0, 0] decisions: list[str] = ["Hold", "Buy", "Sell"] # Test a specific genome def test_genome(genome=None, process_data=None) -> int: # Error catching if process_data is None: # Catch errors if the process data isnt passed raise Exception("Error, process data is needed.") if genome is None: # If genome isn't passed, use the best.pickle try: # Catch any errors when trying to open the best.pickle file with open("../assets/best.pickle", "rb") as f: genome = pickle.load(f) except FileNotFoundError: print("(best.pickle) File not found, please load the file and try again") return -1 except: print("Unknown error occured") return -1 # Declare config config = neat.Config( neat.DefaultGenome, neat.DefaultReproduction, neat.DefaultSpeciesSet, neat.DefaultStagnation, config_path ) # population = neat.Population(config) # Start a new population with the config network: neat.nn.FeedForwardNetwork = neat.nn.FeedForwardNetwork.create(genome=genome, config=config) # Get the network passed and feed it this genome # Predictions prediction: list[float] = network.activate(process_data) decision: float = prediction.index(max(prediction)) return decision test_start_time = time.time() for i in range(test_data_len): # Loop through test data genome_decision = test_genome(process_data=np.append(test_scaled[i], genome_money.bought)) # Transpose the test data and send the second last index current_close: float = test_data["Adj Close"].get(i) # type: ignore decision_counter[genome_decision] += 1 # Increase the decision counter by 1 match genome_decision: case 0: # Hold trade genome_money.hold() # Hold the stock case 1: # Buy if genome_money.bought_for == 0: # If we haven't already bought a stock yet genome_money.buy(current_close) # Buy a new stock case 2: # Sell total_profit += genome_money.sell(current_close) # Sell the stock bar_figure = plt.figure() # Create new Figure bar_axis = bar_figure.add_axes([0,0,1,1]) # Add Axes to the figre bar_axis.bar(decisions, decision_counter) # Add the values plt.xlabel("Decisions") plt.ylabel("Counter") plt.title("Decisions Made by the Best Genome") plt.show() # Show the figure print(f"Total profit made by best genome: {total_profit}") # Print profit made by genome print(f"Time it took to test genome: {time.time() - test_start_time}")

Final Remarks

Discussion

As the data shows, during learning, the genome's profits stay erratic between each generation and do not increase at a linear rate. To create a more efficient algorithm, the data that is passed to the neat algorithm would need to be more correlational for the algorithm to start to see patterns in data. More computing power and more iterations of generations are needed to see progress in profits.

During testing, the best genome in all 200 generations is not able to create a profit at all with a new and different dataset. As displayed by the bar graph shown above, the neural network decides to mostly make hold decisions because it thinks that the longest way to survive is by holding. The neural network also decides to buy approximately 2 times but never decides to close the trade which leaves it at always a negative profit. One reason that the neural network decides to always hold the trade is possibly because it gains some fitness when the value of the currency goes up and it sees selling as too much of a risky decision.

All in all, the current neat algorithm, although efficient, is not smart enough to predict Forex Charts based on pure trends in the data and only 200 iterations of generations.

References

  • Matplotlib for Data Visualization
  • Neat for its Neural Networking Algorithm
  • Numpy for Array Manipulation
  • Pandas For Data Manipulation
  • Pandas_ta for Technical Analysis Functionsli>
  • Pickle for saving genome data
  • Random for Choosing random indexes for neural network learning
  • Sklearn for MinMax Function
  • Yahoo Finance to import ticker data