Neural Networks for Market Forecasting

Raymond Raud

August, 1997 


Forecasting Topics            Trading Markets        Knowledge in Price Data            Timing Tools and Methods 



Neural network is a statistical modeling method. A program that implements the method becomes a tool.
Experiments discussed on this page apply to two types of neural networks:

Back propagation neural network behaves like a statistical curve-fitting method whereas self-organizing network identifies similar patterns and allowing classification of price movement sequences. General principles of neural networks are explained thoroughly elsewhere. Here I discuss only the specific application of two types of neural networks.

Back propagation network

Back propagation network takes fixed period of past price movement samples for its input. The sampling period is irrelevant as long as the data structure is the same. I found an interesting correlation from volume and open interest values and the price movement. Volume and open interest are available only for the day. That determined the sampling period for most of my experiments.

Distinction should be made between the absolute value of the price and the price movement between samples. Absolute value of the price is usually much larger than the price movement, thus forecasting precision will be more accurate for the movement rather than for the absolute value. That requires all price data to be converted into movements from one period to the next. For the one trading day sampling period the data elements are:
do, dh, dl, dc, dv, di
, where
do = open - last day open
dh = high - last day open
dl = low - last day low
dc = close - last day close
dv = volume - last day volume
di = open interest - last day open interest

The underlying understanding is that a set of consecutive sampling periods (trading days) define a pattern that predicts the price movement from the current period (today) to the next (tomorrow). This set is the input vector of the network. Clearly, the longer set of sampling periods, the more numbers to crunch, the slower the neural network becomes. Too few sampling periods is insufficient to identify the pattern.  So, what is the reasonable size? There are at least two ways to find it out -- correlation between sampling periods (use the spreadsheet standard correlation functions) or study the neural network weight matrix after training. In both cases the correlation is in waves with significant peaks (8 -- 9 %)  roughly up to 50 sampling periods (days) to the past, gradually leveling off from there.
 
Back propagation network "learns" form the past price movement data. Each input vector is associated with actual price movement for the next day (true value). The network's algorithm propagates the error (difference between forecasted value and the true value) back through the weights matrix, adjusting the values to reduce the error. True value is what you wish to forecast. In my experiments I mostly forecasted sampling period's average (between daily high and low), but also next period's high or low value. The input vector with the true value is a record the network the network processes on each cycle. The sequence of records is the training file.
 
The network's learning phase (training) starts with randomly generated weights matrix. On each cycle the network first forecasts and then adjusts the weights to reduce the forecasting error. Depending on the size of the history file and the acceptable error, the network needs to go through the file many times before being trained. Once trained, the network can be used for forecasting. The more you train the network the smaller gets the forecasting error on the training . The network gradually "learns" the training pattern better and better. Yet, the goal is to forecast the values on the input vectors that the network is not trained on and this is where the conflict is. Once the network learned well the specific patterns of the training file it can not recognize the slightly different patterns of the input vectors it is not trained on. Over training is a known problem in neural network applications for financial market forecasting. I don't have the "silver bullet" either, but some practical methods to avoid that are discussed below.

Self-organizing network

A self-organizing network identifies similar patterns in the flow of input data without additional input from the operator. The algorithm itself determines what belongs to the pattern, thus eliminating possible bias. The search for the pattern occurs in a sliding window to allow early detection, critical for the application. The forecasting application must assume, that
 - the pattern can be detected from the input before the whole pattern is visible and that the tail of the input will match the rest of the pattern or
 - longer patterns can be decomposed into shorter patterns and the first short patterns will uniquely identify the following shorter patterns.

In my experiments the network identified a number of unique patterns like trending with retractions, "head and shoulders", "rising wedge" and others known from the classical technical analysis. Unfortunately, the assumptions required for the forecasting application do not hold. The patterns, once uniquely identified, are complete. Longer patterns (like "head and shoulders") can be decomposed into shorter patterns. But these patterns also occur in other longer patterns. Their presence does not identify specific longer pattern with usable probability.

These observations confirm the generally acknowledged ambiguity of graph interpretation in forecasting. Once the pattern is all known (past), identification of the pattern becomes obvious, but useless for forecasting.
 

Observations, common problems and practical remedies

Some common topics and issues are covered in this section. If you don't find answer to your question, please drop me a line I may have an answer. If your observation differs, tell me about it, lets discuss.

The impact of Data Preprocessing.

Many authors stress the importance of preprocessing the data before learning to reach the practically usable network. Multiple papers on the topic can be found in archives of "Technical Analysis of Stocks and Commodities" Without going into specific details of each of the different recommendations, my experience does not confirm this suggestion.

Significantly faster learning can be achieved by pre-processing the data such that only elements with high correlation to the true values are chosen for the input. For example, instead of base data values: open, high, low, close, volume, open interest movement from previous period data elements like the difference between close and open, difference between previous period close values, etc. are used. The experiments show that the network trained on derived values will be less accurate than with the base data values. If the derived values are used in parallel with the base data values then the network reaches the same accuracy, with less cycles, but the training time for each cycle is longer resulting in no win situation.

The observation that data preprocessing increases the speed of learning, but decreases accuracy appears logical because by pre-processing one will eliminate some intricate relationships in the data elements. These relationships have low correlation with the true value, but it is not zero. By preprocessing you may expose better the strong correlation relations, thus achieving faster learning. But you are not adding knowledge into the input data, but rather eliminating some, thus the result will be less accurate.

Continuous Contract Issue.

Commodity contracts commonly do not last long enough to supply sufficient data for network training. Also, only the oldest contract is actively traded by wide spectrum of speculators, revealing different patterns than the younger contracts. A "continuous contract" that includes only the active contract's data becomes a necessity. Simple concatenation of daily data sets results in a file with significant jumps in values at the days of contract switch because contracts with different expiration dates are usually traded at different price levels. Many complex methods have been suggested for compiling a smooth data set.

Here's a simple approach that has worked for me. First note, that jumps in the data values occur in absolute values of prices. Since we are interested in forecasting price movements (not absolute values) then we can use price movements as input data instead of absolute values. This way instead of open price level we will have the difference between open (today) and open (day before), etc. for input. To avoid jumps in the input data at switch of the contract the first set of new contract (after switch) values is calculated using the same contracts previous day's values rather than previous' contracts corresponding value. This also guarantees that the absolute values of input are always smaller than the daily movement limit adopted by the exchange.

Over training

Back propagation network is essentially a statistical curve-fitting algorithm that is capable to "learn" all specific patterns of the training set with extensive training. The goal is for the network to "learn" only the common patterns. The common patterns repeat more often and are "learned" faster, but there is no general indication when to stop the training. Over trained network performs very well on the training file, but fails on the real data because of the specific patterns that existed only in training file.

Having attempted many empirical indicators to determine the training cut-off point I have not found a generally applicable formula. An indicator calculated after each training cycle (full run through the training file)

appears to work fine for agricultural commodities, but is out of synch for others.

A simple practical approach is to use a test file in parallel with the training file. After each training cycle the network should forecast the on the test file calculating total error of forecasting. Once the total forecasting error starts consistently increasing it is right time to stop training and use the weights file with best result on the test file. Note, that these indicators fluctuate, so a consistent increase or decrease is the only true indication.


Raymond Raud
 



©1997, Raymond Raud. All Rights Reserved.
Last Modified: August 30, 1997