Raymond Raud
August, 1997
Forecasting Topics Trading Markets Knowledge in Price Data Timing Tools and Methods
Neural network is a statistical modeling method. A program
that implements the method becomes a tool.
Experiments discussed
on this page apply to two types of neural networks:
back propagation,
self-organizing
Back propagation neural network behaves like a statistical curve-fitting method whereas self-organizing network identifies similar patterns and allowing classification of price movement sequences. General principles of neural networks are explained thoroughly elsewhere. Here I discuss only the specific application of two types of neural networks.
Back propagation network takes fixed period of past price movement samples for its input. The sampling period is irrelevant as long as the data structure is the same. I found an interesting correlation from volume and open interest values and the price movement. Volume and open interest are available only for the day. That determined the sampling period for most of my experiments.
Distinction should be made between the absolute value of the price
and the price movement between samples. Absolute value of the price
is usually much larger than the price movement, thus forecasting
precision will be more accurate for the movement rather than for the
absolute value. That requires all price data to be converted into
movements from one period to the next. For the one trading day
sampling period the data elements are:
do, dh, dl, dc, dv, di
,
where
do = open - last day open
dh = high - last day open
dl
= low - last day low
dc = close - last day close
dv = volume
- last day volume
di = open interest - last day open interest
The underlying understanding is that a set of consecutive sampling
periods (trading days) define a pattern that predicts the price
movement from the current period (today) to the next (tomorrow). This
set is the input vector of the network. Clearly, the longer set of
sampling periods, the more numbers to crunch, the slower the neural
network becomes. Too few sampling periods is insufficient to identify
the pattern. So, what is the reasonable size? There are at
least two ways to find it out -- correlation between sampling periods
(use the spreadsheet standard correlation functions) or study the
neural network weight matrix after training. In both cases the
correlation is in waves with significant peaks (8 -- 9 %)
roughly up to 50 sampling periods (days) to the past, gradually
leveling off from there.
Back propagation network
"learns" form the past price movement data. Each input
vector is associated with actual price movement for the next day
(true value). The network's algorithm propagates the error
(difference between forecasted value and the true value) back through
the weights matrix, adjusting the values to reduce the error. True
value is what you wish to forecast. In my experiments I mostly
forecasted sampling period's average
(between daily high and low), but also next period's high or low
value. The input vector with the true value is a record the network
the network processes on each cycle. The sequence of records is the
training file.
The network's learning phase (training)
starts with randomly generated weights matrix. On each cycle the
network first forecasts and then adjusts the weights to reduce the
forecasting error. Depending on the size of the history file and the
acceptable error, the network needs to go through the file many times
before being trained. Once trained, the network can be used for
forecasting. The more you train the network the smaller gets the
forecasting error on the training . The network gradually "learns"
the training pattern better and better. Yet, the goal is to forecast
the values on the input vectors that the network is not trained on
and this is where the conflict is. Once the network learned well the
specific patterns of the training file it can not recognize the
slightly different patterns of the input vectors it is not trained
on. Over training is a known problem in neural network applications
for financial market forecasting. I don't have the "silver
bullet" either, but some practical methods to avoid that are
discussed below.
A self-organizing network identifies similar patterns in the flow
of input data without additional input from the operator. The
algorithm itself determines what belongs to the pattern, thus
eliminating possible bias. The search for the pattern occurs in a
sliding window to allow early detection, critical for the
application. The forecasting application must assume, that
-
the pattern can be detected from the input before the whole pattern
is visible and that the tail of the input will match the rest of the
pattern or
- longer patterns can be decomposed into shorter
patterns and the first short patterns will uniquely identify the
following shorter patterns.
In my experiments the network identified a number of unique patterns like trending with retractions, "head and shoulders", "rising wedge" and others known from the classical technical analysis. Unfortunately, the assumptions required for the forecasting application do not hold. The patterns, once uniquely identified, are complete. Longer patterns (like "head and shoulders") can be decomposed into shorter patterns. But these patterns also occur in other longer patterns. Their presence does not identify specific longer pattern with usable probability.
These observations confirm the generally acknowledged ambiguity of
graph interpretation in forecasting. Once the pattern is all known
(past), identification of the pattern becomes obvious, but useless
for forecasting.
Some common topics and issues are covered in this section. If you don't find answer to your question, please drop me a line I may have an answer. If your observation differs, tell me about it, lets discuss.
Many authors stress the importance of preprocessing the data before learning to reach the practically usable network. Multiple papers on the topic can be found in archives of "Technical Analysis of Stocks and Commodities" Without going into specific details of each of the different recommendations, my experience does not confirm this suggestion.
Significantly faster learning can be achieved by pre-processing the data such that only elements with high correlation to the true values are chosen for the input. For example, instead of base data values: open, high, low, close, volume, open interest movement from previous period data elements like the difference between close and open, difference between previous period close values, etc. are used. The experiments show that the network trained on derived values will be less accurate than with the base data values. If the derived values are used in parallel with the base data values then the network reaches the same accuracy, with less cycles, but the training time for each cycle is longer resulting in no win situation.
The observation that data preprocessing increases the speed of learning, but decreases accuracy appears logical because by pre-processing one will eliminate some intricate relationships in the data elements. These relationships have low correlation with the true value, but it is not zero. By preprocessing you may expose better the strong correlation relations, thus achieving faster learning. But you are not adding knowledge into the input data, but rather eliminating some, thus the result will be less accurate.
Commodity contracts commonly do not last long enough to supply sufficient data for network training. Also, only the oldest contract is actively traded by wide spectrum of speculators, revealing different patterns than the younger contracts. A "continuous contract" that includes only the active contract's data becomes a necessity. Simple concatenation of daily data sets results in a file with significant jumps in values at the days of contract switch because contracts with different expiration dates are usually traded at different price levels. Many complex methods have been suggested for compiling a smooth data set.
Here's a simple approach that has worked for me. First note, that jumps in the data values occur in absolute values of prices. Since we are interested in forecasting price movements (not absolute values) then we can use price movements as input data instead of absolute values. This way instead of open price level we will have the difference between open (today) and open (day before), etc. for input. To avoid jumps in the input data at switch of the contract the first set of new contract (after switch) values is calculated using the same contracts previous day's values rather than previous' contracts corresponding value. This also guarantees that the absolute values of input are always smaller than the daily movement limit adopted by the exchange.
Back propagation network is essentially a statistical curve-fitting algorithm that is capable to "learn" all specific patterns of the training set with extensive training. The goal is for the network to "learn" only the common patterns. The common patterns repeat more often and are "learned" faster, but there is no general indication when to stop the training. Over trained network performs very well on the training file, but fails on the real data because of the specific patterns that existed only in training file.
Having attempted many empirical indicators to determine the training cut-off point I have not found a generally applicable formula. An indicator calculated after each training cycle (full run through the training file)
i = Sum( abs(trade))*Pd - Sum(abs(error)*Stdev(trade)
,where
trade -- is the forecasted price movement
error -- is the difference between forecasted and actual
price movement
Stdev(trade) -- is the standard deviation
of forecasted price movements over the cycle
Pd --
is the probability of correctly forecasted price movement direction
appears to work fine for agricultural commodities, but is out of synch for others.
A simple practical approach is to use a test file in parallel with the training file. After each training cycle the network should forecast the on the test file calculating total error of forecasting. Once the total forecasting error starts consistently increasing it is right time to stop training and use the weights file with best result on the test file. Note, that these indicators fluctuate, so a consistent increase or decrease is the only true indication.
©1997, Raymond Raud. All Rights Reserved.
Last Modified: August 30, 1997