Distinction should be made between the absolute value of the price and
the price movement between samples. Absolute value of the price is usually
much larger than the price movement, thus forecasting precision will be
more accurate for the movement rather than for the absolute value. That
requires all price data to be converted into movements from one period
to the next. For the one trading day sampling period the data elements
are:
do, dh, dl, dc, dv, di
, where
do = open - last day open
dh = high - last day open
dl = low - last day low
dc = close - last day close
dv = volume - last day volume
di = open interest - last day open interest
The underlying understanding is that a set of consecutive sampling periods
(trading days) define a pattern that predicts the price movement from the
current period (today) to the next (tomorrow). This set is the input vector
of the network. Clearly, the longer set of sampling periods, the more numbers
to crunch, the slower the neural network becomes. Too few sampling periods
is insufficient to identify the pattern. So, what is the reasonable
size? There are at least two ways to find it out -- correlation between
sampling periods (use the spreadsheet standard correlation functions) or
study the neural network weight matrix after training. In both cases the
correlation is in waves with significant peaks (8 -- 9 %) roughly
up to 50 sampling periods (days) to the past, gradually leveling off from
there.
Back propagation network "learns" form the past price movement data.
Each input vector is associated with actual price movement for the next
day (true value). The network's algorithm propagates the error (difference
between forecasted value and the true value) back through the weights matrix,
adjusting the values to reduce the error. True value is what you wish to
forecast. In my experiments I mostly forecasted sampling
period's average (between daily high and low), but also next period's
high or low value. The input vector with the true value is a record the
network the network processes on each cycle. The sequence of records is
the training file.
The network's learning phase (training) starts with randomly generated
weights matrix. On each cycle the network first forecasts and then adjusts
the weights to reduce the forecasting error. Depending on the size of the
history file and the acceptable error, the network needs to go through
the file many times before being trained. Once trained, the network can
be used for forecasting. The more you train the network the smaller gets
the forecasting error on the training . The network gradually "learns"
the training pattern better and better. Yet, the goal is to forecast the
values on the input vectors that the network is not trained on and this
is where the conflict is. Once the network learned well the specific patterns
of the training file it can not recognize the slightly different patterns
of the input vectors it is not trained on. Over training is a known problem
in neural network applications for financial market forecasting. I don't
have the "silver bullet" either, but some practical methods to avoid that
are discussed below.
In my experiments the network identified a number of unique patterns like trending with retractions, "head and shoulders", "rising wedge" and others known from the classical technical analysis. Unfortunately, the assumptions required for the forecasting application do not hold. The patterns, once uniquely identified, are complete. Longer patterns (like "head and shoulders") can be decomposed into shorter patterns. But these patterns also occur in other longer patterns. Their presence does not identify specific longer pattern with usable probability.
These observations confirm the generally acknowledged ambiguity of graph
interpretation in forecasting. Once the pattern is all known (past), identification
of the pattern becomes obvious, but useless for forecasting.
Significantly faster learning can be achieved by preprocessing the data such that only elements with high correlation to the true values are chosen for the input. For example, instead of base data values: open, high, low, close, volume, open interest movement from previous period data elements like the difference between close and open, difference between previous period close values, etc. are used. The experiments show that the network trained on derived values will be less accurate than with the base data values. If the derived values are used in parallel with the base data values then the network reaches the same accuracy, with less cycles, but the training time for each cycle is longer resulting in no win situation.
The observation that Data Preprocessing increases the speed of learning, but decreases accuracy appears logical because by pre-processing one will eliminate some intricate relationships in the data elements. These relationships have low correlation with the true value, but it is not zero. By preprocessing you may expose better the strong correlation relations, thus achieving faster learning. But you are not adding knowledge into the input data, but rather eliminating some, thus the result will be less accurate.
Here's a simple approach that has worked for me. First note, that jumps in the data values occur in absolute values of prices. Since we are interested in forecasting price movements (not absolute values) then we can use price movements as input data instead of absolute values. This way instead of open price level we will have the difference between open (today) and open (day before), etc. for input. To avoid jumps in the input data at switch of the contract the first set of new contract (after switch) values is calculated using the same contracts previous day's values rather than previous' contracts corresponding value. This also guarantees that the absolute values of input are always smaller than the daily movement limit adopted by the exchange.
Having attempted many empirical indicators to determine the training cut-off point I have not found a generally applicable formula. An indicator calculated after each training cycle (full run through the training file)
appears to work fine for agricultural commodities, but is out of synch for others.
A simple practical approach is to use a test file in parallel with the training file. After each training cycle the network should forecast the on the test file calculating total error of forecasting. Once the total forecasting error starts consistently increasing it is right time to stop training and use the weights file with best result on the test file. Note, that these indicators fluctuate, so a consistent increase or decrease is the only true indication.