Generalization and robustness
The Liquid State Machine (LSM) (Maass et al., 2002b) , has had substantial success recently. A somewhat different paradigm of computation assumes that information is stored, not in “attractors” as usually assumed in recurrent neural networks (Hemmen et al., 2002; Hopfield, 1982), but in the continuing activity pattern of all the neurons that feed back in a sufficiently recurrent and interconnected network. In this way, the information is stored in a natural temporal fashion and is not transformed into spatial information. It can then be recognized by any sufficiently strong classifier such as an Adaline (Widrow and Hoff, 1960), Back-Propagation (Riedmiller and Braun, 1993), Support Vector machine (SVM) (Cortes and Vapnik, 1995) or Tempotron (Avesani et al., 2011). Moreover, the “persistence of the trace” (or as Maass puts it, the “fading memory” (Lukosevicius and Jaeger, 2009; Maass et al., 2005)) allows one to recognize at a temporal distance the signal sent to the liquid and the sequence and timing effects of inputs.
Fading memory in liquid state or echo state machines is the ability to retrieve memory stored in the activity patterns of the neurons for a limited period of time. As long as there is activity in the liquid or firing activity in the echo state one can retrieve the information using the detector. As time passes, however, the information or activity in the liquid or in the neurons fades or dies out, so that the activity converges to a general state hardly distinguishable from other activity patterns. At this point the memory is degraded and cannot be retrieved.
The LSM is a recurrent neural network. In its usual format (Lukosevicius and Jaeger, 2009; Maass et al., 2002b), each neuron is a biologically inspired artificially as by an IF neuron, a HH (Hodgkin and Huxley, 1952a) or an IN style neuron (Izhikevich, 2003). The connections between neurons define the dynamic process, and the recurrence connections called the topology in this thesis. The properties of the artificial neurons, together with these recurrences, result in transforming any sequence of history input into a spatiotemporal pattern activation of the liquid. The nomenclature comes from the intuitive possibility of looking at the network as a liquid like water in a pond, the stimuli are rocks thrown into the water, and the ripples on the pond are the spatiotemporal pattern. Interestingly, since the reverberations are a function of the network and its recurrency, such temporal storage is also available using static, MP neurons, for example, and this has been exploited in the Echo State machine idea. (The two situations are collectively called reservoir computing. However, see below, since the temporal aspects are solely in the network, the Echo state machine requires stronger connectivity andinterneuron weights tomaintain a signal, which makes it less robustness than the liquid.)
The use of a detector is standard in the LSM community and dates back to Maass et al (Jaeger, 2001a; Lukosevicius and Jaeger, 2009; Maass, 2002; Maass et al., 2002c). The idea is that the detectors are testing whether the information for classification resides in the liquid; and thus are not required to be biological. (How a biological network “uses” the information is a completely separate question not addressed in this model.) Thus it is theoretically possible for the detectors to recognize any spatiotemporal signal fed into the liquid, so the system could be used, for instance, for speech recognition or temporal vision.
Weaknesses of the basic LSM Models
There are two sources of potential instability. First is the issue of small variants in the input. Systems have to balance the need for separation with that of generalization. That is, on the one hand, one may need to separate inputs with small variations into separate treatment, but, with that, small variants may need to be treated as “noise” or generalization of the trained system. For the LSM, as typically presented in the literature, it is understood, e.g. from the work of (Lukosevicius and Jaeger, 2009; Maass, 2002) that the LSM and its variants do this successfully in the case of spatiotemporal signals. The second issue concerns the sensitivity of the system to small changes within itself, which we choose to call damages. This is very important if, as is the case for LSM, it is supposed to explain biological systems.
Results: LSM is not robust to internal damage
As there was little difference between the detectors we eventually restricted ourselves to the back-propagation detector. (None of units of the liquid input accessed by the detectors were allowed to be input neurons of the liquid.) It turned out that while the detector is able to learn the randomly chosen test classes successfully, if there is sufficient average connectivity of 20%, almost any kind of damage caused the detector a very substantial decay in its detecting ability. Even with lower connectivity, which has less feedback, the same phenomenon occurs.
After further experiments, we returned to this point (see concluding remarks, below). In Figure 15 the differencein reaction of the networkis illustrated by a raster (ISI) display. With 10% damage, it is quiteevident that the network diverges dramatically from the noise free situation. In Table 1 through Table 4 this isevident as well with 5% noise for purely random connectivity. Actually, with low degrees of damage the detectors even under theMaass connectivity show dramatic decay in recognition although not to the extremes of random connectivity. These results were robust and repeatable under many trials and variants.Accordingly, we conclude that the LSM, either as purely defined with random connectivity, or, as implemented in (Maass et al., 2002b) cannot serve as a biologically relevant model.
Small world topologies with double power-law distributionAccordingly, using Algorithm 1 and Algorithm 2 we created a topology with power law connectivity but it is in the reverse order for input connections and output connections as in Figure 22. For example: neuron 1 in Figure 22 has almost 250 input connections, but also two output connections, while neuron 243 in Figure 22, has almost 250 output connections, but only two input connection. In general, the two algorithms create a very specific pattern of connectivity; the number of connections to each neuron calculated according to its neighbor by considering the input and output connectivity separately. In essence, neuron that have many of output connections have fewer input connection and vice versa, neuron that have many of input connection have fewer output connections.
In this work, we looked at the robustness of the LSM paradigm and by experimenting with temporal sequences showed that the basic structural set up in the literature is not robust to two kinds of damages; even at low levels.
We also investigated this for various degrees of connectivity. While lowering the average degree of connectivity resulted in decreased sensitivity in all architectures to some extent, the bottom line is that decreased connectivity is ineffective. In addition, it became evident that lowering the connectivity also decreases the strength the network has in representability and, importantly, in the persistence of the signal. That is, a low degree of connectivity causes the activity to die down quickly because of the lack of feedback. Thus the network is bounded in time and cannot recognize an older input signal.) We see, then, as expected from the analysis in (Jaeger, 2001a, 2001b, 2002; Maass et al., 2002b) that a higher connectivity gives a larger set of filters that separate signals, but on the other hand makes it more sensitive to changes.
In any case, even with low connectivities, neither the random nor the Maass topology was robust. (While not at random levels of identification, as seen, it suffered very substantial decays with even small amounts of damages. In addition, other experiments not shown here, with connectivities below 15% – 20%, show that the networks do not maintain the trace for long time. )
We also investigated variants in the kinds of neurons. It seems that the LSM (or the reservoir computing concept) does not change much vis-à-vis robustness to internal noise based on these choices.
There was substantial improvement when supplying a window of time input to the detector rather than an instant of time. However, alone this was not sufficient.
The major effect was changing the topology of connectivity to accommodate the idea of hubs, power law and small world connectivity. Under these topologies, with the best result occurring when we have power law histograms of both input and output connectivity to the neurons with separate neurons as hubs in both directions are present, the liquids are robust to damages.
It has been shown experimentally that the basic LSM is not robust to damages in its underlying neurons and thus without elaboration it cannot be seen as a good fit for a model for biological computation. (Data not shown here indicates that this result holds even if training is continued while the network is suffering damage.)
In the thesis (Bassett andBullmore, 2006;Varshney et al., 2011), a distributionwas chosen for biological reasons to allow preference for close neurons. It is superior to the totally random one, but still not sufficiently robust. Choosing a power law distribution and taking care to make different assignments for in and out connectivity provedto be the best. Since thisis thought of as a potentially biological arrangement (Barabási and Albert, 1999; Bassett andBullmore, 2006);LSM style networks with thisadditional topological constraint can,as of now, be considered sufficiently biological. Other distributions may also work.
Source Code For The Liquid State Machine (LSM) can be found here:
Snippets from the article Topological constraints and robustness in liquid state machines