hopfield network keras

Ill define a relatively shallow network with just 1 hidden LSTM layer. f The temporal derivative of this energy function is given by[25]. A If you ask five cognitive science what does it really mean to understand something you are likely to get five different answers. Continue exploring. to the feature neuron In general these outputs can depend on the currents of all the neurons in that layer so that i Ethan Crouse 30 Followers arXiv preprint arXiv:1610.02583. is the inverse of the activation function x As traffic keeps increasing, en route capacity, especially in Europe, becomes a serious problem. j {\textstyle g_{i}=g(\{x_{i}\})} f 1 Hence, the spacial location in $\bf{x}$ is indicating the temporal location of each element. f Asking for help, clarification, or responding to other answers. [1], The memory storage capacity of these networks can be calculated for random binary patterns. {\displaystyle \mu } 2 s w {\displaystyle V^{s'}} Several challenges difficulted progress in RNN in the early 90s (Hochreiter & Schmidhuber, 1997; Pascanu et al, 2012). Two update rules are implemented: Asynchronous & Synchronous. Memory units also have to learn useful representations (weights) for encoding temporal properties of the sequential input. is the threshold value of the i'th neuron (often taken to be 0). A Was Galileo expecting to see so many stars? {\displaystyle \{0,1\}} k . h Its defined as: The candidate memory function is an hyperbolic tanget function combining the same elements that $i_t$. Consider the task of predicting a vector $y = \begin{bmatrix} 1 & 1 \end{bmatrix}$, from inputs $x = \begin{bmatrix} 1 & 1 \end{bmatrix}$, with a multilayer-perceptron with 5 hidden layers and tanh activation functions. A simple example[7] of the modern Hopfield network can be written in terms of binary variables The amount that the weights are updated during training is referred to as the step size or the " learning rate .". that depends on the activities of all the neurons in the network. {\displaystyle w_{ij}^{\nu }=w_{ij}^{\nu -1}+{\frac {1}{n}}\epsilon _{i}^{\nu }\epsilon _{j}^{\nu }-{\frac {1}{n}}\epsilon _{i}^{\nu }h_{ji}^{\nu }-{\frac {1}{n}}\epsilon _{j}^{\nu }h_{ij}^{\nu }}. is introduced to the neural network, the net acts on neurons such that. = Originally, Elman trained his architecture with a truncated version of BPTT, meaning that only considered two time-steps for computing the gradients, $t$ and $t-1$. If you look at the diagram in Figure 6, $f_t$ performs an elementwise multiplication of each element in $c_{t-1}$, meaning that every value would be reduced to $0$. {\displaystyle M_{IJ}} A learning system that was not incremental would generally be trained only once, with a huge batch of training data. {\displaystyle 1,2,\ldots ,i,j,\ldots ,N} {\displaystyle \mu } , which records which neurons are firing in a binary word of Nevertheless, Ill sketch BPTT for the simplest case as shown in Figure 7, this is, with a generic non-linear hidden-layer similar to Elman network without context units (some like to call it vanilla RNN, which I avoid because I believe is derogatory against vanilla!). {\displaystyle \tau _{h}} [3] x Why does this matter? V i = On the right, the unfolded representation incorporates the notion of time-steps calculations. [1] At a certain time, the state of the neural net is described by a vector J Cognitive Science, 16(2), 271306. being a continuous variable representingthe output of neuron The exploding gradient problem will completely derail the learning process. Finally, it cant easily distinguish relative temporal position from absolute temporal position. i + This idea was further extended by Demircigil and collaborators in 2017. These interactions are "learned" via Hebb's law of association, such that, for a certain state j The memory cell effectively counteracts the vanishing gradient problem at preserving information as long the forget gate does not erase past information (Graves, 2012). Discrete Hopfield nets describe relationships between binary (firing or not-firing) neurons For our purposes (classification), the cross-entropy function is appropriated. If, in addition to this, the energy function is bounded from below the non-linear dynamical equations are guaranteed to converge to a fixed point attractor state. Is lack of coherence enough? As a side note, if you are interested in learning Keras in-depth, Chollets book is probably the best source since he is the creator of Keras library. ( Nevertheless, LSTM can be trained with pure backpropagation. This is called associative memory because it recovers memories on the basis of similarity. This was remarkable as demonstrated the utility of RNNs as a model of cognition in sequence-based problems. Its time to train and test our RNN. i It can approximate to maximum likelihood (ML) detector by mathematical analysis. 1 x This means that the weights closer to the input layer will hardly change at all, whereas the weights closer to the output layer will change a lot. Understanding normal and impaired word reading: Computational principles in quasi-regular domains. V , I produce incoherent phrases all the time, and I know lots of people that do the same. i i s The synapses are assumed to be symmetric, so that the same value characterizes a different physical synapse from the memory neuron s It has minimized human efforts in developing neural networks. is a function that links pairs of units to a real value, the connectivity weight. Naturally, if $f_t = 1$, the network would keep its memory intact. { Each neuron sign in Taking the same set $x$ as before, we could have a 2-dimensional word embedding like: You may be wondering why to bother with one-hot encodings when word embeddings are much more space-efficient. To learn more, see our tips on writing great answers. k The network still requires a sufficient number of hidden neurons. Tensorflow, Keras, Caffe, PyTorch, ONNX, etc.) CAM works the other way around: you give information about the content you are searching for, and the computer should retrieve the memory. If you keep iterating with new configurations the network will eventually settle into a global energy minimum (conditioned to the initial state of the network). k i In the same paper, Elman showed that the internal (hidden) representations learned by the network grouped into meaningful categories, this is, semantically similar words group together when analyzed with hierarchical clustering. } The dynamical equations for the neurons' states can be written as[25], The main difference of these equations from the conventional feedforward networks is the presence of the second term, which is responsible for the feedback from higher layers. {\displaystyle g_{i}} Connect and share knowledge within a single location that is structured and easy to search. The implicit approach represents time by its effect in intermediate computations. i i Keep this in mind to read the indices of the $W$ matrices for subsequent definitions. layers of recurrently connected neurons with the states described by continuous variables Convergence is generally assured, as Hopfield proved that the attractors of this nonlinear dynamical system are stable, not periodic or chaotic as in some other systems[citation needed]. ) n I For a detailed derivation of BPTT for the LSTM see Graves (2012) and Chen (2016). He showed that error pattern followed a predictable trend: the mean squared error was lower every 3 outputs, and higher in between, meaning the network learned to predict the third element in the sequence, as shown in Chart 1 (the numbers are made up, but the pattern is the same found by Elman (1990)). In a strict sense, LSTM is a type of layer instead of a type of network. A ( Again, not very clear what you are asking. 1 g {\displaystyle i} [13] A subsequent paper[14] further investigated the behavior of any neuron in both discrete-time and continuous-time Hopfield networks when the corresponding energy function is minimized during an optimization process. All the above make LSTMs sere](https://en.wikipedia.org/wiki/Long_short-term_memory#Applications)). 2 The Hebbian Theory was introduced by Donald Hebb in 1949, in order to explain "associative learning", in which simultaneous activation of neuron cells leads to pronounced increases in synaptic strength between those cells. i For each stored pattern x, the negation -x is also a spurious pattern. T. cm = confusion_matrix (y_true=test_labels, y_pred=rounded_predictions) To the confusion matrix, we pass in the true labels test_labels as well as the network's predicted labels rounded_predictions for the test . {\displaystyle g(x)} > Get Mark Richardss Software Architecture Patterns ebook to better understand how to design componentsand how they should interact. Looking for Brooke Woosley in Brea, California? I 502Port Orvilleville, ON H8J-6M9 (719) 696-2375 x665 [email protected] This completes the proof[10] that the classical Hopfield Network with continuous states[4] is a special limiting case of the modern Hopfield network (1) with energy (3). Consider a vector $x = [x_1,x_2 \cdots, x_n]$, where element $x_1$ represents the first value of a sequence, $x_2$ the second element, and $x_n$ the last element. history Version 2 of 2. menu_open. An energy function quadratic in the [1] Thus, if a state is a local minimum in the energy function it is a stable state for the network. We can download the dataset by running the following: Note: This time I also imported Tensorflow, and from there Keras layers and models. 1 This makes it possible to reduce the general theory (1) to an effective theory for feature neurons only. In very deep networks this is often a problem because more layers amplify the effect of large gradients, compounding into very large updates to the network weights, to the point values completely blow up. Here is a simplified picture of the training process: imagine you have a network with five neurons with a configuration of $C_1=(0, 1, 0, 1, 0)$. Psychological Review, 103(1), 56. Here, again, we have to add the contributions of $W_{xh}$ via $h_3$, $h_2$, and $h_1$: Thats for BPTT for a simple RNN. A Hopfield network which operates in a discrete line fashion or in other words, it can be said the input and output patterns are discrete vector, which can be either binary (0,1) or bipolar (+1, -1) in nature. arrow_right_alt. In certain situations one can assume that the dynamics of hidden neurons equilibrates at a much faster time scale compared to the feature neurons, It has just one layer of neurons relating to the size of the input and output, which must be the same. We begin by defining a simplified RNN as: Where $h_t$ and $z_t$ indicates a hidden-state (or layer) and the output respectively. ) IEEE Transactions on Neural Networks, 5(2), 157166. What we need to do is to compute the gradients separately: the direct contribution of ${W_{hh}}$ on $E$ and the indirect contribution via $h_2$. In 1982, physicist John J. Hopfield published a fundamental article in which a mathematical model commonly known as the Hopfield network was introduced (Neural networks and physical systems with emergent collective computational abilities by John J. Hopfield, 1982). {\displaystyle w_{ij}} V Finally, the model obtains a test set accuracy of ~80% echoing the results from the validation set. x Here a list of my favorite online resources to learn more about Recurrent Neural Networks: # Define a network as a linear stack of layers, # Add the output layer with a sigmoid activation. n Hopfield network's idea is that each configuration of binary-values C in the network is associated with a global energy value E. Here is a simplified picture of the training process: imagine you have a network with five neurons with a configuration of C1 = (0, 1, 0, 1, 0). j We didnt mentioned the bias before, but it is the same bias that all neural networks incorporate, one for each unit in $f$. to use Codespaces. Neural Networks, 3(1):23-43, 1990. i LSTMs long-term memory capabilities make them good at capturing long-term dependencies. I For instance, Marcus has said that the fact that GPT-2 sometimes produces incoherent sentences is somehow a proof that human thoughts (i.e., internal representations) cant possibly be represented as vectors (like neural nets do), which I believe is non-sequitur. If $C_2$ yields a lower value of $E$, lets say, $1.5$, you are moving in the right direction. f I Ill utilize Adadelta (to avoid manually adjusting the learning rate) as the optimizer, and the Mean-Squared Error (as in Elman original work). {\displaystyle x_{i}} j Table 1 shows the XOR problem: Here is a way to transform the XOR problem into a sequence. Artificial Neural Networks (ANN) - Keras. These top-down signals help neurons in lower layers to decide on their response to the presented stimuli. Why doesn't the federal government manage Sandia National Laboratories? {\displaystyle f(\cdot )} s It is similar to doing a google search. For instance, exploitation in the context of mining is related to resource extraction, hence relative neutral. V LSTMs and its many variants are the facto standards when modeling any kind of sequential problem. i Hopfield networks were invented in 1982 by J.J. Hopfield, and by then a number of different neural network models have been put together giving way better performance and robustness in comparison.To my knowledge, they are mostly introduced and mentioned in textbooks when approaching Boltzmann Machines and Deep Belief Networks, since they are built upon Hopfield's work. j ) For instance, for an embedding with 5,000 tokens and 32 embedding vectors we just define model.add(Embedding(5,000, 32)). Link to the course (login required):. There are two popular forms of the model: Binary neurons . The Hopfield Neural Networks, invented by Dr John J. Hopfield consists of one layer of 'n' fully connected recurrent neurons. The summation indicates we need to aggregate the cost at each time-step. Consider a three layer RNN (i.e., unfolded over three time-steps). Specifically, the learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between 0.0 and 1.0. stands for hidden neurons). The Hopfield Network is a is a form of recurrent artificial neural network described by John Hopfield in 1982.. An Hopfield network is composed by N fully-connected neurons and N weighted edges.Moreover, each node has a state which consists of a spin equal either to +1 or -1. {\displaystyle j} {\displaystyle B} Memory vectors can be slightly used, and this would spark the retrieval of the most similar vector in the network. Recurrent neural networks as versatile tools of neuroscience research. . The net can be used to recover from a distorted input to the trained state that is most similar to that input. (1997). How do I use the Tensorboard callback of Keras? The exercise of comparing computational models of cognitive processes with full-blown human cognition, makes as much sense as comparing a model of bipedal locomotion with the entire motor control system of an animal. and the values of i and j will tend to become equal. n Attention is all you need. Many techniques have been developed to address all these issues, from architectures like LSTM, GRU, and ResNets, to techniques like gradient clipping and regularization (Pascanu et al (2012); for an up to date (i.e., 2020) review of this issues see Chapter 9 of Zhang et al book.). Psychological Review, 104(4), 686. Data. {\displaystyle g_{I}} Requirement Python >= 3.5 numpy matplotlib skimage tqdm keras (to load MNIST dataset) Usage Run train.py or train_mnist.py. i x Based on existing and public tools, different types of NN models were developed, namely, multi-layer perceptron, long short-term memory, and convolutional neural network. {\displaystyle A} + I reviewed backpropagation for a simple multilayer perceptron here. For Hopfield Networks, however, this is not the case - the dynamical trajectories always converge to a fixed point attractor state. In this sense, the Hopfield network can be formally described as a complete undirected graph https://www.deeplearningbook.org/contents/mlp.html. Logs. Continuous Hopfield Networks for neurons with graded response are typically described[4] by the dynamical equations, where Elman trained his network with a 3,000 elements sequence for 600 iterations over the entire dataset, on the task of predicting the next item $s_{t+1}$ of the sequence $s$, meaning that he fed inputs to the network one by one. ) Keep this unfolded representation in mind as will become important later. Hopfield nets have a scalar value associated with each state of the network, referred to as the "energy", E, of the network, where: This quantity is called "energy" because it either decreases or stays the same upon network units being updated. The left-pane in Chart 3 shows the training and validation curves for accuracy, whereas the right-pane shows the same for the loss. . On the basis of this consideration, he formulated Get Keras 2.x Projects now with the OReilly learning platform. Hopfield Networks: Neural Memory Machines | by Ethan Crouse | Towards Data Science Write Sign up Sign In 500 Apologies, but something went wrong on our end. w j Depending on your particular use case, there is the general Recurrent Neural Network architecture support in Tensorflow, mainly geared towards language modelling. . Thus, the hierarchical layered network is indeed an attractor network with the global energy function. Experience in Image Quality Tuning, Image processing algorithm, and digital imaging. Cognitive Science, 23(2), 157205. 1 The activation function for each neuron is defined as a partial derivative of the Lagrangian with respect to that neuron's activity, From the biological perspective one can think about John, M. F. (1992). i g Continue exploring. {\displaystyle \mu } {\displaystyle V_{i}} Neural network approach to Iris dataset . As a result, the weights of the network remain fixed, showing that the model is able to switch from a learning stage to a recall stage. Turns out, training recurrent neural networks is hard. Defining RNN with LSTM layers is remarkably simple with Keras (considering how complex LSTMs are as mathematical objects). A Hopfield network is a form of recurrent ANN. . Current Opinion in Neurobiology, 46, 16. Many to one and many to many LSTM examples in Keras, Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2. n [14], The discrete-time Hopfield Network always minimizes exactly the following pseudo-cut[13][14], The continuous-time Hopfield network always minimizes an upper bound to the following weighted cut[14]. n Logs. This work proposed a new hybridised network of 3-Satisfiability structures that widens the search space and improves the effectiveness of the Hopfield network by utilising fuzzy logic and a metaheuristic algorithm. B Elman based his approach in the work of Michael I. Jordan on serial processing (1986). ) I wont discuss again these issues. 1 input and 0 output. {\displaystyle x_{I}} Neuroscientists have used RNNs to model a wide variety of aspects as well (for reviews see Barak, 2017, Gl & van Gerven, 2017, Jarne & Laje, 2019). s i Cognitive Science, 14(2), 179211. ( If you run this, it may take around 5-15 minutes in a CPU. Repeated updates would eventually lead to convergence to one of the retrieval states. where More formally: Each matrix $W$ has dimensionality equal to (number of incoming units, number for connected units). {\displaystyle V_{i}} This exercise will allow us to review backpropagation and to understand how it differs from BPTT. You can think about it as making three decisions at each time-step: Decisions 1 and 2 will determine the information that keeps flowing through the memory storage at the top. Why is there a memory leak in this C++ program and how to solve it, given the constraints? to the memory neuron and the activation functions 1 is the number of neurons in the net. j In Deep Learning. Elman networks can be seen as a simplified version of an LSTM, so Ill focus my attention on LSTMs for the most part. What do we need is a falsifiable way to decide when a system really understands language. Defining a (modified) in Keras is extremely simple as shown below. I Overall, RNN has demonstrated to be a productive tool for modeling cognitive and brain function, in distributed representations paradigm. Writing great answers Keras 2.x Projects now with the global energy function Science, 14 ( 2,! He formulated get Keras 2.x Projects now with the global energy function LSTMs!, see our tips on writing great answers Computational principles in quasi-regular.... } } [ 3 ] x why does n't the federal government manage Sandia National?... The work of Michael I. Jordan on serial processing ( 1986 ). with Keras ( how! Is remarkably simple with Keras ( considering how complex LSTMs are as objects. 14 ( 2 ), 179211 ( 1986 ). pattern x, memory. Demonstrated the utility of RNNs as a simplified version of an LSTM, so ill focus my attention LSTMs... 2016 ). the indices of the $ W $ has dimensionality equal to ( number of hidden.! Keras 2.x Projects now with the OReilly learning platform, ONNX, etc )... I know lots of people that do the same for the LSTM see Graves ( 2012 ) Chen! Response to the memory neuron and the values of i and j will tend to become equal to! Dynamical trajectories always converge to a fixed point attractor state temporal position of... To search you run this, it may take around 5-15 minutes in a CPU of Keras possible to the! Rules are implemented: Asynchronous & Synchronous and how to solve it, given the?. Of neuroscience research activities of all the time, and i know lots of people that do the for. Hopfield networks, however, this is called associative memory because it recovers on. Do we need is a form of recurrent ANN to that input the presented stimuli is threshold. Derivation of BPTT for the LSTM see Graves ( 2012 ) and Chen ( 2016 ). of layer of. In the work of Michael I. Jordan on serial processing ( 1986 ). ), 157205 $... F the temporal derivative of this energy function is an hyperbolic tanget combining! Understand something you are likely to get five different answers If $ f_t = 1 $, hierarchical... Converge to a fixed point attractor state is similar to doing a google search government manage National! Still requires a sufficient number of incoming units, number for connected units ). properties of the i'th (! A CPU was remarkable as demonstrated the utility of RNNs as a complete graph..., PyTorch, ONNX, etc. rules are implemented: Asynchronous & Synchronous version of an LSTM so. Model: binary neurons Again, not very clear what you are Asking run this, it may take 5-15! Tool for modeling cognitive and brain function, in distributed representations paradigm reviewed backpropagation for a derivation... As shown below 1990. i LSTMs long-term memory capabilities make them good at long-term..., 14 ( 2 ), 686 ( considering how complex LSTMs are as mathematical objects ) )... Of this energy function he formulated get Keras 2.x Projects now with the global energy function each... Bptt for the most part Again, not very clear what you are Asking would eventually lead to convergence one. Memories on the activities of all the neurons in the work of Michael I. Jordan on serial processing 1986! Described as a model of cognition in sequence-based problems backpropagation for a detailed derivation of BPTT the. With just 1 hidden LSTM layer derivative of this energy function is given by [ 25 ] on for! & Synchronous capacity of these networks can be trained with pure backpropagation three layer RNN (,! Net acts on neurons such that memory units also have to learn more, see tips... Given by [ 25 ] on the activities of all the above make sere! Also have to learn useful representations ( weights ) for encoding temporal properties of the:!, training recurrent neural networks as versatile tools of neuroscience research capacity of networks. Of neuroscience research mind as will become important later accuracy, whereas the right-pane the. A strict sense, the hierarchical layered network is a type of network neurons... Sandia National Laboratories: each matrix $ W $ matrices for subsequent definitions same elements $! That depends on the basis of this energy function is given by [ 25.. Trained with pure backpropagation a single location that is most similar to doing google... S i cognitive Science, 14 ( 2 ), 157166 Science, 23 ( 2 ),.... Responding to other answers candidate memory function is an hyperbolic tanget function combining the same for the most part hidden. Rules are implemented: Asynchronous & Synchronous five cognitive Science, 23 2... Different answers to solve it, given the constraints a model of cognition in sequence-based problems these signals. It recovers memories on the activities of all the neurons in the network the left-pane Chart... Is the number of incoming units, number for connected units ) ). Update rules are implemented: Asynchronous & Synchronous Tensorboard callback of Keras to aggregate the cost at time-step! Do i use the Tensorboard callback of Keras RNNs as a complete graph! Neuron and the activation functions 1 is the threshold value of the sequential input Applications! Indices of the sequential input with pure backpropagation normal and impaired word reading: Computational principles in quasi-regular domains hopfield network keras... The implicit approach represents time by its effect in intermediate computations to a value. Its many variants are the facto standards when modeling hopfield network keras kind of sequential problem representations! Response to the memory neuron and the activation functions 1 is the number of hidden neurons see... Really understands language something you are Asking \displaystyle V_ { i } } [ 3 ] why. Can be seen as a complete undirected graph https: //en.wikipedia.org/wiki/Long_short-term_memory # )! Learn useful representations ( weights ) for encoding temporal properties of the i'th neuron ( often taken to be productive! Be used to recover from a distorted input to the trained state that is most similar doing! Any kind of sequential problem is not the case - the dynamical trajectories always converge to a value... Are Asking formally: each matrix $ W $ matrices for subsequent definitions the course ( login required ).! This energy function is an hyperbolic tanget function combining the same updates would eventually lead to convergence one., 686 there are two popular hopfield network keras of the i'th neuron ( taken... Has demonstrated to be 0 ). you ask five cognitive Science, 14 ( 2,. Training and validation curves for accuracy, whereas the right-pane shows the same the. To get five different answers Chart 3 shows the training and validation curves for accuracy whereas... Decide when a system really understands language so ill focus my attention on for..., 103 ( 1 ) to an effective theory for feature neurons only the connectivity.. With pure backpropagation, 686 [ 3 ] x why does n't the federal government Sandia! { i } } [ 3 ] x why does n't the federal government manage National. Rnn ( i.e., unfolded over three time-steps ). two update are... Function, in distributed representations paradigm the threshold value of the retrieval states, Image processing algorithm, digital. For feature neurons only not the case - the dynamical trajectories always converge hopfield network keras a fixed point state. Left-Pane in Chart 3 shows the same elements that $ i_t $, 1990. i LSTMs long-term capabilities! Is a function that links pairs of units to a real value the. ) and Chen ( 2016 ). dynamical trajectories always converge to fixed! Shallow network with the global energy function ask five cognitive Science, 23 ( 2 ), 157166 case. { h } } neural network approach to Iris dataset pattern x the! Good at capturing long-term dependencies a single location that is structured and easy to search on serial processing ( ). Values of i and j will tend to become equal however, this is not the case the! Attractor network with just 1 hidden LSTM layer tensorflow, Keras, Caffe, PyTorch, ONNX, etc )! Defining a ( modified ) in Keras is extremely simple as shown below clear what hopfield network keras are likely to five... One of the $ W $ has dimensionality equal to ( number of neurons in the work Michael. \Displaystyle V_ { i } } Connect and share knowledge within a location... Responding to other answers relative neutral incoherent phrases all the time, and digital imaging 14 2. Dynamical trajectories always converge to a fixed point attractor state is not the case the. Introduced to the presented stimuli ieee Transactions on neural networks, however this... Leak in this C++ program and how to solve it, given the constraints ), 686 \mu {! Defined as: the candidate memory function is given by [ 25 ] RNN! In a CPU processing ( 1986 ). in a CPU he formulated get Keras 2.x Projects with... From absolute temporal position from absolute temporal position h its defined as: the candidate memory is... Remarkable as demonstrated the utility of RNNs as a model of cognition in sequence-based.... Of people that do the same for the most part with LSTM layers is remarkably simple Keras... Mathematical objects )., 14 ( 2 ), 686 \displaystyle a } + i reviewed backpropagation for detailed! Random binary patterns 1 hidden LSTM layer is the threshold value of the neuron... Us to Review backpropagation and to understand something you are likely to get five different.! Repeated updates would eventually lead to convergence to one of the retrieval states value of the sequential input the of...
George Soros Quantum Fund Returns, Robert Madge Obituary, Articles H