上QQ阅读APP看书，第一时间看更新

Recurrent neural networks

Recurrent neural networks, abbreviated as RNNs, is used in cases of sequential data, whether as an input, output, or both. The reason RNNs became so effective is because of their architecture to aggregate the learning from the past datasets and use that along with the new data to enhance the learning. This way, it captures the sequence of events, which wasn't possible in a feed forward neural network nor in earlier approaches of statistical time series analysis.

Consider time series data such as stock market, audio, or video datasets, where the sequence of events matters a lot. Thus, in this case, apart from the collective learning from the whole data, the order of learning from the data encountered over time matters. This will help to capture the underlying trend.

The ability to perform sequence based learning is what makes RNNs highly effective. Let's take a step back and try to understand the problem. Consider the following data diagram:

Imagine you have a sequence of events similar to the ones in the diagram, and at each point in time you want to make decisions as per the sequence of events. Now, if your sequence is reasonably stationary, you can use a classifier with similar weights for any time step but here's the glitch. If you run the same classifier separately at different time step data, it will not train to similar weights for different time steps. If you run a single classifier on the whole dataset containing the data of all the time step then the weights will be same but the sequence based learning is hampered. For our solution, we want to share weights over different time steps and utilize what we have learned till the last time step, as shown in the following diagram:

As per the problem, we have understood that our neural network should be able to consider the learnings from the past. This notion can be seen in the preceding diagrammatic representation, where in the first part it shows that at each time step, the network training the weights should consider the data learning from the past, and the second part gives the solution to that. We use a state representation of the classifier output from the previous time step as an input, along with the new time step data to learn the current state representation. This state representation can be defined as the collective learning (or summary) of what happened till last time step, recursively. The state is not the predicted output from the classifier. Instead, when it is subjected to a softmax activation function, it will yield the predicted output.

In order to remember further back, a deeper neural network would be required. Instead, we will go for a single model summarizing the past and provide that information, along with the new information, to our classifier.

Thus, at any time step, t, in a recurrent neural network, the following calculations occur :

.
and are weights and biases shared over time.
is the activation function .
refers to the concatenation of these two information. Say, your input,, is of shape , that is, n samples/rows and d dimensions/columns and is . Then, your concatenation would result a matrix of shape .

Since, the shape of any hidden state, , is . Therefore, the shape of is and is .

Since,

These operations in a given time step, t, constitute an RNN cell unit. Let's visualize the RNN cell at time step t, as shown here:

Once we are done with the calculations till the final time step, our forward propagation task is done. The next task would be to minimize the overall loss by backpropagating through time to train our recurrent neural network. The total loss of one such sequence is the summation of loss across all time steps, that is, if the given sequence of X values and their corresponding output sequence of Y values, the loss is given by:

Thus, the cost function of the whole dataset containing 'm' examples would be (where k refers to the example):

Since the RNNs incorporate the sequential data, backpropagation is extended to backpropagation through time. Here, time is a series of ordered time steps connecting one to the other, which allows backpropagation through different time steps.