Reinforcement Learning with TensorFlow
上QQ阅读APP看书,第一时间看更新

Logistic regression as a neural network

Logistic regression is a classifier algorithm. Here, we try to predict the probability of the output classes. The class with the highest probability becomes the predicted output. The error between the actual and predicted output is calculated using cross-entropy and minimized through backpropagation. Check the following diagram for binary logistic regression and multi-class logistic regression. The difference is based on the problem statement. If the unique number of output classes is two then it's called binary classification, if it's more than two then it's called multi-class classification. If there are no hidden layers, we use the sigmoid function for the binary classification and we get the architecture for binary logistic regression. Similarly, if there are no hidden layers and we use use the softmax function for the multi-class classification, we get the architecture for multi-class logistic regression.

Now a question arises, why not use the sigmoid function for multi-class logistic regression ?

The answer, which is true for all predicted output layers of any neural network, is that the predicted outputs should follow a probability distribution. In normal terms, say the output has N classes. This will result in N probabilities for an input data having, say, d dimensions. Thus, the sum of the N probabilities for this one input data should be 1 and each of those probabilities should be between 0 and 1 inclusive.

On the one hand, the summation of the sigmoid function for N different classes may not be 1 in the majority of cases. Therefore, in case of binary, the sigmoid function is applied to obtain the probability of one class, that is, p(y = 1|x), and for the other class the probability, that is, p(y = 0|x) = 1 ? p(y = 1|x). On the other hand, the output of a softmax function is values satisfying the probability distribution properties. In the diagram, refers to the sigmoid function:

A follow-up question might also arise: what if we use softmax in binary logistic regression?

As mentioned previously, as long as your predicted output follows the rules of probability distribution, everything is fine. Later, we will discuss cross entropy and the importance of probability distribution as a building block for any machine learning problem especially dealing with classification tasks.

A probability distribution is valid if the probabilities of all the values in the distribution are between 0 and 1, inclusive, and the sum of those probabilities must be 1.

Logistic regression can be viewed in a very small neural network. Let's try to go through a step-by-step process to implement a binary logistic regression, as shown here: