上QQ阅读APP看书，第一时间看更新

Cross entropy

Cross entropy is yet another mathematical notion, allowing us to compare two distinct probability distributions, denoted by p and q. In fact, as you will see later, we often employ entropy-based loss function in neural networks when dealing with categorical features. Essentially, the cross entropy between two probability distributions (https://en.wikipedia.org/wiki/Probability_distribution), (p, q), over the same underlying set of events, measures the average number of pieces of information needed to identify an event picked at random from a set, under a condition; the condition being that the coding scheme used is optimized for a predicted probability distribution, rather than the true distribution. We will revisit this notion in later chapters to clarify and implement our understandings: