
Bayesian decision theory
Bayesian decision theory is an approach to statistical inference in which the probabilities are not interpreted as frequencies, proportions, or similar concepts, but rather as levels of confidence in the occurrence of a given event. The name is derived from Bayes' theorem, which is the foundation of this approach. To understand the concepts behind this theory, it is necessary to introduce some definitions. Consider the following example.
In a bag, there are seven white balls and three black balls. Except for their color, the balls are identical. They are made of the same material, they are the same size, they are perfectly spherical, and so on. I'll put my hand in the bag without looking inside, pulling out a random ball. What is the probability that the ball that I pulled out is black? The possibilities are as follows:
- The balls, in all, are 7 + 3 = 10. By pulling out a ball, I have ten possible cases. I have no reason to think that some balls are more privileged than others, that is, they are more likely to be pulled out. Therefore, the ten possible cases are equally probable.
- Of these ten possible cases, there are only three cases in which the ball pulled out is black. These are the cases that are favorable to the expected event.
The black ball being pulled out has three out of ten possible occurrences. We will define its probability as the ratio between the favorable and the possible cases. Therefore, we get the following:
Probability (black ball) = 3/10 = 0.3 = 30%
As we have shown in the previous example, the probability of an event can be expressed as follows:
- As a fraction, for example, 3/10
- As a decimal number, for example, 3/10 = 3:10 = 0.30
- As a percentage, for example, 0.30 = 30%
Resolving this problem gives us the opportunity to give a chance definition of an event: the probability (a priori) that, a given event (E) occurs, is the ratio between the number (s) of favorable cases of the event itself and the total number (n) of the possible cases, provided all considered cases are equally probable. This can be better represented using the following formula:
Let's take a look at two simple examples:
- By throwing a coin, what is the probability that it shows a head? The possible cases are two—heads and tails {H, T}, so the favorable cases are 1{H}. Therefore, P(head) = ½ =0.5 = 50%.
- By throwing a dice, what is the probability that a 5 is thrown? The possible cases are 6, {1, 2, 3, 4, 5, 6}, and the favorable cases are 1{5}. Therefore, P(5) =1/6 =0.166 =16.6%.
To define probability, we use the concept of equally likely events. It is therefore necessary to clarify what is meant by equally likely events. To this end, the concept of the principle of insufficient reason (or principle of indifference) can be introduced, which states the following:
To calculate the number of possible and favorable cases, in many cases, combinatorial calculations are required. Previously, we defined probability as the ratio between the number of favorable cases and the number of possible cases. But what values can it take? The probability of an event P(E) is always a number between 0 and 1:
The extreme values are defined as follows:
- An event that has a probability of 0 is called an impossible event. Suppose we have six red balls in a bag—what is the probability of picking a black ball? The possible cases are 6; the favorable cases are 0 because there are no black balls in the bag. Thus, P(E) = 0/6 = 0.
- An event that has a probability of 1 is called a certain event. Suppose we have six red balls in a bag—what is the probability of picking a red ball? The possible cases are 6; the favorable cases are 6 because there are only red balls in the bag. Thus, P(E) = 6/6 =1.
The classical definition of probability, based on a discrete and finite number of events, is hardly extendable to the case of continuous variables. The ideal condition of perfect uniformity, where all possible outcomes (the space of events) are previously known and all are equally probable, is a weak element of that definition. The latter condition is also imposed before defining the notion of probability, resulting in circularity in the definition.
An important advance compared to the classic concept in which probability is established a priori, before looking at the data, is contained in the most frequent definition of probability; this is instead obtained later, after examining the data. According to this concept, the probability of an event is the limit to which the relative frequency of the event tends when the number of trials tends to infinity. This definition can also be applied without prior knowledge of the space of events and without assuming the condition of equally likely events. However, it is assumed that the experiment is repeatable several times, ideally infinitely, under the same conditions.
We can then say that, in a series of repeated tests under the same conditions, each of the possible events is manifested with a relative frequency that is close to its probability. This can be defined as follows:
In the Bayesian approach, probability is a measure of the degree of credibility of one proposition. This definition applies to any event. Bayesian probability is an inverse probability; we switch from observed frequencies to probability. In the Bayesian approach, the probability of a given event is determined before making the experiment, based on personal considerations. The a priori probability is therefore tied to the degree of credibility of the event, set in a subjective way. With Bayes' theorem, on the basis of the frequencies observed, we can adjust the probability a priori to reach the probability a posteriori. Then, by using this approach, an estimate of the degree of credibility of a given hypothesis before observation of data is used to associate a numerical value with the degree of credibility of that hypothesis after data observation.
So far, we've talked about the likelihood of an event, but what happens when the possible events are more than one? Two random events, A and B, are independent if the probability of the occurrence of event A is not dependent on whether event B has occurred, and vice versa. For example, let's say we have two 52 decks of French playing cards. When extracting a card from each deck, the following two events are independent:
- E1: The card extracted from the first deck is an ace
- E2: The card extracted from the second deck is a clubs card
The two events are independent, and each can happen with the same probability, independently of the other's occurrence.
Conversely, a random event, A, is dependent on another event, B, if the probability of event A depends on whether event B has occurred or not. Suppose we have a deck of 52 cards. By extracting two cards in succession without putting the first card back in the deck, the following two events are dependent:
- E1: The first extracted card is an ace
- E2: The second extracted card is an ace
To be precise, the probability of E2 depends on whether or not E1 occurs. Hence, we can see the following:
- The probability of E1 is 4/52
- The probability of E2 if the first card was an ace is 3/51
- The probability of E2 if the first card was not an ace is 4/51
Now let's deal with other cases of mutual interaction between events. Accidental events that cannot occur simultaneously on a given trial are considered mutually exclusive or disjointed. By extracting a card from a deck of 52, the following two events are mutually exclusive:
- E1: The ace of hearts comes out
- E2: One face card comes out
Indeed, the two events just mentioned cannot occur simultaneously, meaning that an ace cannot be a figure. Two events are, however, exhaustive or joint if at least one of them must occur at a given trial. By extracting a card from a deck of 52, the following two events are exhaustive:
- E1: One face card comes out
- E2: One number card comes out
These events are exhaustive because their union includes all possible events. Now let's deal with the case of joint probability, both independent and dependent. Given two eventsA and B, if the two events are independent (the occurrence of one does not affect the probability of the other), the joint probability of the event is equal to the product of the probabilities of A and B:
Let's look at an example. We have two decks of 52 cards. By extracting a card from each deck, let's consider the two independent events:
- A: The card extracted from the first deck is an ace
- B: The card extracted from the second deck is a clubs card
What is the probability that both of them occur?
- The probability that A will occur is 4/52
- The probability that B will occur is 13/52
- The probability that both will occur will therefore be 4/52 * 13/52 = 52 /(52 * 52) = 1/52
If the two events are dependent (that is, the occurrence of one affects the probability of the other), then the same rule may apply, provided P(B|A) is the probability of event A given that event B has occurred. This condition introduces conditional probability, which we are going to dive into now:
A bag contains two white balls and three red balls. Two balls are pulled out from the bag in two successive extractions without reintroducing the first ball that was pulled out of the bag.
Calculate the probability that the two balls extracted are both white:
- The probability that the first ball is white is 2/5
- The probability that the second ball is white, provided that the first ball is white is 1/4
The probability of having two white balls is as follows:
P(two whites) = 2/5 * 1/4 = 2/20 = 1/10
As promised, it is now time to introduce the concept of conditional probability. The probability that event A occurs, calculated on the condition that event B occurred, is called conditional probability, and is indicated by the symbol P(A | B). It is calculated using the following formula:
Conditional probability usually applies when A depends on B, that is, events are dependent on each other. In the case where A and B are independent, the formula becomes as follows:
In fact, now, the occurrence of B does not affect the probability P(A).
Let's look at an example. What is the probability that by extracting two cards from a deck of 52, the second one is a diamond? Note the information that the first was a diamond, too:
P(diamonds diamonds) = 13/52 * 12/51
Therefore, the conditional probability is given by the following:
P(diamonds | diamonds) = (13/52 * 12/51) / 13/52 = 12/51
As a further example, you can calculate the probability that you get the number one by throwing a dice, given that the result is an odd number. The conditional probability we want to calculate is that of the event B|A; that is, getting one knowing that there will be an odd number, where A is the getting an odd number event and B is the getting the number one event.
The intersection event A B corresponds to the event getting the number one and an odd number (which is equivalent to getting the number one event, since one is odd).
Therefore, the probability of getting an odd number is equal to the following:
P(A) = 3/6 = 1/2
The probability of getting the number 1 is as follows:
P(A B) = 1/6
Therefore, it is possible to calculate the conditional probability of event B with respect to event A using the following formula:
Let's recall, in this regard, that playing dice is always a loss-making activity, even for a statistician.