Problems in Bayesian Epistemology

The amount of Huel meal replacement drinks I had over the course of making this article is ridiculous. Enjoy :)

Sep 27, 2024

1) This article will attempt to be as conceptual as possible (as math is hard to explain well via writing) but will have math when required. 2) There are, in my opinion, solid responses to most (if not, all) of these objections, but I thought I would spend some (understatement) time putting together the best objections. 3) This is a non exhaustive list largely collected from many different sources and people — while I may have rephrased or had a different spin on some argument, they were not created by me.

Bayesian Epistemology focuses on the degree to which we should believe propositions, how we should update our probabilities (credences) in light of new evidence, and what are some principles that we should uphold with respect to these projects. This is opposed to traditional epistemology, in which we discover/ invent what words like truth and knowledge really mean, and how we achieve them.

I think Bayesian Epistemology is really helpful and should be taken seriously by both people in philosophy and other subjects (yay philosophy does something helpful for once!). For a primer on the concept see here, and for good tips on how to use bayesian reasoning in practice see here.

Defining Terms:

Some important concepts to Bayesian Epistemology include (* will denote that some concept is not necessary for Bayesian Epistemology but accepted by many):

Bayes’ Theorem: the equation used in Bayesian Epistemology to update beliefs in light of new evidence. Here is the formula:

Bayes' rule with a simple and practical example | by Tirthajyoti Sarkar | Towards Data Science

Probability Distribution: The mathematical function that gives the probabilities of occurrence of possible outcomes for an experiment.

Prior Probability: The assumed probability distribution before some evidence is taken into account.

Posterior Probability: A type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule.

Probability Axioms/ Probabilism: 1) Non-Negativity: the probability of an event must be a non-negative number, 2) Normalization: the probability of the entire sample space must be equal to 1, 3) For any two mutually exclusive events A and B, the probability of their union is equal to the sum of their probabilities. The axioms also have various consequences, which I don’t think are necessary to mention here.

Conditionalization: To update your prior probability given some evidence, H, in accordance with Bayes’ theorem: P(H|E) = (P(E|H)*P(H))/P(E) (where E is the new evidence, P is the probability, P(H|E) is the probability of proposition H given evidence E).

Reference Classes*: A set of similar instances or events that can be used to make probabilistic judgements about a new case.

The Regularity Principle*: For any contingent proposition (one that is not tautologically true or false), one’s credence must be between 0 and 1.

The Principal Principle*: (First of all, what an awesome name!) A rational agent’s credence in some proposition should match the objective chance of that proposition occurring, given no other evidence.

Deference Principle*: An agent should defer to the probabilistic judgments of an expert or another agent when the expert's opinion is more informed or reliable.

Sure-Thing Principle*: A decision maker who decided they would take a certain action in the case that event E has occurred, as well as in the case that the negation of E has occurred, should also take that same action if they know nothing about E.

Dominance Principle*: If some option, A, is better than all the alternatives (with respect to preferences) in all possible worlds, one should choose option A.

Reflection Principle*: An agent’s current beliefs should be consistent with what they expect their future beliefs to be given anticipated new evidence.

Principle of Indifference*: Given a partition (mutually exclusive and independent set) of x number of propositions and no evidence in any direction, you should assign a 1/x credence to each proposition being true.

Package Principle (see page 4)*: “If an agent considers £r as a fair price for bet 1 and £r′ as a fair price for bet 2, then she should consider £(r + r′) as a fair price for the book of bets that consists of bets 1 and 2.”

Maximum Entropy Principle: Typically used to establish priors, the principle states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data.

Arguments For Bayesianism:

The following three arguments (dutch book, accuracy, and representation theorems) are usually seen as the best ways to convince a skeptic to use Bayesianism to have rational beliefs. Before I go into the counterarguments, I will briefly discuss the arguments themselves (though there is certainly much more to be said about each of these).

Dutch Book Argument:

A Dutch Book is defined by the ability to place a series of bets with some agent that guarantees them a loss of utility in all possible worlds. Here’s an example (this exploits a property called intransitive preferences but violating other probability axioms leads dutch books as well):

Imagine Alice prefers A to B, B to C, and C to A. From this, you can make a Dutch Book on her: you offer her a bet where she pays a small amount, $1 for example, for a chance to switch from A to B, which works because she prefers A to B. You do the same thing again for A to C. Finally, you do this a final time switching her C for your B. Now, she is in a worse situation from where she started — she lost $3 and has B which is what she started with.

The Dutch Book Theorem states: An agent who has complete preferences and whose credences do not satisfy the Probability axioms, Bayesian Conditionalization, and coherence can be subject to a dutch book (they can also be subject to a diachronic Dutch Book for failing to properly conditionalize).

From this, we can derive that, given that any rational agent should not assign probabilities and utilities to a position that will result in a guaranteed loss of utility, a rational agent’s credences must satisfy the conditions for Bayesianism.

Accuracy Argument:

Accuracy arguments for Bayesianism often use the Brier Score as a means of evaluating how close to the truth/ far from falsehood a belief is (usually in terms of Euclidean Distance). Some use the terminology of epistemic utility where inaccuracy (defined by distance from the truth—for example, having a .9 credence in some true proposition raises your inaccuracy by .1) is going to decrease one’s epistemic utility.

Adapted from here:

De Finetti’s Theorem: Assume you have a credence function c defined over a finite set of propositions F. Then

If c is non-probabilistic, then there is some probabilistic credence function c′ defined on F that strongly dominates it.
If c is probabilistic, then there is no other credence function that even weakly dominates it.

For a visual representation, see the picture below (where c’ is a probabilistic credence function that follows bayesian constraints and c* is a credence function that does not):

This image was taken (with permission) from the slides of my Bayesian Epistemology class taught by Prof. Mikayla Kelley.

Using a Brier Score, one can measure the closeness of some points (here, we are measuring the closeness of some credence function to the truth). Given that a rational agent should be as close to the truth as possible given their evidence and that any point off the line (a credence function that doesn’t follow previously mentioned bayesian constraints) is going to be strictly dominated (in terms of maximizing epistemic utility/ closeness to the truth) by some point on the line (a credence function that follows bayesian constraints), a rational agent’s credence function must satisfy bayesian constraints.

Representation Theorem Argument:

Premise 1: Every rational agents have preferences that satisfy some axioms (completeness, transitivity, etc).

Premise 2: Every rational agent maximizes expected utility given their credences and utilities.

Premise 3: If an agent’s preferences satisfy these axioms, then there exists a unique credence function (up to a positive scalar transformation) and a unique utility function (up to positive affine transformation) that represent those preferences. One of these credence functions adheres to the rules of probability (Probabilism), ensuring that all preferences and decisions are coherent and consistent.

Conclusion: Every rational agent has credences that satisfy the axioms required for Bayesianism.

Responding to the Arguments:

Dutch Books:

Too Practical:

Some argue that Bayesian Epistemology makes claims about theoretical rationality — what one should believe without respect to action. However, one might argue that the dutch book cannot be counted as an argument for Bayesian Epistemology as it is dealing with merely practical concerns (having non-dutch-bookable preferences). The irrationality of losing one’s utility, one might argue, is merely practically irrational.

Not Practical Enough:

For most reasonable real agents, it will likely be quite hard (perhaps even impossible given time and computation constraints) to find the propositions in which one’s credence function has violated the probability axioms. Even if one did find some set of probabilities like this, the person being dutch-booked would likely change them fast enough to not get dutch-booked. Therefore, for practical purposes, having probabilities that don’t fit bayesian constraints likely won’t lead you to getting dutch-booked.

Too Strong:

In order to make a Dutch Book, one must assume that the agent has complete preferences (the idea that you have considered the decision you would make in every possible trade-off), unbounded utility, and accept both the Package Principle and the Sure Thing Principle. If the possibility of being Dutch Booked is the problem, one can just avoid following at least one of these constraints and become immune.

Not Strong Enough:

Bayesianism usually implies more than the Probability axioms and conditionalization, so dutch books are too weak. For example, Bayesianism also usually implies the interpretation of subjective probability which is not necessary for avoiding the Dutch Book. There are also other rationality constraints (the Regularity Principle, for instance) that are not included in the argument for Dutch Books but are typically associated with Bayesianism.

The Czech Book Argument:

Finally, we have an argument to have non probabilistic credences that takes a similar form to the Dutch Book Argument.

Czech Book Theorem: “If you violate probability theory, there exists a set of bets, each of which you consider fair, which collectively guarantee your gain.”

This leads to Converse Czech Book Theorem (which is also found in the linked paper): “If you obey probability theory, there does not exist a set of bets, each of which you consider fair or favorable, which collectively guarantee your gain.”

One can make the claim that a rational agent has good reason not to follow the probability axioms because they can hypothetically guarantee utility gain in all possible worlds.

Note: At first glance, this might suggest that violating probability theory could be advantageous, since it could theoretically allow an agent to secure a guaranteed gain. However, while the Czech Book Theorem is an interesting counterpoint to the Dutch Book Argument, it is important to clarify that this argument is largely conceptual and lacks the same practical implications as the Dutch Book. This is because the scenarios where inconsistency leads to a guaranteed gain are far less straightforward and depend on specific and often contrived conditions. Additionally, the author, Alan Hájek, certainly did not intend this to be an endorsement of having a credence function that doesn’t follow the probability axioms.

Accuracy Arguments:

Other Virtues:

While nobody would argue that accuracy is not a large component of being a rational agent, one might argue that there are other virtues that are important — in other words, they disagree with credal virtue monism. Some commonly cited examples are: proportioning your belief to the evidence, accounting for the utility of some beliefs in their relative importance (i.e. knowing the 37,986th digit of pi is not as important for a rational agent as knowing how to safely cross the street), ethics (especially in collecting evidence for your epistemic beliefs), avoiding information hazards (in which gaining some true information leaves one worse off), being able to justify one’s views, and more. The normal accuracy argument, however, does not (and, for some virtues, the argument cannot in principle) account for other epistemic virtues. If this is true, accuracy arguments (and therefore Bayesianism) are missing out on significant aspects of rationality.

Against Epistemic Consequentialism:

When doing Bayesianism in practice, it would be nice to have a measure of how rational one is being (rather than arguing that someone is or is not bayesian as a binary statement, especially considering that Bayesianism is computationally impossible in practice). However, under this form of the accuracy argument, the only way to measure how rational an agent is would be to measure via accuracy which leads to an approach that looks like Epistemic Consequentialism — the idea that we must maximize the most rational person is the one who has the highest net accuracy on their beliefs.

On the their hand, as Professor Selim Berker argues in his paper entitled The Rejection of Epistemic Consequentialism, Epistemic Consequentialism has many flaws. Optimizing for true beliefs would probably look something like staying in your moms basement to do different simple arithmetic problems for the rest of your life. But this seems clearly irrational! Clearly, we cannot define rationality by being the closest to Bayesianism in terms of accuracy. Because of this, it makes the accuracy of your beliefs seem like only some part of rationality, questioning whether we can argue for a system of theoretical rationality by saying that it maximizes accuracy.

Risk Weighted Epistemic Utility:

In decision theory, many argue that a 50% chance of 100 utility is not the same as 50 utility for everyone, despite having the same expected utility (in which you multiply the probability by the utility associated with each outcome and add the results). This is because some people experience Risk Aversion, the tendency to prefer outcomes with low uncertainty to those outcomes with high uncertainty, even if the average outcome of the latter is equal to or higher in monetary value than the more certain outcome.

If we allow rational agents to have various risk tolerances for expected utility, to be consistent, it seems like we should allow for various risk tolerances for expected epistemic utility. With respect to epistemic utility, there may be more Risk Averse people, those who care about holding false beliefs more than true beliefs, Risk Loving people, those who care more about holding true beliefs than holding false beliefs, and Risk Neutral people, those who care about both equally. While one can make an expected epistemic utility function based on this, it requires us to potentially significantly broaden the definition of what it means to be a rational agent.

Losing information:

There are two main ways accuracy approaches lose information:

1) The Problem of Averages: Imagine you have a die that you have good reason to believe is completely fair — every outcome has a 1/6 probability. To maximize accuracy by taking the average, it seems like one should say that it is going to be a three. When you do this, however, it seems like that answer loses crucial information about the process — it doesn’t distinguish if the number you chose is an average or actually what you believe the dice will land on. It seems like this is an important distinction that the accuracy argument does not account for. Therefore, one may conclude that the accuracy argument leaves out important parts of rationality.

2) Levels of uncertainty: Another piece of information that is lost in this process is information regarding your certainty in your probability distribution (in other words, how much evidence it would take to change it). When we say that some coin is unbiased and therefore has 50% odds of either side, we are much more certain than in a case where you have no evidence for some proposition with two possible events in the sample space, so you use the principle of indifference and give each .5. The accuracy arguments, however, do not account for various levels of certainty which is very important for rational decision making. This is yet another instance where it seems like the accuracy arguments, and therefore Bayesianism in general, loses information.

Domination:

Just like for the accuracy argument, one can make a similar argument for being worse off in every possible world if one assigns a credence that is not one or zero. It would go something like this:

For every possible credence function assigned to some contingent proposition (which is a proposition that is neither a tautology nor logical contradiction), there is a whole number that dominates it — (0 or 1). If the correct answer will always be placed at zero or one (which should be the case for rigorously defined propositions), one should place and zero or one credence, so they do not guarantee themselves a loss in every possible world.

Representation Theorems:

Too pragmatic:

Some argue that, like Dutch Book arguments, representation theorems for Bayesianism are too pragmatic because they rely on ideas like expected utility maximization. While Bayesianism claims to be about theoretical rationality—focusing on what beliefs one should hold regardless of actions—appealing to expected utility maximization introduces a practical element. Expected utility is concerned with how beliefs affect decisions and outcomes, not necessarily whether those beliefs are justified by evidence alone.

This raises the critique that representation theorems might show that Bayesianism is a useful model for decision-making under uncertainty, but not necessarily a good theory of rational belief in the abstract, theoretical sense. In other words, Bayesianism may just be a helpful tool in specific practical contexts, rather than a comprehensive account of how rational agents should form and hold beliefs.

The Objection From Voodoo Spirits (which is an awesome name!):

In his paper entitled Arguments for — Or Against — Probabilism?, Alan Hájek argues that because something can be represented someway doesn’t mean that it necessarily represents that agent — there are alternatives:

The concern is that for all we know, the mere possibility of representing you one way or another might have less force than we want; your acting as if the representation is true of you does not make it true of you. To make this concern vivid, suppose that I represent your preferences with Voodooism. My voodoo theory says that there are warring voodoo spirits inside you. When you prefer A to B, then there are more A favoring spirits inside you than B favoring spirits. I interpret all of the usual rationality axioms in voodoo terms. Transitivity: if you have more A favoring spirits than B favoring spirits, and more B favoring spirits that C favoring spirits, then you have more A favoring spirits than C favoring spirits. Connectedness: any two options can be compared in the number of their favoring spirits. And so on. I then ‘prove’ Voodooism: if your preferences obey the usual rationality axioms, then there exists a Voodoo representation of you. That is, you act as if there are warring voodoo spirits inside you in conformity with Voodooism. Conclusion: rationality requires you to have warring Voodoo spirits in you. Not a happy result.

Similarly, unlike what the Representation Theorem Argument claims, just because a rational agent can be modeled as having credences and utilities that fit bayesian constraints doesn’t mean that a rational agent must satisfy the bayesian constraints in this way. This argument serves as a good reminder not to conflate the map (the way you model some agent) and the territory (the algorithm that some agent actually uses for decision making).

Problems in Evaluating Evidence:

Multiple Partitions Problem:

Assigning credences to different events in the same space is going to be difficult if you have many possible partitions (ways of splitting up possible events in your sample space). This goes into what is known as the many partitions problem — there are often many inconsistent but reasonable ways of dividing up the partition (especially for vaguer categories).

Here’s an example: You’re a doctor seeing a coughing patient, and you want to establish a partition over potential diagnoses — you assign .5 to a viral infection, .3 to a bacterial infection, and .2 to other possibilities. Given the same evidence, you later break these broad categories into more specific diagnoses — you put the common cold at .3, influenza at .2, strep throat at .2, and pneumonia at .1. The issue here is that your credences in the specific case don’t fit your broader categories, despite the fact that they both seem like reasonable partitions. This reflects a problem with probabilistic reasoning that relies on the way you make your partition like Bayesianism.

Reference Class Problems:

Usually, when one makes a probabilistic judgment, they do so based on previous similar data/ a reference class: for example, some random coin flip is likely to be a 50/50 between heads and tails because the vast majority of coins are not biased towards any side. The reference class problem refutes these simplicities in other cases and says that there are many features that could be used as a reference to make a probabilistic judgment about a specific case, and it is hard to know which one will be most predictive.

The following is an example:

Imagine you are buying a used car and want to estimate the likelihood of it being reliable over the next 10 years. There are many potential reference classes that may work for estimating the probability, but some are likely going to be better than others. One can reasonably use any of these reference classes: 1) all used cars, 2) used cars of the same make and model, 3) used cars with similar mileage. Without data on the predictability of these different reference classes in previous cases, Bayesianism doesn’t give us an answer to which one we should use to form our priors. Furthermore, while this is a common case, in more obscure cases (for example, evaluating the risk of an existential threat), the list of potential reasonable reference classes to use becomes much larger. The problem sheds light on the difficulty in assigning priors in Bayesianism.

The Problem of Grue:

Similar to reference class problems, the Problem of Grue (also known as Goodman’s New Riddle of Induction) asks how we can properly use induction to guide our priors. He uses the following example:

Imagine you and I are both miners in a cave looking for emeralds a few hours before noon (let’s call noon time T). Every single emerald that’s ever been found has been green, and we both have hypotheses about the color of future emeralds: namely, you think that all emeralds are green, and I think all emeralds are grue (which means that all emeralds found before time T will be green and all emeralds found after time T will be blue). It seems like every green emerald ever collected (which are all before time T) should equally raise the likelihood of both of our hypotheses, but, in practice, it really seems like we have reason to think that there is a higher likelihood that the next emerald will be green rather than blue.

“No,” you say. You claim that I am misunderstanding: “green is actually a simpler term and therefore has a higher likelihood — you arbitrarily change the color at time T.”

I disagree. I say that “you have the more complex term. You are changing the state of the emerald from grue before time T and bleen (that all emeralds found before time T will be blue and all emeralds found after time T will be green) after time T.”

While there are many potential solutions to this problem, it raises an issue for choosing priors, especially the difficulty of doing so with simpler and more complex hypotheses.

The Problem of Precise Credences:

Assigning precise credences/ a precise probability distribution to qualitative evidence is going to be really difficult in many cases. Here’s an example:

Imagine you are trying to predict the success of a new product that you want to put on the market. While there are certain metrics one can use to get a rough estimate of the market landscape (such as analyzing competitors, surveying target demographics, conducting beta-testing, quality-testing, and refining the design), as any good entrepreneur will tell you, you can never guarantee the success of a product until you have some tested market feedback loops. However, despite this uncertainty, just as you would assign a probability to a coin flip landing heads, you’re expected to assign a credence to the product’s success. Given only qualitative evidence, the method of one’s credence function is going to be somewhat arbitrary.

Even if one adds a margin of error (for instance, one says that the credence is between .05 and .1) as Miriam Schoenfield suggests, one’s going to have a precision problem at the margins—for example, why have a credence of .05 instead of .050000921? The problem is that without clear, quantitative evidence, any precise figure feels somewhat artificial, leaving room for uncertainty and leaving room to doubt the exactness of one’s assigned credence.

The Problem of Confirming Evidence:

In order to properly update a belief, one needs to confirm that some evidence is true. While this seems theoretically easy, the process is actually quite difficult in practice.

One way that confirmation is difficult is because it tends to rely on empirical data — we see that it is raining, for instance. However, all empirical data is subject to skepticism — one can always say that you don’t know for certain that you’re not on drugs or in a dream, and there is no actual rain. Therefore, one will never be able to confirm any evidence, so they won’t be able to conditionalize. Given that you can’t probabilistically conditionalize, it becomes useless in the vast majority of cases.

Another case where confirming evidence is going to be difficult is in assigning probabilities to qualitative propositions. Imagine economists are working on some intervention, x, that is supposed to reduce poverty in a given area, but they can’t do a randomized controlled trial for some ethical reasons. At what point do you confirm the proposition that intervention x reduced poverty in the region? While this seems like a reasonable question, it seems both too vague and too subject to future revision to actually be able to have total confirmation on. Even if there is a really high probability that the intervention reduced poverty, you still can’t conditionalize.

Problem of Old Evidence:

According to Bayesianism, the only thing that can affect your credence in some proposition is the confirmation of new evidence. There is, however, a problem with this model of rational agents: we often update (and sometimes have to update) on the basis of old evidence. There are generally two ways of doing this:

Confirmed Evidence Supporting New Beliefs:

Here is a classic case that is discussed in the literature:

The following explanation of the case is from ChatGPT:

The Evidence ( e ): The anomalous precession of Mercury's perihelion was a well-documented phenomenon in astronomy by the late 19th century. Newtonian mechanics could not fully account for this precession, leaving it as an unexplained anomaly.
The Hypothesis ( h ): In 1915, Einstein proposed General Relativity, which among other things, provided a new explanation for the precession of Mercury’s perihelion.
From a Bayesian perspective, if the evidence e (Mercury’s perihelion precession) was already known, then P(e) should be 1, meaning that the evidence e does not change the probability of the hypothesis h. This leads to the unintuitive conclusion that e cannot confirm h at all because the probability P(e) is already fixed, and so the posterior probability P(h∣e) would equal the prior P(h).
However, intuitively, many believe that Einstein’s theory was indeed confirmed by its ability to account for this long-standing anomaly, suggesting that the Bayesian account is missing something important.

Re-evaluating the Reliability of Old Evidence:

Say you have a trustworthy and epistemically humble friend named Nancy P, and she tells you that she is quite certain that the price of some stock is going to go up by 30% in five days (don’t ask how she got that information) — let’s call this proposition P. Because Nancy P is trustworthy and certain, you conditionalize and end up raising your credence in P from .001 to .8. However, your friend Chuck S. unrelatedly tells you that Nancy has been lying recently and gives you a bunch of examples of things that she recently predicted with confidence that turned out to be false.

This is going to be an issue for Bayesianism: namely, you already conditionalized on the information that Nancy told you about P given her reliability. Once you conditionalize, you cannot go back and fix it.

While there are some methods of somewhat resolving this issue (like allowing for conditionalizing on new evidence of the reliability of your friend), it’s going to be extremely difficult, if not impossible (given that the initial belief was already integrated in one’s network of beliefs), to reevaluate the evidence like you would have had you known that the Nancy was less trustworthy.

Circularity:

A general problem with foundationalist approaches to epistemology (in which you root knowledge in a particular idea or concept) is that the root of knowledge needs to somehow be justified. This necessarily leads to an infinite regress or circularity (in which you have to use some concept x to show why x is true).

Unlike in other areas, Bayesian epistemology is similar to traditional epistemology here. Bayesianism is going to require some probability on the fact that Bayesianism is a rational procedure, but this claim itself seems to require either another procedure or circularity (where you use Bayesian probabilities on Bayesianism). There is also an issue with using a different methodology as it makes Bayesianism self effacing (the idea that some theory tells you to use another theory), which is often viewed as a problem in philosophical methodology.

Conciliationism:

As stated before, the deference principle states that bayesians should take rational experts’/ epistemic peers’ beliefs as evidence. Similarly, Robert Aumann, a Nobel prize winning economist, proved that if two agents have common knowledge of each other’s beliefs (knows how the other updates on every piece of evidence) and updates via bayesian conditionalization, they cannot agree to disagree — their beliefs must converge. For an intuitive explanation of why this makes sense, see my friend’s post here.

From these, philosophers talk about a view called Conciliationism, the idea that one should revise their belief in the face of epistemic disagreement. However, there are a few issues with this approach:

Self Undermining Problems:

When figuring out who to conciliate towards (in other words, figuring out who the experts are) for some object-level belief, it seems like one needs to conciliate towards experts (a meta conciliation). That meta conciliation approach continuously applies for every meta layer (you need to conciliate on who to conciliate for for conciliating, etc). At every level, however, this is going to drastically change who you should trust and therefore change what you should think in the object level case, without any signs of convergence. If this problem holds, holding consistent beliefs according to Bayesianism seems like it will be quite a problem in face of epistemic disagreement.

Similarly, there is a self undermining problem in light of disagreements about how one should appeal to experts — how much the should weigh themselves vs others, the scope of conciliationism, etc. While this seems less catastrophic, it does get into problems of circularity and infinite regress.

Information Cascade:

Another potential issue is the problem of information cascades.

Imagine you are appealing to the beliefs of some experts, say economists, with respect to if some policy should be implemented, say school choice. It seems like you should do a weighted view of what the experts say and form your belief around that. However, the experts beliefs should already be on the basis of weighted beliefs of peers, so you may end up double counting evidence (as you updated on some piece of evidence, and you updated on someone else updating on that same piece of evidence), which is bad because it allows some pieces of evidence to have more weight than they warrant. This creates an information cascade.

While there are some potential solutions here (ideal bayesian agents should know what evidence the expert is updating based on before updating their own beliefs, according to Aumann’s Agreement Theorem), in practice this might be quite difficult. This, some would argue, is a problem with the bayesian model as it requires a form of conciliation.

The Problem of Priors:

In order for someone to update their credences in some proposition, they need to start with a prior probability of some proposition being true. There is general disagreement and potential arbitrariness about how to establish a prior, especially in cases without evidence. While Objective Bayesians hold that there are constraints on priors (some common examples are the principle of indifference, maximum entropy principle, and well defined reference classes) and that there is a unique credence one can rationally hold related to the objective chances of some event, Subjective Bayesians are going to hold that there are few (if any) constraints on priors (aside from the probability axioms). This problem is known as the problem of priors.

This debate has massive effects on how strong Bayesianism is. If the Subjective Bayesians are right (and you don’t have to accept the principal principle, for example), one can rationally assign a .3 (30% probability) credence to a coin landing heads — even if they have no reason to think this coin is weighted. To many, this seems completely irrational. If Objective Bayesians are correct, all rational agents given the same evidence must converge on propositions that seem extremely qualitative and subjective like in cases of sparse data (for example, given five minute interviews with many candidates for a job, two bayesians would need to give the same credence to each candidate being the best one).

There’s also a more specific problem here known as the problem of the initial prior. This is the idea that you have to start from some prior at some point with no evidence whatsoever. These seem entirely arbitrary, and one may argue that rational agents can pick whatever they want (as long as it adheres to the simple Bayesian constraints). While this is less of a problem as evidence is collected (bayesians with different priors will mostly converge after enough new evidence), this remains a problem for cases with sparse or no evidence in any direction.

Problems in Decision Theory

Many argue that Bayesian Epistemology is largely important for its influence on decision making and decision theory. Given that, one critique that one could make against Bayesianism is that decision theory has problems of its own. There are three major problems to point out:

Newcomb-like Problems:

People in decision theory are still trying to figure out the correct decision procedure (Evidential Decision Theory, Causal Decision Theory, or something else) in Newcomb-like and Smoking-Lesion-like cases.

Newcomb’s Problem:

There is a reliable predictor, another player, and two boxes designated A and B. The player is given a choice between taking only box B or taking both boxes A and B. The player knows the following:
Box A is transparent and always contains a visible $1,000.
Box B is opaque, and its content has already been set by the predictor:
If the predictor has predicted that the player will take both boxes A and B, then box B contains nothing.
If the predictor has predicted that the player will take only box B, then box B contains $1,000,000.
The player does not know what the predictor predicted or what box B contains while making the choice.

According to Evidential Decision Theory, the correct answer is to take one box — taking one box is strong evidence that the predictor predicted that you only take box B.

According to Causal Decision Theory, the correct answer is to take two boxes — as the predictor already made its choice, and there is nothing you can do to change (without backwards causation). Given that the choice was already make, you should choose the dominant strategy.

The Smoking Lesion Problem:

In the world of the Smoking Lesion, smoking is correlated with cancer but does not cause cancer. Instead, there is a genetic lesion which, if present, increases a person’s chance of smoking and their chance of developing cancer. The Lesion is either already present or not present. You do not know if it is present in you. The question is this, if you like to smoke (but strongly dislike having cancer), should you smoke?
The desired answer in this situation seems to be that you should smoke – doing so does not increase your chances of developing cancer and it gives you pleasure. What do causal and evidential decision theories do here?
Remember what Evidential Decision Theory looks for – it asks whether a decision would act as evidence for a possible outcome. So smoking would be evidence that you were more likely to have the genetic lesion. Thus, evidential decision theory would advise that you don’t smoke. Causal Decision Theory, meanwhile, would say that smoking has no causal effect on cancer and so you should feel free to smoke.

These cases generally reflect the unintuitive conclusions of these decision theoretic principles, leading many to believe that we do not have the proper decision procedure yet. If Bayesianism is important for decision making, but we don’t have a standard and proper decision making procedure for Bayesianism to use in every case, it seems like that should be counted as an argument against Bayesianism.

Issues in Expected Utility Theory:

Expected Utility Theory is a way many philosophers, economists, and others think about making well informed decisions about which actions to take under uncertainty. The idea being that you should multiply the probability of some proposition being true by the value associated with it. If the value of the action has the highest expected utility, you should choose that outcome. Unfortunately, while this decision procedure is quite robust in the vast majority of cases, many are that it goes terribly wrong in the following two types of scenarios:

Pascal’s Mugging:

A Pascal’s mugging is a type of scenario in which you have high uncertainty, low probability, and high expected value associated with some action. Here, it seems like using expected value leads to terrible decisions). Take for instance the following example (adapted from Nick Bostrom):

Imagine you’re walking down an alleyway, and a strange-looking man walks up to you.

Stranger: Give me your wallet! Give me your wallet or I’ll shoo—wait. Fuck! *mumbles to himself for a whole minute.* Hi, my name is Oz, and I’m a wizard. Can I please have your wallet in return for a million utilons tomorrow.

You: Lol, what?? What’s a utilon?

Stranger: It’s a hypothetical unit of goodness that does not face diminishing returns and has no bound — one utilon (util for short) is the equivalent of some muzak and potatoes, but 10 utils is the equivalent of an extremely happy day. Because I’m a wizard, I can give you utils tomorrow (tomorrow because there’s a lot of bureaucracy slowing down utility wire transfers where I’m from), but I will only do so in exchange of your wallet, as human money is very valuable to me.

You: Are you literally insane? You want me to believe that you, the guy who just walked up to me and thought he has a gun, are actually a wizard and will give me fake goodness tokens tomorrow because you like my money? I’m not dumb; I’m literally a Bayesian expected value maximizer.

Stranger: Oh, yeah? What’s your credence in me being a wizard? It better not be zero — me being a wizard is a contingent proposition, and a good Bayesian would accept the Regularity principle.

You: Of course I accept the Regularity Principle; I don’t wanna get dutch booked or anything like that. My credence is quite low because you’re pretty obviously not a wizard — .0000000000000001 to be exact.

Stranger: How lucky are you? I am going to give you 9999999999999999999 utils which, when multiplied by your credence is going to give you an expected value of 1000 utils. Hand over the wallet please.

You: Wait what? This feels super weird. I don’t want to give it to you.

Stranger: Are you an expected value maximizer, or not? Do you follow the math or just follow your intuition?

You: You’re right. I need to maximize expected utility. Here’s the wallet.

Stranger: *Takes it.* Thanks man — have a good day!

You: Wait, do you need my name or number to transfer the utils?

Stranger: LOL, no. I totally lied to you; I’m obviously not a wizard. Thanks again for the money, though. *Runs away.*

Infinite Utility:

Imagine you are given the option to open one of two doors:

One: If you open this door, you get a 99.99999% chance of infinite utility bliss for eternity.

Two: If you open this door, you get a .0000001% chance of infinite utility bliss for eternity.

The answer is pretty obvious: you should open door number one because you’d rather have a high chance of infinite value than a very low change.

However, because when you multiply both of these numbers by infinity you get the same value, a pure expected utility maximizer should be indifferent between these options — and die if he was a mule (or something like that).

The Rationality Risk Aversion:

Imagine you are given the option to take the following gamble:

Several studies (including some with real money payoffs) have shown that given this bet most people choose Gamble 1A.

However, these studies also show that given the choice above people tend to choose Gamble 2b.

Maurice Allais, am economist, famously showed that choosing 1A and 2B together is being inconsistent with respect to expected utility maximization. Instead, to be consistent, the person should either choose 1A and 2A or 1B and 2B.

This is known as the Allais Paradox — the idea that people often don’t make decisions that maximize expected utility. Because of this case and others like it, philosophers and economists are still debating over whether risk aversion, as defined previously, can be rational or not (as it seems to be reasonable to be inconsistent in these cases) and if there are any limits to how risk averse a rational agent can be.

While people like Lara Buchak would argue that we must have a Risk Weighted Expected Utility where you can be rational and accept 1A and 2B. On the other hand, others argue that risk aversion is entirely irrational. If bayesian epistemology is important because of decision theory and decision theory has unsolved problems, that’s going to be a problem for bayesian epistemology — at least in terms of decreasing its useful application.

The Superbaby Problem:

More well known as the problem of idealized agents but Superbaby, a term coined by David Lewis, is obviously a cooler name, so we’ll stick with that.

The reason I call this the Superbaby Problem is because all of the following conditions would have to hold as soon as some agent is a newborn baby:

All Possible Evidence:

For conditionalization to work, Bayesianism requires all rational agents to know the probability of some proposition given all possible series of evidences in order to properly conditionalize. This is a problem because there are a potential infinite number of these — requiring brains to do something that is not computable given size and energy constraints (unless Penrose is right, a brain cannot do an infinite amount of processing).

If it doesn’t work like this, given some evidence that one didn’t expect, it will be much harder to estimate the evidence that you would have given conditional on it. One of the best parts about conditionalization is that you had to pick before and so could not have motivated reasoning biases.

Knowing All Math:

In the probability axioms that Bayesians must accept, one much assign a probability of 0 to every contradiction and 1 to every tautology. While this may seem simple (i.e. P(Q&~Q) = 0), this can be extremely difficult when dealing with extremely complicated math equations that haven’t been proved yet (i.e. the Riemann Hypothesis, P vs NP, and others). You would also have to have a credence of 1 (100% probability) in every tautology — for example, if I asked you what the 20,000th digit of pi, you would need to instantaneously tell me with 100% certainty and accuracy.

Complete Preferences:

In order for Bayesian decision theory to work, an agent must have complete preferences (as in, they have an answer for what they would do in every hypothetical trade-off situation), even in situations where it seems that a rational agent could be uncertain (for example, deciding between career paths with very different but incomparable benefits, deciding if you want to order Chinese or Mexican food, or deciding whether to push the button in the trolley problem). Having complete preferences doesn’t seem essential to rational decision making, so we may conclude that Bayesianism is asking for too much of agents.

Idealized Memory:

Because of the way that Bayesians conditionalize, a Bayesian’s belief must include all the evidence that they have ever received. This, however, means that a bayesian can never forget anything ever — as every experience that one has had can contribute to some hypothetical future belief, and they must conditionalize on all of those experiences. Asking for a great memory not only seems somewhat irrelevant to being a rational agent (it’s like asking for a rational agent to have perfect eyesight), but it also seems like we are asking for way too much.

Problems With Idealized Agents:

Saying that one rationally ought to be some way seems to imply that they can be that way. However, human beings can’t do these things even hypothetically (their brains have limited computing power, for instance). Therefore, these cannot be constraints on rational agents.

It is also strange to suggest that, in order to be rational, you must have attributes that are beyond your control (great math skills or a perfect memory). This is like suggesting that in order for you to be rational you must have a great sense of smell, which doesn’t seem related to the concept of rationality.

While one can attempt to reconcile this by arguing that we should do a sort of brier score for how close to people are in being bayesians, there are many issues with this approach as well (including the examples given by responses to accuracy argument).

Frequentist Concerns:

In statistics, there is a competing method for thinking about both interpretations of probability and assigning probabilities: it’s called frequentism.

Frequentism defines the probability of an event as the long run relative frequency of that event occurring, calculated as the number of times the event has occurred divided by the total number of opportunities for it to occur, as the number of trials approaches infinity. For example, the probability that a fair coin lands heads is 50% because if you flipped the coin infinitely many times, the relative frequency of heads would converge to 50%.

Unlike Bayesianism, there is no room for subjective probabilities using prior information, nor does it use Bayes rule to update your beliefs on the basis of new evidence. Frequentist probabilities are seen as objective features of the world, determined by the inherent properties of the system, even though they may be unobservable given a finite number of trials.

Practical Concerns:

Moral encroachment:

Moral encroachment is the idea that morals should sometimes have influences on our beliefs. For example, Sarah Moss, in her paper entitled Moral Encroachment, gives the following case:

Suppose that a pedestrian sees a pit bull in front of her, and she crosses the street in order to avoid coming into close proximity with it. She forms an opinion about the pit bull on the basis of her knowledge of general statistics about pit bulls, including their disproportionate representation among dogs that harm pedestrians. If someone challenges her opinion by saying ‘You don’t know that pit bull’, the pedestrian could truly respond by saying:
(1) I know it is more likely to bite me than any dog across the street.
By contrast, consider an example of racial profiling. A pedestrian sees Beaty in front of her, and she crosses the street in order to avoid coming into close proximity with him. She forms an opinion about Beaty on the basis of her knowledge of general statistics about race and crimes of robbery. Beaty challenges her opinion by saying ‘You don’t know me’. By contrast with (1), the following response sounds false:
(2) I know you are more likely to steal my purse than anyone across the street.
The contrast between (1) and (2) reflects an epistemic difference. The pedestrian uses statistical evidence to form an opinion about a pit bull and also to form an opinion about a person. The former opinion is knowledge and the latter is not. The moral encroachment thesis accounts for this contrast, by allowing that the moral status of a profiled object can make a difference to the epistemic features of opinions that are formed by profiling it.

Bayesian Epistemology has no way to have morality rationally influence their decision making process, but it seems necessary in some cases like the one that Sarah Moss describes. If this is true, Bayesianism doesn’t fully encompass what it means to be a good epistemic agent.

Instrumental Beliefs:

All true beliefs aren’t the same; holding some beliefs are more costly. Some beliefs have very little influence on how we should live our lives. Bayesianism doesn’t tell us anything about which beliefs we should focus on. In fact, Bayesianism’s focus on accuracy may lead us astray.

Normally, having true beliefs is helpful. What we are doing in conversation is trying to get evidence to better our beliefs about propositions. I don’t think this is always the case, however. The concept of rational irrationality describes that there are cases in which being epistemically irrational is instrumentally rational. Take, for instance, someone deciding what political beliefs they should use to vote; let’s call this person Alice. If Alice believes that her vote doesn’t matter, it doesn’t make sense to spend so much time on questions about politics. If Alice’s community is very right wing and she will be treated worse if she has left wing politics, given that she thinks her political beliefs won’t influence the outcome of the election, it is more “rational” (this type of rational is about instrumental rationality, as opposed to epistemic rationality) for her to have right wing beliefs whether or not they are better.

Descriptively, the pressure to conform and avoid being socially isolated, may make it harder to accept left-wing beliefs (for example because of motivated reasoning). However, it may not make sense to be perfectly rational in this debate (look at arguments at both sides, think very critically of your beliefs, etc) if there are more values to having beliefs than being epistemically rational. Given that Bayesianism is not able to account for this, many would argue that it doesn’t fully encapsulate what it means to be a rational agent.

Blue Bus Paradox:

From this paper:

A bus causes harm.
In the first scenario, an eyewitness recognizes the bus as belonging to the Blue Bus Company. The witness, imperfectly reliable; let us say that she is roughly 70 percent reliable in matters such as this one. The law has no qualms about accepting the eyewitness testimony as evidence and indeed basing a positive finding that the bus was a Blue Bus bus (and perhaps also that that the Blue Bus Company is liable) on the testimony.
In the second scenario, there is no eyewitness, but we have uncontested data regarding the distribution of buses in the relevant area; in particular, the Blue Bus Company owns roughly 70 percent of the buses there. Here, though, the law will typically not be willing to base a positive finding of fact—and certainly not liability—on just this kind of evidence, sometimes called statistical evidence.

While there are some justifications in the paper used for why these cases are practically different (the paper argues for sensitivity requirements), one may argue that, even though these cases are statistically the same, they are different sorts of evidence that require different treatments. If this were the case, it would seem as though Bayesianism is missing something with respect to differentiating statistical evidence and other types of evidence in practice.

Extreme Epistemic Humility:

Many forms of Bayesianism seem to imply the deference principle, which states that you should defer to rational experts on beliefs. Given that you think a lot of people are much smarter/ more rational than you with respect to many issues, it still seems like, even when you are quite confident about your expertise in some area, the vast majority of your belief should come from the “outside view.”

If taken to its logical conclusion, this seems extremely unintuitive and impractical. It really seems like one shouldn’t almost entirely rely on that year’s PhilPapers survey every time they want to figure out which philosophical belief they should rationally hold, but under the many seemingly plausible forms of Bayesianism, experts beliefs’ should hold the vast majority of epistemic weight. On the impracticality dimension, it seems really difficult to actually set your beliefs in accordance with the experts - given our bias to hold our beliefs with more regard, Bayesianism seems to be asking for way too much.

Lastly, this convergence of belief would have negative effects on science and entrepreneurship:

Imagine a world where all the rational scientists converged on one theory because it was most probable given all the evidence — I think this would be awful. Disagreement and arguments persisting are usually what drives scientific progress and technological innovation.

Imagine an entrepreneur thinking probabilistically about his chances to succeed and updated based on others telling that can’t update it — given the low probability of entrepreneurs that become successful, it seems very possible that this would dramatically decrease the number of entrepreneurs. This would be terrible from a societal perspective as a significant amount of innovation comes from entrepreneurship.

The World is Too Messy:

The world is often too messy to even try to be a bayesian agent. We often don’t have enough time to think through beliefs this precisely, there are too many factors other than pure epistemic belief in developing beliefs, and most decisions and beliefs are too qualitative to be simply reduced to probabilities. While one can argue that rational agents should attempt to follow the probability axioms even though they won’t be amazing, in practice, this can often be seen as people doing exactly what normal people do with different names, maybe trying to sound smarter (instead of qualifying one’s belief by saying “I think”, an attempted bayesian puts a probability, and instead of talking about changing their mind or their bias, they talk about conditionalizing using Bayes’ theorem and priors).

As always, tell me why I’m wrong!

Lionel Page

Sep 28, 2024

Interesting reflections. I discussed some of these points about rationality and Bayesianism in the Epilogue of my book "Optimally Irrational".

Bob Jacobs

Feb 28, 2025

Excellent post! If you're interested; a couple years back I posted a socratic dialogue wherein I proposed solutions to some, but not most, of the problems you talk about here (+ some other problems, and a new problem) https://bobjacobs.substack.com/p/solutions-to-problems-with-bayesianism

Irrational Community

Discussion about this post

Ready for more?