# Chapter 7 Responding to Evidential Decision Theory

## 7.1 So Why Ain’t You Rich?

There is a familiar complaint against causal decision theory that goes back to the modern origins of decision theory in the 1970s. Here is a recent version of it due to Ahmed and Price (2012). While their version is primarily directed against proceduralist forms of causal decision theory, this particular objection does not turn on the proceduralism. If the objection works, it also works against my defensivist version of causal decision theory. (I’ve slightly changed some of the wording, but otherwise this argument is quoted from page 16 of their paper.)

1. In Newcomb problems, the average returns to one-boxing exceed that to two-boxing.
2. Everyone can see that (1) is true.
3. Therefore one-boxing foreseeably does better than two-boxing. (by 1, 2)
4. Therefore Causal Decision Theory (CDT) is committed to the foreseeably worse option for anyone facing Newcomb’s problem.

Here’s what they, and many other proponents of Evidential Decision Theory (EDT) say follows from 4.

The point of the argument is that if everyone knows that the CDT-irrational strategy will in fact do better on average than the CDT-rational strategy, then it’s rational to play the CDT-irrational strategy.

This is what Lewis (1981b) called the “Why Ain’cha Rich” argument, and what following Bales (2018) I’ll call the WAR argument. I’m going to argue the last step of the WAR argument doesn’t follow. Or, at the very least, that proponents of EDT cannot coherently say that it follows. For there are several cases where EDT foreseeably does worse than CDT. This section will go over three of them.

### 7.1.1 Example One - Split Newcomb

This game takes place over three rounds.

1. At stage one, the human player chooses Play or Exit. If they choose Out, player gets 5 and demon gets 1. If they choose In, we move onto stage two.
2. At stage two, demon chooses Left or Right, and this choice is announced.
3. At stage three demon and the player simultaneously choose either Up or Down. Demon is very good at predicting what player’s choices will be, and indeed at stage two they were already very good at making such a prediction. And Demon wants to use these predictive powers to get as high a payoff as possible, and this is common knowledge.

If Demon chose Left at stage two, stage three involves the game in table 7.1.

Table 7.1: The left hand side of Split Newcomb

PU

PD

U

$$2, 1$$

$$4, 0$$

D

$$1, 0$$

$$3, 3$$

But if Demon chose Right at stage two, stage three involves the game in table 7.2.

Table 7.2: The left hand side of Split Newcomb

PU

PD

U

$$12, 4$$

$$14, 0$$

D

$$11, 0$$

$$13, 2$$

If you’d prefer it as a game tree, it is presented in figure 7.1.

We start at the hollow node in the middle, and Chooser (here denoted as ‘C’) either goes up by Playing, or goes down by Exiting. Then Demon moves Left or Right. Then Demon moves again, making a prediction. But this second move isn’t revealed to Chooser, which is why on either side Chooser’s nodes are in an information set. That’s to say, Chooser chooses Up or Down knowing whether Demon has gone Left or Right, but not knowing whether Demon has predicted Up or Down. And then we get the payoffs.

Whether Demon goes Left or Right, the CDTer will choose Up, and the EDTer will choose Down. Either choice Chooser faces is a fairly straightforward Newcomb Problem. In both sub-games Up causally dominates Down, but Down will get a higher return if you assume, as we did assume, that demon mostly makes correct predictions.

So at stage two, Demon will know that if the person facing them is an EDTer, they will get a return of 3 from Left and 2 from Right. (They’ll end up in the Down-Down cell either way.) So they will rationally choose Left. On the other hand, if the person facing them is a CDTer, they will get a return of 1 from Left and 4 from Right. (They’ll end up in the Up-Up cell either way.) So they will rationally choose Right. And everything in this paragraph can be deduced by a rational player at stage 1.

So at stage one, a CDTer will know that if they Play, they expect to get 12 (the game will go Right then Up-Up), and if they Exit, they know they’ll get 5. So they’ll Play. But an EDTer will know that if they Play, they expect to get 4 (the game will go Left then Down-Down), and if they Exit, they know they’ll get 5. So they’ll Exit.

The result of all this is that the CDTer will get 12, and the EDTer will get 5. So the CDTer will predictably do better than the EDTer. Indeed, the EDTer will voluntarily choose at stage one to take a lower payout than the CDTer ends up with. This seems bad for EDT, at least if we think that predictably ending up with a lower outcome is bad.

Now you might object that this is because at stage two the demon chooses to treat the EDTer differently to how they treat the CDTer. I don’t really agree for two reasons, though I’m not sure either of these reasons work. (Hence the second and third examples that are about to come.) One is that the demon isn’t trying to harm the EDTer; they are just trying to maximise their return. It so happens that EDT is such an impermissive theory that it doesn’t allow for any flexibility, and the Demon, knowing this, is forced to take choices that are bad for EDT, and indeed worse for Demon than if they ended up at Right-Up-Up. But this isn’t Demon’s fault; it’s the fault of EDT being so impermissive. The other reason is that Demon does not in fact make any choices that hurt the EDTer. The EDTer should expect that Demon will in fact make such choices, in response to their theory, but that’s not quite the same thing. The only player who moves at all in the EDT version of the game is Chooser. So it’s a little hard to say this is just a case where the EDTer is harmed by the demon’s malicious choices.

I think those responses work, but I’m not completely sure that they do. So let’s look at a different example, one where Demon doesn’t have these variable payouts.

### 7.1.2 Example Two - Coins and Signals

This example is a version of a signaling game of the kind introduced by Lewis (1969). And in particular it’s a version of the broadly adversarial kinds of signaling games that are central to the plot of Cho and Kreps (1987), and which we discussed a lot in chapter (coherent). Again, it will involve three stages.

At the first stage a fair coin is flipped, and the result shown to Chooser, but not to Demon.

At the second stage, Chooser will choose Up or Down, and the choice will be publicly announced.

At the third stage, Demon will try to guess what the coin showed. Demon knows the payoff table I’m about to show you, and is arbitrarily good at predicting Chooser’s strategy. That is, Demon can make accurate predictions of the form “If Heads, Chooser will make this choice, and if Tails, they will make that choice.”

The payoffs to each player are a function of what happens at each of the three steps, and are given by the following table.

Table 7.3: Payoffs for the coins and signals game.
Coin Chooser Demon Chooser Payoff Demon Payoff
H U H 40 1
H U T 400 0
H D H 0 1
H D T 0 0
T U H 40 0
T U T 28 1
T D H 28 0
T D T 36 1

Figure 7.2 shows game they are playing in tree form.

Demons’s payoffs are just as you’d expect - they get rewarded iff they figure out how the coin landed. Chooser’s payoffs are more complicated, but the big thing to note is they get the biggest rewards if they manage to play Up while Demon makes an incorrect prediction.

One last thing to stipulate about Demon before we analyse the game. If Demon predicts Chooser will do one thing if Heads and another if Tails, they will use the information from Chooser’s choice to make their guess about how the coin landed. But if they predict Chooser will say the same thing whether the coin landed Heads or Tails, they won’t know how the coin landed, and will flip their own coin to make a guess. So in that case it will be 50/50 whether Demon says Heads or Tails.

Onto the analysis. It should be fairly clear that if the coin lands Heads, the human should say Up. The worst possible return from Up is 40, the best possible return from Down is 0. So that’s what both a CDTer and an EDTer would do, and hence what Demon would predict that they will do.

So what happens if the coin lands Tails? Given Demon will predict Up if Heads, we can work out the value of Up and Down if Tails to the EDTer. If they play Up, Demon will predict that, and hence Demon will flip a coin to choose Heads or Tails. So they have a 50/50 shot at getting either 40 or 28, and so their expected return is 34. If they play Down, Demon will predict that, and hence Demon will say Tails, and they will get a return of 36. Since 36 > 34, they will play Down if Tails.

That’s the unique solution to the game for the EDTer. They play Up if Heads, Down if Tails. Demon can figure out that they’ll do this, so will correctly guess what the coin showed. And they will get 40 if the coin landed Heads, and 36 if it landed Tails, for an expected return of 38.

What should the CDTer do? And, in particular, what should a causal defensivist do? Well, it turns out this is another problem where the theory is not decisive. Doing exactly what the EDTer does is defensible. But it’s also defensible to say Up no matter what. Let’s go over why this is defensible. The question is whether Chooser can endorse their decision to play Up no matter what after each possible result of the coin toss. They can clearly endorse it if the coin lands Heads; in that case Up strictly dominates Down, and strictly dominant are always defensible. What if the coin lands Tails? Well they think they’ll play Up. So they think the demon will flip a coin to guess in this situation. So they think the expected return of Up is 34 (like the EDTer thinks), and the expected return of Down is 32. The key difference here is that when working out the expected return of a non-chosen option, the Chooser who believes in causal defensivism does not change the expected behavior of the demon, while the EDTer does. (This disposition is why dominance reasoning works for them.) So Chooser will think that even if the coin lands Tails, they would do worse on average if they switched to playing Down if Tails. So it follows that they can defensibly play Up either way.

And if they do play this, the rewards are handsome. The demon won’t have any information about the coin, so the demon will flip their own coin. So lines 1, 2, 5 and 6 of the table are all equally likely to appear. So if Chooser plays this strategy, they are equally likely to get a return of 40, 400, 40 or 28, for an overall expected return of 127. And this is much higher than the 38 the EDTer is expected to receive. By changing the payout on line 2 of the table, we can make the gap in expected returns be arbitrarily large.

Now you might object that while the the causal defensivist can do better, it doesn’t follow that EDT is wrong. After all, we’ve just said here that a rival theory may do better. I don’t think that matters much. The point of the WAR is to refute a theory, and if the EDTer does foreseeably worse than one kind of Chooser who follows causal defensivism, that should be enough to refute them. But just in case you think this objection is stronger, we’ll include one last example.

### 7.1.3 Example Three - Coins and Newcomb

This is just like Example Two, with one twist. If the game goes Tails, Down, Tails, then we don’t immediately end the game and make payouts to the players. Instead we play another game, with a familiar structure, and the payouts shown in table 7.4. As always, Demon is really good at predicting Chooser’s play, and Chooser’s payouts are listed first in every cell. (I’m not going to include a tree here, because it is more confusing than helpful.)

Table 7.4: The payouts after tails–down–tails in coins and signals.

PU

PD

U

$$20, 1$$

$$40, 0$$

D

$$16, 0$$

$$36, 1$$

The EDTer will think they’ll get 36 from this game, so the example will be just like Example Two. And the EDTer will play Up if Heads, Down if Tails, for an expected return of 38.

But if Chooser follows causal defensivism, then they will think that if the game gets to this stage, they’ll get 20. So now they think that in the original game, Up dominates Down no matter whether the coin lands Heads or Tails. So now they will definitely play Up no matter what, and get an expected return of 127.

### 7.1.4 Why The Examples Matter

I’ve argued against EDT elsewhere in the book, but note that this section is very much not an argument against EDT; instead, it’s part of a war on WAR. The point of the first example is that any theory whatsoever is subject to a WAR argument. That’s because for any theory whatsoever, you can construct pairs of choices like Left and Right, where the theory says to take choices that lead Demon to preferring to go Left. So for any theory whatsoever, or at least any theory that is consequentialist in the sense popularised by Hammond (1988), there is an example where the theory leads to worse returns. So any consequentialist theory is subject to an objection by WAR. It’s the paradigm of an over-generating objection.

There is perhaps something a bit interesting about the second example, though it isn’t a problem especially for EDT. What makes the second example work is that Chooser is in a situation that rewards unpredictability, but EDT is decisive, and hence predictable. And any decisive theory will be subject to an WAR-style objection from cases like this. Now this isn’t part of my argument for indecisiveness, since I think WAR-style objections are bad. But following the recipe from subsection 7.1.2, it is possible to construct a case where any decisive theory will lead to predictably bad outcomes. It’s always incoherent to endorse the combination of WAR reasoning and decisive decision theories. Even though I reject both of these, I think it’s pretty clear in this case that WAR reasoning should be the first to go. And with it goes not EDT, but one of the historically key arguments for EDT. Let’s turn to a more contemporary argument for EDT next.

## 7.2 To Bet the Impossible Bet

The Ahmed cases all go away if you focus on what’s possible at the end of deliberation, not at the start of it

This might require some contextualism about ability, a la Hawthorne and Pettit, to really make stick

See page 63 of Joyce book for the premise that counterfactuals about the bet must be specified in order for us to have a real bet.