9  Evidence

9.1 A Puzzle About Evidence

Think back to the red-blue game from back in section 2.1, and consider a variant on it with the following two characteristics.1

  • 1 The first five sections of this chapter are based on my (2018).

    • The red sentence is that two plus two equals four.
    • The blue sentence is something that, if known, would be part of the agent’s evidence.

    I’m going to argue that there are cases where the only rational play is Red-True, but the blue sentence is something we want to say that, ordinarily, is part of the subject’s evidence. And I’ll argue that this is a problem for the theory I have described so far. It is not a problem that shows that anything I’ve said so far is untrue. But it does suggest that what I’ve said so far is incomplete, and in a key respect unexplanatory.

    I have tried so far to argue that belief, rational belief, and knowledge are all interest-relative. And I have tried to tell a story about when they are interest-relative. In the case of knowledge, the story is reasonably simple. One loses knowledge that p when the situation changes in such a way that one is no longer entitled to take p as given in deliberation. But what does it mean to say that one is entitled to take something as given? I haven’t given anything like a full theory of this, but the suggestion has been to interpret this on broadly evidentialist lines. For example, one is not entitled to take p as given if the optimal choice, given one’s evidence, is different unconditionally to what it is conditional on p.

    That story doesn’t explain when practical considerations might affect what evidence one has. Indeed, it can’t explain anything about evidence, since it takes evidence as a given. So if the arguments for the interest-relativity of knowledge can be repurposed to show that evidence too is interest-relative, we have a problem. Since I think they can be repurposed in just this way, my project has a problem. The aim of this chapter is to set out just what the problem is, and to suggest a solution to it. I’ll start by arguing that evidence is interest-relative, then come back to what the problem is, and how I’ll aim to solve it.

    The main example I’ll work through is the example of Parveen from subsection 2.3.4. Recall that she’s in a restaurant and notices an old friend, Rahul, across the restaurant. The conditions for detecting people aren’t perfect, and she’s surprised Rahul is here. But still we’d ordinarily say it is part of her evidence that Rahul is in this restaurant. She doesn’t infer this from other facts, and she would not be called on to defend it if she relies on it in ordinary circumstances. She then plays the red-blue game, with these sentences.

    • The red sentence is Two plus two equals four.
    • The blue sentence is Rahul is in this restaurant.

    And the intuitions that raise problems for my view are:

    • The unique rational play for Parveen is Red-True; and
    • If evidence is interest-invariant, then the evidential probability that Rahul is in the restaurant is the same as the evidential probability that two plus two is four.

    Now these intuitions are not inconsistent if evidence is interest-relative. And the point of this chapter will be to investigate, and ultimately endorse, this possibility. But I haven’t told you a story about how evidence can be interest-relative. I haven’t even started such a story. All the stories I’ve told you so far about interest-relativity have presupposed that the relevant evidence can be identified, and then we ask what the evidence warrants as circumstances change. That model is by its nature incapable of saying anything about when interests, or practical situations, affect evidence. The model isn’t wrong - but it is in a crucial respect incomplete. On the one hand, all models are incomplete. On the other hand, it would be odd to have the model’s explanatory ambitions stop somewhere between Anisa’s case and Parveen’s. That’s the kind of explanatory failure that makes one wonder whether you’ve got the original cases right.

    There are two ways out of this problem that I don’t want to take, but are notable enough that I want to set them aside explicitly.

    One is to say that propositions like Rahul is in this restaurant are never part of Parveen’s evidence. Perhaps her evidence just consists of things like I am being appeared to Rahul-like. Such an approach is problematic for two reasons. The first is that it is subject to all the usual objections to psychological theories of evidence (Williamson 2007). The second is that we can re-run the argument with the blue sentence being some claim about Parveen’s psychological state, and still get the result that the only rational play is Red-True. A retreat to a psychological conception of evidence will only help with this problem if agents are infallible judges of their own psychological states, and that is not in general true (Schwitzgebel 2008).

    Another option is to deny that any explanation is needed here. Perhaps pragmatic effects, like the particular sentences that are chosen for this instance of the red-blue game, mean that Parveen’s evidence no longer includes facts about Rahul, and this is a basic epistemic fact without explanation. Now we shouldn’t assume that everything relevant to epistemology will have an epistemic explanation. Facts about the way that proteins work in the brain do not have explanations within epistemology, although they are vitally important for there even being a subject matter of epistemology. So in principle there could be facts around here that ground epistemic explanations without having explanations within epistemology. But in practice things look less rosy. Without an explanation of why Parveen loses evidence, we don’t have a theory that makes predictions about how interests affect knowledge. And we don’t have a satisfying explanation of why playing Blue-True is irrational for Parveen. And we are forced, as already noted, to draw an implausible distinction between Anisa and Parveen.

    We shouldn’t be content with simply saying Parveen loses evidence when playing the red-blue game. We should say why this is so. The aim of the rest of this chapter is to tell a story that meets this explanatory desideratum.

    9.2 A Simple, but Incomplete, Solution

    Let’s take a step back and look at the puzzle more abstractly. We have a person who has some option o, and it really matters whether or not the expected value of o, i.e., v(o), is at least x. (I am assuming that Parveen is in the business of maximising expected utility here; I don’t think the considerations from chapter 6 that tell against expected utility maximisation in some situations are relevant.) It is uncontroversial that her evidence includes some background K, and controversial whether it includes some contested proposition p. It is also uncontroversial that v(o | p) ≥ x, and we’re assuming that for any proposition q that is in her evidence, v(o | q) = v(o). That is, we’re assuming the relevant values are conditional on evidence. We can capture that last assumption with one big assumption that probably isn’t true, but is a harmless idealisation for the purposes of this chapter. Say there is a prior value function v-, with a similar metaphysical status to the mythical, mystical prior probability function. Then for any choice cv(c) = v-(c | E), where E is the evidence Parveen has.

    Now I can offer a simple, but incomplete, solution. Let p be the proposition that she might or might not know, and the question of whether v(o) ≥ x be the only salient question to which p is relevant. Then she knows p only if [v-(o|K) + v-(o|K ∧ p)]/2 ≥ x. That is, we work out the value of o with and without the evidence p, and if the average is greater than x, good enough!

    That solves the problem of Parveen and Rahul. Parveen’s evidence may or may not include that Rahul is in the restaurant. If it does, then Blue-True has a value of $50. If it does not, then Blue-True’s value is somewhat lower. Even if the evidence includes that someone who looks a lot like Rahul is in the restaurant, the value of Blue-True might only be $45. Averaging them out, the value is less than $50. But she’d only play Blue-True if it was worthwhile it play it instead of Red-True, which is worth $50. So she shouldn’t play Blue-True.

    Great! Well, great except for two monumental problems.

    The first problem is that what I’ve said here really only helps with very simple cases, where there is a single decision problem that a single contested proposition is relevant to. There has to be some way to generalise the case to less constrained situations.

    The second (and bigger) problem is that the solution is completely ad hoc. Why should the arithmetic mean of these two things have any philosophical significance? Why not the mean of two other things? Why not some other function, like the geometric mean of them? This looks like a formula plucked out of the air, and there are literally infinitely many other formulae that would do just as well by the one criteria I’ve laid down so far: Parveen must play Red-True.

    Pragmatic encroachment starts with a very elegant, very intuitive, principle: you only know the things you can reasonably take to be settled for the purposes of current deliberation. It does not look like any such elegant, intuitive, principle will lead to some theorem about averaging out the value of an option with and without new evidence.

    Happily, the two problems have a common solution. But the solution requires a detour into some technical work concerning coordination games.

    9.3 The Radical Interpreter

    Many philosophical problems can be usefully thought of as games, and hence studied using game theoretic techniques.2 This is an especially useful move when the problems involve interactions of rational agents. Here, for example, is the game table for Newcomb’s problem, with the human who is usually the focus of the problem as Row, and the demon as Column.

  • 2 The idea of writing Newcomb’s problem as this kind of game is due to William Harper (1986).

  • Table 9.1: Newcomb’s Problem as a game.
    Predict 1 Box Predict 2 Boxes
    Choose 1 Box 1000, 1 0, 0
    Choose 2 Boxes 1001, 0 1, 1

    This game has a unique Nash equilibrium; the bottom right corner. And that’s one way of motivating the view that (a) the game is possible, and (b) the rational move for the human is to choose two boxes.

    Let’s look at a more complicated game. I’ll call it The Interpretation Game. The game has two players. Just like in Newcomb’s problem, one of them is a human, the other is a philosophical invention. In this case the invention is not a demon, but The Radical Interpreter. To know the payouts for the players, we need to know their value function. More colloquially, we need to know their goals.

    • The Radical Interpreter assigns mental states to Human in such a way as to predict Human’s actions given Human rationality. We’ll assume here that evidence is a mental state, so saying what evidence Human has is among Radical Interpreter’s tasks. (Indeed, in the game play to come, it will be their primary task.)
    • Human acts so as to maximise the expected utility of their action, conditional on the evidence that they have. Human doesn’t always know what evidence they have; it depends on what The Radical Interpreter says.

    The result is that the game is a coordination game. The Radical Interpreter wants to assign evidence in a way that predicts rational Human action, and Human wants to do what’s rational given that assignment of evidence. Coordination games typically have multiple equilibria, and this one is no exception.

    Let’s make all that (marginally) more concrete. Human is offered a bet on p. If the bet wins, it wins 1 util; if the bet loses, it loses 100 utils. Human’s only choice is to Take or Decline the bet. The proposition p, the subject of the bet, is like the claim that Rahul is in the restaurant. It is something that is arguably part of Human’s evidence. Unfortunately, it is also arguable that it is not part of Human’s evidence. We will let K be the rest of Human’s evidence (apart from p, and things entailed by Kp), and stipulate that Pr(p |K) = 0.9. Each party now faces a choice.

    • The Radical Interpreter has to choose whether p is part of Human’s evidence or not.
    • Human has to decide whether to Take or Decline the bet.

    The Radical Interpreter achieves their goal if this biconditional is true: human takes the bet iff p is part of their evidence. If p is part of the evidence, then The Radical Interpreter thinks that the bet has positive expected utility, so Human will take it. And if p is not part of the evidence, then The Radical Interpreter thinks that the bet has negative expected utility, so Human will decline it. Either way, The Radical Interpreter wants Human’s action to coordinate with theirs. And Human wants to maximise expected utility. So we get the following table for the game.

    Table 9.2: The Radical Interpreter game.
    p ∈ E p ∉ E
    Take the Bet 1, 1 -9.1, 0
    Decline the Bet 0, 0 0, 1

    We have, in effect, already covered The Radical Interpreter’s payouts. They win in the top-left and lower-right quadrants, and lose otherwise. Human’s payouts are only a little trickier. In the bottom row, they are guaranteed 0, since the bet is declined. In the top-left, the bet is a sure winner; their evidence entails it wins. So they get a payout of 1. In the top-right, the bet wins with probability 0.9, so the expected return of taking it is 1 × 0.9 - 100 × 0.1 = -9.1.

    There are two Nash equilibria for the game - the top left and the bottom right. That there are two equilibria to this game should not come as a surprise. It’s a formal parallel to the fact that the pragmatic encroachment theory I’ve developed so far doesn’t make a firm prediction about this game. It is consistent with the theory developed so far that Human’s evidence includes p, and they should take the bet, or that due to interest-sensitive features of the case, it does not include p, and they should not take the bet. The aim of this chapter is to supplement that theory with one that, at least most of the time, makes a firm pronouncement about what the evidence is.

    But to do that, I need to delve into somewhat more contested areas of game theory. In particular, I need to introduce some work on equilibrium choice. And to do this, it helps to think about a game that is inspired by an example of Rousseau’s.

    9.4 Risk-Dominant Equilibria

    At an almost maximal level of abstraction, a two player, two option each game looks like this.

    Table 9.3: A generic 2 by 2 by 2 game.
    a b
    A r11c11 r12c12
    B r21c21 r22c22

    We’re going to focus on games that have the following eight properties:

    • r11 > r21
    • r22 > r12
    • c11 > c12
    • c22 > c21
    • r11 > r22
    • c11 ≥ c22
    • r21 + r22 > r11 + r12
    • c12 + c22 ≥ c11 + c21

    The first four clauses say that the game has two (strict) Nash equilibria: Aa and Bb. The fifth and sixth clauses say that the Aa equilibria is Pareto-optimal: neither player prefers Aa to Bb. In fact it says something a bit stronger: one of the players strictly prefers the Aa equilibria, and the other player does not prefer Bb. The seventh and eighth clauses say that the Bb equilibria is risk-optimal.

    I’m going to set out an argument presented by Hans Carlsson and Eric van Damme (1993) for the idea that in these games, rational players will end up at Bb. The game that Human and The Radical Interpreter are playing fits these eight conditions, and The Radical Interpreter is perfectly rational, so this will imply that in that game, The Radical Interpreter will say that p ∉ E, which is what we aimed to show. Now I don’t think their argument works in full generality. In particular, I don’t think it works when it is common knowledge that both players are rational, and both players know precisely the values of each of the eight payoffs. But I think it does work in the special case where one player has imperfect access to what the payouts are. And that, it turns out, is the special case that matters to us. But let’s start with their argument.

    Games satisfying these eight inequalities are sometimes called Stag Hunt games. There is some flexibility, and some vagueness, in which of the eight inequalities need to be strict, but that level of detail isn’t important here. The name comes from a thought experiment in Rousseau’s Discourse on Inequality.

    They were perfect strangers to foresight, and were so far from troubling themselves about the distant future, that they hardly thought of the morrow. If a deer was to be taken, every one saw that, in order to succeed, he must abide faithfully by his post: but if a hare happened to come within the reach of any one of them, it is not to be doubted that he pursued it without scruple, and, having seized his prey, cared very little, if by so doing he caused his companions to miss theirs.  (Rousseau 1913, 209–10)

    It is rather interesting to think through which real-life situations are best modeled as Stag Hunts. Many situations that theorists instinctively treat as Prisoners’ Dilemmas turn out, on closer inspection, to be Stag Hunts (Skyrms 2001). This kind of thought is one way in to appreciating the virtues of Rousseau’s political outlook, and especially the idea that social coordination might not require anything like the heavy regulatory presence that, say, Hobbes thought was needed. But that’s a story for another day. What I’m going to focus on is why Rousseau was right to think that a ‘stranger to foresight’, who is just focussing on this game, should take the rabbit.

    To make matters a little easier, we’ll focus on a very particular instance of Stag Hunt, as shown here. (From here I’m following Carlsson and van Damme very closely; this is their example, with just the labelling slightly altered.)

    Table 9.4: A simple version of Stag Hunt.
    a b
    A 4, 4 0, 3
    B 3, 0 3, 3

    At first glance it might seem like Aa is the right choice; it produces the best outcome. This isn’t like Prisoners Dilemma, where the best collective outcome is dominated. In fact Aa is the best outcome for each individual. But it is risky, and Carlsson and van Damme suggest a way to turn that risk into an argument for choosing Bb.

    Embed this game in what they call a global game. We’ll start the game with each player knowing just that they will play a game with the following payout table, with x to be selected at random from a flat distribution over [-1, 5].

    Table 9.5: The global game.
    a b
    A 4, 4 0, x
    B x, 0 xx

    Before they play the game, each player will get a noisy signal about the value of x. There will be signals sR and sC chosen (independently) from a flat distribution over [x - 0.25, x + 0.25], and shown to Row and Column respectively. So each player will know the value of x to within ¼, and know that the other player knows it to within ¼ as well. This is a margin of error model, and in those models there is very little that is common knowledge. That, Carlsson and van Damme argue, makes a huge difference.

    In particular, they prove that iterated deletion of strictly dominated strategies (almost) removes all but one strategy pair. (I’ll go over the proof of this in the next subsection.) Each player will play A/a if the signal is greater than 2, and B/b otherwise.3 Surprisingly, this shows that players should play the risk-optimal strategy even when they know the other strategy is Pareto-optimal. When a player gets a signal in (2, 3.75), then they know that x < 4, so Bb is the Pareto-optimal equilibrium. But the logic of the global game suggests the risk-dominant equilibrium is what to play.

  • 3 Strictly speaking, we can’t rule out various mixed strategies when the signal is precisely 2, but this makes little difference, since that occurs with probability 0.

  • Carlsson and van Damme go on to show that many of the details of this case don’t matter. As long as (a) there is a margin of error in each side’s estimation of the payoffs, and (b) every choice is a dominant option in some version of the global game, then iterated deletion of strongly dominant strategies will lead to each player making the risk-dominant choice.

    Now what does this show about the game where players know precisely what the value of x is? They argue that it shows that the risk-dominant choice is the right choice there as well. After all, the game where there is perfect knowledge just is a margin of error game, where the margin of error is 0. In previous work I’d endorsed this argument (Weatherson 2018). I now think this was a mistake.4 The limit case, where the players know the value of x, is special. But, I’ll argue, this doesn’t matter for our purposes.

  • 4 It would take us far to far afield to go into the reasons why. The short version is that I’ve been convinced that a version of the argument against single-solution concepts in an early note by David Pearce (1983) is sound, and this rules out typical Stag Hunt games having a unique solution.

  • Note that in any game we’re considering, between Human and The Radical Interpreter, Human won’t know precisely what the payoffs are. To see this, think about the case involving Parveen. Given that Parveen’s evidence is not luminous (Williamson 2000), she won’t know precisely what the expected value of acting as if it’s part of her evidence that Rahul is in the restaurant. And since the payoffs in a game table are expected payoffs, she won’t know precisely what her payoffs are. So like the player in Carlsson and van Damme’s global game, she won’t know precisely what game she’s playing. And that’s enough for the iterated dominance argument that she should play the risk-dominant equilibrium to go through.

    To be sure, The Radical Interpreter, who is just an idealisation, presumably does know the payouts in the different states of the game. It turns out, as I’ll go over in subsection 9.4.2, that Carlsson and van Damme’s result only needs that one player is uncertain of the payouts. And given that human evidence is not luminous, that will be the case.

    So assuming rationality requires playing strategies that survive iterated deletion of strongly dominated strategies, in games like the one Human and The Radical Interpreter are playing, Human should play the risk-dominant strategy as long as they don’t know precisely what their own evidence is. And since The Radical Interpreter is rational, they too will play the risk-dominant strategy when Human’s evidence isn’t luminous. In the game with Human, that means the rational strategy for The Radical Interpreter is to say p ∉ E. And in the case of Parveen and Rahul, the rational strategy for The Radical Interpreter is to say that it is not part of Parveen’s evidence that Rahul is in the restaurant. This is an interest-relative theory of evidence; had Parveen been playing a different game, The Radical Interpreter would have said that it is part of Parveen’s evidence that Rahul was in the restaurant.

    From this point all the intuitions about the case fall into place. If it is part of Parveen’s evidence that Rahul is in the restaurant, then she knows this. Conversely, if she knows it, then The Radical Interpreter would have said it is part of her evidence, so it is part of her evidence. Parveen will perform the action that maximises expected utility given her evidence. And she will lose knowledge when that disposition makes her do things that would be known to be sub-optimal if she didn’t lose knowledge.

    In short, this model keeps what was good about the pragmatic encroachment theory developed in the previous chapters, while also allowing that evidence can be interest-relative. It does require a considerably more complex theory of rationality than was previously used. Rather than just model rational agents as utility maximisers, they are modelled as playing playing risk-dominant strategies in coordination games under uncertainty about what the payouts are. Still, it turns out that this is little more than assuming that they maximise evidential expected utility, and they expect others (at least perfectly rational abstract others) to do the same, and they expect those others to expect they will maximise expected utility, and so on.

    The rest of this section goes into more technical detail about Carlsson and van Damme’s example. Readers not interested in these details can skip ahead to the next section. In the first subsection I summarise their argument that we only need iterated deletion of strictly dominated strategies to get the result that rational players will play the risk-dominant strategies. In the second subsection I offer a small generalisation of their argument, showing that it still goes through when one of the players gets a precise signal, and the other gets a noisy signal.

    9.4.1 The Dominance Argument for Risk-Dominant Equilibria

    Two players, Row (or R) and Column (or C) will play the game depicted in table 9.5. They won’t be told what x is, but they will get a noisy signal of x, drawn from an even distribution over [x - 0.25, x + 0.25]. Call these signals sR and sC. Each player must then choose A, getting either 4 or 0 depending on the other player’s choice, or choose B, getting x for sure.

    Before getting the signal, the players must choose a strategy. A strategy is a function from signals to choices. Since the higher the signal is, the better it is to play B, we can equate strategies with ‘tipping points’, where the player plays B if the signal is above the tipping point, and A below the tipping point. Strictly speaking, a tipping point will pick out not a strategy but an equivalence class of strategies, which differ in how they act if the signal is the tipping point. But since that happens with probability 0, the strategies in the equivalence class have the same expected return, and so I won’t distinguish them.

    Also, strictly speaking, there are strategies that are not tipping points, because they map signals onto probabilities of playing A, where the probability decreases as A rises. I won’t discuss these directly, but it isn’t too hard to see how these are shown to be suboptimal using the argument that is about to come. It eases exposition to focus on the pure strategies, and to equate these with tipping points. And since my primary aim here is to explain why the result holds, not to simply repeat an already existing proof, I’ll mostly ignore these mixed strategies.

    Call the tipping points for Row and Column respectively TR and TC. Since the game is symmetric, we’ll just have to show that in conditions of common knowledge of rationality, TR = 2. It follows by symmetry that TC = 2 as well. And the only rule that will be used is iterated deletion of strictly dominated strategies. That is, neither player will play a strategy where another strategy does better no matter what the opponent chooses, and they won’t play strategies where another strategy does better provided the other player does not play a dominated strategy, and they won’t play strategies where another strategy does better provided the other player does not play a strategy ruled out by these first two conditions, and so on.

    The return to a strategy is uncertain, even given the other player’s strategy. But given the strategies of each player, each players’ expected return can be computed. And that will be treated as the return to the strategy pair.

    Note first that TR = 4.25 strictly dominates any strategy where TR = y > 4.25. If sR ∈ (4.25, y), then TR is guaranteed to return above 4, and the alternative strategy is guaranteed to return 4. In all other cases, the strategies have the same return. And there is some chance that sR ∈ (4.25, y). So we can delete all strategies TR = y > 4.25, and similarly all strategies TC = y > 4.25. By similar reasoning, we can rule out TR < -0.25 and TC < -0.25.

    If sR ∈ \([\)-0.75, 4.75\(]\), then it is equally likely that x is above sR as it is below it. Indeed, the posterior distribution of x is flat over \([\)sR - 0.25, sR + 0.25\(]\). From this it follows that the expected return of playing B after seeing signal sR is just sR.

    Now comes the important step. For arbitrary y > 2, assume we know that TC ≤ y. Now consider the expected return of playing A given various values for sR > 2. Given that the lower TC is, the higher the expected return is of playing A, we’ll just work on the simple case where TC = y, realizing that this is an upper bound on the expected return of A given TC ≤ y. The expected return of A is 4 times the probability that Column will play a, i.e., 4 times the probability that sC < TC. Given all the symmetries that have been built into the puzzle, we know that the probability that sC < sR is 0.5. So the expected return of playing A is at most 2 if sRy. But the expected return of playing B is, as we showed in the last paragraph, sR, which is greater than 2. So it is better to play B than A if sR ≥ y. And the difference is substantial, so even if sR is epsilon less than that y, it will still be better to play B. (This is rather hand-wavy, but I’ll go over the more rigorous version presently.)

    So for any y > 2 if TC ≤ y we can prove that TR should be lower still, because given that assumption it is better to play B even if the signal is just less than y. Repeating this reasoning over and over again pushes us to it being better to play B than A as long as sR > 2. And the same kind of reasoning from the opposite end pushes us to it being better to play A than B as long as sR < 2. So we get sR = 2 as the uniquely rational solution to the game.

    Let’s make that a touch more rigorous. Assume that TC = y, and sR is slightly less than y. In particular, we’ll assume that z = y - sR is in (0, 0.5). Then the probability that sC < y is 0.5 + 2z ‑ 2z2. So the expected return of playing A is 2 + 8z ‑ 8z2. And the expected return of playing B is, again, sR. These will be equal \(\frac{\sqrt{145-32y}-9}{16}\). (The working out is a tedious but trivial application of the quadratic formula, plus some rearranging.) So if we know that TC ≥ y, we know that TR ≥ y + \(\frac{\sqrt{145-32y}-9}{16}\), which will be less than y if y > 2. And then by symmetry, we know that TC must be at most as large as that as well. And then we can use that fact to derive a further upper bound on TR and hence on TC, and so on. And this will continue until we push both down to 2. It does require quite a number of steps of iterated deletion. Here is the upper bound on the threshold after n rounds of deletion of dominated strategies. (These numbers are precise for the first two rounds, then just to three significant figures after that.)

    Table 9.6: How the threshold moves towards 2.
    Round Upper Bound on Threshold
    1 4.250
    2 3.875
    3 3.599
    4 3.378
    5 3.195
    6 3.041
    7 2.910
    8 2.798
    9 2.701
    10 2.617

    That is, TR = 4.25 dominates any strategy with a tipping point above 4.25. And TR = 3.875 dominates any strategy with a higher tipping point than that, assuming TC ≤ 4.25. And TR ≈ 3.599 dominates any strategy with a higher tipping point than that, assuming TC ≤ 3.875. And so on.

    And similar reasoning shows that at each stage not only are all strategies with higher tipping points dominated, but so are strategies that assign positive probability (whether it is 1 or less than 1), to playing A when the signal is above the ‘tipping point’. So this kind of reasoning rules out all mixed strategies (except those that respond probabilistically to sR = 2).

    So it has been shown that iterated deletion of dominated strategies will rule out all strategies except the risk-optimal equilibrium. The possibility that x is greater than the maximal return for A is needed to get the iterated dominance going. And the signal to have an error bar to it, so that each round of iteration removes more strategies. But that’s all that was needed; the particular values used are irrelevant to the proof.

    9.4.2 Making One Signal Precise

    The aim of this sub-section is to prove something that Carlsson and van Damme did not prove, namely that the analysis of the previous subsection goes through with very little change if one party gets a perfect signal, while the other gets a noisy signal. So I’m going to discuss the game that is just like the game of the previous subsection, but where it is common knowledge that the signal Column gets, sC, equals x.

    Since the game is no longer symmetric, I can’t just appeal to the symmetry of the game as frequently as in the previous subsection. This slows the proof down, but doesn’t stop it.

    We can actually rule out slightly more at the first step in this game than in the previous game. Since Column could not be wrong about x, Column knows that if sC > 4 then playing b dominates playing a. So one round of deleting dominated strategies rules out TC > 4, as well as ruling out TR > 4.25.

    At any stage, if for y > 2 we know TC ≤ y, then TR = y dominates TR > y. That’s because if sR  ≥ y, and TC ≤ y, then the probability that Column will play a (given Row’s signal) is less than 0.5. After all, the signal is just as likely to be above x as below it (as long as the signal isn’t too close to the extremes). So if sR is at or above TC, then it is at least 0.5 likely that sC = x is at or above TC. So the expected return of playing A is at most 2. But the expected return of playing B equals the signal, which is greater than 2. So if Row knows TC ≤ y, for y > 2, Row also knows it is better to play B if sRy. And that just means that TR ≤ y.

    Assume now that it is common knowledge that TR ≤ y, for some y > 2. And assume that x = sC is just a little less than y. In particular, define z = y - x, and assume z ∈ (0, 0.25). We want to work out the upper bound on the expected return to Column of playing a. (The return of playing b is known, it is x.) The will be highest when TR is lowest, so assume TR ≤ y. Then the probability that Row plays A is (1 + 2z)/2. So the expected return of playing a is 2 + 4z, i.e., 2 + 4(y - x). That will be greater than x only when x < (2 + 4y)/5.

    So if it is common knowledge that TR ≤ y, then it is best for Column to play b unless x < (2 + 4y)/5. That is, if it is common knowledge that TR ≤ y, then TC must be at most (2 + 4y)/5.

    We proceed in a zig-zag fashion. At one stage, we show that TR must be as low as TC. At the next, we show that if it has been proven that TR takes a particular value greater than 2, then TC must be lower still. And this process will eventually rule out all values for TR and TC greater than 2.

    This case is crucial to the story of this chapter because The Radical Interpreter does not have an error bar in their estimation of the game they are playing. But it turns out the argument for risk-dominant equilibria being the unique solution to interpretation games is consistent with that. As long as one player has a margin of error, each player should play the risk-dominant equilibria.

    9.5 Objections and Replies

    Objection: The formal argument requires that in the ‘global game’ there are values for x that make A the dominant choice. These cases serve as a base step for an inductive argument that follows. But in Parveen’s case, there is no such setting for x, so the inductive argument can’t get going.

    Reply: What matters is that there are values of x such that A is the strictly dominant choice, and Human (or Parveen) doesn’t know that they know that they know, etc., that those values are not actual. And that’s true in our case. For all Human (or Parveen) knows that they know that they know that they know…, the proposition in question is not part of their evidence under a maximally expansive verdict on The Radical Interpreter’s part. So the relevant cases are there in the model, even if both players know that they know that they know … that the models don’t obtain, for a high but finite number of repetitions of ‘that they know’.

    Objection: This model is much more complex than the simple motivation for pragmatic encroachment.

    Reply: Sadly, this is true. I would like to have a simpler model, but I don’t know how to create one. I suspect any such simple model will just be incomplete; it won’t say what Parveen’s evidence is. In this respect, any simple model will look just like applying tools like Nash equilibria to coordination games. So more complexity will be needed, one way or another. I think paying this price in complexity is worth it overall, but I can see how some people might think otherwise.

    Objection: Change the case involving Human so that the bet loses 15 utils if p is false, rather than 100. Now the risk-dominant equilibrium is that Human takes the bet, and The Radical Interpreter says that p is part of Human’s evidence. But note that if it was clearly true that p was not part of Human’s evidence, then this would still be too risky a situation for them to know p. So whether it is possible that p is part of Human’s evidence matters.

    Reply: This is all true, and it shows that the view I’m putting forward is incompatible with some programs in epistemology. In particular, it is incompatible with E=K, since the what it takes to be evidence on this story is slightly different from what it takes to be knowledge. I will come back to this point in section 9.6.

    Objection: Carlsson and van Damme discuss one kind of global game. But there are other global games that have different equilibria. For instance, changing the method by which the noisy signal is selected would change the equilibrium of the global game. So this kind of argument can’t show that the risk-dominant equilibrium is the one true solution.

    Reply: This is somewhat true. There are other ways of embedding the game involving Human and The Radical Interpreter in global games that lead to different outcomes. They are usually somewhat artificial; e.g., by having the signal be systematically biased in one way. This game is important because it is the game where the error in Human’s knowledge of the payoffs is determined by their actual epistemic limitations. The precise details are ultimately less important to me than whether we can provide a motivated story of how interests affect knowledge and evidence that does not presuppose we know what the agent’s evidence is. And the method I’ve outlined here shows that we can do that, even if we end up tinkering a bit with the details.

    9.6 Evidence, Knowledge and Cut-Elimination

    In the previous section I noted that my theory of evidence is committed to denying Williamson’s E=K thesis. This is the thesis that says one’s evidence is all and only what one knows. What I say is consistent with, and arguably committed to, one half of that thesis. Nothing I’ve said here provides a reason to reject the implication that if p is part of one’s evidence, then one knows p. Indeed, the story I’m telling would have to be complicated even further if that fails. But I am committed to denying the other direction. On my view, there can be cases where someone knows p, but p is not part of their evidence.

    My main reason for this comes from the kind of cases that Shyam Nair (2019) describes as failures of ‘cut-elimination’. I’ll quickly set out what Nair calls cut-elimination, and why it fails, and then look at how it raises problems for E=K.

    Start by assuming that we have an operator ⊨ such that Γ ⊨ A means that A can be rationally inferred from Γ. I’m following Nair (and many others) in using a symbol usually associated with logical entailment here, though this is potentially misleading. A big plotline in what follows will be that ⊨, so understood, behaves very differently from familiar notions of entailment.

    For the purposes of this section, I’m staying somewhat neutral on what it means to be able to rationally infer A from Γ. In particular, I want everything that follows to be consistent with the interpretation that an inference is rational only if it produces knowledge. I don’t think that’s true; I think folks with misleading evidence can rationally form false beliefs, and I think the traveler in Dharmottara’s example rationally believes there is a fire. But there is a dialectical reason for staying neutral here. I’m arguing against one important part of the ‘knowledge first’ program, and I don’t want to do so by assuming the falsity of other parts of it. So for this section (only), I’ll write in a way that is consistent with saying rational belief requires knowledge.

    So one way to interpret Γ ⊨ A is that A can be known on the basis of Γ. What can be known on the basis of what is a function of, among other things, who is doing the knowing, what their background evidence is, what their capacities are, and so on. Strictly speaking, that suggests we should have some subscripts on ⊨ for who is the knower, what their background evidence is, and so on. In the interests of readability, I’m going to leave all those implicit. In the next section it will be important to come back and look at whether the force of some of these arguments is diminished if we are careful about this relativisation.

    So that’s our important notation. The principle Cut that Nair focuses on is that if 1 and 2 are true, so is 3.

    1. Γ ⊨ A
    2. {A} ∪ Δ ⊨ B
    3. Γ ∪ Δ ⊨ B

    The principle is intuitive. Indeed, it is often implicit in a lot of reasoning. Here is one instance of it in action.

    I heard from a friend that Jack went up the hill. This friend is trustworthy, so I’m happy to infer that Jack did indeed go up the hill. I heard from another friend that Jack and Jill did the same thing. This friend is also trustworthy, so I’m happy to infer that Jill did the same thing as Jack, i.e., go up the hill.

    Normally we wouldn’t spell out the ‘happy to infer’ steps, but I’ve included them in here to make the reasoning a bit more explicit. But note that what I didn’t need to make explicit, even in this laborious reconstruction. That Jack went up the hill goes from a conclusion of the first little argument, to a premise in a later argument. The later argument says that we can conclude from the fact that Jack went up the hill, and that a friend said Jack and Jill did the same thing, that Jill went up the hill. And what matters for our purposes is that there doesn’t seem to be a gap between the rationality of inferring that Jack went up the hill, and the rationality of using that as a premise in later reasoning. The idea that there is no gap here just is the idea that the principle Cut is true.

    But while Cut seems intuitive in cases like this, Nair argues that it can’t be right in general. (And so we have a duty, one he takes up, to explain why cases like Jack and Jill seem like good reasoning.) For my purposes, it is helpful to divide the putative counterexamples to Cut into two categories. I’ll call them monotonic and non-monotonic counterexamples. The categorisation turns on whether Γ ∪ Δ ⊨ A is true assuming that Γ ⊨ A is true. I’ll call cases where it is true monotonic instances of Cut, and cases where it is false non-monotonic instances.

    That Cut fails in non-monotonic cases is fairly obvious. We can see this with an example that was hackneyed a generation ago.

    Γ = {Tweety is a bird}
    Δ = {Tweety is a penguin}
    A = B = Tweety can fly

    From Tweety is a bird we can rationally infer that Tweety flies. And given that Tweety is a flying penguin, we can infer that she flies. But given that Tweety is a penguin and a bird, we cannot infer this. So principles 1 and 2 in Cut are true, but 3 is false. And the same pattern will recur any time Δ provides a defeater for the link between Γ and A.

    These cases will matter in what follows, but they are rather different from the monotonic examples.The monotonic example I’ll set out (in the next three paragraphs) is very similar to one used in an argument against E=K by Alvin Goldman (2009). In many ways the argument against E=K I’m going to give is just a notational variant on Goldman’s, but I think the notation I’m borrowing from Nair helps bring out the argument’s strength.

    Here’s the crucial background assumption for the example. (I’ll come back to how plausible this is after setting the example up.) The nature of F around here varies, but it varies very very slowly. If we find a pattern in common to all the F within distance d of here, we can rationally infer that the pattern extends another mile. That’s just boring induction. But we can’t infer that it extends to infinity. This is to say, we’re doing work that’s more like working out the diet of local wildlife than working out the mass of an electron. If you know the mass of electrons around here, and what pigeons around here eat, there are some inferences you can make. You can come to know what the mass of electrons will be in the next town over, and what pigeons eat in the next town over. But there is a difference between the cases. You can also infer from this evidence what the mass of electrons will be on the other side of the world. But you can’t make very confident inferences about what pigeons eat on the other side of the world; they may have adapted their diet to local conditions. In our case F and G concern things more like pigeon diets than electron masses. Now here is the counterexample.

    Γ = Δ = {Every F within 3 miles of here is G.}
    A = Every F between 3 and 4 miles of here is G.
    B = Every F between 4 and 5 miles of here is G.

    If what I said was right, then this is a counterexample to Cut. Γ ⊨ A is true because it says given evidence about all the F within 3 miles of here, we can infer that all the F within 4 miles are like them. And {A} ∪ Δ ⊨ B is true because because it says given evidence about all the F within 4 miles of here, we can infer that all the F within 5 miles are like them. But Γ ∪ Δ ⊨ A is false, because it purports to say that given evidence about the F within 3 miles of here, we can infer that all the F within 5 miles are alike. And that’s an inductive bridge too far.

    I don’t know if there are instances of F and G where this particular pattern obtains. That is, I don’t know if there are instances of F and G where given a perfect correlation holding within d miles, we can rationally infer it holds within d + 1 miles, but not d + 2 miles. It seems likely to me that something like this could be right, but it’s hard to say for sure.

    What I really need for the argument is independent of how we think spatial distance relates to rational inductive inference. All that’s needed is is that there is some similarity metric such that inductive inference is rational across short jumps in that similarity metric, but not across long jumps. One kind of similarity is physical distance from a salient point. That’s not the only kind of similarity, and rarely the most important kind.

    As long as there is some ‘inductive margin of inference’, the argument works. What I mean by an inductive margin of inference is that given that all the F that differ from a salient point (along this metric) by amount d are G, it is rational to infer that all the F that differ from that salient point by amount d + m are G, but not that all the F that differ from that salient point by amount d + 2m are G. And it seems very plausible to me that there are some metrics, and values of F, G, d, m such that that’s true.

    For example, given what I know about Miami’s weather, I can infer that it won’t snow there for the next few hundred Christmases. Indeed, I know that. But I can’t know that it won’t snow there for the next few million Christmases. There is some point, and I don’t know what it is, where my inductive knowledge about Miami’s snowfall (or lack thereof) gives out.

    While it is plausible that such cases are possible, any particular case fitting this pattern is weird. Here’s what is weird about them. It will be easier to go back to the case where the metric is physical distance to set this out, but the weirdness will extend to all cases. Imagine we investigate the area within 3 miles of here thoroughly, and find that all the F are Gs. We infer, and now know, that all the F within 4 miles of here are Gs. We keep investigating, and keep observing, and after a while we’ve observed all the F within 4 miles. And they are all G, as we knew they would be. But now we are in a position to infer that all the F within 5 miles are G. Observing something that we knew to be true gives us a reason to do something, i.e., make a further inference, that we couldn’t do before. That’s weird, and I’m going to come back in the next section to how it relates to the story I told about knowledge in chapter 4.

    But for now I want to note that it undermines the E=K principle. There is a difference between knowing A and being able to use A to support further inductive inferences. It is very natural to call that the difference between knowing A and having A as evidence.

    The reasoning that I’ve been criticising violates a principle Jonathan Weisberg calls No Feedback (Weisberg 2010, 533–34). This principle says that if a conclusion is derived from some premises, plus some intermediary conclusions, then it is only justified if it could, at least in principle, be derived from those premises alone. A natural way to read this is that we have some evidence, and things that we know on the basis of that evidence have a different functional role from the evidence. They can’t do what the evidence itself can do, even if known. This looks like a problem for E=K, as Weisberg himself notes (2010, 536).

    The non-monotonic cases where cut elimination fails are also tricky for the E=K theorist, but ultimately not as problematic. Here’s how to bring out the problem, and also ultimately how to solve it.

    On day 1, Ankita gets as evidence that Tweety is a flying bird, while Bojan gets as evidence that Tweety is a bird, and infers that Tweety flies. At this stage he knows, as Ankita does, that Tweety flies; this was a perfectly good inference. On day 2, they both get as evidence that Tweety is a penguin. Now Ankita knows something special: Tweety is a flying penguin. But Bojan doesn’t know this. He can no longer infer that Tweety flies, so doesn’t know that Tweety is a flying penguin. And the mystery is to explain what’s happened.

    The theorist who rejects E=K has an easy explanation. Ankita and Bojan had different evidence on day 1, though they knew the same things. Then when more evidence was added into their evidence set, they could do different things. That’s the full mystery solved.

    The theorist who accepts E=K can’t say just this. They have to say that although Bojan did have as part of his evidence that Tweety flies back on day 1, on day 2 this is no longer part of his evidence. Why is it not? Presumably because it was defeated by the new information. Why was it defeated? The explanation for that can’t be that given the new information, his old evidence didn’t support the belief that Tweety flies. That can’t be right because Tweety’s being a penguin doesn’t get in the way of the ‘inference’ Tweety flies, therefore she flies. Instead the story must be, somehow, that this old evidence was defeated, not just the inference from the evidence to this knowledge.

    I’m not sure that the E=K theorist has a good story to tell here. But I’m not sure that they don’t either. Alexander Bird (2004), in the context of replying to a similar objection, points out that everyone is going to need a theory of evidential defeat. That’s right. Unless evidence is taken to be something that is infallible and indefeasible, we have to have some story for how it can be lost. And I certainly don’t want evidence to be infallible and indefeasible; if that were true we wouldn’t have very much evidence. So the puzzle for the E=K theorist - why does Bojan lose this evidence at this time - is a puzzle for everyone. This case is still a problem for E=K. The theorist who rejects E=K has, at least in my opinion, a much nicer story to tell about the difference between Ankita and Bojan’s knowledge. But a problem is not a refutation; and the puzzle this case raises for E=K is a puzzle everyone has to solve.

    The real problems for E=K come from the monotonic counter-instances to cut-elimination. If any such cases exist, it looks like we need to distinguish between things the thinker knows by inference, and things they know by observation, in order to assess their inferences. That’s to say, some knowledge will not play the characteristic role of evidence. And that suggests that E=K is false.

    9.7 Basic Knowledge and Non-Inferential Knowledge

    It would be natural to conclude from the examples I’ve discussed that evidence is something like non-inferential knowledge. This is very similar to a view defended by Patrick Maher (1996). And it is, I will argue, close to the right view. But it can’t be exactly right, for reasons Alexander Bird (2004) brings out.

    I will argue that evidence is not non-inferential knowledge, but rather basic knowledge. The primary difference between these two notions is that being non-inferential is a diachronic notion, it depends on the causal source of the knowledge, while being basic is a synchronic notion, it depends on how the knowledge is currently supported. In general, non-inferential knowledge will be basic knowledge, and basic knowledge will be non-inferential. But the two notions can come apart, and when they do, the evidence is what is basic, not what is non-inferential.

    The following kind of case is central to Bird’s objection to the idea that evidence is non-inferential knowledge. Assume that our inquirer sees that A and rationally infers B. On the view that evidence is non-inferential knowledge, A is evidence but B is not. Now imagine that at some much later time, the inquirer remembers B, but has forgotten that it is based on A. This isn’t necessarily irrational. As Harman (1986) stresses, an obligation to remember our evidence is wildly unrealistic. The inquirer learns C and infers B ∧ C. This seems perfectly rational. But why is it rational?

    If evidence is non-inferential knowledge, then this is a mystery. Since B was inferred, that can’t be the evidence that justifies B ∧ C. So the only other option is that the evidence is the, now forgotten, A. It is puzzling how something that is forgotten can now justify. But a bigger problem is that if A is the inquirer’s evidence, then they should also be able to infer A ∧ C. But this would be an irrational inference.

    So I agree with Bird that we can’t identify evidence with non-inferential knowledge, if by that we mean knowledge that was not originally gained through inference. (And what else could it mean?) But a very similar theory of evidence can work. The thing about evidence is that it can play a distinctive role in reasoning, it provides a distinctive kind of reason. In particular, it provides basic reasons.

    Evidence stops regresses. That’s why we can say that our fundamental starting points are self-evident. Now there is obviously a controversy about what things are self-evident. I don’t find it particularly likely that claims about the moral rights we were endowed with by our Creator are self-evident. But I do think it is true that a lot of things are self-evident. (Even including, perhaps, that we have moral rights.) And we should take this notion of self-evidence seriously. Our evidence is that knowledge which provides basic reasons.

    What is it for a reason to be basic? It isn’t that it was not originally inferred. Something that was once inferred from long forgotten premises may now be a basic reason. Rather, it is something that needs no further reason given as support. (Its support is itself, since it is self-evident.) What makes a reason need further support? I’m an interest-relative epistemologist, so I think this will be a function of the agent’s interests. For example, I think facts reported in a reliable history book are pieces of basic evidence when we are thinking about history, but not when we are thinking about the reliability of that book. But this kind of interest-relativity is essential to the story. What is essential is that evidence provides a reason that does not in turn require more justification.

    This picture suggests an odd result about cases of forgotten evidence. There is a much discussed puzzle about forgotten evidence that was set in motion by Gilbert Harman (1986). He argued that if someone irrationally believes p in the basis of some evidence, then later forgets the evidence but retains the belief, the belief may now be rational. It would not be rational if they remembered both the evidence, and that it was the evidence for p. But, and this is what I want to take away from the case, there is no obligation on thinkers to keep track of why they believe each of the things they do. There is a large literature now on this case; Sinan Dogramaci (2015) both provides a useful guide to the debate and moves it forward by considering what we might aim to achieve by offering one or other evaluation of the believer in this case. The view I’m offering here is, as far as I can tell, completely neutral on Harman’s original case. But it has something striking to say about a similar case.

    So imagine an inquirer, call him Jaidyn, believes p for the excellent reason that he read it in a book from a reliable historian H. Six months later, he has forgotten that that’s where he learned that p, though he still believes that p. In a discussion about historians, a friend of Jaidyn’s says that H is really unreliable. Jaidyn is a bit shocked, and literally can’t believe it. And this is for the best since H is in fact reliable, and his friend is suffering from a case of mistaken identity. But he is moved enough by the testimony to not believe that H is reliable, and so he forms a disposition to not believe anything H says without corroboration. Since he doesn’t know that he believes p because H says so, he doesn’t do anything about this belief. What should we say about Jaidyn’s belief that p?

    Here’s what I want to say. I don’t claim this is particularly intuitive, but I’m not sure there is anything particularly intuitive; it’s best to just see what a theory says about the case. My theory says that Jaidyn still knows that p. This knowledge was once based on H’s testimony, but it is no longer based on that. Indeed, it is no longer based on anything. Presumably, if Jaidyn is rational, the knowledge will be sensitive to the absence of counter-evidence, or to incoherence with the rest of his world-view. But these are checks and balances in Jaidyn’s doxastic system, they aren’t the basis of the belief. Since the belief is knowledge, and is a basic reason for Jaidyn, it is part of his evidence.

    Note three things about that last conclusion. First, this is a case where a piece of inferential knowledge can be in someone’s evidence. By (reasonably) forgetting the source of the knowledge, it converts to being evidence. Second, almost any knowledge could make this jump. Whenever someone has no obligation to remember the source or basis of some knowledge, they can reasonably forget the source, and the basis, and the knowledge will become basic. And then it is evidence. The picture I’m working with is that pieces of knowledge can easily move in and out of one’s evidence set; sometimes all it takes is forgetting where the knowledge came from. But third, if Jaidyn had done better epistemically, and remembered the source, he would no longer know that p.

    It is somewhat surprising that knowledge can be dependent on forgetting. Jaidyn knows that p, but if he’d done better at remembering why he believes p, he wouldn’t know it. Still, the knowledge isn’t grounded in forgetting. It’s originally grounded in testimony from an actually reliable source, and Jaidyn did as good a job as he needed to in checking the reliability of the source before accepting the testimony. Now since Jaidyn is finite, he doesn’t have any obligation to remember everything. And it seems odd to demand that Jaidyn adjust his beliefs on the basis of where they are from if he isn’t even required to track where they are from. It would be very odd to say that Jaidyn’s evidence now includes neither p (if it is undermined by his friend’s testimony), nor the fact that someone said that p. That suggests any p-related inferences Jaidyn makes are totally unsupported by his evidence, which doesn’t seem right.

    So the picture of evidence as basic knowledge, combined with a plausible theory of when forgetting is permissible, suggests that the forgetful reader knows more than the reader with a better memory. I suspect the same thing will happen in versions of Goldman’s explosive inductive argument. Imagine a thinker observes all the Fs within 3 miles, sees they are all G, and rationally infers that all the Fs within 4 miles are G. Some time later they retain the belief, the knowledge actually, that all Fs within 4 miles are G. But they forget that this was partially inferential knowledge, like Jaidyn forgot the source of his knowledge that p. They then make the seemingly sensible inductive inference that all Fs within 5 miles are G. Is this rational, and can it produce knowledge? I think the answer is yes; if they (not unreasonably) forget the source of their knowledge that the Fs 3 to 4 miles away are G, then this knowledge becomes basic. If it’s basic, it is evidence. And if it is evidence, it can support one round of inductive reasoning.

    I’ve drifted a fair way from discussing interest-relativity. And a lot of what I say here is inessential to defending IRT. So I’ll return to the main plotline with a discussion of how my view of evidence helps respond to a challenge Ram Neta issues to IRT, and implies a rejection of a key principle in Jeremy Fantl and Matthew McGrath’s theory of knowledge.

    9.8 Holism and Defeaters

    The picture of evidence I’ve outlined here grounds a natural response to a nice puzzle case due to Ram Neta (2007).5

  • 5 This section draws on section 5 of my (2011).

  • Kate needs to get to Main Street by noon: her life depends upon it. She is desperately searching for Main Street when she comes to an intersection and looks up at the perpendicular street signs at that intersection. One street sign says “State Street” and the perpendicular street sign says “Main Street.” Now, it is a matter of complete indifference to Kate whether she is on State Street–nothing whatsoever depends upon it. (Neta 2007, 182)

    Neta argues that IRT implies Kate knows that she is on State Street, but does not know that she is on Main Street. And, he suggests, this is intuitively implausible. I think I agree with that intuition, so let’s take it for granted and ask whether IRT has this problematic implication.

    Let’s also assume that it is not rational for Kate to take the street sign’s word for it. I’m not sure that’s true actually, but let’s assume it to get the argument going. I think the reason we’re meant to think this is plausible, given IRT, is that her life depends on the sign being correct. And if high stakes make it hard to know things, we can’t know something when our life depends on it on the say-so of a sign. But I often take actions that my life depends on going by the say so of signs. For example, I often turn onto the freeway ramp labelled on ramp, and not the ramp labelled off ramp, without really double checking. And if I was wrong about this it would be a mistake that is frequently fatal. But maybe Kate has some other way of checking where she is - like a map on a phone in her pocket - and it would be irrational to take the sign for granted and not check that other map. So I’m not going to push on this assumption.

    So what evidence should The Radical Interpreter assign to Kate? It doesn’t seem to be at issue that Kate sees that the signs say State and Main. The big question is whether she can simply take it as evidence that she is on State and Main. That is, do the contents of the sign simply become part of Kate’s evidence? (Assume that the signs are accurate and there is no funny business going on, so it is plausible that the signs contribute this evidence.) There are three natural options.

    1. Both signs supply evidence directly to Kate, so her evidence includes that she is on State and that she is on Main.
    2. Neither sign contributes evidence directly to Kate, so her evidence includes what the signs say, but nothing directly about her location.
    3. One sign contributes evidence directly to Kate, but the other does not.

    Option 1 implies that Kate is rational to not check further whether she is on Main Street. And that’s irrational, so option 1 is out.

    Option 3 implies that the signs behave differently, and that The Rational Interpreter will assign them different roles in Kate’s cognitive architecture. But this will be true even though the signs are equally reliable, and Kate’s evidence about their reliability is identical. So Kate treating them differently would be irrational, and The Radical Interpreter does not want to make Kate irrational if it can be helped. So option 3 is out.

    That leaves Option 2. Kate’s evidence does not include that she is on State, and does not include that she is on Main. The latter ‘non-inclusion’ is directly explained by pragmatic factors. The former is explained by those factors plus the requirement that Kate’s evidence is what The Radical Interpreter says it is, and The Radical Interpreter’s desire to make Kate rational.

    So Kate’s evidence doesn’t distinguish between the streets. It does, however, include that the signs say she is on State and that she is on Main. Could she be entitled in inferring that she is on State, but not that she is on Main?

    It is hard to see how this could be so. Street signs are hardly basic epistemic sources. They are the kind of evidence we should be ‘conservative’ about in the sense of Pryor (2004). We should only use them if we antecedently believe they are correct. So for Kate to believe she’s on State, she’d have to believe the street signs she can see are correct. If not, she’d incoherently be relying on a source she doesn’t trust, even though it is not a basic source. But if she believes the street signs are correct, she’d believe she was on Main, and that would lead to practical irrationality. So there’s no way to coherently add the belief that she’s on State Street to her stock of beliefs. So she doesn’t know, and can’t know, that she’s either on State or on Main. This is, in a roundabout way, due to the practical situation Kate faces.

    Neta thinks that the best way for IRT to handle this case is to say that the high stakes associated with the proposition that Kate is on Main Street imply that certain methods of belief formation do not produce knowledge. And he argues, plausibly, that such a restriction will lead to implausibly sceptical results. What to say about this suggestion turns on how we understand what a ‘method’ is. If methods are individuated very finely, like Trust street signs right here, then it’s plausible that Kate should restrict what methods she uses, but implausible that this is badly sceptical. If methods are individuated very coarsely, like Trust written testimony, then it’s plausible that this is badly sceptical, but implausible that Kate should give up on methods this general. I can rationally treat some parts of a book as providing direct evidence about the world, and other, more speculative, parts as providing direct evidence about what the author says, and hence indirect evidence about the world. Similarly, Kate can treat these street signs as indirect evidence about her location, while still treating other signs around her as providing direct evidence. So there is no sceptical threat here.

    But while the case doesn’t show IRT is false, it does tell us something interesting about the implications of IRT. When a practical consideration defeats a claim to know that p, it will often also knock out nearby knowledge claims. Some of these are obvious, like that the practical consideration defeats the claim to know 0=0 → p. But some of these are more indirect. When the inquirer knows what her evidence is, and knows that she has just the same evidence for q as for p, then if a practical consideration defeats a claim to know p, it also defeats a claim to know q. In practice, this makes IRT a somewhat more sceptical theory than it may have first appeared. It’s not so sceptical as to be implausible, but it’s more sceptical than is immediately obvious. This kind of result, where IRT ends up being somewhat sceptical but not implausibly so, has been a theme of many different cases throughout the book.

    9.9 Epistemic Weakness

    The cases where cut-elimination fails raise a problem for the way that Jeremy Fantl and Matthew McGrath spell out their version of IRT. Here is a principle they rely on in motivating IRT.

    When you know a proposition p, no weaknesses in your epistemic position with respect to p—no weaknesses, that is, in your standing on any truth-relevant dimension with respect to p—stand in the way of p justifying you in having further beliefs. (Fantl and McGrath 2009, 64)

    And a few pages later they offer the following gloss on this principle.

    We offer no analysis of the intuitive notion of ‘standing in the way’. But we do think that, when Y does not obtain, the following counterfactual condition is sufficient for a subject’s position on some dimension d to be something that stands in the way of Y obtaining: whether Y obtains can vary with variations in the subject’s position on d, holding fixed all other factors relevant to whether Y obtains. (Fantl and McGrath 2009, 67)

    This gloss suggests that the difference between knowledge and evidence is something that stands in the way of an inference. The inquirer who knows that nearby Fs are Gs, but does not know that somewhat distant Fs are Gs, has many things standing in the way of this knowledge. One of them is, according to this test, that her evidence does not include that all nearby Fs are Gs. Yet this is something she knows. So a weakness in her epistemic position with respect to the nature of nearby Fs, that it is merely evidence and not knowledge, stands in the way of it justifying further beliefs.

    The same thing will be true in the monontonic cases of cut-elimination failure. The thinker whose evidence includes Γ ∪ Δ, and whose inferential knowledge includes A, cannot infer B. But if they had A as evidence, and not merely as knowledge, then they could infer B. So the weakness in their epistemic position, the gap between evidence and knowledge, stands in the way of something.

    I didn’t endorse the principle of Fantl and McGrath’s quoted above, but I did endorse very similar principles, and one might wonder whether they are subject to the same criticism. The main principle I endorsed was that if one knows that p, one is immune from criticism for using p on the grounds that p might be false, or is too risky to use. Equivalently, if the use of p in an inference is defective, but p is known, the explanation of why it is defective cannot be that p is too risky. But now won’t the same problem arise? Our inquirer in the monotonic cut-elimination example can’t use A in reasoning to B. If A was part of their evidence, then it wouldn’t be risky, and they would be able to use it. So the risk is part of what makes the use of it mistaken.

    I reject the very last step in that criticism. The fact that something is wrong, and that it wouldn’t have been wrong if X, does not mean the non-obtaining of X is part of the ground, or explanation, for why it is wrong. If I break a law, then what I do is illegal. Had the law in question been struck down by a constitutional court, then my action wouldn’t have been illegal. Similarly, if the law had been repealed, my action would not have been illegal. But that doesn’t imply that the ground or explanation of the illegality of my action is the court’s not striking the law down, or the later legislature not repealing the law. That is to put too much into the notion of ground or explanation. No, what makes the act illegal is that a particular piece of legislation was passed, and this act violates it. This explanation is defeasible - it would be defeated if a court or later legislature had stepped in - but it is nonetheless complete.

    The same thing is true in the case of knowledge and evidence. Imagine an inquirer who observes all the Fs within 3 miles being G, and infers both that all the Fs within 4 miles are G, and, therefore, that all the Fs within 5 miles are G. The intermediate step is, in a sense, risky. And the final step is bad. And the final step wouldn’t have been bad if the intermediate step hadn’t been risky. But it’s not the riskiness that makes the second inference bad. No, what makes the second inference bad is that it violates Weisberg’s No Feedback principle. That’s what the reasoner can be criticised for, not for taking an epistemic risk.

    There are two differences then between the core principle I rely on - using reasons that are known provides immunity to criticism for taking epistemic risks - and the principle Fantl and McGrath rely on. I use a concept of epistemic risk where they use a concept of strength of epistemic position. I don’t think these are quite the same thing, but they are clearly similar. But the bigger difference is that they endorse a counterfactual gloss of their principle, and I reject any such counterfactual gloss. I don’t say that the person who uses known p is immune to all criticisms that would have been vitiated had p been less risky. I just say that the risk can’t be the ground of the criticism; something else must be. In some cases, including this one, that ‘something else’ might be correlated with risk. But it must be the explanation.

    Of course, this difference between my version of IRT and Fantl and McGrath’s is tiny compared to how much our theories have in common. And indeed, it’s tiny compared to how much my theory simply borrows from theirs. But it’s helpful I think to highlight the differences to understand the choice points within versions of IRT.