Mixing Expert Opinion

games and decisions unpublished

This paper contributes to the project of articulating and defending the supra-Bayesian approach to judgment aggregation. I discuss three cases where a person is disposed to defer to two different experts, and ask how they should respond when they learn about the opinion of each. The guiding principles are that this learning should go by conditionalisation, and that they should aim to update on the evidence that the expert had updated on. But this doesn’t settle how the update on pairs of experts should go, because we also need to know how the experts are related. I work through three examples showing how the results change given different prior beliefs about this relationship.

Brian Weatherson http://brian.weatherson.org (University of Michigan)https://umich.edu
03-30-2021

What should you do if two experts, each of whom you are disposed to defer to, disagree? The answer depends on what you know about the relationship between the experts’ evidence. I’m going to argue for this dependence claim, and work through three examples that start the process of illustrating the nature of the dependence. The first example concerns a case where the evidence the experts have is maximally independent. This case has been well analysed by Easwaran et al. (2016), and my main contribution is to offer a new (and perhaps more explanatory) proof of their primary conclusion. The second case is where you know what proportion of the experts’ evidence is shared. And the third is where you know that one expert is more informed, but you don’t know which. In each of the last two cases I’ll show the computed exact values of the posterior probabilities after conditionalising on the expert credences, and also show some simple methods for approximating these exact values. The approximations are, I suspect, a little more robust when we move from the simple examples I’ll describe to more realistic ones.

So let’s get more precise about the question we’re asking, and also give names to the characters in the story. (It feels weird to talk about you when I don’t know who you are, so I prefer having named characters.) Assume Player regards Ivy and Zack as experts about \(p\) in the following sense.

  1. If Player learns that Ivy’s credence in \(p\) is \(x\), and nothing else, he will change his credence in \(p\) to \(x\).
  2. If Player learns that Zack’s credence in \(p\) is \(x\), and nothing else, he will change his credence in \(p\) to \(x\).

Given that, what is the answer to this question.

  1. If Player learns that Ivy’s credence in \(p\) is \(y\), and Zack’s credence in \(p\) is \(z\), and nothing else, what should his credence in \(p\) become?

Following Baccelli and Stewart (2021), let’s distinguish two kinds of answers to this question. The supra-Bayesian says that this case, like every other case, calls for conditionalisation. This is going to be the kind of answer I defend. Here’s how we spell this answer out. First, we rewrite (1) and (2) as (4) and (5)

  1. \(\forall x: Cr_P(p | Cr_I(p) = x) = x\)
  2. \(\forall x: Cr_P(p | Cr_Z(p) = x) = x\)

Where \(Cr_P, Cr_I\) and \(Cr_Z\) are Player, Ivy and Zack’s credence functions respectively. Then (3) gets rephrased as a request for the value of

  1. \(Cr_P(p | Cr_I(p) = y \wedge Cr_I(p) = z)\)

That’s good as far as it goes, but it raises two natural questions. First, what reasonable credal functions make (4) and (5) true, and what do they tend to say about (6)? Second, given the massive computational difficulty in calculating values like (6) in real time, are there heuristics for approximating its value in realistic cases? This paper aims to make progress on both questions. It offers some examples of reasonable credal functions satisfying (4) and (5), and uses them to suggest some heuristics for approximating (6) in somewhat realistic cases.

But before we get to those answers, we should look at the other kind of answer Baccelli and Stewart (2021) mention: pooling answers. A pooling answer to (3) says that we should find some function that in some way ‘pools’ \(y\) and \(z\) to answer (3). One obvious such function is the arithmetic mean. The answer to (3) is just \((y + z)/2\). Unfortunately, this won’t do for three reasons. One reason, as proven independently by Gallow (2018) and Bradley (2017) is that it is incompatible with supra-Bayesianism. A second reason, as stressed by Russell, Hawthorne, and Buchak (2015), is that it is in cases where Player defers to Ivy and Zack across a range of questions, this answer is incompatible with Player, Ivy and Zack all updating on external evidence by conditionalisation.1 A third reason, as stressed by Levinstein (2015) and Easwaran et al. (2016) is that in some cases the intuitively correct answer to (3) is not between \(y\) and \(z\).

The last of these reasons is most pressing. The natural response to the first two reasons is to move to some other kind of pooling. Both Russell, Hawthorne, and Buchak (2015) and Baccelli and Stewart (2021) suggest that we should use some kind of geometric pooling instead of linear pooling. In this context, to use geometric pooling is to give an answer to (3) something like2

\[ \frac{\sqrt{yz}}{\sqrt{yz} + {\sqrt{(1-y)(1-z)}}} \]

And that pooling function can be shown to avoid the first two reasons for not using linear pooling. But it can’t avoid the third, and that’s what I’m going to focus on here.

There are three somewhat distinct reasons you might use pooling to answer (3).

First, you might use it as a replacement for supra-Bayesianism. I’m going to argue that if you do this, you also have to give up on Bayesianism across the board. Sometimes the recipient of expert opinion can reliably infer the evidence behind the opinion reliably. In those cases, regular Bayesianism implies that the recipient should update on just that evidence. And that regular, not supra, Bayesian principle is enough to dispose of pooling answers.

There are two more plausible uses for a pooling answer. Second, you might use it as a constraint on supra-Bayesianism. You could argue that if the values that (6) takes for various \(y, z\) do not look like some kind of pooling function, that’s evidence the prior \(Cr_P\) was irrational to start with. And third, you might use it as an approximation for supra-Bayesianism. It’s a lot easier to calculate linear or geometric means than to work out precisely the value of (6). Both of the last two uses are intuitively very plausible. One of the arguments of this paper is that they are, unfortunately, ultimately untenable. There just isn’t much use around here for pooling.

Pooling answers to (3) look a lot like conciliationist approaches to peer disagreement. Indeed, the form of pooling that uses linear averaging is sometimes thought to be a application of the Equal Weight View (Elga 2007). Supra-Bayesian answers look like evidentialist approaches to peer disagreement. In particular, they look a lot like the Total Evidence View (Lackey 2010). I’m going to use an even older motivation for them: the evidentialist approach to testimony defended by Frank Jackson (1987). On Jackson’s view, testimony that \(p\) is evidence that the speaker has evidence for \(p\). The way to rationally update on it depends on what kind of evidence you think the speaker is likely to have, given they’ve concluded \(p\), and what you would (rationally) do with that evidence. Typically, the answer is Conclude p. Jackson argues that while this is typical, it isn’t always the right answer. And it fails to be the right answer in just the cases you shouldn’t accept the speaker’s testimony.

So to simplify here, I’m going to look at some cases where Player can simply deduce, given one of the experts’ credences, what their evidence must have been. And then Player will update on that evidence. As we’ll see, different assumptions about how the evidence of the experts interacts leads to different answers to (3).

Two quick notes. First, I’m only going to look at cases where the experts are treated symmetrically. That’s a restriction, but it’s a useful one for letting us see the range of cases. Second, I’m going to be agreeing with Easwaran et al. (2016) a lot, especially in the first half of the paper. I’m ultimately going to consider some different kinds of cases to what they consider - but that’s a difference in focus, not a difference in conclusions. (They look at a bunch of kinds of cases that I won’t consider as well; it’s not like I’m going strictly beyond their work.) This paper is intended as a complement to theirs, not at all a substitute. But I think it’s a valuable complement, because I’ll show how some very realistic cases require a generalisation of their model, and make some suggestions for what that generalisation should look like.

Case One: Conditionally Independent Evidence

In our first case, the experts’ evidence is as independent as possible. Here’s a story to think about how that could be. Carmen has an urn with 50 marbles, 25 black and 25 white. She draws one at random and marks it with invisible ink. She has a scanner that can detect which marble is marked, but no one else can tell it apart from the other marbles. Let \(p\) be the proposition that the marked marble is white - that’s what we’ll focus on from now on.

After selecting one marble to be marked, she puts together a jar containing the marked marble and 9 other marbles drawn at random from the urn. (I’ll use ‘urn’ for where Carmen keeps all the unused marbles, and ‘jar’ for what she constructs to show the experts.) She shows that to one of the experts, let’s say Ivy. She gets to inspect the jar, i.e., count how many marbles in it are white and black. She then reports to Player, but crucially not to Zack, her credence in \(p\).

In this example, the next thing that happens is that Carmen takes the jar back, removes the 9 unmarked marbles, puts them back in the urn, and draws a new set of 9 marbles. (That set may overlap with the first set of course.) She puts these 9 in the jar, along with the marked marble, and shows the jar to Zack. He examines the jar, and reports to Player his credence in \(p\).

Now in this case we can work out precisely how Player should update on these two pieces of information. When one expert reports a credence of \(x\) in \(p\), Player can infer that they saw \(10x\) white marbles. After all, what the expert knows is just that the marked marble is equally likely to be any of the marbles in the jar they see. So given \(Cr_I(p) = y\) and \(Cr_Z(p) = z\), Player can infer how many white marbles were in each jar. And he can work out the probability of each of those jars turning up given \(p\) and given \(\neg p\). And that’s enough to plug into Bayes’s Theorem to work out a posterior probability for \(p\). When you do that, you get the following result.

  1. \(Cr_P(p | Cr_I(p) = y \wedge Cr_Z(p) = z) = \frac{yz}{yz + (1-y)(1-z)}\)

I’m not going to work through the derivation of this, because it’s a straightforward consequence of something I will derive below. If you do want to check it for yourself, the key input is that the probability of drawing \(x\) white balls in \(t\) draws without replacement from an urn with \(w\) white balls and \(b\) black balls is

\[ \frac{\binom{w}{x} \binom{b}{t-x}}{\binom{w+b}{t}} \]

More importantly, (7) looks just like a special case of the central formula (Upco) that Easwaran et al. (2016) use. And that’s not surprising, since this case uses the same conditional independence assumption that they make through much of their paper. To say that \(A\) and \(B\) are conditionally independent given \(C\) is just to say that \(\Pr(A \wedge B | C) = \Pr(A | C)\Pr(B | C)\). In this case, any pair of claims about how many white balls are in the jars shown to Ivy and to Zack are conditionally independent, both conditional on \(p\) and on \(\neg p\).

The right hand side of (7) also looks a lot like the geometric means described above. The big difference is that the square root signs have disappeared. And that makes a difference, because it means the result violates what Baccelli and Stewart (2021) call Unanimity. This principle requires that \(Cr_P(p | Cr_I(p) = y \wedge Cr_I(p) = y) = y\). If (7) is true then Unanimity is violated in every case except where \(y\) equals 0, 0.5 or 1. But this is bad news for Unanimity, because the case for (7) in this case seems very strong. Player really knows how many white marbles were in each jar, and it’s just a bit of algebra to get from there to (7) via conditionalisation. And it’s very plausible that conditionalisation is the right way to update on evidence about how many marbles are in a jar. So any principle incompatible with (7) is false.

It turns out that varying how many marbles are in the urn Carmen starts with does not change (7). But changing the ratio of white marbles to black marbles in the urn does change the formula. If the proportion of the initial urn that is white is \(r\), then the general result is

  1. \(Cr_P(p | Cr_I(p) = y \wedge Cr_I(p) = z) = \frac{yz(1-r)}{yz(1-r) + (1-y)(1-z)r}\)

Again, this isn’t a new result; Easwaran et al. (2016, 27) derive an even more general formula from which this falls out as a special case. But my way of deriving it is just different enough to be worth including.

Let \(I_x\) be the disjunction of all possible evidence propositions that would lead Ivy to have credence \(x\) in \(p\). In this case \(I_x\) is a simple proposition that there are \(10x\) white marbles in the jar, but we don’t need to assume that \(I_x\) will be anything like that simple. Everything that follows about \(I_x\) also holds for \(Z_x\), the disjunction of all possible evidence propositions that would lead Ivy to have credence \(x\) in \(p\), but I won’t repeat the derivations. Since Player defers to Ivy, i.e., (4) is true, we have the following proof. (All credences are Player’s, so I’ll drop the subscripts.)

\[\begin{align*} Cr(p | I_x) &= x &&\therefore \\ Cr(p \wedge I_x) &= x \cdot Cr(I_x) \\ &= x (Cr(p \wedge I_x) + Cr(\neg p \wedge I_x)) &&\therefore \\ (1-x)Cr(p \wedge I_x) &= x \cdot Cr(\neg p \wedge I_x)) &&\therefore \\ Cr(\neg p \wedge I_x)) &= \frac{1-x}{x} Cr(p \wedge I_x) &&\therefore \\ Cr(I_x | \neg p) &= \frac{(1-x)Cr(p)}{x\cdot Cr(\neg p)}Cr(I_x | p) \end{align*}\]

So we know the ratio of \(Cr(I_x | p)\) to \(Cr(I_x | \neg p)\). That will become useful in what follows. Assuming evidentialism, what matters for (6) is working out the value of \(Cr(p | I_y \wedge Z_z)\). But we now know enough to do that.

\[ Cr(p | I_y \wedge Z_z) = \frac{Cr(p \wedge I_y \wedge Z_z)}{Cr(I_y \wedge Z_z)} \]

Using the general fact that \(X\) is equivalent to \((p \wedge X) \vee (\neg p \wedge X)\), and that Player’s credences are probabilistic, so his credence in an exclusive disjunction equals the sum of the credence in the disjuncts, we know this equals.

\[ \frac{Cr(p \wedge I_y \wedge Z_z)}{Cr(p \wedge I_y \wedge Z_z) + Cr(\neg p \wedge I_y \wedge Z_z)} \]

Since \(Cr(p \wedge X) = Cr(X | p)Cr(p)\), we can rewrite this as

\[ \frac{Cr(I_y \wedge Z_z | p) Cr(p)}{Cr(I_y \wedge Z_z | p)Cr(p) + Cr(I_y \wedge Z_z | \neg p)Cr(\neg p)} \]

And since \(I_y\) and \(Z_z\) are independent given both \(p\) and \(\neg p\), this becomes

\[ \frac{Cr(I_y| p) Cr(Z_z | p) Cr(p)}{Cr(I_y| p) Cr(Z_z | p) Cr(p) + Cr(I_y| \neg p) Cr(Z_z | \neg p) Cr(\neg p)} \]

If we assume the initial value of \(Cr(p) = r\), and use the earlier derived fact that \(Cr(I_x | \neg p) = \frac{(1-x)r}{x(1-r)}Cr(p)\) this becomes

\[ \frac{Cr(I_y| p) Cr(Z_z | p)r}{Cr(I_y| p) Cr(Z_z | p)r + \frac{(1-y)r}{y(1-r)} Cr(I_y| p) \frac{(1-z)r}{z(1-r)} Cr(Z_z | p) (1-r)} \] Now we can finally eliminate \(Cr(I_y| p) Cr(Z_z | p)r\) from the top and bottom, so this becomes

\[ \frac{1}{1 + \frac{(1-y)(1-z)r}{yz(1-r)}} \]

Or in other words

\[ \frac{yz(1-r)}{yz(1-r) + (1-y)(1-z)r} \]

And that’s the completely general result when the evidence the experts has is conditionally independent of both \(p\) and \(\neg p\), and Player starts with credence \(r\) in \(p\).

But this case is surely rare. Experts typically have some training in common that isn’t shared by non-experts. So their reasons for having a credence in \(p\) that differs from our prior will not be completely independent. Easwaran et al. (2016) note that sometimes we can adjust for the common evidence by conditionalising on the common evidence to come up with a new ‘prior,’ or perhaps I should say ‘intermediate’ credence, \(r\), then applying this formula. This is slightly more general, but still not a lot. Part of what makes us non-experts be non-experts is that we don’t have this common training, so we can’t identify what’s common between the experts. Let’s see if we can come up with a slightly more general case.

Case Two: Common Marbles

In our second case, Carmen once again has an urn with 50 marbles, 25 black and 25 white. She draws one at random and marks it with invisible ink. She can tell which one this is, but no one else can. And \(p\) is still the proposition that the marked marble is white - that’s what we’ll focus on from now on. After selecting the marble to be marked, she puts together a jar containing the marked marble and 9 other marbles drawn at random from the urn. She shows that to one of the experts, let’s say Ivy. She gets to inspect the jar, i.e., count how many marbles in it are white and black. She then reports to Player, but crucially not to Zack, her credence in \(p\).

So far, it’s just like the last case. But what happens next is (possibly) different. In this case, Carmen removes \(m\) unmarked marbles from the jar, puts them back in the urn, and draws a new set of \(m\) marbles to put in the jar. It’s all random, so this could include some of the marbles she just removed. She shows the jar to Zack, he inspects it, and reports his credence in \(p\) to Player. And, crucially, player knows \(m\), the number of marbles that are in common between the jars. So \(m\) is a measure of the independence of the experts’ opinions.

Once again, we can work out precisely what Player’s credence should be given \(m\), and the two credences. Unfortunately, it’s just a long formula that doesn’t seem to reduce nicely. But if you’ve got a machine that’s good at calculating hypergeometric distributions, and you dear reader are probably reading this paper on a machine that’s good at calculating hypergeometric distributions, it’s not that hard to calculate the values by brute force. I won’t list all the values, there are several hundred of them, but I’ll present them graphically here. (Note I’ll leave off the case where one or other expert announces a credence of 0 or 1; in that case Player knows whether \(p\) is true, so the question of how to merge the credences is easy.)

Here is how to read the graph. Each row corresponds to a partiular credence announced by Ivy; the credence is shown on the right. Each column corresponds to a partiular credence announced by Zack; that credence is shown on the top. The x-axis of the individual graphs shows the value for \(m\), the number of marbles removed. And the y-axis shows Player’s final credence in \(p\). There are more dots on some graphs than others because some combinations of Ivy credence, Zack credence and \(m\) are impossible. The announced credences can’t, by the rules of the game, differ by more than \(0.1m\).

One notable feature of that graph is that as \(m\) gets larger, the final credence tends to move away from 0.5; it tends to get more opinionated. Another notable feature, though probably not one you can see in this resolution, is that this move towards greater opinionation happens in a surprisingly linear fashion. To a first approximation, Player’s credence moves away from 0.5 roughly the same amount for each addition to \(m\), at least holding \(y\) and \(z\) fixed.

It’s not perfectly linear, but it’s much closer than I would have guessed looking at how really quite non-linear the inputs are. Let’s zoom in on a part of the graph to see this more vividly.

The curve in the bottom right panel is not really linear; it definitely curves downwards. But as you move your eye upwards and leftwards in the table, the curves look much much straighter. The panel where they both announce 0.7 is really remarkably straight. If we focus on the middle of the big graph, this is even more striking. (I’ve left off the cases where Zack announces a credence under 0.5 because those graphs are just mirror images of graphs already shown.)

Why does this matter? Because pooling functions are easy to use, and the supra-Bayesian needs something to match that ease of use. It’s a cliche that for every problem there is a solution that is simple, intuitive, and wrong. And the version of the pooling approach that uses linear averages is very simple, very intuitive, and very wrong. The version that uses geometric averages strikes most people as less simple and intuitive (or maybe I’m just bad at explaining it), but it is less wrong. But still, sometimes simple, intuitive and wrong is exactly what you need! Computation is hard, life is short, precision is overrated. Why not just average if you are just looking to get something roughly right?

The supra-Bayesian can exploit the more-or-less linearity of the graphs above graphs to come up with an approximation to these ideal Bayesian credence. And the approximation isn’t that much harder to calculate than the geometric average. Intuitively, it works like this. If the experts have exactly the same evidence, we take the geometric average of their opinions.3 If the experts’ evidence is conditionally independent, we use the formula from Easwaran et al. (2016) that I rederived in the last section. In between, we just need a guess \(k\) about what proportion of the evidence they share, and what is independent. And we use that guess to come up with an average of those two things, the geometric average and the formula for conditionally independent evidence. So our estimation of the new credence is this, where \(y\) and \(z\) are the announced credences, and \(k\) is the measure of independence of the evidence.

\[ (1-k)\frac{\sqrt{yz}}{\sqrt{yz} + \sqrt{(1-y)(1-z)}} + k\frac{yz}{yz + (1-y)(1-z)} \]

Let’s check visually how this does against the exact calculations. In the graphs that follow, I’ll use circles for the ideally calculated posterior credences, and triangles for the estimates made using this formula.