Re: Pascal's Mugging

For those who are unfamiliar with Pascal’s Mugging, here’s an excerpt from Wikipedia:

“In Bostrom’s description, Blaise Pascal is accosted by a mugger who has forgotten their weapon. However, the mugger proposes a deal: the philosopher gives them his wallet, and in exchange the mugger will return twice the amount of money tomorrow. Pascal declines, pointing out that it is unlikely the deal will be honoured. The mugger then continues naming higher rewards, pointing out that even if it is just one chance in 1000 that they will be honourable, it would make sense for Pascal to make a deal for a 2000 times return. Pascal responds that the probability of that high return is even lower than one in 1000. The mugger argues back that for any low but strictly greater than 0 probability of being able to pay back a large amount of money (or pure utility) there exists a finite amount that makes it rational to take the bet. In one example, the mugger succeeds by promising Pascal 1,000 quadrillion happy days of life. Convinced by the argument, Pascal gives the mugger the wallet.”

The justification for the possibility of the mugger giving Pascal 1,000 quadrillion happy days of life is basically that “anything’s possible”, however unlikely it may be. For the sake of the argument, I’ll grant that premise. There’s always this general possibility that there’s something we don’t understand that goes beyond where reasoning, evidence, and science can take us. Maybe we’re in a simulation. Maybe I’m a Boltzmann brain. Maybe an evil demon is tricking me about everything. I’m fine with that part of the thought experiment.

The problem I see with Nick Bostrom’s version of Pascal’s mugging is that it suggests a false dichotomy: either Pascal just loses his wallet or he gets 1,000 quadrillion happy days of life. Clearly, these are not the only two possibilities given the justification of “anything is possible”.

Since there’s no evidence that the mugger actually has the ability to grant happy days, I would assign the 1,000 quadrillion happy days outcome the same probability as an outcome where Pascal suffers for 1,000 quadrillion miserable days after giving the mugger his wallet. And I could also justify the possibility of the misery outcome using the same “anything’s possible” justification. In fact, any outcome the mugger suggests with massive reward for giving up the wallet can be countered by suggesting an equally unlikely counterfactual which is just as bad as the good outcome is good, thus cancelling any expected gains from Pascal giving up his wallet.

Pascal’s mugging appears to be dead in the water, but I still think it brings up a point which can be elucidated with another simpler thought experiment:

Imagine an unfair coin. It has a one-third chance of landing heads and a two-thirds chance of landing tails. If it lands heads, you’ll certainly spend three years in pure bliss. If tails, you’ll certainly spend one year in pure agony which is just as bad as the bliss is good. Or you can choose not to flip the coin and neither outcome occurs. What do you do?

Treating each year as one unit of expected value, then the expected value of flipping the coin is positive one third, given by the expected value formula:

3*1/3 + -1*2/3 = +1/3

The expected value of not flipping the coin would be zero. One-third is greater than zero, so flipping the coin is supposedly the rational choice.

I don’t think most people would flip the coin though. That doesn’t necessarily mean that the expected value equation can’t account for human values or that humans behave irrationally in this situation. It’s more probable that we’re just not considering everything we value. If it’s even possible to meaningfully quantify human values using expected value, then there are probably many missing terms in the above equation that vary by individual and change over time.

For example, one possible explanation of the discrepancy between the expected value calculation above and human behavior could be that we humans positively value having a baseline level of well being. In other words, I’ll take a guaranteed so-so outcome over one that is potentially bad. Another possible explanation is that we value the absence of pleasure and pain asymmetrically.

Whatever the true reason is, I don’t see Pascal’s Mugging nor my thought experiment as paradoxical. There is a discrepancy between how humans act and what expected value says, but that can be explained by the fact that the expected value calculations are overly simplistic and don’t fully account for the complexity of human values as programmed by our society, culture, and evolution.

Others have suggested that, to resolve the apparent paradox of Pascal’s Mugging, we should bound utility functions, penalize prior probabilities, or “abandon quantitative decision procedures in the presence of extremely large risks”. I’m fine with these measures, but only insofar as they reflect the way human values behave quantitatively and not for any other justification.

To conclude, I want to say something about the importance of Pascal’s Mugging, The Trolley Problem, and moral thought experiments in general.

They can seem very theoretical, but they’re actually very practical in that they help us figure out what we value by asking us to imagine extreme scenarios. The question of what our values are is relevant because, if we don’t destroy ourselves, we’ll eventually create artificial general intelligence and we really need it to be aligned with those values.