• @[email protected]
    link
    fedilink
    English
    1
    edit-2
    9 days ago

    Person B’s predicted outcome was closer to the truth.

    Perhaps person A’s prediction would improve if multiple trials were allowed. Perhaps their underlying assumptions are wrong (ie the coins are not unweighted).

    • @[email protected]
      link
      fedilink
      49 days ago

      Perhaps person A’s prediction would improve

      But in this hypothetical scenario of explicitly unweighted coins, Person A was entirely correct in the odds they gave. There’s nothing to improve.

      • @[email protected]
        link
        fedilink
        English
        1
        edit-2
        9 days ago

        We are talking about testing a model in the real world. When you evaluate a model, you also evaluate the assumptions made by the model.

        Let’s consider a similar example. You are at a carnival. You hand a coin to a carny. He offers to pay you $100 if he flips heads. If he flips tails then you owe him $1.

        You: The coin I gave him was unweighted so the odds are 50-50. This bet will pay off.

        Your spouse: He’s a carny. You’re going to lose every time.

        The coin is flipped, and it’s tails. Who had the better prediction?

        You maintain you had the better prediction because you know you gave him an unweighted coin. So you hand him a dollar to repeat the trial. You end up losing $50 without winning once.

        You finally reconsider your assumptions. Perhaps the carny switched the coin. Perhaps the carny knows how to control the coin in the air. If it turns out that your assumptions were violated, then your spouse’s original prediction was better than yours: you’re going to lose every time.

        Likewise, in order to evaluate Silver’s model we need to consider the possibility that his model’s many assumptions may contain flaws. Especially if his prediction, like yours in this example, differs sharply from real-world outcomes. If the assumptions are flawed, then the prediction could well be flawed too.