Journal of Experimental Social Psychology 96 (2021) 104154

Contents lists available at ScienceDirect

Journal of Experimental Social Psychology
journal homepage: www.elsevier.com/locate/jesp

Retrospective and prospective hindsight bias: Replications and extensions
of Fischhoff (1975) and Slovic and Fischhoff (1977)☆
Jieying Chen a, *, 1, Lok Ching Kwan (Roxane)b, 1, Lok Yeung Ma (Loren)b, 1, Hiu Yee Choi
(HayleyAnne)b, 1, Ying Ching Lo (Lita)b, 1, Shin Yee Au (Sarah)b, 1, Chi Ho Tsang (Toby)b, 1,
Bo Ley Cheng b, Gilad Feldman b, *
a
b

Department of Business Administration, University of Manitoba, Canada
Department of Psychology, University of Hong Kong, Hong Kong SAR, China

A R T I C L E I N F O

A B S T R A C T

Keywords:
Hindsight bias
Knew-it-all-along effect
Outcome knowledge
Judgment and decision making
Surprise
Confidence
Pre-registered replication

Hindsight bias refers to the tendency to perceive an event outcome as more probable after being informed of that
outcome. We conducted very close replications of two classic experiments of hindsight bias and a conceptual
replication testing hindsight bias regarding the perceived replicability of hindsight bias. In Study 1 (N = 890), we
replicated Experiment 2 in Fischhoff (1975), and found support for hindsight bias in retrospective judgments
(dmean = 0.60). In Study 2 (N = 608), we replicated Experiment 1 in Slovic and Fischhoff (1977), and found
support for hindsight bias in prospective judgments (dmean = 0.40). In Study 3 (N = 520) we found strong support
for hindsight bias regarding perceived likelihood of our replication of hindsight bias (d = 0.43–1.03). We also
included extensions examining surprise, confidence, and task difficulty, yet found mixed evidence with weak to
no effects. We concluded support for hindsight bias in both retrospective and prospective judgments, and in
evaluations of replication findings, and therefore call for establishing measures to address hindsight bias in
valuations of replication work and interpreting research outcomes. All materials, data, and code, were shared on:
https://osf.io/nrwpv/.

1. Hindsight bias
Hindsight bias refers to the tendency to perceive an event outcome as
more probable after being informed of that outcome, resulting in the
illusion that the outcome “was known all along” (Fischhoff, 1975;
Hawkins & Hastie, 1990; Roese & Vohs, 2012). Examples of hindsight
bias include claims that a surprising movie ending was actually pre­
dictable, post-election claims that it was obvious who would get elected,
students feeling like they knew in advance that an unlikely question was
to be on the exam, or financial analysts claiming to have predicted
market changes after they happened. Hindsight bias may also affect
researchers' interpretations of study findings, leading to an over­
estimation of their ability to predict the results beforehand and an un­
derestimation of their reliance on the observed outcomes in

reconstructing their previous predictions (Fischhoff, 1977).
The earliest empirical investigation that touches upon the idea of
hindsight bias that we know of dates back to Forer's (1949) study about
students' beliefs about a personality test (see Hoffrage & Pohl, 2003).
Students were asked to rate the extent to which the test revealed basic
characteristics of their personality, and then recall their ratings after
knowing that the feedback received by all students was the same.
Although Forer (1949) focused on examining how individuals could be
fooled by universal statements about personality (e.g., “At times you are
extroverted, affable, sociable, while at other times you are introverted,
wary, reserved”), this study uncovered the unexpected finding that
feedback may affect memory.
A more formal investigation of hindsight bias came in the mid-1970s,
when Fischhoff (1975) published a study that explicitly compared the

This paper has been recommended for acceptance by Professor Michael Kraus.
* Corresponding author.
E-mail addresses: jieying.chen@umanitoba.ca (J. Chen), rk1128@hku.hk, rk1128@connect.hku.hk (L.C. Kwan), loren14@connect.hku.hk (L.Y. Ma), hychoi@
connect.hku.hk (H.Y. Choi), u3527928@connect.hku.hk (Y.C. Lo), u3519865@connect.hku.hk (S.Y. Au), tbtsang@connect.hku.hk, 13tsangtc1@kgv.hkBo
(C.H. Tsang), boleystudies@gmail.com (B.L. Cheng), gfeldman@hku.hk (G. Feldman).
1
Contributed equally, joint first authors
☆

https://doi.org/10.1016/j.jesp.2021.104154
Received 22 May 2020; Received in revised form 1 April 2021; Accepted 19 April 2021
0022-1031/© 2021 Elsevier Inc. All rights reserved.

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

probability estimates of outcomes before (in foresight) and after (in
hindsight) knowing what outcome actually occurred. In this pioneering
study, participants were presented with four scenarios and four possible
outcomes following each scenario. Then, they were asked to estimate the
probabilities of possible outcomes in those scenarios. Some participants
were informed of the outcomes of the scenarios, whereas the rest were
not. Fischhoff found that participants with outcome knowledge esti­
mated the probability of the informed outcome to be higher than par­
ticipants who were not given any outcome information, demonstrating
hindsight bias. Because this effect held despite the instructions to ignore
outcome knowledge, Fischhoff (1975) suggested that individuals were
either unaware of their bias, or, if they were aware, they were unable to
make judgments in a foresightful state of mind (though Dietvorst and
Simonsohn, 2019 suggested an alternative accuracy-based account).
Since the Fischhoff (1975) article was published, hindsight bias has
attracted much scholarly attention and led to a sizable body of follow-up
research. Several studies investigated whether hindsight bias was “real,”
or whether it was induced by demand characteristics. For example,
Fischhoff (1977) and Wood (1978) found that hindsight bias still held
when outcome knowledge was provided as isolated statements, when
outcome knowledge was provided with a delay, and when participants
were asked to respond as if they were a general college student who
might not have known the outcome. These findings alleviated the
concern about demand characteristics.
Later studies also differentiated between two main ways to examine
hindsight bias (Pohl, 2007). The design used by Fischhoff (1975) is
termed the hypothetical design, as participants in the hindsight condi­
tion receive feedback about the actual outcome (or, the correct answer),
but are asked to answer as if they did not know the outcome. These “as
if” answers are then compared with answers by participants in the
foresight condition who receive no feedback. The other design is the
memory design, in which participants in the hindsight condition first
answer some questions, then are informed of the correct answer, and at
the end are asked to recall their initial answers (Fischhoff & Beyth, 1975;
Wood, 1978). Their recalled answers are then compared with their
initial answers.
The hypothetical design and the memory design share many simi­
larities, yet one distinction between them is noteworthy: hindsight bias
detected using the memory design is mostly associated with memory
distortion and/or the feeling that the known outcome was to happen
inevitably, whereas hindsight bias that occurs in the hypothetical design
may entail more complex psychological processes (Roese & Vohs, 2012).
Hindsight bias has had significant impact on a wide array of disci­
plines going beyond psychology, such as economics, management,
health science, and law (e.g., Bukszar & Connolly, 1988; Casper, Bene­
dict, & Perry, 1989; Kaplan & Barach, 2002; Thaler, 2016).

will also be biased toward the feedback, demonstrating hindsight bias.
The Selective Activation and Reconstructive Anchoring (SARA; Pohl,
Eisenhauer, & Hardt, 2003) model assumes that individuals generate
answers, encode feedback, and recall answers based on a probabilistic
sampling of associations among external cues and units in the knowl­
edge base. When individuals encode the feedback into their knowledge
base, the associations among external cues, feedback, and units that are
similar to the feedback are strengthened. This will render units that are
more similar to the feedback more likely to be activated in a memory
search using those external cues (i.e., selective activation). In addition,
after seeing the feedback, individuals may still maintain the feedback in
the working memory, or have increased cognitive accessibility to the
feedback due to its recent activation. In these cases, feedback may be
used as internal retrieval cues, making units similar to the feedback
more likely to be retrieved to the working memory (i.e., biased recon­
struction). According to SARA, either selective activation or biased
reconstruction, or both, can lead to hindsight bias.
In both RAFT and SARA, when encoding feedback, the changes to the
knowledge base, cue values, and associations occur automatically. Such
knowledge updating is often seen as an adaptive learning process (e.g.,
Hawkins & Hastie, 1990; Hertwig, Fanselow, & Hoffrage, 2003; Hof­
frage et al., 2000; Pohl, Bender, & Lachmann, 2002). However, as
Bernstein et al. (2011, p. 389) wrote, “the downside of such automatic
knowledge updating is that people tend to forget their original, naive
thoughts, views, and predictions.”
Other eminent models about the psychological processes underlying
hindsight bias include the causal model theory (Blank & Nestler, 2007),
Pezzo's (2003) sense-making model, Roese and Vohs' (2012) three-level
model, and Sanna and Schwarz's (2006) metacognitive model.
3. Role of surprise, overconfidence, and task difficulty
Emotions such as surprise and overconfidence have been suggested
as factors in cognitive and metacognitive processes leading to hindsight
bias (Bernstein, Aßfalg, Kumar, & Ackerman, 2016). Fischhoff and Beyth
(1975, p. 12) argued that “the occurrence of an event increases its
reconstructed probability and makes it less surprising than it would have
been had the original probability been remembered.” They operation­
alized surprise as “the occurrence of an unlikely event or the nonoc­
currence of a likely event” (Fischhoff & Beyth, 1975, p. 12), and found
that outcome knowledge reduced surprise (i.e., participants made
decreased probability estimates of unlikely events and increased prob­
ability estimates of likely events after knowing the outcome). Slovic and
Fischhoff (1977, Experiment 3) was the first study that we know of to
examine the relationship between subjective surprise feelings and
hindsight bias. In this experiment, “hindsight subjects assessed the
surprisingness of the reported outcome, and foresight subjects assessed
how surprising each of the two possible outcomes would seem were they
obtained” (Slovic & Fischhoff, p. 549). They found direct support for the
hypothesis that hindsight participants who had outcome knowledge felt
less surprised about the outcome than foresight participants who had no
outcome knowledge. Later studies investigating the role of surprise in
hindsight bias either measured surprise as a subjective feeling (e.g.,
Hoch & Loewenstein, 1989; Ofir & Mazursky, 1997) or manipulated
surprise using expected outcomes or high cognitive loads (e.g., Mazur­
sky & Ofir, 1990; Müller & Stahlberg, 2006).
In addition, some studies found that when experiencing surprise
about a highly unusual outcome, individuals may show a reversed
hindsight bias, such that their reconstructed probability estimates of the
outcome becomes lower than their initial probability estimates
(Mazursky & Ofir, 1990; Müller & Stahlberg, 2007; Ofir & Mazursky,
1997). The underlying rationale is that hindsight bias often results from
a cognitive failure to become aware of the distorted memory and evi­
dence reconstruction, and to recognize how much oneself has learned
from the outcome knowledge prior to the estimation. The feeling of
surprise is linked with an awareness that the outcome is different from

2. Reasons for hindsight bias: Emotions
Multiple factors were suggested as possible causes for hindsight bias
(Blank, Musch, & Pohl, 2007; Hawkins & Hastie, 1990; Roese & Vohs,
2012), including 1) cognitive processes such as memory impairment,
biased reconstruction, and sense-making, 2) meta-cognitive processes
involving experiences such as surprise, confidence, experienced fluency,
ease of reasoning, and 3) social-motivational processes to increase
controllability and enhance self-image.
Several models have been proposed to explain hindsight bias. The
Reconstruction After Feedback with Take the Best (RAFT; Hoffrage,
Hertwig, & Gigerenzer, 2000) model suggested that when a direct recall
of the initial answer is not possible, individuals try to reconstruct their
initial answer by using relevant cues to reevaluate the question. Both the
initial evaluation and the reconstructed evaluation are based on a Take
the Best heuristic, where decision is based on the cue that discriminates
among choices and has the highest validity. Because feedback trans­
forms the values of elusive cues into discriminating ones and shifts cue
values asymmetrically toward the feedback, the reconstructed answer
2

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

what they would have expected given their knowledge of the event.
Therefore, when experiencing high levels of surprise, individuals are
more likely to conclude that they “never would have known it,” esti­
mating the outcome probability to be lower (rather than higher) than
the estimates made by individuals without outcome knowledge
(Mazursky & Ofir, 1990; Müller & Stahlberg, 2007; Ofir & Mazursky,
1997; Sanna & Schwarz, 2006).
Whereas surprise may help individuals overcome hindsight bias,
overconfidence may exacerbate hindsight bias, as it reduces individuals'
scrutiny of their own decision-making process and hinders the recog­
nition of the impact of outcome knowledge (Bernstein et al., 2016).
Winman, Juslin, and Björkman (1998) found support for a confidencehindsight mirror effect: tasks that yielded overconfidence led to a
hindsight bias, whereas tasks that yielded underconfidence led to a
reversed hindsight bias.
The impact of overconfidence and hindsight bias may escalate. For
example, physicians may become more overconfident about their
judgments of certain physiological indices over time due to accumulated
outcome knowledge, which can lead to increasingly stronger hindsight
bias (Arkes, 2013). However, studies indicated little to no relationship
between physicians' confidence about their judgments of physiological
indices and the real accuracy of those judgments (e.g., Dawson et al.,
1993; Yang & Thompson, 2010). Thus, without proper caution, the
escalation of overconfidence and hindsight bias may lead to undesirable
consequences in high-stake decisions.
Other studies investigated the role of task difficulty in hindsight bias
(e.g., Harley, Carlsen, & Loftus, 2004), based on the assumption that
task difficulty is related to both surprise about the outcome and confi­
dence about the accuracy of one's own judgment (Winman et al., 1998).
The arguments are similar to those regarding surprise and confidence.

combination of hindsight bias and confirmation bias (Wagenmakers,
Wetzels, Borsboom, van der Maas, & Kievit, 2012) may lead researchers
to analyze the data and interpret replication findings in a way that
would favor initial findings, or feel pressured to do so by original au­
thors, reviewers, editors, and other gatekeepers in the publication,
promotion, and grant systems that perceive original findings as taken for
granted or more authoritative. One way of addressing these problems is
by encouraging direct close open replications by multiple third-party
researchers (Brandt et al., 2014; Nosek et al., 2012; Nosek et al.,
2018). Several mass open-science collaboration teams have been formed
in the last decade to pursue this direction, such as the Psychological
Science Accelerator (Moshontz et al., 2018), Collaborative Replications
and Education Project (Wagge et al., 2019), and Many Labs (e.g.,
Ebersole et al., 2020; Klein et al., 2018).
However, the success of these initiatives depends on slow-to-change
publication, granting, and promotion systems that may hinder these
efforts. For example, grant authorities may be reluctant to fund, and
reviewers and editors may be reluctant to publish, perceiving that this
research question has already been addressed and therefore replications
hold no contribution. This proposed impact of hindsight bias on the
estimation of replication outcome and the evaluation of contribution of
replication studies awaits empirical tests. Initial findings regarding
journals conducting Registered Reports, publication accepted peer
reviewed pre-registrations prior to data collection, both demonstrate
these issues and show promise in addressing them (Chambers & Tza­
vella, 2020; Scheel, Schijen, & Lakens, 2021).
5. Current investigation: Two replications, extensions, and a
new study
In this research, we conducted a close replication of hindsight bias in
retrospective judgment (Study 1), a close replication of hindsight bias in
prospective judgment (Study 2), and a study to examine possible hind­
sight bias regarding replicability of hindsight bias (Study 3).
We aimed to address mixed evidence regarding the magnitude and
generalizability of hindsight bias. An early meta-analysis study con­
ducted by Christensen-Szalanski and Willham (1991) on 122 studies on
hindsight bias suggested a small effect size of d = 0.35, 95% confidence
interval (CI) [0.28, 0.41] (sample-size corrected effect size d = 0.52,
95% CI [0.43, 0.61]). A more recent meta-analysis study based on 252
independent effect sizes revealed a similar sample-size-corrected effect
of d = 0.39, 95% CI [0.36, 0.42] (Guilbault, Bryant, Brockway, & Pos­
avac, 2004). In contrast to the two meta-analytical studies, the initial
study of hindsight bias by Fischhoff (1975) suggested a much larger
effect size (d = 1.13) for the supported contrasts between foresight and
hindsight. A replication study of Fischhoff and colleagues' classic hind­
sight bias studies may help examine replicability of the effect using the
same stimuli four and a half decades later, to provide an up-to-date es­
timate of the effect to aid researchers design follow-up studies (Simons,
Holcombe, & Spellman, 2014).
We aimed to revisit and examine the replicability of these classic
findings, following calls for a credibility revolution following what was
coined a “replication/reproducibility crisis” in psychology (e.g., Klein
et al., 2018; Open, 2015) and science overall (Camerer et al., 2016;
Camerer et al., 2018; Gelman & Loken, 2013; Ioannidis, 2005). Datasets
and code for the three studies were shared on: https://osf.io/nrwpv/.

4. Implications of hindsight bias for Science
Hindsight bias holds implications for science, and shows the
importance of the ongoing credibility revolution in promoting open
science practices (Hom Jr & Van Nuland, 2019; Kerr, 1998; Nosek,
Ebersole, DeHaven, & Mellor, 2018; Shrout & Rodgers, 2018; Veldkamp,
2017). First, retrospective hindsight bias suggests that being presented
with a study's outcome may lead to overestimating the probability of
that outcome. This may result in the skewed perception that this
outcome was the expected result and in line with own expectations even
when it was not the case. Past research has shown that when evaluating
research findings, individuals who had outcome knowledge perceived
the research findings to be more obvious and inevitable than individuals
who had no outcome knowledge (Wong, 1995). The false belief of
having known the outcome all along may lead to Hypothesizing After
the Results are Known (HARKing; i.e., presenting a post-hoc hypothesis
as if it were an a priori hypothesis; Kerr, 1998), which has been iden­
tified as a questionable research practice (QRPs). HARKing makes
exploratory analyses seem as if they were confirmatory, thereby leading
to an overconfidence in the reported findings and fewer follow-up
confirmatory studies, overall increasing rate of false-positive findings
in the literature (Bosco, Aguinis, Field, Pierce, & Dalton, 2016; Hom Jr &
Van Nuland, 2019; John, Loewenstein, & Prelec, 2012; Shrout &
Rodgers, 2018). To fend against hindsight bias, researchers have rec­
ommended the endorsement of open-science best practices such as preregistration, Registered Reports, and openly sharing all predictions and
decisions throughout the entire research lifecycle (Nosek et al., 2018;
van't Veer & Giner-Sorolla, 2016).
Second, prospective hindsight bias may result in overestimating the
robustness and the generalizability of an initial finding, believing that
replications of a study would result in the same findings, and that rep­
lications are therefore of no value and a waste of resources. There are
currently immense pressures for novelty in science, discouraging re­
searchers from conducting replications (Nosek, Spies, & Motyl, 2012).
Then, even if researchers do conduct a replication study, the

5.1. Two pre-registered close replications
We chose Experiment 2 in Fischhoff (1975) as a target for replication
for three reasons. First, this article is one of the first rigorous demon­
strations of hindsight bias (Fischhoff, 2007; Hoch & Loewenstein, 1989).
At the time of writing the article had 3073 citations according to Google
Scholar. Second, the study was conducted in the 1970s and employed
simplified statistics and reporting. By revisiting these classic methods
and stimuli we aimed to refresh and update the methods and reporting
3

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

to meet current best practices in psychological science. To our knowl­
edge and based on our communication with the author, this study is the
first direct replication of the target experiment.
We chose Slovic and Fischhoff's (1977) Experiment 1 for replication
for three key reasons. First, this experiment investigates prospective
judgments, in which participants predict the probability of outcomes in
future trials. In such judgments, hindsight bias is thought to have
occurred if the forecast of the probability in future trials is affected by
the outcome knowledge of the initial trial. The article received much
attention, with 531 citations according to Google Scholar at the time of
writing. Examining prospective judgments is important because hind­
sight bias may lead to biases in generalized evaluations of research and
investigations based on initial, preliminary findings (Slovic & Fischhoff,
1977). By examining both retrospective judgments (in Study 1) and
prospective judgments (in Study 2), we aimed to provide a more com­
plete view of how outcome knowledge affects judgments and decision
making.
Second, although Davis and Fischhoff's (2014) conducted a replica­
tion of the target experiment, we thought it worthwhile to conduct a preregistered replication by an independent external research team of no
direct relationship with the original authors. As suggested by various
replication protocols (e.g., KNAW: Royal Dutch Academy of Arts and
Sciences, 2018; Simons et al., 2014), independent replications by re­
searchers from a different team can help reduce biases and increase
credibility. Our study also enforced a pre-registration which was not
included in Davis and Fischhoff (2014) and was conducted on a larger
sample (N = 608 versus N = 173 after filtering the responses from 95
participants who failed the attention checks). Pre-registration is
increasingly seen as important in limiting researchers' degrees of
freedom and protecting against hindsight fallacy, as it helps reduce the
possibility of consciously or unconsciously modifying beliefs about the
hypotheses and planned ways of handling the data collection and
analysis.
Overall, the two close replications answer calls for more preregistered direct replication studies and open-science transparent
reporting to increase the credibility and trustworthiness of published
findings (Gelman & Loken, 2013; Munafò et al., 2017; Nosek & Lakens,
2014). Such efforts are particularly important in light of recent findings
of lower-than-expected replicability rates of classic findings by mass preregistered replications (Camerer et al., 2018; Klein, Hardwicke, et al.,
2018; Open, 2015).
Both replication experiments were pre-registered on the Open Sci­
ence Framework prior to data collection (Study 1: https://osf.io/5bfjg;
Study 2: https://osf.io/75h98).

suggested that surprise and confidence may mediate and/or moderate
the relationship between hindsight (vs. foresight) and probability esti­
mates, yet past studies seldom explicitly and systematically tested these
mechanisms.
We therefore proposed extensions regarding the roles of surprise and
confidence. In Study 1, we tested the mediating and moderating roles of
surprise. In Study 2, we tested the mediating and moderating roles of
surprise, overconfidence, and task difficulty.
5.3. New study: Hindsight bias over replicability of hindsight bias
The purpose of the third study was to examine hindsight bias
regarding the perceived replicability of hindsight bias. In our other
replication work, we are often faced with reviewers who argued that our
replication findings were not surprising, regardless of whether they were
successful or not, and claiming that our replications added nothing new.
Study 3 aimed to show the importance and generalizability of hindsight
studies to directly address these issues by testing whether, ironically,
hindsight bias replications may themselves be subject to hindsight bias.
In this study, we asked participants to contemplate the study design
of Fischhoff's (1975) Experiment 2 and to then estimate the probabilities
of a successful replication and of a failed replication. If hindsight bias
holds, then participants who were informed of the outcome of the
replication study would estimate the probability to be higher than par­
ticipants who did not know the outcome and participants who were
informed of the opposite outcome.
This study was pre-registered on the Open Science Framework prior
to data collection (Study 3: https://osf.io/qyznw).
6. Study 1: Replicating Experiment 2 of Fischhoff (1975)
6.1. Target experiment and hypotheses
6.1.1. Replication: Retrospective hindsight bias
In Experiment 2 of Fischhoff (1975), 172 students from an intro­
ductory statistics class in an Israeli university participated in the study
(details available in the Supplementary Materials). Participants first
read a passage describing an event, and were then asked to estimate the
probabilities of four possible outcomes for the event. Participants were
randomly assigned to two types of conditions: those in the Before con­
dition did not have any outcome knowledge (i.e., they did not know
which of the four outcomes actually occurred), whereas those from the
After conditions were given the outcome knowledge but were asked to
estimate as if they had not known the outcome. Because for each event,
there were four possible outcomes, there were four After conditions,
with each condition stating that one of the presented outcomes had
actually occurred. Despite being asked to ignore their knowledge of the
outcome, participants in the After conditions estimated a higher prob­
ability for the outcome to which they were told has occurred, demon­
strating hindsight bias.
We made the following prediction for the replication study of
Experiment 2 of Fischhoff (1975):
H1: Probability estimates (hindsight bias). Compared with participants in
the Before condition, participants in the After conditions estimate a higher
probability of the outcome that they knew had occurred.

5.2. Extensions: Surprise and overconfidence
In addition, we added several extensions. Although the role of sur­
prise and (over)confidence in hindsight bias seem widely accepted, our
knowledge about their effects is in fact limited. First, the relationship
between receiving outcome knowledge and surprise about the outcome
needs further clarification. Some studies found that participants with
outcome knowledge were less surprised by the outcome compared to
those without outcome knowledge (e.g., Slovic & Fischhoff, 1977),
whereas other studies found surprise as a moderator of hindsight bias (e.
g., Ofir & Mazursky, 1997). Second, there are multiple ways of manip­
ulating and measuring surprise (e.g., high/low probability of the
outcome, warning/no warning about a stimulus, congruence/incon­
gruence with outcome expectation) (see Ash, 2009; Nestler & Egloff,
2009; Ofir & Mazursky, 1997; Pezzo, 2003; Slovic & Fischhoff, 1977),
yet these are often disjointed. For example, Pezzo (2003) manipulated
surprise by outcome feedback that was either congruent or incongruent
with participants' expectation, yet found that “regardless of whether
outcomes were generally congruent or incongruent, people who found
them to be still surprising after 5 minutes of thought showed less
hindsight bias” (p. 430). Third, theoretical arguments in past research

6.1.2. Extension: Surprise
We proposed extension hypotheses regarding the processes leading
to hindsight bias. Feelings of surprise signal the difficulty of generating
alternatives to the outcome, increase the need to scrutinize the cognitive
process, and deepen the extent of sense making after receiving outcome
knowledge (Bernstein et al., 2016; Pezzo, 2003; Sanna & Schwarz,
2006). Our literature review suggested that surprise could play one or
both of two roles in hindsight bias. The first role is an indicator or an
accompanying outcome of hindsight bias. An implicit and untested
inference of this line of reasoning is that surprise is an intermediate
4

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

outcome in the cognitive processes leading to hindsight bias. For
example, Slovic and Fischhoff (1977) suggested that hindsight bias
occurred when outcome knowledge led individiauls to feel less surprised
and biased their probability estimates toward the known outcome. The
second role is a required condition that shapes the magnitude of hind­
sight bias, or a moderator of hindsight bias. For example, Sanna and
Schwarz (2006) argued that hindsight bias occurs when individuals feel
the outcome is unsurprising, and it could reverse when individuals feel
the outcome is surprising (i.e., the “I never would have known it” effect
or the “backfire effect”; Hawkins & Hastie, 1990; Hoch & Loewenstein,
1989). Some models considered both roles of surprise simultaneously.
For example, Pezzo's (2003) sense-making model suggested that a sur­
prising outcome is required to trigger sense-making activities (surprise
as a moderator); while the person might experience some initial surprise
(surprise as a mediator), successful sense-making activities lead to
hindsight bias and reduce end-state surprise feelings (surprise as an
accompanying outcome).
We therefore tested three effects of surprise: as an outcome of
experimental condition, as a mediator of the effect of experimental
condition on probability estimates, and as a moderator of the effect of
experimental condition on probability estimates.2 In order to test these
effects, we asked participants to report their feelings of surprise about
the outcome. We proposed that:
H2: Surprise ratings (extension).
H2a: Compared with participants in the Before condition, participants in
the After conditions report lower levels of surprise regarding the outcome for
which they knew had occurred.
H2b: Surprise mediates the relationship between outcome knowledge and
probability estimates. (exploratory).
H2c: Surprise moderates the relationship between outcome knowledge
and probability estimates, such that hindsight bias is stronger in the lowsurprise group than in the high-surprise group. (exploratory).

author of the target experiment (see Supplementary Materials). There
were four events: Event A, the British-Gurka struggle; Event B, the nearriot in Atlanta; Events C: Mrs. Dewar in therapy; and Event D: George in
therapy. We note that in consultation with the original author and the
editor we removed the descriptions of the stimuli of Events C and D, and
related findings. We jointly strongly believe that these stimuli should no
longer be used in future research.
Events A and B were each described in a passage ranged from 185 to
235 words in length, followed by four possible outcomes. For example,
Event A described a war between the British and the Gurkas in South
Asia in 1814. The four possible outcomes were: (1) British resulted in
victory; (2) Gurka resulted in victory; (3) The two sides reached a mil­
itary stalemate, but were unable to come to a peace settlement; (4) The
two sides reached a military stalemate and came to a peace settlement.
This study used a between-subject design. Participants were randomly
assigned to one of five experimental conditions: one Before condition and
four After conditions (each associated with one informed outcome). Each
participant was presented with one of the two events used in the target
experiment. That is, participants were exposed to one of the 5 (condition)
x 2 (event) possibilities. Participants in the Before condition read the
assigned passage alone, whereas participants in the After conditions read
the assigned passage followed by a sentence which provided the outcome
knowledge (e.g., Outcome: British resulted in victory).
Participants were then asked a comprehension question, “To make
sure you read and understood the scenario, please answer the following
comprehension question: What was the outcome of the event?”. In order to
proceed to the next stage of the experiment, participants in the Before
condition had to choose “The case did not indicate the outcome,”
whereas participants in the After conditions had to choose the informed
outcome.
6.2.3.1. Probability estimates. Participants were asked to provide prob­
ability estimates for each of the four possible outcomes of the event. For
the Before condition, the question read, “In light of the information
appearing in the passage, please estimate the probability of occurrence
of each of the four possible outcomes listed below. There are no right or
wrong answers, answer based on your intuition. (The probabilities
should sum to 100%)”. For the After conditions, in addition to the sen­
tences above, participants also read “Answer as if you do not know the
outcome, estimating the case at that time before outcomes were known.”

6.2. Method
6.2.1. Power analysis
The planned sample size for the replication study was calculated
based on an effect size of d = 1.13, 95% CI [0.44, 1.82] for a single
before-after contrast, estimated from the target experiment (see Sup­
plementary Materials for details). We conducted a power analysis using
G-Power (Faul, Erdfelder, Buchner, & Lang, 2009). In order to achieve a
statistical power of 95% with an alpha of 0.05 (two-tailed), a sample size
of 46 per comparison would be required. Because the study adopted a
between-subject design (4 events with 4 possible outcomes each), we
approximated a total sample size of 46 * 4 * 4 = 736. In consultation
with the original author and the editor, we removed the stimuli and
results relating to Events C and D. We therefore updated this analysis
posthoc to indicate a total required sample size of 368.

6.2.3.2. Surprise ratings. Following the probability estimates, partici­
pants were asked to rate their levels of surprise (i.e., “How surprised
would you be if the outcome was that the (outcome)?”) on a 7-point
Likert scale (1 = Not surprised at all, 7 = Very surprised). Participants
in the Before condition were asked to rate their surprise levels regarding
all four possible outcomes; participants in After conditions were only
asked to rate their surprise levels regarding the informed outcome.
6.2.4. Replication evaluation: Very close replication
Our replication study is a very close replication based on the criteria
proposed in LeBel, Berger, Campbell, and Loving (2017) and LeBel,
McCarthy, Earp, Elson, and Vanpaemel (2018). According to LeBel and
colleagues' taxonomy, a very close replication shares the same inde­
pendent variable (IV) operationalization, dependent variable (DV)
operationalization, IV stimuli, and DV stimuli with the original study;
only the procedural details, physical setting, and contextual variables (e.
g., linguistic or cultural adaptations) differ from the original study.
Similarly, Brandt et al. (2014, p. 218) wrote that “close replications refer
to those replications that are based on methods and procedures as close
as possible to the original study … ideally the only differences between
the two are the inevitable ones (e.g., different participants…).” In Study
1, the IV operationalization, DV operationalization, IV stimuli, and DV
stimuli were all the same as those used in the original study, with a few
necessary adjustments to improve on the design or to accommodate

6.2.2. Participants
A total of 442 American participants were recruited from Amazon
Mechanical Turk online through CloudResearch (Litman, Robinson, &
Abberbock, 2017) (245 females, 196 males, 1 undisclosed, Mage = 39.78,
SDage = 11.46, see Supplementary Materials for details about sample
characteristics; descriptives in this section were updated to reflect the
exclusion of data collection for Events C and D, explained below).
6.2.3. Procedure and materials
The materials used in this replication study were obtained from the
2

A variable can be both a mediator and a moderator of a relationship (James
& Brett, 1984; Judd, Kenny & McClelland, 2001; Karazsia & Berlin, 2018). Such
relationships have been tested in previous studies (e.g., Connor-Smith &
Compas, 2002; Wei, Mallinckrodt, Russell & Abraham, 2004; Zhou, Wang, Chen
& Shi, 2012)
5

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

contextual requirements. See Table 1 for a summary of classification,
necessary adjustments, and theoretical extensions.

Table 1
Study 1: Classification of the replication, based on LeBel et al. (2018).
Design facet

Replication Details of deviation

IV operationalization
DV operationalization
IV stimuli

Same
Same
Same

DV stimuli

Same

Procedural details

Similar

Physical settings

Different

Contextual variables
Similar
Replication classification Very close
replication

6.3. Results
6.3.1. Replication: Probability estimates
We summarized the descriptives of the probability estimates in
Table 2. Violin plots of the probability estimates are available in Sup­
plementary Materials. The numbers of interest are the probability esti­
mate of an outcome in the Before condition, and probability estimate of
that same outcome in the After condition in which this outcome was
informed to have occurred (numbers marked in bold).
Because there are two events with four outcomes each, we conducted
8 sets of Mann-Whitney U tests. As shown in Table 3, in 7 of the 8 sets of
comparison (except Event A-Outcome 2), the mean probability estimates
in the After condition were higher than those in the Before condition.
The results remained largely the same when we adjusted the p values
using the Benjamini and Hochberg (1995) false discovery rate control
method.
Historically, the correct outcomes of Events A and B were Outcome 1,
yet the mean probability estimates of these two outcomes in the Before
condition were not higher than chance (21.40% and 7.46%, respec­
tively). Specifically, the probability estimate for Outcome 1 (British

• Changed the word “Negro” into “African
American” in the passage of Event A
• Added surprise measure after the
replication.
• Used a larger sample size: Original study:
172; Replication study: 890
• Added one comprehension question for
each scenario.
• Added funnel questions at the end of the
study.
• Changed from offline data collection
(participants were students from Hebrew
University and the University of the
Negev) to online data collection
(participants were recruited from
CloudResearch).

Note. IV = Independent variable, DV = dependent variable.

Table 2
Study 1: Means and standard deviations of probability estimates.
Experimental Condition

Sample Size

Outcome Informed

Outcome Evaluated
Outcome 1

Event A: British-Gurka struggle
Before
43
After
45
42
44
43
Event B: Near riot in Atlanta
Before
After

46
46
44
44
45

Outcome 2

Outcome 3

Outcome 4

Mean

SD

Mean

SD

Mean

SD

Mean

SD

None
Outcome 1
Outcome 2
Outcome 3
Outcome 4

21.40
45.51
26.05
21.93
25.49

18.17
28.59
20.35
17.13
17.84

38.61
21.18
43.62
23.18
28.40

26.60
19.45
23.62
16.14
22.72

23.49
19.69
18.48
31.59
18.72

19.93
16.25
18.66
19.61
15.97

16.51
13.62
11.86
23.30
27.40

15.53
11.46
9.52
14.10
23.98

None
Outcome 1
Outcome 2
Outcome 3
Outcome 4

7.46
25.44
11.61
15.23
9.87

9.25
23.11
12.50
13.64
12.18

25.91
22.63
50.02
17.50
12.98

23.88
17.58
29.13
12.60
12.24

12.91
22.28
9.52
29.77
11.20

18.43
21.88
10.34
28.53
16.82

53.72
29.65
28.84
37.50
65.96

26.66
18.76
22.18
24.53
27.76

Note: The bolded numbers indicate the key sets of comparison of interest (i.e., the Before and After probability estimates of the same outcome). The foresight ratings of
all four outcomes came from the same participants in the foresight condition. The hindsight ratings of the four outcomes came from participants in the four hindsight
conditions, respectively. Following a discussion with lead original author and editor Events C and D about therapy have been removed from reporting due to prob­
lematic stimuli in the target article.

Table 3
Study 1: Mann-Whitney U tests of probability estimates difference between before and after conditions.
After - Before
Event A Outcome 1
Event A Outcome 2
Event A Outcome 3
Event A Outcome 4
Event B Outcome 1
Event B Outcome 2
Event B Outcome 3
Event B Outcome 4

Mean Difference (Rank)
23.0
5.8
11.5
14.0
26.7
24.6
20.9
11.3

95% CI for ϕ

95% CI for d

U

z

p

padjusted

r

ϕ

LL

UL

d

LL

UL

462
780
695
624.5
444
459.5
543
778.5

4.24
1.09
2.15
2.62
4.87
4.48
3.82
2.05

<0.001
0.277
0.032
0.009
<0.001
<0.001
<0.001
0.041

<0.001
0.277
0.043
0.014
<0.001
<0.001
<0.001
0.047

0.45
0.12
0.23
0.28
0.51
0.47
0.40
0.21

0.76
0.57
0.63
0.66
0.79
0.77
0.73
0.62

0.65
0.45
0.51
0.54
0.68
0.66
0.62
0.50

0.84
0.68
0.74
0.76
0.87
0.85
0.82
0.73

1.00
0.20
0.41
0.54
1.02
0.91
0.71
0.45

0.53
− 0.23
− 0.02
0.10
0.56
0.45
0.26
0.03

1.46
0.63
0.84
0.97
1.48
1.36
1.14
0.87

Note. We calculated three effect sizes of the Mann-Whitney U tests, which are r (the correlation between being in the hindsight condition and winning in the rank
comparison with the other condition, see Fritz, Morris, & Richler, 2012), ϕ (the probability that a score in the hindsight condition was higher than that in the foresight
condition, see Fay & Malinovsky, 2018), and Cohen's d (the standard difference in the mean ranking between the hindsight condition and the foresight condition,
assuming that the rankings follow a normal distribution, see Cohen, 1988). p values were adjusted using the Benjamini and Hochberg (1995) false discovery rate
control method. Following a discussion with lead original author and editor Events C and D about therapy have been removed from reporting due to problematic
stimuli in the target article.

6

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

Table 4
Study 1 Extension: Means and standard deviations of surprise ratings.
Experimental Condition

Outcome Evaluated
Outcome 1
n

Event A: British-Gurka struggle
Before
43
After
45
Event B: Near-riot in Atlanta
Before
46
After
46

Outcome 2

Outcome 3

Outcome 4

Mean

SD

n

Mean

SD

n

Mean

SD

n

Mean

SD

4.35
3.20

2.14
2.00

43
42

3.95
4.10

2.16
1.88

43
44

3.42
3.41

1.76
1.76

43
43

4.53
4.60

1.84
1.55

5.89
5.17

1.55
1.70

46
44

2.78
2.91

1.55
1.65

46
44

5.46
5.36

1.57
1.94

46
45

1.96
1.91

1.38
1.44

Note. The foresight ratings of all four outcomes came from the same participants in the foresight condition. The hindsight ratings of the four outcomes came from
participants in the four hindsight conditions, respectively. Hindsight participants only rated their surprise over the outcome which they knew had occurred. Following
a discussion with lead original author and editor Events C and D about therapy have been removed from reporting due to problematic stimuli in the target article.

resulted in victory) in Event A (Before condition) was not significantly
different from chance (one-sample t-test: t = − 1.30, df = 42, p = .200, d
= − 0.20). The probability estimate for Outcome 1 (dispersion and no
outbreak of violence) in Event B (Before condition) was the lowest
among those for all four outcomes, and it was significantly smaller than
chance (one-sample t-test: t = − 12.87, df = 45, p = .000, d = − 1.90).
These suggest that the participants did not have much knowledge about
the historical background of these two events, relieving the concern that
prior knowledge gained before participating in this study impacted
participants' reactions to these two experimental stimuli. Importantly, as
Event B is the only event that is linked to the American history, the
findings address the concern that using an American sample (versus the
Israeli sample used in the original study) reduced the task difficulty of
this question or impacted the magnitude of hindsight bias.
Because Mann-Whitney U tests are nonparametric, we calculated
three effect sizes: (1) r, the correlation between experimental group
membership and whether the rank is higher or lower than the other
group (see Fritz et al., 2012), (2) ϕ, the probabilistic index reflecting the
likelihood that the score in one group is smaller than or equal to that of
the other group, estimated using the receiver operating characteristic
curve under the proportional odds assumption (see Fay & Malinovsky,
2018), and (3) Cohen's d, the standard difference between the mean
rankings of the two groups, assuming that the rankings in the two groups
follow a normal distribution (Cohen, 1988).
As shown in Table 3, the correlations rs between being in the hind­
sight condition and winning in the rank comparison with the other
condition were all positive. The sizes of correlations were mostly me­
dium to large (Cohen, 1988). The effect sizes ϕs, reflecting the proba­
bility that a score in the hindsight condition was higher than that in the
foresight condition, did not include 0.50 in all but one set of comparison
(i.e., Event A-Outcome 2). However, when we calculated the Cohen's ds
under the assumption of a normal distribution of the rankings, two
comparisons had confidence intervals that overlapped with the null (i.e.,
Event A-Outcome 2, Event A-Outcome 3). The Cohen's d effects were
mostly medium to large.

6.3.2. Robustness checks: Alternative tests and exclusion criteria
To examine the robustness of the findings, we conducted additional
analyses on the probability estimates (see Supplementary Materials).
Results of Student's independent samples t-tests of probability estimates
were largely consistent with the results of the Mann-Whitney U tests.
When we analyzed the data with only participants who met a set of preregistered criteria (i.e., understood the English used in the study, was
serious in the study, and did not correctly guess the purpose of the
study), the results regarding the probability estimates remained mostly
the same. We concluded robust support for Hypothesis 1.
6.3.3. Extension: Surprise ratings
We detailed the descriptives of the surprise ratings in Table 4. Violin
plots of the surprise ratings are available in Supplementary Materials.
Similar to previous analyses with probability estimates, we con­
ducted 8 sets of Mann-Whitney U tests to compare the differences in
surprise ratings between the Before condition and the After conditions.
As shown in Table 5, a total of two sets of comparisons were significant,
based on p value and the confidence interval of ϕ. Specifically, for Event
A Outcome 1 and Event B Outcome 1, surprise ratings in the After
condition were significantly lower than those in the Before condition,
and the effect sizes were small to medium. The results of the other three
sets of comparison (Event C-Outcome 2, Event C-Outcome 4, Event DOutcome 2) were in the opposite direction of our prediction, with the
surprise ratings in the After condition being higher than those in the
Before condition (small to medium effect sizes). When we adjusted the p
values using the Benjamini and Hochberg (1995) false discovery rate
control method, none of the Mann-Whitney U tests remained significant.
Results of Student's independent samples t-tests of surprise ratings (see
Supplementary Materials) were largely consistent with the results of the
Mann-Whitney U tests. Overall, the results provided little to no support
for Hypothesis 2(a) regarding surprise ratings.
We found no support for exploratory Hypotheses 2 that surprise
acted as a mediator of the relationship between outcome knowledge and
probability estimates. We found mixed support for exploratory

Table 5
Study 1: Extension: Mann-Whitney U tests of differences in surprise between Before and After conditions.
After - Before

Mean Difference (Rank)

Event A Outcome 1
Event A Outcome 2
Event A Outcome 3
Event A Outcome 4
Event B Outcome 1
Event B Outcome 2
Event B Outcome 3
Event B Outcome 4

− 13.76
1.67
− 0.69
1.86
− 13.80
1.40
1.91
− 1.03

95% CI for ϕ

95% CI for d

U

z

p

padjusted

r

ϕ

Lower

Upper

d

Lower

Upper

665
867.5
931
884.5
740.5
980.5
969
1011.5

− 2.56
0.32
− 0.13
0.35
− 2.57
0.26
0.36
− 0.21

0.011
0.752
0.897
0.725
0.010
0.795
0.719
0.833

0.044
0.897
0.897
0.897
0.044
0.897
0.897
0.897

− 0.27
0.03
− 0.01
0.04
− 0.27
0.03
0.04
− 0.02

0.34
0.52
0.49
0.52
0.35
0.52
0.52
0.49

0.24
0.40
0.38
0.40
0.25
0.40
0.41
0.39

0.46
0.64
0.61
0.64
0.46
0.63
0.63
0.59

− 0.56
0.07
− 0.01
0.04
− 0.44
0.08
− 0.05
− 0.03

− 0.99
− 0.36
− 0.43
− 0.38
− 0.86
− 0.34
− 0.47
− 0.44

− 0.12
0.50
0.42
0.46
− 0.02
0.49
0.36
0.38

Note. p values were adjusted using the Benjamini and Hochberg (1995) false discovery rate control method. Following a discussion with lead original author and editor
Events C and D about therapy have been removed from reporting due to problematic stimuli in the target article.
7

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

events were included in the analysis. We provided all related details and
analyses in the Supplementary Materials.

Table 6
Study 1: Comparison of results of the original study and the replication study.
Cohen's d [95% CI] p-value Note
Fischhoff (1975)
Replication
Event A Outcome 1
Event A Outcome 2
Event A Outcome 3
Event A Outcome 4
Event B Outcome 1
Event B Outcome 2
Event B Outcome 3
Event B Outcome 4

1.13 [0.44, 1.82]

<0.001

1.00 [0.53, 1.46]
0.20 [− 0.23, 0.63]
0.41 [− 0.02, 0.84]
0.54 [0.10, 0.97]
1.02 [0.56, 1.48]
0.91 [0.45, 1.36]
0.71 [0.26, 1.14]
0.45 [0.03, 0.87]

<0.001
0.277
0.032
0.009
<0.001
<0.001
<0.001
0.041

6.4. Discussion
Signal – consistent
No signal – inconsistent, smaller
No signal – inconsistent, smaller
Signal – inconsistent, smaller
Signal – consistent
Signal – consistent
Signal – consistent
Signal – inconsistent, smaller

We aimed to replicate Fischhoff (1975)’s Experiment 2, a classic
study of hindsight bias. Following the original study, we hypothesized
that participants provided with outcome knowledge would estimate a
greater probability for the outcome which they knew had occurred,
compared to participants without outcome knowledge. This hypothesis
was supported in 7 of the 8 sets of comparison of probability estimates,
and the effect sizes were mostly medium to large. Once participants
were informed of the outcome, they perceived the outcome to be more
probable, even if they were asked to ignore the outcome, demonstrating
hindsight bias. These findings therefore support the idea that partici­
pants were either unaware of or unable to resist the influence of outcome
knowledge.

Note: Following a discussion with lead original author and editor Events C and D
about therapy have been removed from reporting due to problematic stimuli in
the target article.
According to LeBel et al. (2019), there is a signal if the confidence interval of the
replication effect size excludes zero, and the replication result is considered
consistent with the original study if the confidence interval of the replication
effect size includes the effect size of the original study.

6.4.1. Evaluation of replication findings: Mostly successful replication
In Table 6 we compared the results of the target experiment and the
replication study using the criteria described in LeBel, Vanpaemel,
Cheung, and Campbell (2019). All the 8 sets of comparison of proba­
bility estimates were in the same direction as in the original study. The
replication effects were medium to large, though slightly smaller than
those found in the original study. In 4 of the 8 sets of probability esti­
mates comparisons, the confidence intervals of the effect sizes (Cohen's
ds) of the replication study included d = 1.13, which is the effect size
estimated from the target experiment. In Fig. 1 we provided a forest plot

Hypothesis 2c that surprise acted as a moderator, such that the rela­
tionship between outcome knowledge and probability estimates was
stronger when surprise was lower rather than higher. However, in our
original analysis when all four events were included, we did not find
support for the moderating effect of surprise. While we have decided to
remove results related to Events C and D, which is a deliberate deviation
from the preregistration, we caution our readers about the conflicting
findings of the moderating effect of surprise in Study 1 when different

Fig. 1. Study 1: forest plot for probability estimates.
8

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

of the probability estimates contrasts. Overall, we conclude this repli­
cation of hindsight bias as successful.

7.2. Extension: Surprise, confidence, and task difficulty
For the extension hypotheses, we first examined the effects of sur­
prise and confidence. By surprise, we refer to individuals' feelings of
surprise if a particular outcome would occur in future trials (Slovic &
Fischhoff, 1977). By confidence, we refer to individuals' feelings of
confidence about the accuracy of their own judgments (Granhag,
Strömwall, & Allwood, 2000). We chose to study these two factors
because these have been suggested as mechanisms that affect hindsight
bias: beliefs about events' objective likelihoods, and beliefs about one's
own prediction ability subjectively (Roese & Vohs, 2012).
As in Study 1, we hypothesized that surprise ratings are lower among
participants in the hindsight condition than those in the foresight con­
dition. We also tested the hypothesis that surprise mediates or moder­
ates the relationship between hindsight condition and probability
estimates as in Study 1.
H4: Surprise ratings (extension).
(H4a) Participants in the hindsight conditions report lower levels of
surprise regarding the outcome for which they knew had initially occurred
compared with participants in the foresight condition.
(H4b) Surprise mediates the relationship between the hindsight condition
and probability estimates. (exploratory)
(H4c) Surprise moderates the relationship between hindsight condition
and probability estimates, such that hindsight bias is stronger in the lowsurprise group than in the high-surprise group. (exploratory)
Like surprise, past research has also theorized and examined multiple
roles that confidence can play in hindsight bias. For example, over­
confidence is often proposed as a consequence of outcome knowledge
(Davis & Fischhoff, 2014; Slovic, Lichtenstein, & Fischhoff, 1988). Other
studies examined the moderating role of confidence in hindsight bias.
For example, Arkes, Wortmann, Saville, and Harkness (1981) found that
a procedure to reduce overconfidence by asking for reasons for each
possible outcome reduced hindsight bias. Also, Werth and Strack (2003)
found that the magnitude of hindsight bias was contingent on the feeling
of confidence, which served as a signal of whether the individual would
have known the answer or not. They found that participants who
experienced higher confidence showed greater hindsight bias than
participants who experienced lower confidence.
Therefore, we hypothesized that participants in the hindsight con­
dition will report greater confidence about the accuracy of their esti­
mation than participants in the foresight condition. Furthermore, like
surprise, we examined whether confidence mediates or moderates the
relationship between hindsight condition and probability estimates.
H5: Confidence ratings (extension).
(H5a) In prospective judgments, compared with participants in the fore­
sight condition, participants in the hindsight conditions report higher levels of
confidence about the accuracy of their judgments.
(H5b) Confidence mediates the relationship between hindsight condition
and probability estimates. (exploratory)
(H5c) Confidence moderates the relationship between hindsight condition
and probability estimates, such that hindsight bias is stronger in the highconfidence group than in the low-confidence group. (exploratory)
To examine the effect of the characteristics of the task, we also
measured the extent to which participants perceived the task to be
difficult. We expected that participants in the hindsight condition will
report lower levels of task difficulty than participants in the foresight
condition. This is because the foresight condition could dilute partici­
pants' attention by asking them to consider two outcomes simulta­
neously, whereas the hindsight condition could cue participants to
ignore the outcome that did not occur in the initial trial (Slovic &
Fischhoff, 1977). Lower levels of perceived task difficulty, in turn, may
contribute to hindsight bias, as the subjective difficulty to generate
alternative outcomes can be taken as an indication that those outcomes
are implausible (Harley et al., 2004; Roese & Vohs, 2012; Sanna &
Schwarz, 2006). We therefore tested the following:
H6: Task difficulty (exploratory extension).

6.4.2. Extension: Surprise ratings
Beyond the replication, we extended the experiment by investigating
an intuitive yet understudied dependent variable, the level of surprise
associated with the known outcome. Judging from null hypothesis sig­
nificance testing (NHST), effect sizes, and confidence intervals, 2 of the 8
sets of surprise ratings comparisons were significant in the predicted
direction.
Contrary to our expectations, we found no support for surprise as a
mediator in the relationship between outcome knowledge and proba­
bility estimates. Additional analyses showed that surprise ratings and
probability estimates were indeed negatively correlated, both in the
Before condition and in the After conditions (see Supplementary Mate­
rials). These results suggest that the negative correlation between sur­
prise ratings and probability estimates may be caused by factors other
than hindsight bias. Also, we found inconclusive findings for the
exploratory hypothesis that surprise acted as a moderator of the rela­
tionship between outcome knowledge and probability estimates.
7. Study 2: Replicating experiment 1 of Slovic and Fischhoff
(1977)
7.1. Target experiment and hypotheses
7.1.1. Replication: Prospective hindsight bias
In Experiment 1 of Slovic and Fischhoff (1977), 184 American par­
ticipants were recruited via university newspaper. All participants read
four vignettes about scientific research. For each vignette, participants
in the foresight condition read that two outcomes were possible in the first
trial, whereas participants in the hindsight condition read that the first
trial had been conducted and one of the two outcomes had occurred.
They were then asked why they thought the outcome(s) might occur,
and then predicted the probability that the previously observed outcome
would repeat in future research trials. The results suggested a sense of
inevitability of the disclosed outcome among hindsight participants:
their predicted probabilities of the previously observed outcome to
repeat were higher than those of participants in the foresight condition
(d = 0.36). Davis and Fischhoff (2014) replicated this experiment, which
produced similar effects (overall effect: 0.27–0.33, d = 0.20 to 0.44) that
the disclosed outcome of the initial trial was perceived to be more likely
to occur in future trials in hindsight than in foresight.
We extended the original design and tested exploratory analyses
regarding the mechanisms underlying hindsight bias, using a different
set of materials and decisions (i.e., prospective judgments). In addition
to surprise, we asked participants to report their levels of confidence
about the accuracy of their own judgments. To better understand if the
nature of the task would have an impact on hindsight bias, we also
measured participants' overall levels of perceived difficulty of the pre­
diction task.
We followed Experiment 1 in Slovic and Fischhoff (1977) to predict
that hindsight bias would be observed in prospective judgments. In­
dividuals often use past information to form judgments about the future
(Aarts, Verplanken, & Van Knippenberg, 1998; Ouellette & Wood,
1998). If individuals' beliefs about past events changed due to outcome
knowledge, then those changed beliefs may trigger hindsight bias when
people use them to make prospective judgments. In addition, knowing
the outcome of the initial trial may increase the perceived inevitability
of the outcome, which will increase the expectation that the outcome
will repeatedly occur in the future. Therefore, we predicted:
H3: Participants in the hindsight condition estimate a greater probability
that the outcome will continue to occur in future trials, compared with par­
ticipants in the foresight condition.

9

J. Chen et al.

Table 7
Study 2: Questions asked in the virgin rat scenario.
Foresight condition

Hindsight outcome A condition

10

Note. Questions italicized in the table are the extension questions; they were not italicized in the Qualtrics survey.

Outcome: The initial virgin rat did NOT exhibit maternal behavior in the
first trial.
1. What is the probability that in a replication of this experiment with 10
additional virgin female rats (these probabilities should total 100%)
a. All will exhibit maternal behavior?: _______
b. Some will exhibit maternal behavior?: _______
c. None will exhibit maternal behavior?: _______
Total: ________
2. Do you think the finding that the virgin rat did not exhibit maternal behavior
is surprising? 1 = Not surprising at all … 5 = Extremely surprising
3. How confident are you about the accuracy of your predictions on the
probability of the future outcomes of the Virgin Rat experiment? 0 = Extremely
not confident … 6 = Extremely confident

Journal of Experimental Social Psychology 96 (2021) 104154

1. Try and estimate, what are the probabilities of the following outcomes (these
Outcome: The initial virgin rat exhibited maternal behavior in the first
probabilities should total 100%)
trial.
Virgin rat will exhibit maternal behavior: _______
1. What is the probability that in a replication of this experiment with 10
Virgin rat will NOT exhibit maternal behavior: _______
additional virgin female rats (these probabilities should total 100%)
Total: ________
a. All will exhibit maternal behavior?: _______
2. If the virgin rat does exhibit maternal behavior, what is the probability that in
b. Some will exhibit maternal behavior?: _______
a replication of this experiment with 10 additional virgin female rats (these
c. None will exhibit maternal behavior?: _______
probabilities should total 100%)
Total: ________
a. All will exhibit maternal behavior?: _______
2. Do you think the finding that the virgin rat exhibited maternal behavior is
b. Some will exhibit maternal behavior?: _______
surprising? 1 = Not surprising at all … 5 = Extremely surprising
c. None will exhibit maternal behavior?: _______
3. How confident are you about the accuracy of your predictions on the
Total: ________
probability of the future outcomes of the Virgin Rat experiment? 0 = Extremely
3. If the virgin rat does exhibit maternal behavior, how surprised would you be? 1 =
not confident … 6 = Extremely confident
Not surprised at all … 5 = Extremely surprised
4. If the virgin rat does NOT exhibit maternal behavior, what is the probability
that in a replication of this experiment with 10 additional virgin female rats
(these probabilities should total 100%)
a. All will exhibit maternal behavior?: _______
b. Some will exhibit maternal behavior?: _______
c. None will exhibit maternal behavior?: _______
Total: ________
5. If the virgin rat does NOT exhibit maternal behavior, how surprised would you be?
1 = Not surprised at all … 5 = Extremely surprised
6. How confident are you about the accuracy of your predictions on the probability of
the future outcomes of the Virgin Rat experiment? 0 = Extremely not confident … 6 =
Extremely confident
For all three conditions, after reading all four scenarios
How difficult was it to make estimations of outcomes probabilities? 1 = Extremely easy … 7 = Extremely difficult

Hindsight outcome B condition

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

Table 8
Study 2: Classification of the Replication, based on LeBel et al. (2018)
Design facet

Replication

IV operationalization
DV operationalization
IV stimuli

Same
Same
Same

DV stimuli

Similar

Procedural details

Similar

Physical settings

Different

Contextual variables
Replication
classification

Different
Very close
replication

Details of deviation

• Changed outcome B in the Y-Test scenario from “Places in Area B" to “Places in Area C,” so that outcome A and outcome B were
symmetric.
• Removed reasons for why the outcome had occurred.
• Added surprise, confidence, and task difficulty measures.
• Used a larger sample size: Original study: 184 (sample size per group varied from 24 to 37); Replication study: 604 (197 hindsight, 204
foresight outcome A, 203 foresight outcome B)
• Added one comprehension question for each scenario.
• Added funnel questions at the end of the study.
• Changed from offline data collection (participants were recruited via a student newspaper at the University of Oregon) to online data
collection (participants were recruited from CloudResearch).

Note. IV = Independent variable, DV = dependent variable.

(H6a) In prospective judgments, compared with participants in the fore­
sight condition, participants in the hindsight condition report lower levels of
task difficulty.
(H6b) Task difficulty mediates the relationship between hindsight con­
dition and probability estimates.
(H6c) Task difficulty moderates the relationship between hindsight con­
dition and probability estimates, such that hindsight bias is stronger among
those who perceive the task to be easy than among those who perceive the task
to be difficult.

Materials for full materials). We use the virgin rat scenario to illustrate
the materials and the question format:
Virgin Rat.
Several researchers intend to perform the following experiment:
They will inject blood from a mother rat into a virgin rat immediately
after the mother rat has given birth. After the injection, the virgin rat
will be placed in a cage with the newly born baby rats, after removal of
the actual mother.
The possible outcomes were:
(a) the virgin rat exhibited maternal behavior or.
(b) the virgin rat failed to exhibit maternal behavior.
Following each scenario, participants were required to correctly
answer comprehension questions before proceeding to the next stage of
the study. For the virgin rat scenario, the comprehension question was,
“Which rat will be placed in a cage with the newly born baby?” The correct
answer was “Virgin rat with mother rat blood injection.”
Then, participants were asked questions measuring probability es­
timates (of the initial trial for foresight condition, and of the future trials
for both foresight and hindsight conditions), followed by our extension
questions measuring surprise and confidence. We present the questions
for the virgin rat scenario in Table 7.

7.3. Method
7.3.1. Power analysis
The planned sample size for the replication study was estimated from
the target experiment (see Supplementary Materials for details). We
estimated the effect sizes based on p values, because they were the only
statistics available from the target experiment. The p values of pairwise
comparisons ranged from 0.001, 0.01, to 0.05. We chose p = .05, which
lead to d = 0.36, 95% CI [0.00, 0.72]. We conducted a power analysis
using G-Power (Faul et al., 2009). In order to achieve a statistical power
of 95% with alpha of 0.05 (one-tailed), a sample size of at least 168
people would be required for each condition, totaling a sample size of
504 for three conditions: foresight, hindsight outcome A, hindsight
outcome B. In anticipation of unexpected situations such as careless
responses and to make sure that our study would be over-powered, we
planned to recruit about ten more participants per comparison.

7.3.3.1. Probability estimates of future trials. Participants were asked to
estimate the probability that the outcome would occur in “all,” “some,”
and “none” (or “A,” “B,” and “C” for the Y-test scenario) of future trials.
The percentages of the three items (“all,” “some,” and “none”) needed to
add up to 100%. Participants in foresight condition were asked to rate
the probabilities of two possible outcomes; participants in hindsight
conditions were only asked to rate the outcome which they knew had
occurred in the initial trial.

7.3.2. Participants
A total of 604 American participants were recruited online through
CloudResearch (300 females, 302 males, 2 undisclosed, Mage = 38.5,
SDage = 12.00, see Supplementary Materials for details about sample
characteristics). We did not allow participants who took part in Study 1
to take part in Study 2.

7.3.3.2. Extension: Surprise ratings. Following the probability estimates,
participants were asked to rate their levels of surprise regarding the
outcome(s) (i.e., “Do you think the (outcome) is surprising?”) on a 5point Likert scale (1 = not surprising at all, 5 = extremely surprising).
Participants in the foresight condition were asked to rate the levels of
surprise regarding two possible outcomes; participants in the hindsight
conditions were only asked to rate the outcome which they were knew
had occurred in the initial trial.

7.3.3. Procedure and materials
The study used a between-subject design. Participants were
randomly assigned to one of three conditions. In the foresight condition,
participants were not presented with any outcomes of an initial trial. In
the hindsight conditions, because there were two possible outcomes for
each scientific trial scenario, half of the participants read that outcome A
had occurred in the initial trial (hindsight outcome A condition), and the
other half read that outcome B had occurred in the initial trial (hindsight
outcome B condition). All participants read all four scenarios: virgin rat,
hurricane seeding, gosling imprinting, and Y test, shown in a random
order.
The descriptions of the four scenarios were adapted from Slovic and
Fischhoff's (1977) Experiment 1 on hindsight bias (see Supplementary

7.3.3.3. Confidence ratings. For each scenario, participants were asked
to rate their confidence (i.e., “How confident are you about the accuracy
of your predictions on the probability of the future outcomes of the
(scenario)?”) on a 7-point Likert scale (0 = extremely not confident, 6 =
extremely confident).
7.3.3.4. Task difficulty. After reading all four scenarios, participants
11

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

Table 9
Study 2: Mean Probabilities in Future Trials (in percentage %).
Initial result and kind of
replication
Virgin rat experiment
Outcome A: Shows maternal
behavior
a. All show maternal
behavior**
b. Some show maternal
behavior
c. None show maternal
behavior***
Outcome B: Fails to show
maternal behavior
a. All show maternal
behavior
b. Some show maternal
behavior
c. None show maternal
behavior
Hurricane seeding experiment
Outcome A: Intensity
increases
a. All increase
b. Some increase
c. None increase
Outcome B: Intensity
weakens
a. All weaken
b. Some weaken**
c. None weaken***

Foresight
N

Hindsight
SD

N

Mean

SD

Hindsight vs.
Foresight

Mean
Difference

t

df

p

padjusted Cohen’s Cohen’s
d
d 95% CI
Lower Upper

197

197

29.16

28.09

34.57

26.04

36.27

31.44

17.73

23.68

28.08

23.90

54.20

32.83

197

47.74
33.80
18.45

30.13
24.37
20.60

197

29.59
34.51
35.91

Gosling imprinting experiment
Outcome A: Approaches
duck
a. All approach duck*
b. Some approach duck
197
c. None approach duck***
Outcome B: Approaches
goose
a. All approach goose**
b. Some approach goose
197
c. None approach goose*
Y-test experiment
Outcome A: Places dot in
Area A
a. Places in Area A
b. Places in Area B
c. Places in Area C*
Outcome B: Places dot in
Area C
a. Places in Area A
b. Places in Area B
c. Places in Area C*

Mean

Table 10
Study 2: Independent Samples Student’s T-Tests of Probability Estimates be­
tween Foresight and Hindsight (Outcome A/B) Conditions.

204

203s

38.42

29.19

36.58

25.37

25.00

26.04

13.89

21.81

25.90

23.56

60.21

33.18

204

49.35
34.99
15.66

28.73
24.98
18.59

25.52
23.60
30.19

203

34.00
41.24
24.77

26.39
25.47
24.50

39.14
38.50
22.36

27.63
25.96
24.58

204

45.26
39.63
15.10

30.62
27.93
17.73

38.10
38.98
22.92

30.38
27.09
24.71

203

46.39
36.42
17.19

33.13
27.95
21.90

197

59.62
13.90
26.48

23.92
14.67
17.98

197

51.54
14.68
33.78

24.18
15.04
21.56

204

61.96
15.80
22.24

22.66
17.53
16.21

203

47.52
13.76
38.73

23.36
14.84
22.70

Virgin rat experiment
Outcome A: Shows
maternal behavior
a. All show
9.26
maternal
behavior**
b. Some show
2.01
maternal behavior
c. None show
-11.27
maternal
a
behavior***
Outcome B: Fails to
show maternal
behavior
a. All show
-3.83
maternal behavior
b. Some show
-2.18
maternal behavior
c .None show
6.01
maternal
behavior
Hurricane seeding experiment
Outcome A:
Intensity increases
a. All increases 1.61
b. Some increases 1.18
c. None increases -2.79
Outcome B:
Intensity weakens
a. All weaken
4.41
b. Some weaken** 6.73
c. None
-11.14
weaken*** a
Gosling imprinting experiment
Outcome A:
Approaches duck
a. All approach 6.12
duck*
b. Some approach 1.13
duck
c. None approach -7.26
duck*** a
Outcome B:
Approaches goose
a. All approach 8.29
goose**a
b. Some approach -2.56
goose
c. None approach -5.73
goose*

Note. Options and numbers marked in bold represent the kind of replication that
was reported to have occurred in the initial trial (hindsight) or could possibly
occur in the initial trial (foresight). The foresight ratings of both outcome A and
outcome B came from the same participants in the foresight condition. The
hindsight ratings came from participants in the hindsight outcome A condition
or the hindsight outcome B condition, respectively. *p < .05, **p < .01, ***p <
.001.

Y-test experiment
Outcome A: Places
dot in Area A
a. Places in Area 2.34
A
b. Places in Area B 1.90
a

c. Places in Area -4.24
C*
Outcome B: Places
dot in Area C
a. Places in Area A -4.02
b. Places in Area B -0.93
c. Places in Area 4.95
C*

were required to rate the difficulty of the prediction task (i.e., “How
difficult was it to make estimations of outcomes probabilities?”) on a 7-point
Likert scale (1 = extremely easy, 7 = extremely difficult).
7.3.4. Replication evaluation: Very close replication
Our replication study is a very close replication based on the criteria
proposed in LeBel et al. (2017) and LeBel et al. (2018). Our IV oper­
ationalization and DV operationalization were the same as those used in
the original study. For IV stimuli, we made the necessary adjustment to
change outcome B in the Y-Test scenario from “Places in Area B" to

3.24 399 0.001

0.006

0.32

0.12

0.52

0.78 399 0.434

0.521

0.08

-0.12 0.27

-3.92 399 <0.001 <0.001 -0.39

-0.59 -0.19

-1.69 398 0.093

0.159

-0.17

-0.37 0.03

-0.92 398 0.359

0.453

-0.09

-0.29 0.10

1.82 398 0.069

0.151

0.18

-0.02 0.38

0.55 399 0.584
0.48 399 0.632
-1.43 399 0.155

0.637
0.659
0.248

0.05
0.05
-0.14

-0.14 0.25
-0.15 0.24
-0.34 0.05

1.70 398 0.090 0.159 0.17
2.74 398 0.006 0.029 0.27
-4.06 398 <0.001 <0.001 -0.41

-0.03 0.37
0.08 0.47
-0.61 -0.21

2.10 399 0.036

0.086

0.21

0.01

0.42 399 0.674

0.674

0.04

-0.15 0.24

-3.40 399 0.001

0.006

-0.34

-0.54 -0.14

2.61 398 0.009

0.036

0.26

0.06

-0.93 398 0.353

0.453

-0.09

-0.29 0.10

-2.46 398 0.014

0.042

-0.25

-0.44 -0.05

1.00 399 0.316

0.446

0.10

-.10

1.18 399 0.240

0.360

0.12

-0.08 0.31

-2.48 399 0.013

0.042

-0.25

-0.45 -0.05

-1.69 398 0.091
-0.62 398 0.535
2.23 398 0.026

0.159
0.611
0.069

-0.17
-0.06
0.22

-0.37 0.03
-0.26 0.13
0.03 0.42

0.41

0.46

0.30

Note. Bolded options indicate the pairs of comparisons of interest. a Levene’s test
was significant. *p < .05, **p < .01, ***p < .001. p values were adjusted using
the Benjamini and Hochberg (1995) false discovery rate control method.

12

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

“Places in Area C,” so that outcome A and outcome B were symmetric.
For DV stimuli, we removed the request for writing down the reasons for
why the outcome had occurred, in order to reduce the time required for
the experiment in an online setting where participants might have
shorter focus than when they were in a physical laboratory. These ad­
justments were necessary and did not fundamentally change the stimuli
used in the replication study. We therefore consider this replication a
very close replication of the original study. See Table 8 for a summary of
classification, necessary adjustments, and theoretical extensions.

outcome B, gosling imprinting-outcome B, and Y-test-outcome B.
Overall, the results provide some support for Hypothesis 4(a) regarding
surprise ratings.
7.4.3. Extension: Confidence ratings
As shown in Table 12, only one of the eight sets of comparison were
in support of difference in the confidence ratings between the foresight
condition and the hindsight condition: virgin rat scenario-Outcome B.
The results for the virgin rat-Outcome A were contrary to our expecta­
tion. All other confidence ratings comparison sets had much weaker
effects. We concluded results provide no support for Hypothesis 5(a)
regarding confidence ratings.

7.4. Results
7.4.1. Probability estimates
We summarized the descriptive statistics of probability estimates in
Table 9. Violin plots of the probability estimates are available in Sup­
plementary Materials. As there were four scenarios (virgin rat, hurricane
seeding, gosling imprinting, Y-test), two possible outcomes (A or B) for
the initial trial, and three possible outcomes of future trials (all, some,
none for the first three scenarios; A, B, C for the Y-test scenario), we
conducted 24 sets of independent samples Student's t-tests.
These eight key sets of comparisons are bolded in Tables 9 and 10.
For the virgin rat, hurricane seeding, and gosling imprinting scenarios,
among the three options (i.e., all, some, and none repetition), we were
particularly interested in the probability estimates for repetition in all
future trials. For the Y-test scenario with only one future trial, we were
interested in the probability estimate of the dot being placed in the same
area as in the initial trial.
As shown in Table 10, in four of the eight comparisons, the proba­
bility estimates in the hindsight condition were higher than those in the
foresight condition, demonstrating hindsight bias. In the other four sets
of comparison, the differences in the probability estimates between the
hindsight condition and the foresight condition were weaker.
Overall, the results provide moderate support for Hypothesis 3. The
effects in all eight sets of comparisons were in the direction of partici­
pants in the hindsight condition providing higher estimates than those in
the foresight condition, although there were variations depending on the
scenario and the outcome.

7.4.4. Task difficulty
We conducted an independent samples Student's t-test to examine
the difference in the perceived task difficulty. Participants in the hind­
sight outcome A condition (M = 4.41, S⋅D = 1.61) reported lower levels
of task difficulty than participants in the foresight condition (M = 4.98,
S⋅D = 1.43), t(399) = − 3.79, p < .001, d = − 0.38, 95% CI [− 0.58,
− 0.18]. Similarly, participants in the hindsight outcome B condition (M
= 4.40, S⋅D = 1.51) reported lower levels of task difficulty than par­
ticipants in the foresight condition (M = 4.98, S⋅D = 1.43), t(398) =
− 3.98, p < .001, d = − 0.40, 95% CI [− 0.60, − 0.20]. Overall, we
conclude strong support for Hypothesis 6(a) that participants in the
hindsight conditions perceived the task to be less difficult than partici­
pants in the foresight condition.
7.4.5. Robustness checks: Alternative tests and exclusion criteria
To examine the robustness of the findings, we conducted additional
analyses (see Supplementary Materials for details). First, we tested the
Hypotheses 3, 4(a), 5(a), and 6(a) using Mann-Whitney U tests, and the
results were highly similar to those obtained using Student's indepen­
dent samples t-tests. Second, when we analyzed the data with only
participants who met a set of pre-registered exclusion criteria (i.e., selfreported English proficiency and seriousness, and guessing study pur­
pose), we found little to no differences.
7.4.6. Mediation and moderation analyses
We tested the mediation and the moderation hypotheses (see Sup­
plementary Materials for details). Surprise partially mediated the rela­
tionship between hindsight (vs. foresight) and probability estimates,
supporting H4(b), and confidence moderated the relationship between
hindsight (vs. foresight) and probability estimates, supporting H5(c).
We found no support for the mediating effects of confidence in H5(b) or
task difficulty in H6(b), and no support for the moderating effects of
surprise in H4(c) or task difficulty in H6(c).

7.4.2. Extension: Surprise ratings
We summarized the descriptives of surprise ratings in Table 11, and
the violin plots are available in the Supplementary Materials. Similar to
previous analyses for probability estimates, we conducted eight sets of
independent samples Student's t-tests to compare the surprise ratings in
the foresight and hindsight conditions.
As shown in Table 12, three of the eight sets of comparison of sur­
prise ratings were in support of hindsight bias: hurricane seeding-

Table 11
Study 2: Means and Standard Deviations of Surprise Ratings and Confidence Ratings.
Scenario

Outcome A

Outcome B

Foresight

Hindsight

Foresight

Hindsight

Mean

SD

Mean

SD

Mean

SD

Mean

SD

Surprise
Virgin rat
Hurricane seeding
Goose imprinting
Y-test

3.13
2.03
2.20
1.81

1.40
1.14
1.21
1.06

2.93
2.13
2.08
1.66

1.25
1.19
1.10
0.95

1.75
3.01
2.16
2.46

1.05
1.26
1.14
1.17

1.57
2.67
1.90
2.14

0.95
1.16
1.13
1.01

Confidence
Virgin rat
Hurricane seeding
Goose imprinting
Y-test

3.61
3.27
3.41
3.52

1.56
1.68
1.62
1.47

3.17
3.39
3.49
3.63

1.58
1.61
1.53
1.47

3.61
3.27
3.41
3.52

1.56
1.68
1.62
1.47

3.91
3.25
3.67
3.34

1.5
1.45
1.48
1.41

Note. Surprise ratings: 1 = not surprising at all, 5 = extremely surprising. Confidence ratings: 0 = extremely not confident, 6 = extremely confidence. The foresight ratings of
both outcome A and outcome B came from the same participants in the foresight condition. The hindsight ratings came from participants in the hindsight outcome A
condition or the hindsight outcome B condition, respectively. Hindsight participants only rated their surprise over the outcome which they knew had occurred in the
initial trial.
13

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

target experiment for one outcome, smaller for two outcomes, and larger
for one outcome. Overall, we conclude this a mostly successful
replication.

Table 12
Study 2: Independent samples student’s T-tests of surprise and confidence rat­
ings between foresight and hindsight conditions.
Hindsight vs.
Foresight
Surprise
Outcome A
a. Virgin rat
b. Hurricane
seeding
c. Gosling
imprinting
d. Y-test
Outcome B
a. Virgin rat
b. Hurricane
seeding**
c. Gosling
imprinting*
d. Y-test**
Confidence
Outcome A
a. Virgin rat**
b. Hurricane
seeding
c. Gosling
imprinting
d. Y-test
Outcome B
a. Virgin rat*
b. Hurricane
seeding
c. Gosling
imprinting
d. Y-test

t

df

p

padjusted

d

95% CI of d
Lower

8. Study 3: Predictions on the replicability of Fischhoff (1975)

Upper

8.1. Design and procedure
a

-1.48
.88

399
399

.140
.382

.187
.382

-.15
.09

-.35
-.11

.05
.29

-.67

399

.320

.366

-.10

-.30

.10

-1.54

399

.124

.187

-.15

-.29

-.01

-1.79
-2.82

398
398

.074
.005

.148
.020

-.18
-.28

-.38
-.48

.02
-.08

-2.30

398

.022

.059

-.23

-.43

-.03

-2.92a

398

.004

.020

-.29

-.49

-.09

-2.79
.75

399
99

.006
.454

.048
.605

-.28
.07

-.48
-.13

-.08
.27

.50

399

.616

.704

.05

-.15

.25

.78

399

.436

.605

.08

-.12

.28

1.98
-.14a

398
398

.049
.885

.196
.885

.20
-.01

.002
-.21

.40
.19

1.70

398

.091

.243

.17

-.03

.37

-1.20

398

.232

.464

-.12

-.32

.08

In this study, we asked participants to predict the replicability of
Experiment 2 of Fischhoff (1975) and expected hindsight bias over the
replicability of hindsight bias.
All participants first read a brief introduction to the main findings of
Experiment 2 of Fischhoff (1975). To ease participants' understanding,
we 1) removed “Experiment 2” and simply used “Fischhoff (1975)” in
this introduction, and 2) focused only on the results about probability
estimates in Fischhoff (1975). Participants were then randomly assigned
to one of three conditions: Foresight, Hindsight Outcome Success, and
Hindsight Outcome Fail. Those in the Foresight condition were told that
a group of researchers intended to conduct a replication of Fischhoff
(1975), and there were two possible outcomes: successful replication or
failed replication. In addition, those in the Hindsight Outcome Success
condition were told that the outcome of the replication was successful;
those in the Hindsight Outcome Fail condition were told that the
outcome of the replication was a failed replication. All participants were
asked to write down the reasons for a successful replication and the
reasons for a failed replication. They then provided probability esti­
mates of successful and failed replications. They also answered ques­
tions about surprise, confidence, and task difficulty.
8.2. Hypotheses
Because Study 2 replicated the finding that people tend to use the
results of past findings to predict future research outcomes, we expected
that:
H7: Participants in the Foresight condition will predict the probability of a
successful replication to be higher than chance (50%).
In addition, as suggested by previous research on hindsight bias,
outcome knowledge might bias probability estimates toward the known
outcome. If participants' probability estimates are influenced by
knowledge about the replication outcome, then those who were
informed of a successful replication would perceive a successful repli­
cation to be more probable than those who did not have outcome
knowledge, whereas those who were informed of a failed replication
would perceive a successful replication to be less probable than those
who did not have outcome knowledge. Such hindsight bias may occur
through cognitive processes such as memory impairment, biased
reconstruction, sense-making, and meta-cognitive experiences, as well
as social-motivational processes to increase perceived controllability
and enhance self-image (Blank et al., 2007). For example, information
about a successful replication may impact the person's memory by

Note. Levene’s test was significant. * p < .05, ** p < .01, *** p < .001. p values
were adjusted using the Benjamini and Hochberg (1995) false discovery rate
control method.

7.5. Discussion
We aimed to replicate Slovic and Fischhoff's (1977) Experiment 1, a
study of hindsight bias in prospective judgments. In line with the find­
ings in the original study, we found support for our predictions in four of
the eight sets of comparison. Overall, our findings provide moderate
support for hindsight bias in prospective judgments.
7.5.1. Replication: Mostly successful
We compared the results of the target experiment and the replication
study based on the criteria described in LeBel et al. (2019). As summa­
rized in Table 13 and Fig. 2, in four of the eight sets of probability es­
timates comparison, we found signals for successful replication. The
effect sizes observed in the replication study were similar to those of the
Table 13
Study 2: Comparison of Results in the Original Study and the Replication Study.
Scenario

p-value original

Original effect:
Cohen's da

Slovic & Fischhoff, 1977
Present Study
Virgin Rat A
Virgin Rat B
Hurricane Seeding A
Hurricane Seeding B
Gosling Imprinting A
Gosling Imprinting B
Y-Test A
Y-Test B

< 0.05

0.36 [0, 0.72]

< 0.05
> 0.05
< 0.001
< 0.05
< 0.001
> 0.05
< 0.001
< 0.001

0.36
0
0.61
0.36
0.61
0
0.61
0.61

p-value replication

Replication effect: Cohen's d [95% CI]

Replication summary

0.001
0.069
0.584
0.090
0.036
0.009
0.316
0.026

0.32 [0.12, 0.52]
0.18 [− 0.02, 0.38]
0.05 [− 0.14, 0.25]
0.17 [− 0.03, 0.37]
0.21 [0.01, 0.41]
0.26 [0.06, 0.46]
0.10 [− 0.10, 0.30]
0.22 [0.03, 0.42]

Signal – consistent
No signal – consistent
No signal – inconsistent
No signal – consistent
Signal – inconsistent, smaller
Signal – inconsistent, larger
No signal – inconsistent
Signal – inconsistent, smaller

Note: a. Estimated using largest possible p-values (e.g., 0.001 if p < .001; 0.05 if p < .05; 0.99 if p > .05; see the power analysis in the Supplementary Materials for
details).
14

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

Fig. 2. Study 2: Forest Plot of the Effect Size of Probability Estimates.

strengthening the association between relevant cues (e.g., the type of
study to be replicated and the research question) and the outcome of a
successful replication, or overwriting old knowledge with the newly
informed knowledge unconsciously. (e.g., Blank & Nestler, 2007; Hof­
frage et al., 2000; Pohl et al., 2003).
Hence, presenting evidence regarding hindsight bias will result in
participants in the Hindsight Outcome Success condition predicting the
highest probability for successful replication, followed by participants in
the Foresight condition, and lastly participants in the Hindsight
Outcome Fail condition.
Therefore:
H8: Participants in the Hindsight Outcome Success condition estimate the
probability of a successful replication to be higher than that estimated by
participants in the Hindsight Outcome Fail condition.
H9: Participants in the Hindsight conditions estimate a greater probability
for the informed outcome of replication, compared with participants in the
Foresight condition.

Table 14
Study 3: Mean Estimations of Outcomes of a Replication of Fischhoff (1975) (in
percentage %).
Foresight
(n = 154)

Estimated
probabilities
a. Successful
replication
b. Failed replication
Surprise
a. Successful
replication
b. Failed replication
Confidence
Task difficulty

8.3. Method

Hindsight
Outcome
Success:
Successful
Replication
(n = 178)

Hindsight
Outcome Fail:
Failed
Replication
(n = 188)

Mean

SD

Mean

SD

Mean

65.36 a

18.08

73.07 b

17.46

52.22 c

22.62

18.08

b

17.46

c

22.62

34.64
a

3.06 a
3.99 a
3.98 a

2.22

a

26.93

1.28

2.16

a

1.13
1.29
1.66

3.38 b
4.18 a
3.89 a

47.78

SD

1.24

2.42

a

1.26

1.12
1.30
1.73

2.89 a,c
3.64 b
4.19 a

1.14
1.39
1.58

Note. *p < .05, **p < .01, ***p < .001. Means with different superscripts (a, b, c)
were significantly different from each other.

8.3.1. Power analysis
The planned sample size for the replication study was calculated
based on pretests indicating an effect size of d = 0.4 (see supplementary
for details), with power of 95% with alpha of 0.05 (two-tailed) requiring
a sample size of 164 people for each condition, totaling a sample size of
492. We collected slightly more responses to address the possibility of
unexpected exclusions.

8.3.2. Participants
A total of 520 American participants were recruited online through
CloudResearch (228 females, 289 males, 3 undisclosed, Mage = 38.96,
SDage = 12.18, see Supplementary Materials for details about sample
characteristics).
15

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

Table 15
Study 3: Independent Samples Student's T-Tests of Estimations of Outcomes of a Replication of Fischhoff (1975).
Hindsight vs. Foresight
Estimated probabilities of successful replication
Hindsight Outcome Success vs. Foresight
Hindsight Outcome Fail vs. Foresight
Hindsight Outcome Success vs. Hindsight Outcome Fail
Surprise about successful replication
Hindsight Outcome Success vs. Foresight
Hindsight Outcome Fail vs. Foresight
Hindsight Outcome Success vs. Hindsight Outcome Fail
Surprise about failed replication
Hindsight Outcome Success vs. Foresight
Hindsight Outcome Fail vs. Foresight
Hindsight Outcome Success vs. Hindsight Outcome Fail
Confidence
Hindsight Outcome Success vs. Foresight
Hindsight Outcome Fail vs. Foresight
Hindsight Outcome Success vs. Hindsight Outcome Fail
Task difficulty
Hindsight Outcome Success vs. Foresight
Hindsight Outcome Fail vs. Foresight
Hindsight Outcome Success vs. Hindsight Outcome Fail

Mean Difference

t

df

p

Cohen's d

95% CI of Cohen's d
Lower

Upper

7.71
− 13.15
20.85

3.95
− 5.84
9.84 a

330
340
364

<0.001
<0.001
<0.001

0.43
− 0.64
1.03

0.21
− 0.86
0.80

0.65
− 0.41
1.26

− 0.06
0.20
− 0.26

− 0.42
1.45
− 1.97

330
340
364

0.677
0.149
0.050

− 0.05
0.16
− 0.21

− 0.27
− 0.05
− 0.42

0.17
0.37
0.00

0.32
− 0.16
0.48

2.56
− 1.33
4.07

330
340
364

0.011
0.184
<0.001

0.28
− 0.14
0.43

0.06
− 0.35
0.22

0.50
0.07
0.64

0.19
− 0.35
0.54

1.31
− 2.40
3.80

330
340
364

0.192
0.017
<0.001

0.14
− 0.26
0.40

− 0.08
− 0.47
0.19

0.36
− 0.04
0.61

− 0.09
0.21
− 0.30

− 0.50
1.17
− 1.73

330
340
364

0.620
0.243
0.085

− 0.05
0.13
− 0.18

− 0.27
− 0.08
− 0.39

0.17
0.34
0.03

Note. a. Levene's test was nonsignificant for all comparisons.

8.3.3. Procedure and materials
The study used a between-subject design. Participants were
randomly assigned to one of three conditions. In the Foresight condition,
participants did not receive any knowledge about the actual outcome of
the replication study. In the hindsight conditions, because there were
two possible outcomes for each scientific trial scenario, half of the
participants read that the replication was successful (Hindsight Outcome
Success condition), and the other half read that replication failed (Hind­
sight Outcome Fail condition). Following the information, participants
were required to correctly answer two comprehension questions before
proceeding to the next stage of the study. Participants then responded to
two open-ended questions asking the reasons for successful or failed
replications.

probability estimates for a successful replication (MeanProb = 65.36%, S.
D.Prob = 18.08%) were higher than chance (50%), t(153) = 10.55, p <
.001, d = 0.85. We concluded support for H7.
We conducted independent samples t-tests to test H8 and H9. As
shown in Table 15, participants who were informed of Outcome Success
estimated a successful replication to be more probable than participants
who were informed of Outcome Fail, t(364) = 9.84, p < .001, Cohen's d
= 1.03, 95% CI [0.80, 1.26]. In addition, participants who were
informed of Outcome Success estimated a successful replication to be
more probable than participants who did not know the outcome, t(330)
= 3.95, p < .001, Cohen's d = 0.43, 95% CI [0.21, 0.65]. In contrast,
participants who were informed of Outcome Fail estimated a successful
replication to be less probable than participants who did not know the
outcome, t(340) = − 5.84, p < .001, Cohen's d = − 0.64, 95% CI [− 0.86,
− 0.41]. The results therefore provided strong support for H8 and H9.

8.3.4. Probability estimates of replication outcomes
Participants were then asked to provide probability estimates for
both Outcome A (the hindsight bias effect will be successfully repli­
cated) and Outcome B (the hindsight bias effect will fail to replicate). In
the Foresight condition, the instructions were: “In light of the informa­
tion appearing in the paragraphs provided, please estimate the proba­
bilities of occurrence of the two possible outcomes in the replication
study. There are no right or wrong answers, answer based on your
intuition. (The probabilities should sum to 100%).” In the Hindsight
conditions, the instructions contained an additional sentence: “Answer
as if you do not know the outcome, estimating the probabilities at that
time before the replication study was launched.”

8.5. Robustness checks
To examine the robustness of the findings, we conducted additional
analyses (see Supplementary Materials for details). When we analyzed
the data with only participants who met a set of pre-registered exclusion
criteria (i.e., self-reported English proficiency and seriousness, and
guessing study purpose), we found little to no differences between the
results with the full sample and the results after exclusion.
8.6. Exploratory extensions

8.3.5. Surprise, confidence, and task difficulty ratings: exploratory
We added exploratory measures of surprise, confidence, and task
difficulty. Exploratory hypotheses and findings are reported in the
supplementary.
Participants were asked to rate their surprise about both Outcome A
and Outcome B, confidence about the accuracy of their estimation, and
perceived task difficulty. Measures of surprise, confidence, and task
difficulty were similar or identical to those used in Study 2.

We found some support for the mediating role and the moderating
role of surprise over the alternative outcome for the relationship be­
tween Hindsight Outcome Success condition and probability estimates
of Outcome A. However, there was no support for any other hypothe­
sized mediating or moderating effects, and we concluded weak to no
support for the mediating or moderating effects. Hypotheses, analyses,
and results are provided in the supplementary.
8.7. Discussion

8.4. Results

We found strong support of hindsight bias for the replicability of
hindsight bias. First, being presented with an outcome of Fischhoff's
(1975) original study, participants' probability estimates of a successful
replication were higher than chance. Second, participants' probability
estimates of a certain outcome were higher when they knew the

We summarized the descriptive statistics of probability estimates,
surprise, confidence, and task difficulty in Table 14. Violin plots of these
variables are available in Supplementary Materials.
We conducted a one-sample t-test to test H7. We found that the
16

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

outcome than when they did not know the outcome.

provide explanations of an outcome, the person would have to tempo­
rarily assume that outcome is true, and then assess its plausibility. Such
cognitive processes can lead the person to perceive the outcome to be
more plausible, persuasive, or even inevitable (Koehler, 1991). It is
therefore possible that writing down the reasons for the outcome re­
inforces participants' belief that the outcome is true, which in turn in­
tensifies hindsight bias. In our replication study we had to make
adjustments to remove the step of providing explanations and this may
have led to the observed effect size to be smaller than the case when
participants were asked to provide explanations.. We note, however,
that this explanation does not clarify the weaker effects in Study 1. It
could be that the effect size of hindsight bias is larger for retrospective
judgments, and smaller for prospective judgments. This possibility
awaits further investigation.

9. General Discussion
We conducted very close replications of Experiment 2 in Fischhoff
(1975) and Experiment 1 in Slovic and Fischhoff (1977), and found
support for hindsight bias in both retrospective and prospective judg­
ments. In retrospective judgments (Study 1: replication of Fischhoff,
1975), participants were asked to predict the probability of an outcome
in a past event. Compared to participants who had no knowledge about
the actual outcome of the event, participants who knew the actual
outcome estimated the probability of the actual outcome to be higher,
even if they were asked to estimate as if they did not know the actual
outcome. In prospective judgments (Study 2: replication of Slovic &
Fischhoff, 1977), participants were told that researchers had conducted
an initial trial of an experiment, and would conduct either one or mul­
tiple trials of the same kind in the future. The participants' job was to
predict the outcome of those future trials. Compared to participants who
had no knowledge of the actual outcome of the initial trial, participants
who knew the actual outcome of the initial trial predicted the proba­
bility of the actual outcome in future trials to be higher.
Building on these two replication studies, we added a third study to
examine hindsight bias in estimating the replicability of hindsight bias.
Our findings suggest that estimates of replication outcomes were heavily
influenced by outcome knowledge. Overall, participants predicted a
successful replication for Fischhoff (1975). The probability estimates of
a successful replication were highest among those who were informed of
a successful replication, moderate among those who were not informed
of an outcome, and lowest among those who were informed of a failed
replication. Our findings suggest that probability estimations regarding
research and replication outcomes were affected by hindsight bias.

9.2. Extensions
We added several extensions. In Study 1, we found no support for the
mediating effect of surprise in the relationship between hindsight con­
dition and probability estimates, and inconclusive results for the
moderating effect of surprise on the relationship between hindsight
condition and probability estimates. In Study 2, we found some support
for surprise, but not for confidence, as a mediator of the relationship
between hindsight condition and probability estimates. In addition, we
found support for confidence, but not for surprise, as a moderator of the
relationship between hindsight condition and probability estimates.
Hindsight bias was evident when confidence about one's own judgments
was high, but it was reversed when confidence was low. In Study 3, we
found weak to no support for the mediating role and the moderating role
of surprise. Other than that, there was no support for the mediating or
the moderating effects of surprise, confidence, and task difficulty.
Given these mixed findings, we are hesitant to offer any conclusions
regarding surprise and confidence. Past findings regarding the effect of
surprise were not unequivocal. Although many articles argued that
hindsight bias could be caused by a lack of scrutiny and consideration of
alternatives associated with a lack of surprise feelings (Sanna &
Schwarz, 2006; Slovic & Fischhoff, 1977), other research noted that a
certain level of surprise is required for hindsight bias to occur––after all,
if the person already had the knowledge (thus would not feel surprised),
then his/her estimation of the probability shall not be affected by the
outcome knowledge provided by the researcher (Pezzo, 2003). In testing
the robustness of hindsight bias, some research found that hindsight bias
persisted even when the materials and outcome knowledge were diffi­
cult or unexpected by the participants (e.g., Ash, 2009; Fischhoff, 1977;
Hoch & Loewenstein, 1989; Roese & Olson, 1996; Wood, 1978), sug­
gesting that surprise did not necessarily hinder hindsight bias.
Furthermore, Schkade and Kilbourne (1991) found that hindsight bias
was larger when outcomes were inconsistent with expectations than
when they were consistent. The authors reasoned that this could be
because the process of assimilating the outcome knowledge into what
was already known was immediate and at least partially automatic.
Thus, the more different and surprising the outcome knowledge was
from prior knowledge, the larger the hindsight bias; the more familiar
the outcome knowledge was from prior knowledge, the less likely that a
cognitive reconstruction leading to hindsight bias will occur. More
research is needed to clarify these varying theoretical arguments and
mixed findings about the role of surprise in hindsight bias.
Previous studies have linked hindsight bias to confidence, yet there
are studies that failed to detect such associations. Ross (2012) found that
the effect of outcome knowledge on probability estimates and that on
confidence are disconnected. In addition, Schatz (2019) failed to find
support for the relationship between receiving outcome knowledge and
confidence across ten studies. These and our findings suggest more
research is needed to understand role of confidence in hindsight bias, yet
it is possible that these links have been overestimated.
In addition, studies in the literature tend to consider surprise and

9.1. Replications: comparison with original findings
In our two replication studies, results were mostly in line with the
original findings with some minor deviations. We concluded these rep­
lications as mostly successful despite these deviations for two reasons.
First, study materials were designed almost half a century ago, and some
participants may have been more knowledgeable about some of these
stimuli than participants in the 1970s. For example, in the Y-test sce­
nario of Study 2, a 4-year-old child was asked to determine the relative
position of a dot to the letter Y when viewed from the back of the easel,
like in a left-right mirror image. Back in 1970s, people might not
necessarily know the more likely choice of the child. However, today,
following wider dissemination of findings in developmental and cogni­
tive psychology, more people may have had the insight that mirrorimage confusions are prevalent among children, because the abilities
that are required to make the correct choice, such as spatial cognition
(Colby, 2009) and theory of mind (Wellman & Liu, 2004), are not welldeveloped among 4-year olds (Gregory, Landau, & McCloskey, 2011). In
the target experiment, the average probability of outcome A (“places in
area A", showing a lack of spatial cognition and theory of mind) in the
foresight condition was 0.29. However, in the replication study, the
number was much higher (0.60), possibly indicating a shift of knowl­
edge regarding this phenomenon over the decades. Similarly, in the
hurricane seeding scenario in Study 2, the average probability of
outcome A (“All increase”) was 0.29 in the target experiment, and 0.48
in our replication study. When participants hold certain knowledge prior
to taking part in the study, their probability estimates may be less
influenced by the study's manipulation of outcome knowledge (of the
initial trial), weakening hindsight bias. Given these changes, we
consider our findings an impressive demonstration of the generaliz­
ability and relevancy of the effect.
Second, for Study 2, while the target experiment asked the partici­
pants to write down why they thought the outcome would happen, we
did not include this question in the replication study. When asked to
17

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

confidence as two sides of the same coin, based on an assumption that
feelings of surprise may reduce a person's confidence about a judgment.
However, we found no indication for such an association. Future studies
may aim to differentiate and contrast surprise and confidence in hind­
sight bias.
We found no support for the mediating effect or moderating effect of
subjective task difficulty in the relationship between hindsight condition
and probability estimates. Although participants in the hindsight con­
dition perceived the task to be easier, this decreased perceived difficulty
did not seem to predict probability estimates. Task difficulty was
negatively associated with confidence about one's own judgments, and
weakly positively associated with surprise of the outcome. Similar to
surprise, the literature also showed discrepancies in whether hindsight
bias is larger in more difficult or less difficult tasks (see for example
Arkes et al., 1981; Harley et al., 2004). More research is needed to
address these discrepancies and clarify the role of task difficulty in
hindsight bias.

9.4. Limitations and future research
In all three studies, we used the hypothetical design to test hindsight
bias (“answer as if you did not know the outcome”). However, this
design makes it difficult to examine psychological processes underlying
hindsight bias. We therefore encourage future studies to 1) replicate
further studies about hindsight bias which had a stronger focus on the
underlying psychological processes, and 2) extend our findings in Study
3 using other designs, such as memory recall (Pohl, 2007), and multi­
nomial processing trees (Bernstein et al., 2011; Groß & Bayen, 2015;
Hell, Gigerenzer, Gauggel, Mall, & Müller, 1988).
We conducted all studies using an American sample, and future
studies may aim to extend our efforts to also examine samples from other
diverse cultures.
We discussed possible implications of hindsight bias for science, yet
these were inferred rather than directly tested. We believe that this is a
promising and much needed area of research. Future research may aim
to directly examine whether and to what extent hindsight bias influences
researchers' decisions to embark on replications and reviewers' and ed­
itors' decisions to publish a replication study. If such a bias is found, it
would be imperative to further examine the impact of our above sug­
gested solutions and other potential remedies to overcome this bias.
This replication presented us with a special challenge, regarding
some of the events included in the original stimuli of Fischhoff (1975).
Events C and D used in the original were from a classic clinical psy­
chology book by Ellis from the 1960s. The original authors reflected on
the use of these stimuli and noted that the scenarios described patients
"in terms that fit now–antiquated mores and theories" (Fischhoff, 2007,
p. 11; also see interview in Klein, Hegarty, & Fischhoff, 2017). In cor­
respondence with the original author and the editor we felt it needed to
include a warning note that that these stimuli should no longer be used
in follow-up research. We removed the reporting of these materials and
analyses of these events from the manuscript and the supplementary.

9.3. Take-aways for Science: Endorsement of Open Science practices
In the introduction we discussed direct and important implications of
hindsight bias for science. Beyond our successful replications of classic
hindsight bias studies, we also successfully demonstrated the application
of hindsight bias regarding our very own replication of hindsight bias.
We were asked by the editor and reviewers to discuss our views on
possible ways to address hindsight bias in the scientific process. First,
there is the issue of raising awareness to hindsight bias pitfalls. To be
able to overcome this bias, there needs to be some awareness that the
problem exists, and some scholars in the open-science community have
been trying to raise awareness to the impact of cognitive biases and
study these systematically using meta research (e.g., Bishop, 2019,
2020a, 2020b). Second, pre-registrations - if done appropriately - seem
like a promising direction against researchers fooling themselves by
making a public commitment regarding their hypotheses, design, pro­
cedures, and data analysis plans (Nosek et al., 2018; Shrout & Rodgers,
2018; van't Veer & Giner-Sorolla, 2016). These may at the very least
address the issues of unintended memory reconstruction and HARKing,
since researchers can easily go back to their pre-registrations and
examine their findings against their prior plans. These may also partly
serve to ensure others of the researchers' open transparent research
process, and demonstrate researchers' public commitment to over­
coming their own biases.
Third, Registered Reports publication format (Chambers & Tzavella,
2020; Simons et al., 2014) and results-blind review (Button, Bal, Clark,
& Shipley, 2016) can reduce hindsight bias in the publication review
process by addressing outcome driven interpretations and the pressures
on authors to adhere to a certain outcome. Determining whether to
accept or reject a replication study prior to data collection also helps
address outcome bias (Baron & Hershey, 1988; Savani & King, 2015),
where a failed replication (i.e., a bad outcome) leads to perceiving the
study or the replicators as lower quality compared to a successful
replication (i.e., a good outcome). Endorsement of Replication Regis­
tered Reports as an integral part of the scientific process, with directions
like the Pottery Barn rule (if you publish it, you commit to publishing
replications of it; Edlund, Cuccolo, Irgens, Wagge, & Zlokovich, 2020;
Srivastava, 2012) and a commitment to publishing all well-executed
replications (e.g., Chambers, 2018) may help overcome inherent bia­
ses against replications as being more predictable and of lower value
(Zwaan, Etz, Lucas, & Donnellan, 2018).
Lastly, and most important, systematically documenting and openly
sharing everything about the research life-cycle, from initial idea and
research question, through process, design, and decisions, to materials,
data, and code, with public commitment and openness toward third
party open peer review, can greatly reduce human biases introduced in
the scientific process and encourage collaboration and sharing. This is
the essence of open science.

10. Conclusion
We conducted two close replication studies and one novel study to
investigate hindsight bias. In Study 1, we found support for hindsight
bias as in Experiment 2 of Fischhoff (1975). Participants were more
likely to estimate the probability of an outcome to be higher when they
knew that the outcome actually occurred. In Study 2, we found some
support for hindsight bias as in Experiment 1 of Slovic and Fischhoff
(1977). When informed of the outcome of an initial trial, participants
were more likely to predict this same outcome to repeatedly occur in
future trials. In Study 3, we found support for hindsight bias over the
replicability of hindsight bias. We found mixed weak to no support for
the mediating and moderating roles of surprise, confidence, and task
difficulty. We conclude that after almost five decades since the original
studies were published, we found consistent evidence for hindsight bias.
Financial disclosure/funding
This research was supported by the European Association for Social
Psychology seedcorn grant.
Authorship declaration
Gilad led the reported replication effort with the team listed below.
Gilad supervised each step of the project, conducted the pre-registration,
and ran data collection. Jieying followed up on initial work by the other
coauthors to verify analyses and conclusions, added advanced tables and
plots, designed, ran, and analyzed the third study, and completed the
manuscript submission draft. Jieying and Gilad jointly finalized the
manuscript for submission.
Lok Ching (Roxane) Kwan, Lok Yeung (Loren) Ma, Hiu Yee (Hay­
leyAnne) Choi, Ying Ching (Lita) Lo, Shin Yee (Sarah) Au, and Chi Ho
18

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

(Toby) Tsang conducted the two replication studies as part of university
coursework. They conducted an initial analysis of the paper, designed
the replication, initiated the extensions, wrote the pre-registrations,
conducted initial data analyses, and wrote initial replication reports.
Bo Ley Cheng guided and assisted the replication effort.

Bernstein, D., Aßfalg, A., Kumar, R., & Ackerman, R. (2016). Looking backward and
forward on hindsight bias. Handbook of Metamemory (pp. 289–304). Oxford, UK:
Oxford University Press.
Bernstein, D. M., Erdfelder, E., Meltzoff, A. N., Peria, W., & Loftus, G. R. (2011).
Hindsight bias from 3 to 95 years of age. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 37, 378–391.
Bishop, D. (2019). Fixing the replication crisis: The need to understand human
psychology. APS Observer, 32(10).
Bishop, D. (2020a). How scientists can stop fooling themselves over statistics. Nature,
584(7819), 9.
Bishop, D. (2020b). The psychology of experimental psychologists: Overcoming
cognitive constraints to improve research: The 47th sir Frederic Bartlett lecture.
Quarterly Journal of Experimental Psychology, 73(1), 1–19.
Blank, H., Musch, J., & Pohl, R. F. (2007). Hindsight bias: On being wise after the event.
Social Cognition, 25, 1–9.
Blank, H., & Nestler, S. (2007). Cognitive process models of hindsight bias. Social
Cognition, 25, 132–146.
Bosco, F. A., Aguinis, H., Field, J. G., Pierce, C. A., & Dalton, D. R. (2016). HARKing’s
threat to organizational research: Evidence from primary and meta-analytic sources.
Personnel Psychology, 69, 709–750.
Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., …
Van’t Veer, A. (2014). The replication recipe: What makes for a convincing
replication? Journal of Experimental Social Psychology, 50, 217–224.
Bukszar, E., & Connolly, T. (1988). Hindsight bias and strategic choice: Some problems in
learning from experience. Academy of Management Journal, 31, 628–641.
Button, K. S., Bal, L., Clark, A., & Shipley, T. (2016). Preventing the ends from justifying
the means: Withholding results to address publication bias in peer-review. BMC
Psychology, 4(1), 1–7.
Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., … Wu, H.
(2016). Evaluating replicability of laboratory experiments in economics. Science, 351
(6280), 1433–1436.
Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T. H., Huber, J., Johannesson, M., …
Altmejd, A. (2018). Evaluating the replicability of social science experiments in
nature and science between 2010 and 2015. Nature Human Behaviour, 2, 637–644.
Casper, J. D., Benedict, K., & Perry, J. L. (1989). Juror decision making, attitudes, and the
hindsight bias. Law and Human Behavior, 13, 291–310.
Chambers, C. D. (2018). Reproducibility meets accountability: Introducing the
replications initiative at Royal Society Open Science. In Royal Society Open Science.
Retrieved from https://royalsociety.org/blog/2018/10/reproducibility-meets-acc
ountability/.
Chambers, C. D., & Tzavella, L. (2020). Registered Reports: Past, Present and Future.
https://doi.org/10.31222/osf.io/43298.
Christensen-Szalanski, J. J., & Willham, C. F. (1991). The hindsight bias: A meta-analysis.
Organizational Behavior and Human Decision Processes, 48, 147–168.
Cohen, J. E. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ:
Lawrence Erlbaum Associates, Inc.
Colby, C. L. (2009). Spatial Cognition. Encyclopedia of Neuroscience, 165–171.
Davis, A. L., & Fischhoff, B. (2014). Communicating uncertain experimental evidence.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 40, 261–274.
Dawson, N. V., Connors, A. F., Jr., Speroff, T., Kemka, A., Shaw, P., & Arkes, H. R. (1993).
Hemodynamic assessment in the critically ill: Is physician confidence warranted?
Medical Decision Making, 13, 258–266.
Ebersole, C. R., Mathur, M. B., Baranski, E., Bart-Plange, D. J., Buttrick, N. R.,
Chartier, C. R., … Szecsi, P. (2020). Many labs 5: Testing pre-data-collection peer
review as an intervention to increase replicability. Advances in Methods and Practices
in Psychological Science, 3(3), 309–331.
Edlund, J., Cuccolo, K., Irgens, M. S., Wagge, J. R., & Zlokovich, M. S. (2020). Saving
Science Through Replication Studies. https://doi.org/10.31234/osf.io/efypc.
Faul, F., Erdfelder, E., Buchner, A., & Lang, A. G. (2009). Statistical power analyses using
G* power 3.1: Tests for correlation and regression analyses. Behavior Research
Methods, 41, 1149–1160.
Fay, M. P., & Malinovsky, Y. (2018). Confidence intervals of the Mann-Whitney
parameter that are compatible with the Wilcoxon-Mann-Whitney test. Statistics in
Medicine, 37, 3991–4006.
Fischhoff, B. (1975). Hindsight ∕
= foresight: The effect of outcome knowledge on
judgment under uncertainty. Journal of Experimental Psychology: Human Perception
and Performance, 104, 288–299.
Fischhoff, B. (1977). Perceived informativeness of facts. Journal of Experimental
Psychology: Human Perception and Performance, 3, 349–358.
Fischhoff, B. (2007). An early history of hindsight research. Social Cognition, 25(1),
10–13.
Fischhoff, B., & Beyth, R. (1975). I knew it would happen: Remembered probabilities of
once—Future things. Organizational Behavior and Human Performance, 13, 1–16.
Forer, B. R. (1949). The fallacy of personal validation: A classroom demonstration of
gullibility. Journal of Abnormal and Social Psychology, 44, 118–123.
Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use,
calculations, and interpretation. Journal of Experimental Psychology: General, 141,
2–18.
Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons
can be a problem even when there is no “fishing expectation” or “p-hacking” and the
research hypothesis was posited ahead of time. Retrieved from https://osf.io/n3axs.
Granhag, P. A., Strömwall, L. A., & Allwood, C. M. (2000). Effects of reiteration,
hindsight bias, and memory on realism in eyewitness confidence. Applied Cognitive
Psychology, 14, 397–420.
Gregory, E., Landau, B., & McCloskey, M. (2011). Representation of object orientation in
children: Evidence from mirror-image confusions. Visual Cognition, 19, 1035–1062.

Contributor roles taxonomy
In the table below, employ CRediT (Contributor Roles Taxonomy) to
identify the contribution and roles played by the contributors in the
current replication effort. Please refer to the url (https://www.casrai.or
g/credit.html) on details and definitions of each of the roles listed below.
Role

Jieying
Chen

Gilad
Feldman

Conceptualization
Pre-registrations
Data curation
Formal analysis
Funding acquisition
Investigation
Methodology
Pre-registration peer
review /
verification
Data analysis peer
review /
verification
Project
administration
Resources
Software
Supervision
Validation
Visualization
Writing-original
draft
Writing-review and
editing

X
X

X
X
X
X
X
X
X
X

X
X
X
X
X

Lok Ching Kwan, Lok
Yeung (Loren) Ma, Hiu
Yee (HayleyAnne)
Choi, Ying Ching (Lita)
Lo, Shin Yee (Sarah)
Au, and Chi Ho (Toby)
Tsang
X
X
X
X
X

X
X
X
X

X

X
X

X

Bo Ley
Cheng

X
X
X
X

X
X

X

X
X

Declaration of Competing Interest
The author(s) declared no potential conflicts of interests with respect
to the authorship and/or publication of this article.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.
org/10.1016/j.jesp.2021.104154.
References
Aarts, H., Verplanken, B., & Van Knippenberg, A. (1998). Predicting behavior from
actions in the past: Repeated decision making or a matter of habit? Journal of Applied
Social Psychology, 28, 1355–1374.
Arkes, H. R. (2013). The consequences of the hindsight bias in medical decision making.
Current Directions in Psychological Science, 22, 356–360.
Arkes, H. R., Wortmann, R. L., Saville, P. D., & Harkness, A. R. (1981). Hindsight bias
among physicians weighing the likelihood of diagnoses. Journal of Applied
Psychology, 66, 252–254.
Ash, I. K. (2009). Surprise, memory, and retrospective judgment making: Testing
cognitive reconstruction theories of the hindsight bias effect. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 35, 916–933.
Baron, J., & Hershey, J. C. (1988). Outcome bias in decision evaluation. Journal of
Personality and Social Psychology, 54, 569–579.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical
and powerful approach to multiple testing. Journal of the Royal Statistical Society
Series B, 57, 289–300.

19

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154
Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia: II Restructuring
incentives and practices to promote truth over publishability. Perspectives on
Psychological Science, 7, 615–631.
Ofir, C., & Mazursky, D. (1997). Does a surprising outcome reinforce or reverse the
hindsight bias? Organizational Behavior and Human Decision Processes, 6, 51–57.
Open, S. C. (2015). Psychology. Estimating the reproducibility of psychological science.
Science, 349(6251). aac4716.
Ouellette, J. A., & Wood, W. (1998). Habit and intention in everyday life: The multiple
processes by which past behavior predicts future behavior. Psychological Bulletin,
124, 54–74.
Pezzo, M. (2003). Surprise, defence, or making sense: What removes hindsight bias?
Memory, 11, 421–441.
Pohl, R., Eisenhauer, M., & Hardt, O. (2003). SARA: A cognitive process model to
simulate the anchoring effect and hindsight bias. Memory, 11, 337–356.
Pohl, R. F. (2007). Ways to assess hindsight bias. Social Cognition, 2, 14–31.
Pohl, R. F., Bender, M., & Lachmann, G. (2002). Hindsight bias around the world.
Experimental Psychology, 49, 270–282.
Roese, N. J., & Olson, J. M. (1996). Counterfactuals, causal attributions, and the
hindsight bias: A conceptual integration. Journal of Experimental Social Psychology, 32
(3), 197–227.
Roese, N. J., & Vohs, K. D. (2012). Hindsight bias. Perspectives on Psychological Science, 7,
411–426.
Ross, M. (2012). The hindsight bias: Judgment task differentiation. doctoral dissertation. Old
dominion university.
Sanna, L. J., & Schwarz, N. (2006). Metacognitive experiences and human judgment: The
case of hindsight bias and its debiasing. Current Directions in Psychological Science, 15,
172–176.
Savani, K., & King, D. (2015). Perceiving outcomes as determined by external forces: The
role of event construal in attenuating the outcome bias. Organizational Behavior and
Human Decision Processes, 130, 136–146.
Schatz, D. A. (2019). Boundaries of the hindsight bias. Doctoral dissertation. Berkeley:
University of California.
Scheel, A. M., Schijen, M., & Lakens, D. (2021). An excess of positive results: Comparing
the standard psychology literature with registered reports. In Advances in Methods
and Practices in Psychological Science.
Schkade, D. A., & Kilbourne, L. M. (1991). Expectation-outcome consistency and
hindsight bias. Organizational Behavior and Human Decision Processes, 49, 105–123.
Shrout, P. E., & Rodgers, J. L. (2018). Psychology, science, and knowledge construction:
Broadening perspectives from the replication crisis. Annual Review of Psychology, 69,
487–510.
Simons, D. J., Holcombe, A. O., & Spellman, B. A. (2014). An introduction to registered
replication reports at perspectives on psychological science. Perspectives on
Psychological Science, 9, 552–555.
Slovic, P., & Fischhoff, B. (1977). On the psychology of experimental surprises. Journal of
Experimental Psychology: Human Perception and Performance, 3, 544–551.
Slovic, P., Lichtenstein, S., & Fischhoff, B. (1988). Decision-making. In R. C. Atkinson,
et al. (Eds.), Learning and cognition: Vol. 2. Steven’s handbook of experimental
psychology (pp. 673–738). New York, NY: Wiley.
Srivastava, S. (2012). A Pottery Barn rule for scientific journals. Retreived from: https:
//hardsci.wordpress.com/2012/09/27/a-pottery-barn-rule-for-scientific-journals/.
van’t Veer, A. E., & Giner-Sorolla, R. (2016). Pre-registration in social psychology—A
discussion and suggested template. Journal of Experimental Social Psychology, 67,
2–12.
Thaler, R. H. (2016). Behavioral economics: Past, present, and future. American Economic
Review, 106, 1577–1600.
Veldkamp, C. (2017). The human fallibility of scientists: Dealing with error and bias in
academic research. doctoral dissertation. Tilburg University.
Wagenmakers, E. J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A.
(2012). An agenda for purely confirmatory research. Perspectives on Psychological
Science, 7, 632–638.
Wagge, J. R., Brandt, M. J., Lazarevic, L. B., Legate, N., Christopherson, C., Wiggins, B., &
Grahe, J. E. (2019). Publishing research with undergraduate students via replication
work: The collaborative replications and education project. Frontiers in Psychology,
10, 247.
Wellman, H. M., & Liu, D. (2004). Scaling of theory-of-mind tasks. Child Development, 75,
523–541.
Werth, L., & Strack, F. (2003). An inferential approach to the knew-it-all-along
phenomenon. Memory, 11(4–5), 411–419.
Winman, A., Juslin, P., & Björkman, M. (1998). The confidence–hindsight mirror effect in
judgment: An accuracy-assessment model for the knew-it-all-along phenomenon.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 24(2), 415.
Wong, L. Y. S. (1995). Research on teaching: Process-product research findings and the
feelings of obviousness. Journal of Educational Psychology, 87(3), 504.
Wood, G. (1978). The knew-it-all-along effect. Journal of Experimental Psychology: Human
Perception and Performance, 4, 345–353.
Yang, H., & Thompson, C. (2010). Nurses’ risk assessment judgements: A confidence
calibration study. Journal of Advanced Nursing, 66, 2751–2760.
Zwaan, R. A., Etz, A., Lucas, R. E., & Donnellan, M. B. (2018). Making replication
mainstream. Behavioral and Brain Sciences, 41.

Groß, J., & Bayen, U. J. (2015). Adult age differences in hindsight bias: The role of recall
ability. Psychology and Aging, 30, 253–258.
Guilbault, R. L., Bryant, F. B., Brockway, J. H., & Posavac, E. J. (2004). A meta-analysis
of research on hindsight bias. Basic and Applied Social Psychology, 26, 103–117.
Harley, E. M., Carlsen, K. A., & Loftus, G. R. (2004). The“ saw-it-all-along” effect:
Demonstrations of visual hindsight bias. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 30, 960–968.
Hawkins, S. A., & Hastie, R. (1990). Hindsight: Biased judgments of past events after the
outcomes are known. Psychological Bulletin, 107, 311–327.
Hell, W., Gigerenzer, G., Gauggel, S., Mall, M., & Müller, M. (1988). Hindsight bias: An
interaction of automatic and motivational factors? Memory & Cognition, 16,
533–538.
Hertwig, R., Fanselow, C., & Hoffrage, U. (2003). Hindsight bias: How knowledge and
heuristics affect our reconstruction of the past. Memory, 11, 357–377.
Hoch, S. J., & Loewenstein, G. F. (1989). Outcome feedback: Hindsight and information.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 605–619.
Hoffrage, U., Hertwig, R., & Gigerenzer, G. (2000). Hindsight bias: A by-product of
knowledge updating? Journal of Experimental Psychology: Learning, Memory, and
Cognition, 26, 566–581.
Hoffrage, U., & Pohl, R. (2003). Research on hindsight bias: A rich past, a productive
present, and a challenging future. Memory, 11, 329–335.
Hom, H. L., Jr., & Van Nuland, A. L. (2019). Evaluating scientific research: Belief,
hindsight bias, ethics, and research evaluation. Applied Cognitive Psychology, 33,
675–681.
Ioannidis, J. P. (2005). Why most published research findings are false. PLoS Medicine, 2
(8), Article e124.
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of
questionable research practices with incentives for truth telling. Psychological
Science, 23, 524–532.
Kaplan, H., & Barach, P. (2002). Incident reporting: Science or protoscience? Ten years
later. BMJ Quality & Safety, 11, 144–145.
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and
Social Psychology Review, 2, 196–217.
Klein, O., Hardwicke, T. E., Aust, F., Breuer, J., Danielsson, H., Hofelich Mohr, A., …
Frank, M. C. (2018). A practical guide for transparency in psychological science.
Collabra: Psychology, 4, 1–15.
Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams, R. B., Jr., Alper, S., …
Sowden, W. (2018). Many labs 2: Investigating variation in replicability across
samples and settings. Advances in Methods and Practices in Psychological Science, 1(4),
443–490.
KNAW: Royal Dutch Academy of Arts and Sciences. (2018). Replication studies:
Improving reproducibility in the empirical sciences. Amsterdam, Netherlands
Retrieved from https://knaw.nl/en/news/publications/replication-studies.
Koehler, D. J. (1991). Explanation, imagination, and confidence in judgment.
Psychological Bulletin, 110(3), 499–519.
LeBel, E. P., Berger, D., Campbell, L., & Loving, T. J. (2017). Falsifiability is not optional.
Journal of Personality and Social Psychology, 11, 254–261.
LeBel, E. P., McCarthy, R. J., Earp, B. D., Elson, M., & Vanpaemel, W. (2018). A unified
framework to quantify the credibility of scientific findings. Advances in Methods and
Practices in Psychological Science, 1, 389–402.
LeBel, E. P., Vanpaemel, W., Cheung, I., & Campbell, L. (2019). A brief guide to evaluate
replications. Meta-Psychology, 3. MP.2018.843.
Litman, L., Robinson, J., & Abberbock, T. (2017). TurkPrime. Com: A versatile
crowdsourcing data acquisition platform for the behavioral sciences. Behavior
Research Methods, 49(2), 433–442.
Mazursky, D., & Ofir, C. (1990). “I could never have expected it to happen”: The reversal
of the hindsight bias. Organizational Behavior and Human Decision Processes, 46,
20–33.
Moshontz, H., Campbell, L., Ebersole, C. R., IJzerman, H., Urry, H. L., Forscher, P. S., …
Chartier, C. R. (2018). The psychological science accelerator: Advancing psychology
through a distributed collaborative network. Advances in Methods and Practices in
Psychological Science, 1(4), 501–515.
Müller, P. A., & Stahlberg, D. (2006). Surprise as information: Metacognitive influences on
hindsight bias. Unpublished manuscript. Germany: University of Mannheim.
Müller, P. A., & Stahlberg, D. (2007). The role of surprise in hindsight bias: A
metacognitive model of reduced and reversed hindsight bias. Social Cognition, 25,
165–184.
Munafò, M. R., Nosek, B. A., Bishop, D. V., Button, K. S., Chambers, C. D., Du Sert, N. P.,
… Ioannidis, J. P. (2017). A manifesto for reproducible science. Nature Human
Behaviour, 1, 1–9.
Nestler, S., & Egloff, B. (2009). Increased or reversed? The effect of surprise on hindsight
bias depends on the hindsight component. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 35, 1539–1544.
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration
revolution. Proceedings of the National Academy of Sciences, 115, 2600–2606.
Nosek, B. A., & Lakens, D. (2014). A method to increase the credibility of published
results. Social Psychology, 45, 137–141.

20

J. Chen et al.

Journal of Experimental Social Psychology 96 (2021) 104154

Jieying Chen is an assistant professor at the Department of Business Administration,
University of Manitoba. Her research focuses on judgment and decision-making, crosscultural interactions, strategic human resource management, and mindfulness.

Bo Ley Cheng was the teaching assistant at the University of Hong Kong psychology
department during the academic year 2018–9.
Gilad Feldman is an assistant professor with the University of Hong Kong psychology
department. His research focuses on judgment and decision-making.

Lok Ching (Roxane) Kwan, Lok Yeung (Loren) Ma, Hiu Yee (HayleyAnne) Choi, Ying
Ching (Lita) Lo, Shin Yee (Sarah) Au, and Chi Ho (Toby) Tsang were students at the
University of Hong Kong during the academic year 2018-9.

21

