Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022,
pp.
449–486
Both
better
and
worse
than
others
depending
on
difficulty:
Replication
and
extensions
of
Kruger’s
(1999)
above
and
below
average
effects
Max
Korbmacher
∗
Ching
(Isabelle)
Kwan
†
Gilad
Feldman
‡
Abstract
Above-and-below-average
effects
are
well-known
phenomena
that
arise
when
com-
paring
oneself
to
others.
Kruger
(1999)
found
that
people
rate
themselves
as
above
average
for
easy
abilities
and
below
average
for
difficult
abilities.
We
conducted
a
suc-
cessful
pre-registered
replication
of
Kruger’s
(1999)
Study
1,
the
first
demonstration
of
the
core
phenomenon
(N
=
756,
US
MTurk
workers).
Extending
the
replication
to
also
include
a
between-subject
design,
we
added
two
conditions
manipulating
easy
and
dif-
ficult
interpretations
of
the
original
ability
domains,
and
with
an
additional
dependent
variable
measuring
perceived
difficulty.
We
observed
an
above-average-effect
in
the
easy
extension
and
below-average-effect
in
the
difficult
extension,
compared
to
the
neu-
tral
replication
condition.
Both
extension
conditions
were
perceived
as
less
ambiguous
than
the
original
neutral
condition.
Overall,
we
conclude
strong
empirical
support
for
Kruger’s
above-and-below-average
effects,
with
boundary
conditions
laid
out
in
the
extensions
expanding
both
generalizability
and
robustness
of
the
phenomenon.
Keywords:
above-average
effect,
below-average
effect,
bias,
anchoring,
egocentrism
∗
Co-first-author.
Department
of
Health
and
Functioning,
Western
Norway
University
of
Applied
Sciences,
Bergen,
Norway,
https://orcid.org/0000-0002-8113-2560.
†
Co-first-author.
Department
of
Psychology,
University
of
Hong
Kong,
Hong
Kong
SAR.
‡
Corresponding
author.
Department
of
Psychology,
University
of
Hong
Kong,
Hong
Kong
SAR.
https://
orcid.org/0000-0003-2812-6599.
Email:
gfeldman@hku.hk.
We
would
like
to
thank
Leo
Chan
for
reviewing
the
materials
during
an
early
project
stage
and
Raj
Aiyer,
Hirotaka
Imada,
Matan
Mazor,
Nicole
Russel,
Burak
Tunca,
and
Meng-Yun
Wang
for
reviewing
the
manuscript
prior
to
submission.
Their
work
led
to
many
helpful
comments,
which
improved
the
project
output
substantially.
We
would
also
like
to
thank
Prasad
Chandrashekar
for
his
help
with
mixed
modelling.
All
materials,
data,
and
code
are
available
in
the
OSF
supplement
at
https://osf.io/7yfkc/.
Copyright:
©
2022.
The
authors
license
this
article
under
the
terms
of
the
Creative
Commons
Attribution
3.0
License.
449
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
1
Introduction
1.1
Background
The
above-average
effect
refers
to
the
tendency
to
perceive
oneself
as
better
than
the
average
person
across
different
aspects.
Kruger
(1999)
was
the
first
to
present
instances
of
the
opposite
–
a
below-average
effect
–
the
tendency
to
view
oneself
as
worse
than
the
average
person,
and
he
proposed
that
this
opposing
effect
depends
on
the
difficulty
of
the
ability
domain.
The
above-average
effect
was
observed
when
self-perceived
skills
in
an
ability
domain
were
high,
whereas
the
below-average
effect
occurred
when
self-perceived
skills
were
low.
Hence,
Kruger
identified
the
two
effects’
underlying
mechanism
to
be
the
egocentric
nature
of
comparative
ability
judgments
and
suggested
an
anchoring-and-
adjustment
account.
Individuals
anchor
onto
their
own
skills
and
then
adjust
away
from
their
own
anchor
when
judging
the
skill
of
others.
Therefore,
when
considering
easy
activities,
people
perceive
their
ability/skill
as
high
and
display
the
above-average
effect,
thus
failing
to
account
for
the
“true”
distribution
curve
of
such
abilities/skills
which
includes
others
who
are
also
highly
skilled.
When
activities
are
difficult
and
hence
absolute
domain
ability
is
generally
low,
a
below-average
effect
results
from
the
failure
to
consider
that
others
are
also
not
highly
skilled.
This
result
was
first
operationalized
in
Study
1
in
Kruger
(1999)
using
a
questionnaire
in
which
participants
first
compared
themselves
with
their
peers
on
four
relatively
easy
and
four
relatively
difficult
ability
domains
(or
activities).
Participants
then
answered
a
series
of
questions
concerning:
1)
estimates
of
their
own
and
classmates’
absolute
abilities
(termed
“comparative
ability”);
2)
desirability;
3)
ambiguity
of
each
ability;
and
4)
past
experience
of
each
ability.
A
strong
negative
correlation
between
domain
difficulty
and
participants’
comparative
ability
judgments
supported
both
above
and
below-average
effects
(Kruger,
1999).
The
study
demonstrated
correlational
evidence
for
the
egocentric
nature
of
compar-
ative
ability
judgments,
in
the
form
of
a
strong
positive
correlation
between
participants’
ratings
of
their
own
and
their
comparative
abilities.
For
all
ability
domains,
participant
judg-
ments
of
their
own
absolute
abilities
better
predicted
their
comparative
ability
judgments
than
did
participants’
judgments
of
their
peers’
skills.
Additional
experimental
studies
(2
and
3
in
Kruger,
1999)
used
a
situation
in
which
participants
received
either
a
very
easy
or
a
difficult
test,
leading
to
similar
results
as
in
Study
1.
The
anchoring-and-adjustment
account
was
deemed
consistent
with
the
fact
that
cognitive
load
increased
bias
during
comparative
ability
judgments.
We
conducted
a
close
replication
and
extensions
of
Kruger
(1999)
with
two
main
goals;
1)
test
the
robustness
of
above-
and
below-average
effects,
and
2)
examine
extensions
to
test
whether
ambiguities
regarding
domain
difficulty
may
moderate
this
effect.
Two
between-subject
conditions
were
added
to
the
original
design
to
test
whether
an
easier
or
more
difficult
version
of
Kruger’s
original
ability
domains
would
moderate
the
effects.
Furthermore,
we
added
an
additional
dependent
variable
to
assess
the
phenomenon
using
450
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
ratings
of
perceived
domain
difficulty
more
directly.
We
begin
by
introducing
the
literature
on
above-and-below-average
effects
and
the
choice
of
target
article
for
replication,
then
provide
information
on
the
original
findings,
and
outline
our
added
extensions.
1.2
Above-and-below-average
effects
In
the
1980s,
researchers
began
to
assess
subjects’
self-evaluations
in
relationship
to
their
peers
with
the
results
showing
over-estimations
of
own
chances
for
positive
outcomes
compared
to
the
average
population
(e.g.,
Weinstein,
1980,
1983).
Focusing
on
comparisons
with
others,
the
phenomenon
became
later
known
as
above
or
better-than-average
effect
(Kruger,
1999).
Research
picked
up
quickly
on
the
above-average
effect,
testing
boundary
conditions
such
as
culture
(Heine
&
Lehman,
1997)
or
self-appraisal
(Wilson
&
Ross,
2001).
Kruger
(1999)
was
the
first
to
add
that
there
is
not
only
an
above-
but
also
below-average
effect.
1.2.1
Underlying
mechanisms
Throughout
the
last
decades,
a
range
of
different
underlying
mechanisms
was
proposed
to
explain
the
above-average
effect
(less
research
focused
on
the
below-average
effect),
such
as
informational
differences
(i.e.,
knowing
more
about
oneself
than
others),
focalism
(i.e.,
focussing
on
oneself
during
comparative
judgments),
naïve
realism,
and
egocentrism
(Brown,
2012).
The
final
mechanism
was
also
used
in
the
chosen
study
for
replication
(Kruger,
1999);
when
people
assess
how
they
compare
with
their
peers,
they
may
focus
egocentrically
on
their
own
skills
and
insufficiently
account
for
the
skills
of
the
comparison
group.
However,
Kruger
(1999)
reported
not
only
an
above-average
effect,
but
also
a
below-average
effect,
both
explained
by
egocentrism.
1.2.2
Theoretical
grounding
Originally,
the
above-average
effect
has
been
described
as
motivated
by
self-enhancement
needs
(i.e.,
to
induce
positive
affect
towards
oneself)
or
a
byproduct
of
motivated
reasoning
(Alicke,
1985;
Brown,
1986;
Kunda,
1990;
Taylor
&
Brown,
1988).
Self-enhancement
enables
the
maintenance
of
a
global
self-concept
allowing
for
both
positive
attributes
un-
der
personal
control
and
negative
attributes
resulting
from
factors
beyond
personal
control
(Alicke,
1985).1
Self-verification
can
be
used
as
another
explanation
for
the
above-average
effect
(Zell
et
al.,
2020).
Expanding
on
self-enhancement,
the
self-verification
theory
de-
scribes
that
both
self-enhancement
and
exposure
to
information
which
creates
and
strength-
ens
a
biased
view
of
oneself
can
lead
to
phenomena
such
as
the
above-and-below-average
1See
Ziano
et
al.
(2021)
for
a
recent
successful
direct
replication
of
Alicke
(1985),
showing
that
people
rate
more
desirable
traits
to
be
more
descriptive
of
themselves
than
of
others,
and
extending
that
the
effect
was
stronger
for
more
controllable
traits.
This
study
was
different
from
the
current
work
as
it
focused
on
traits
whereas
the
focus
here
is
on
skills.
451
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
effects
(Zell
et
al.,
2020).
In
that
sense,
higher
self-esteem
has
been
linked
with
stronger
above-average
effects
(e.g.,
Bosson
et
al.,
2000;
Chung
et
al.,
2016).
Support
for
the
mo-
tivational
perspective
and
the
ubiquity
of
the
above-average
effect
was
provided
by
those
objectively
being
below-average
in
certain
characteristics
displaying
the
above-average
ef-
fect
(e.g.,
Sedikides
et
al.,
2014).
For
instance,
prisoners
comparing
themselves
with
non-prisoners
on
pro-social
characteristics
rated
themselves
as
above-average
in
most
char-
acteristics
(Sedikides
et
al.,
2014).
Another
explanation
can
be
found
in
social
comparisons
during
which
people
evaluate
their
social
position
compared
to
relevant
peers
–
with
the
tendency
of
positioning
oneself
as
higher-standing
(Gerber
et
al.,
2018).
An
example
of
both
effects
applying
during
social
comparisons
is
when
Democrats
and
Republicans
compare
their
own
warmth
and
competency
with
the
average
person
of
their
in-
and
out-
group
(Eriksson
&
Funcke,
2013).
In-group
comparisons
lead
to
below-average
ratings
for
warmth
among
Democrats
and
above-average
effects
among
Republicans,
which
reversed
for
outgroup
comparisons
(Eriksson
&
Funcke,
2013).
Above-and-below-average
effects
have
also
been
found
to
vary
across
ages,
with
egocentrism
accounting
for
age
differences
(Zell
&
Alicke,
2011).
Young,
middle-aged,
and
older
adults
displayed
an
above-average
effect
for
most
ability
and
trait
dimensions,
whereas
a
below-average
effect
was
observed
for
older
adults
with
clear
deficiencies
(Zell
&
Alicke,
2011).
1.2.3
Follow-up
research
Due
to
the
large
number
of
citations
of
Kruger’s
(1999)
findings,
it
is
difficult
to
generalize
the
publication’s
impact.
However,
focusing
on
follow-up
research
on
the
above
and
below-average
effects’,
more
recent
studies
provided
information
about
the
effects’
wide
applicability
and
boundary
conditions,
with
a
large
body
of
work
supporting
the
original
findings
(e.g.,
Aucote
&
Gold,
2005;
Burson
et
al.,
2006;
Johansson
&
Allwood,
2007;
Sweeny
&
Shepperd,
2007).
For
example,
building
on
the
original
findings,
Giladi
and
Klar
(2002)
demonstrated
that
individual
items
within
a
positive
group
tend
to
be
rated
as
above-
average
and
individual
items
within
a
negative
group
tend
to
be
rated
as
below-average.
These
effects
can
be
reversed
depending
on
the
timing
of
the
denotation
of
the
target
item,
which
affects
the
direction
and
size
of
the
comparative
biases
(Windschitl
et
al.,
2008b).
Much
subsequent
research
also
continued
to
explore
underlying
mechanisms,
such
as
motivations
and
debiasing
factors
influencing
egocentrically
biased
comparative
judgments.
Epley
and
Caruso
(2004)
discussed
how
unconscious,
automatic
features
of
human
judgment
result
in
egocentric
judgments
that
appear
objective
to
the
judges
themselves.
Windschitl
et
al.’s
(2008a)
experiments
attempting
to
debias
over-optimism
for
easy
tasks
and
under-
optimism
for
hard
tasks
through
feedback
was
only
successful
under
restrictive
conditions.
Yet,
their
results
support
the
pervasiveness
of
egocentric
biases
as
participants
failed
to
generalize
non-egocentric
tendencies
to
new
contexts.
452
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
1.3
Choice
of
study
for
replication
Kruger’s
(1999)
work
made
an
important
contribution
to
the
field
by
introducing
the
below-
average
effect
and
conditions
in
which
occurs,
which
adds
to
the
understanding
of
a
highly
prevalent
effect
with
importance
to
daily
reasoning.
A
recent
meta-analysis
of
better-than-
average-effect
studies
found
the
effect
to
be
robust
across
studies,
yet,
with
the
effect
being
smaller
for
abilities
compared
to
personality
traits
(Zell
et
al.,
2020).
Problematically,
definitions
and
measurement
of
skill
are
incongruent
which
leads
to
biased
assessment
and
operationalizations
differ
strongly
between
studies
testing
above-and-below
average-
effects,
generally
(Zell
et
al.,
2020),
and
in
specific
contexts
such
as
drivers’
overconfidence
in
their
driving
skills
(Sundström,
2008).
Hence,
despite
the
prolific
literature
that
followed,
the
above-average
effect’s
robustness
has
been
repeatedly
called
into
question
(Sundström,
2008;
Zell
et
al.,
2020).
However,
some
studies
failed
to
conceptually
replicate
mechanisms
and
boundary
con-
ditions
originally
reported
by
Kruger,
such
as
the
relationship
of
estimates
about
others
in
relationship
to
estimates
about
oneself.
For
example,
Moore
and
Kim
(2003)
found
mixed
evidence
for
the
relationship
between
comparative
ability
and
the
evaluations
of
others’
ability.
This
was
also
shown
in
a
practical
context
by
Walsh
and
Ayton
(2009).
After
presenting
an
imaginary
scenario
in
which
a
doctor
provides
information
about
a
serious
diagnosis
applying
to
the
participant
and
how
that
affects
others’,
own
happiness
estimates
by
participants
were
indeed
influenced
by
information
about
others’
happiness.
We
chose
Kruger’s
(1999)
study
for
replication
based
on
the
following
factors:
impact,
open
questions
about
boundary
conditions
of
the
above
and
below-average
effects,
and
absence
of
direct
replications.
To
the
best
of
our
knowledge,
no
direct
replications
of
Kruger
(1999)
have
been
publshed.
Yet,
the
article
has
had
a
significant
impact
on
several
scientific
and
practical
fields,
including
management
(Bazerman
&
Moore,
2012),
economy
(DellaVigna,
2009;
Koellinger
et
al.,
2007),
medicine
(Stewart
et
al.,
2013),
education,
or
the
workplace
in
general
(Dunning
et
al.,
2004).
At
the
time
of
writing
(May
2021),
there
were
1178
Google
Scholar
citations
of
the
article
and
many
important
follow-up
theoretical
and
empirical
articles
(Chambers
&
Windschitl,
2004;
Moore,
2007;
Moore
&
Cain,
2007;
Moore
&
Small,
2007;
Whillans
et
al.,
2020;
Windschitl
et
al.,
2008b).
We
chose
Study
1,
as
it
was
the
first
demonstration
of
the
core
phenomenon.
We
aimed
to
revisit
this
classic
phenomenon
in
a
well-powered
preregistered
close
replication
(e.g.,
Brandt
et
al.,
2014).
1.4
Original
hypotheses
in
target
article
In
the
original
study,
participants
compared
themselves
to
their
peers
on
eight
ability
domains
of
varying
difficulty.
Kruger
proposed
that
(H
orig1
:)
compared
to
judgments
of
their
peers’
abilities,
people’s
judgments
of
their
own
abilities
account
for
more
variance
in
their
comparative
ability
judgments.
453
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Past
research
on
reasons
for
people’s
tendency
to
focus
on
their
own
ability
when
comparing
themselves
to
others
offers
insight
on
why
comparative
ability
judgments
are
egocentric
in
nature.
One’s
own
skills
are
more
likely
to
be
assessed
first
when
comparing
the
self
to
others
(Srull
&
Gaelick,
1983),
are
easier
to
conceptualize
than
skills
of
the
average
person
(Higgins
et
al.,
1982;
Higgins
&
Bargh,
1987;
Srull
&
Gaelick,
1983),
and
have
a
larger
database
to
refer
to
than
others’
skills
(Ross
&
Sicoly,
1979).
These
explanations
formed
the
basis
of
Kruger’s
primary
hypothesis.
When
comparing
one’s
own
ability
to
peers’
ability,
assessments
are
predominantly
based
on
the
perception
of
one’s
own
skills
and
less
on
the
perceptions
of
peers’
skills,
and
therefore,
perceptions
of
one’s
own
absolute
ability
better
predict
comparative
ability
judgments.
Based
on
that,
Kruger
proposed
that
(H
orig2
:)
people
tend
to
perceive
themselves
as
above
average
when
considering
easy
abilities,
and
that
(H
orig3
:)
people
tend
to
perceive
them-
selves
as
below
average
when
considering
difficult
abilities.
We
merged
the
dichotomized
hypotheses
to
propose
that
the
more
difficult
the
ability
domain
is
perceived
to
be,
the
more
likely
a
person
is
to
shift
from
perceiving
oneself
as
above
average
to
perceiving
oneself
as
below
average.
1.5
Original
findings
in
target
article
Kruger
(1999)
used
a
combination
of
correlational
studies,
one-sample
t-tests,
and
multiple
regression
and
found
support
for
all
hypotheses
(Table
1).
Above
and
below-average
effects
were
prevalent
for
all
but
one
difficult
item:
telling
jokes.
He
observed
an
inverse
association
between
the
domain
difficulty
and
comparative
ability:
as
ability
domains
increased
in
difficulty,
the
perception
of
their
comparative
ability
decreased.
Participants
believed
to
be
above
average
for
easy
abilities
and
below
average
for
difficult
abilities.
To
examine
the
relationship
between
one’s
own
absolute
ability
and
comparative
ability
judgments,
we
conducted
multiple
regressions
predicting
comparative
ability
from
their
own
ability,
and
others’
ability
for
each
of
the
eight
abilities.
Participants’
perception
of
their
own
ability
better
predicted
their
comparative
ability
judgments.
Participants
anchored
onto
their
own
absolute
ability,
as
opposed
to
their
peers’
absolute
ability
when
comparing
themselves
to
others
across
ability
domains.
Here
we
summarize
effect
sizes
and
power
analysis
for
the
original
study
results
in
the
sections
“effect
size
calculations
of
the
original
study
effects”
and
“power
analysis
of
original
study
effect
to
assess
required
sample
for
replication”
in
the
OSF
supplement.
1.6
Extensions
to
the
Original
Study
Design
1.6.1
Extension
1:
Manipulating
domain
difficulty
We
aimed
to
extend
the
replication
study
by
considering
the
ambiguities
in
the
definitions
of
easy
and
difficult
used
in
the
domains
of
the
original
study.
The
ability
domains
in
the
target
article
were
only
succinctly
described
(see
Table
2).
Each
ability
domain
may
connote
454
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Table
1:
Kruger’s
(1999)
findings:
Mean
comparative
ability
estimates
and
judgmental
weight
of
own
and
peers’
abilities.
Ability
Domain
difficulty
1
Comparative
Judgmental
weight
Judgmental
weight
ability
2
of
Own
ability
3
of
Others’
ability
3
Easy
Using
a
mouse
Driving
Riding
a
bicycle
Saving
money
3.1
3.6
3.9
4.3
58.8
∗∗
65.4
∗∗∗∗
64.0
∗∗∗∗
61.5
∗∗
0.21
.89
∗∗∗∗
.61
∗∗∗∗
.90
∗∗∗∗
0.06
–.25
∗
–0.02
–.25
∗∗∗
Difficult
Telling
jokes
Playing
chess
Juggling
Programming
6.1
7.1
8.3
8.7
46.5
27.8
∗∗∗∗
26.5
∗∗∗∗
24.8
∗∗∗∗
.91
∗∗∗∗
.96
∗∗∗∗
.89
∗∗∗∗
.85
∗∗∗∗
–0.03
–.22
∗∗
–0.16
–0.1
1
Higher
numbers
reflect
greater
difficulty.
2
Mean
percentile
estimates
above
50
reflect
an
above-average
effect,
estimates
below
50
reflect
a
below-average
effect.
3
Standardised
betas
from
multiple
regressions
predicting
participants’
comparative
ability
(percentile)
estimates
from
their
estimates
of
their
own
absolute
ability
and
the
absolute
ability
of
their
peers,
respectively.
∗
p
<
.05.
∗∗
p
<
.01.
***
p<
.001.
∗∗∗∗
p
<
.0001.
different
meanings,
depending
on
how
participants
interpret
the
domains.
For
instance,
the
ability
“saving
money”
was
categorized
as
an
easy
ability.
Yet,
the
amount
of
money
saved
was
not
specified,
and
that
may
matter
for
perceived
difficulty,
as
saving
3%
of
income
per
month
is
likely
to
be
perceived
as
easier
than
saving
20%
of
income
per
month.
Therefore,
we
manipulated
domain
difficulty.
In
our
replication,
we
randomly
assigned
participants
to
one
of
the
three
conditions
receiving
different
definitions
of
the
ability
domains,
either:
1)
original
domain
condition
(replication);
2)
easy
domain
condition
(extension)
with
an
easy
reinterpretation
of
the
original
domains;
or
3)
difficult
domain
condition
(extension)
with
a
difficult
reinterpretation
of
the
original
domains
(Table
2).
For
the
two
extension
groups,
the
extension
domains
aim
to
be
specifically
defined
in
measurable
terms.
More
context
is
provided
for
the
domains
to
be
more
specific,
such
as
the
hand
used
(dominant
versus
non-dominant
hand)
for
using
a
mouse,
the
location
and
type
of
car
(home
country
and
automatic
gear
car
versus
foreign
country
and
manual
gear
car)
for
driving,
and
the
help
received
for
computer
programming
(someone
very
knowledgeable
versus
someone
not
very
knowledgeable),
which
is
an
ability
domain
most
participants
may
not
have
experience
with.
Additionally,
an
objective
measure
should
be
455
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Table
2:
Extension:
Manipulation
of
perceived
domain
difficulty
in
target’s
domains.
Original
domain
group
Easy
domain
group
(replication)
(extension)
Easy
domains
Using
a
mouse
Driving
Riding
a
bicycle
Saving
money
Difficult
domains
Telling
jokes
Playing
chess
Juggling
Programming
Difficult
domain
group
(extension)
Using
a
mouse
with
your
dominant
hand
Driving
a
car
with
automatic
gear
in
your
home
country
Using
a
mouse
with
your
non-dominant
hand
Driving
a
car
with
manual
gear
in
a
foreign
country
where
people
drive
on
the
opposite
side
of
the
road
Riding
a
bicycle
for
10
minutes
Riding
a
bicycle
for
an
hour
up
on
a
flat
road
a
road
with
an
upwards
incline
slope
Saving
3%
of
your
income
each
Saving
20%
of
your
income
month
each
month
Telling
a
joke
to
one
person
Telling
a
joke
in
front
of
a
live
you
know
well
(e.g.,
friend,
audience
in
an
improv
stand-up
family
member,
etc.)
comedy
club
Win
a
game
of
chess
against
an
Win
a
game
of
chess
against
an
AI
(computer)
in
advanced
AI
(computer)
in
beginners’
mode
mode
Juggling
2
balls
Juggling
4
balls
Programming
guided
by
Programming
guided
by
someone
very
knowledgeable
someone
not
knowledgeable
in
in
programming
programming
quantitatively
determined
in
units
that
can
be
measured
(e.g.,
length
of
time,
amount
of
money)
or
counted
(e.g.,
number
of
people;
Roth
et
al.,
2008).
Therefore,
the
extension
domains
also
use
criteria
such
as
time
(10
minutes
versus
1
hour),
number
of
people
(one
person
versus
a
live
audience
in
an
improv
stand-up
comedy
club),
and
difficulty
(beginner
mode
versus
advanced
mode).
1.6.2
Extension
2:
Measuring
domain
difficulty
For
the
second
extension,
we
added
an
additional
dependent
variable
measuring
domain
difficulty.
In
the
original
study,
domain
difficulty
was
determined
in
a
pretest
by
a
separate
group
of
participants
(n
=
39).
They
rated
their
absolute
ability
–
the
extent
of
how
skilled
they
are
–
on
the
eight
abilities
on
a
10-point
scale
(higher
number
indicates
higher
456
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
skill
level):
“For
this
ability,
please
rate
your
own
ability
from
1
(very
unskilled)
to
10
(very
skilled)“.
The
ratings
were
then
reverse-scored
and
higher
numbers
indicated
greater
domain
difficulty.
The
four
ability
domains
lower
than
the
midpoint
of
the
scale
were
categorized
as
easy
domains,
whereas
the
four
ability
domains
higher
than
the
midpoint
of
the
scale
were
categorized
as
difficult
domains.
Due
to
problems
associated
with
categorizing
the
continuous
variable
of
the
difficulty
level
of
ability
domains
into
easy
domains
or
difficult
domains,
in
the
current
replication,
we
measured
domain
difficulty
on
a
continuous
scale:
“Please
rate
the
difficulty
of
this
ability
from
1
(very
easy)
to
10
(very
difficult)”.
Details
on
the
adjustment
can
be
found
in
the
section
below
“adjustments
to
the
original
study”.
In
contrast
to
the
original
study,
domain
difficulty
ratings
were
scored
on
a
similar
scale
as
comparative
ability,
(own
and
others’)
comparative
ability,
desirability,
and
ambiguity.
We
examined
difficulty
ratings
across
all
domains
to
assess
whether
perceived
difficulty
was
as
expected
in
the
original
and
conditions
in
which
difficulty
was
manipulated.
For
the
easy
domain
condition,
we
hypothesized
that
easy
interpretations
of
the
original
domains
would
result
in
lower
domain
difficulty
ratings
across
all
abilities
compared
to
ratings
of
the
original
domain
group.
For
the
difficult
domain
condition,
we
hypothesized
that
difficult
interpretations
of
the
original
domains
would
result
in
higher
domain
difficulty
ratings
across
all
abilities
compared
to
original
domain
group
ratings.
We
expected
the
ambiguity
ratings
for
both
easy
and
difficult
conditions
to
be
lower
than
that
in
the
original’s
domains.
Additionally,
we
tested
whether
comparative
ability
would
be
influenced
by
our
easy/difficult
manipulations.2
1.7
Hypotheses
Based
on
the
original
study
and
the
current
extension
hypotheses,
this
replication
aims
to
test
four
central
hypotheses
(Table
3).
1.8
Adjustments
to
the
original
study
In
the
original
study,
the
eight
ability
domains
were
divided
into
two
categories:
four
easy
domains
and
four
difficult
domains.
On
a
10-point
scale
from
very
easy
to
very
difficult,
easy
domains
had
domain
difficulty
ratings
below
5
(the
midpoint
of
the
scale),
and
difficult
domains
above
5,
respectively.
The
above-average
effect
was
tested
for
the
easy
domains,
whereas
the
below-average
effect
was
tested
for
the
difficult
domains.
Yet,
several
issues
may
arise
from
treating
continuous
variables
as
categorical.
First,
the
categorization
of
continuous
variables,
especially
dichotomization
of
placing
variables
into
two
groups,
might
lead
to
misclassifications,
loss
of
information
and
power
(Naggara
2Although
this
test
was
the
reason
for
the
preregistration,
due
to
an
error,
neither
hypotheses
or
tests
related
to
the
core
questions
of
the
extensions
were
part
of
the
preregistration.
Hence,
analyses
connected
to
this
question
in
the
extension
will
be
treated
as
exploratory.
457
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Table
3:
Summary
of
the
hypotheses.
Hypothesis
Statement
H1
Compared
to
judgments
of
others’
Own
absolute
abilities,
participant
judgments
of
ability;
others’
absolute
ability;
their
own
abilities
better
predict
comparative
their
comparative
ability
ability
judgments.
Replication
and
extension
conditions
Comparative
ability;
domain
difficulty;
desirability;
ambiguity
Replication
and
extension
conditions
(Original)
H2
Variables
The
more
difficult
the
ability
domain,
the
more
likely
a
person
is
to
shift
from
perceiving
oneself
as
above
average
to
perceiving
oneself
as
below
average.
(Original
reframed)
Domain
H3
Compared
to
the
replication
difficulty;
(Extension)
condition
participants,
the
easy
ambiguity
domain
condition
participants
assign
lower
domain
difficulty
and
ambiguity
ratings
to
abilities.
Domain
H4
Compared
to
the
replication
difficulty;
(Extension)
condition,
the
difficult
domain
ambiguity
condition
participants
assign
higher
domain
difficulty
and
lower
ambiguity
ratings
to
abilities.
Conditions
Replication
and
easy
domain
conditions
Replication
and
difficult
domain
conditions
et
al.,
2011).
Second,
the
loss
of
power
by
dichotomizing
variables
at
the
median
is
equal
to
discarding
one-third
of
the
data
(Cohen,
1983;
MacCallum
et
al.,
2002).
Third,
variation
between
categorized
groups
may
be
underestimated
as
close
response
scores
divided
into
different
groups
are
defined
as
being
very
different
instead
of
very
similar.
It
has
thus
been
suggested
to
keep
variables
continuous
using
methods
such
as
linear
regressions
instead
of
t-tests
(Altman
&
Royston,
2006).
For
the
above
reasons,
we
did
not
assign
ability
domains
to
specific
dichotomic
easy/difficult
categories.
The
above-
and
below-average-effects
were
tested
on
a
continuous
scale:
instead
of
using
one-sample
t-tests,
correlations
were
used
to
test
the
relationship
between
domain
difficulty
and
comparative
ability
in
three
different
ways:
item-wise,
com-
piled
items
in
a
vector
(but
not
averaging
across
them),
and
row-wise
averaged
for
the
three
conditions.
Applying
this
method
is
a
more
direct
assessment
of
perceived
difficulty
with
the
same
sample.
For
a
full
overview
of
differences
between
the
current
and
the
original
study
see
the
OSF
supplement,
section
“Comparisons
and
deviations”.
458
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
1.9
Pre-registration
and
open
science
Before
data
collection,
the
experiment
was
pre-registered
(see
the
OSF
supplement).
Pre-
registrations,
power
analyses,
materials,
data,
exclusions,
manipulations,
power
analyses,
and
other
details
and
disclosures
are
available
in
the
OSF
supplement.
Data
collection
was
completed
before
analyses.
2
Method
2.1
Participants
and
power
analyses
We
conducted
power
analyses
in
R
using
the
pwr
package
(Champely
et
al.,
2018).
The
power
analyses
suggested
a
sample
size
of
160
to
be
sufficient
for
reaching
95%
power
with
an
alpha-level
=
.05
assuming
an
effect
size
of
f2
=
0.099
(informed
by
Kruger,
1999)
for
a
2-factor
multiple
linear
regression
analysis
(see
OSF
supplement,
section
“Power
analysis
of
original
study
effect
to
assess
required
sample
for
replication”).
We
tried
to
exceed
this
estimate
(following
replication
recommendation
such
as
Simonsohn,
2015)
and
added
extensions
thereby
leading
to
the
recruitment
of
756
Amazon
MTurkers.
A
total
of
65
participants
failed
to
meet
the
pre-registered
inclusion
criteria
and
were
excluded,
resulting
in
a
total
of
691
included
participants
(see
Table1
in
the
OSF
supplement
for
sample
comparison
and
exclusion
details).
2.2
Design
The
original
study
used
a
within-subject
design
with
one-sample
analyses
conducted
for
each
condition
(easy
versus
difficult
domains),
yet
in
the
current
replication,
we
used
a
3
(between
difficult
conditions:
original,
easy,
difficult)
x
2
(within
difficulty
conditions:
easy,
difficult)
mixed-design.
All
participants
were
presented
with
eight
items
(within-subjects;
see
Table2).
We
used
the
same
methods
as
in
the
original
study
for
within-group
analyses
and
added
additional
analyses
for
the
between-group
comparisons
(see
the
OSF
supplement
for
more
details
and
full
measures).
2.3
Procedure
Participants
were
recruited
through
MTurk
on
TurkPrime/CloudResearch
(Litman
et
al.,
2017)
and
completed
questionnaires
via
a
provided
“Qualtrics”
link
after
giving
consent.
Participants
were
randomly
assigned
to
one
of
three
conditions:
1)
Original
domains
(8
original
domains;
4
easy
and
4
difficult
domains),
2)
Easy
domains
extension
(easy
reinterpretations
of
the
8
original
domains),
or
3)
Difficult
domains
extension
(difficult
reinterpretations
of
the
8
original
domains).
459
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Table
4:
Comparison
of
original
study
and
replication’s
samples.
Kruger
(1999)
Sample
size
37
Geographic
origin
US
American
Gender
8
males,
29
females
Medium
(location)
Questionnaire
(Cornell
University)
Compensation
Course
credit
Year
1999
MTurk
sample
(pre-exclusion)
MTurk
sample
(post-exclusion)
756
691
US
American
US
American
442
males,
307
397
males,
288
females,
7
unspecified
females,
6
unspecified
Computer
(online)
Computer
(online)
Nominal
payment
2020
Nominal
payment
2020
Based
on
the
categorization
in
the
original
study,
of
the
eight
ability
domains,
four
were
categorized
as
easy
and
the
other
four
as
difficult
(see
Table2),
presented
in
randomized
order.
2.4
Measures
The
original
study
had
six
dependent
variables
and
the
current
study
added
an
additional
dependent
variable
of
perceived
domain
difficulty.
Across
all
conditions,
the
dependent
variables
were
measured
as
participant
ratings
for
each
of
the
eight
ability
domains
(Table
2).
We
computed
Cronbach’s
𝛼-scores
for
the
original
and
extension
eight-item
scales,
first
for
all
domains
together,
and
then
divided
using
the
original’s
categorization
of
easy
and
difficult
domains,
being
𝛼
all
>.63,
𝛼
all
>.46,
𝛼
all
>.47
(see
the
OSF
supplement
section
“Reliability
for
domains
across
conditions”).
2.5
Exclusion
criteria
The
following
exclusion
criteria
were
pre-registered:
1)
low
proficiency
of
English
(less
than
5
on
a
scale
of
1
to
7);
2)
not
being
serious
(less
than
4
on
a
scale
of
1
to
5);
3)
correctly
guessing
one
of
the
hypotheses;
4)
having
seen
or
done
the
survey
before;
5)
failure
to
complete
the
survey;
and
6)
not
in
or
from
the
United
States,
to
keep
sample
characteristics
as
close
to
the
original
study
as
possible.
2.6
Evaluation
criteria
for
replication
findings
We
compare
the
replication
effects
with
the
original
effects
in
the
target
article
using
the
criteria
set
by
LeBel
et
al.
(2019)
(See
the
OSF
supplement
sections
“Criteria
for
evaluation
of
replications”
and
“Replication
evaluation”).
460
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
We
categorized
the
current
replication
as
a
“close
replication”
and
provided
details
in
Table5.
Variables
and
questions
were
the
same
as
in
the
original,
with
the
addition
of
extensions
and
adjustments
to
fit
the
MTurk
sample,
instead
of
Cornell
university
students.
3
Results
We
analyzed
the
data
using
R
v3.6.3
(R
Core
Team,
2020),
with
analyses
conducted
both
on
a
participant-
and
an
item-level.
To
allow
for
a
broader
assessment
of
the
data,
we
conducted
preprocessing
by
both
calculating
mean
scores
(Table
6
for
correlation
matrices
for
each
condition),
and
compiling
the
values
for
variables’
eight
items
(abilities)
in
their
raw
form,
resulting
in
8
rows
per
participant
(see
“Correlations
per
condition”
subsection
in
the
OSF
supplement
for
correlation
matrices
for
each
condition).
For
analyses
conducted
on
an
item
level,
participant
ratings
for
each
of
the
eight
abilities
were
examined.
3.1
Domain
difficulty
comparisons
by
conditions
We
conducted
paired-sample
Wilcoxon
tests
comparing
difficulty
ratings
between
the
grouped
4
easy
and
4
difficult
replication/original
and
extension
domain
items
and
found
do-
main
difficulty
ratings
to
be
higher
for
difficult
abilities
across
all
comparisons
(summarized
in
Table7,
ps
<
.001),
supporting
Kruger’s
(1999)
original
categorization.3
Hence,
all
con-
ditions
were
analyzed
as
in
the
original
study,
including
correlations
between
the
variables
across
the
eight
domains,
and
one-sample
Wilcoxon-tests
testing
for
the
above-average
ef-
fect
in
easy
ability
domains
and
the
below-average
effect
in
difficult
ability
domains
(Tables
8.1–8.3
in
the
OSF
supplement).
3.2
Replication:
original
domain
condition
We
conducted
all
analyses
in
this
section
on
the
original
domain
condition
(n
=
240).
3.2.1
H
1
:
Relationship
between
absolute
and
comparative
ability
In
a
linear
regression
model,
own
and
others’
absolute
ability
ratings
predicted
mean
comparative
ability
judgments
(F(2,
237)
=
323.9,
p
<
.001,
R
adj
2
=
.73,
95%
CI
[0.68,
0.79]).4
However,
we
found
support
only
for
participants’
judgments
of
their
own
absolute
ability
as
predictors
of
their
comparative
ability
judgments
(𝛽
=
0.90,
t(239)
=
19.93,
p
<
.001).
On
an
item
level,
we
conducted
multiple
regressions
for
each
of
the
eight
abilities
to
examine
how
participants’
estimates
of
both
own
and
others’
absolute
abilities
predict
comparative
ability
estimates
(see
Table8
for
standardized
betas).
Own
absolute
abilities
3T-statistics
for
the
distinction
of
ability
items
into
easy
and
difficult
were
not
reported
in
Kruger
(1999).
4See
“Additional
Tables
and
Figures”
in
the
OSF
supplement
for
regression
plots
and
tables.
461
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Table
5:
Classification
of
the
replication,
based
on
LeBel
et
al.
(2018).
Design
facet
Replication
IV
opera-
tionalization
DV
opera-
tionalization
IV
stimuli
Same
DV
stimuli
Same
Similar,
with
an
added
extension
Similar,
with
an
added
extension
DV
stimuli
Similar,
with
an
added
extension
Procedural
details
Similar,
with
an
added
extension
Different
Physical
settings
Contextual
variables
Replication
classification
Details
of
deviation
Different
Close
replication
IV1
ability
domains
is
changed
from
one
condition
of
4
easy
and
4
difficult
abilities,
to
3
conditions
of
the
replication
group,
the
easy
domain
group,
and
the
difficult
domain
group.
Participants
were
presented
with
either
the
original
ability
domains,
easy
interpretations
of
the
original
ability
domains,
or
difficult
interpretations
of
the
original
ability
domains.
An
additional
dependent
variable,
DV1
(domain
difficulty),
is
added.
For
DV2
(comparative
ability
judgment),
the
scale
was
changed
from
0–99
to
0–100
for
easier
comprehension.
For
DV2
(comparative
ability
judgment)
and
DV4
(Judgmental
weight
of
others’
absolute
abilities),
the
comparison
group
was
changed
from
“other
students
from
the
course”
to
“other
MTurk
workers”
to
ensure
applicability
for
all
Mturk
participants.
For
DV7
(experience
in
the
ability
domain),
the
scale
used
to
measure
prior
experience
was
unspecified
in
the
original
study.
Similar
to
the
majority
of
other
dependent
variables,
it
is
measured
using
a
scale
of
1
(no
experience
at
all)
to
10
(very
experienced).
Participants
are
all
assigned
to
the
same
condition
in
the
original
study.
In
the
replication,
they
are
randomly
assigned
to
one
of
the
three
conditions.
From
a
questionnaire
to
filling
out
an
online
Qualtrics
survey.
From
Cornell
University
undergraduates
to
American
MTurk
workers
as
participants.
With
two
added
extensions.
were
generally
better
in
explaining
changes
in
comparative
ability
judgments
than
others’
skills,
which
supports
H
1
.
462
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Table
6:
Mean
ratings
across
all
abilities
for
the
three
conditions.
Original
domains
(n
=
240)
Easy
domains
(n
=
225)
Difficult
domains
(n
=
226)
Variable
Mean
SD
Mean
SD
Mean
SD
Mean
domain
difficulty
Mean
comparative
ability
Mean
own
absolute
ability
Mean
others’
absolute
ability
Mean
desirability
Mean
ambiguity
∗
6.05
53.29
6.04
6.22
8.15
3.00
1.15
14.5
1.37
1.14
0.99
1.24
5.22
58.86
6.64
6.59
7.9
2.68
1.63
14.92
1.44
1.33
1.15
1.23
7.39
46.97
4.77
5.06
7.54
2.76
1.19
18.86
2.02
1.79
1.35
1.43
∗
Ambiguity
scores
were
reversed
to
indicate
increasing
ambiguity
from
1
to
10.
Table
7:
Asymptotic
Wilcoxon-Mann-Whitney
tests
comparing
perceived
domain
difficulty
ratings
between
easy
and
difficult
abilities
(within
conditions).
Condition
Original
(replication)
Easy
domain
(extension)
Difficult
domain
(extension)
Effect
Mean
p-value
size
r
difference
95%
CI
T-statistic
df
668.5
238
2.78
<.001
0.82
[0.79,
0.85]
1416
223
1.99
<.001
0.75
[0.69,
0.80]
1917
224
1.22
<.001
0.69
[0.62,
0.75]
For
the
relationship
between
absolute
and
comparative
ability
ratings
across
all
abilities
(240
participants
*
8
items),
we
found
a
strong
relationship
between
comparative
ability
estimates
and
others’
ability
ratings
(r(6)
=
0.94,
p
<
.001,
95%
CI
[0.71,
.99]);
and
between
comparative
ability
estimates
and
own
ability
ratings
(r(6)
=
0.99,
p
<
.001,
95%
CI
[0.96,
.99]).
Hotelling’s
(1940)
t
indicated
these
correlations
to
be
different
from
each
other
(t(5)
=
4.66,
p
=
.006).
3.2.2
H
1
:
Additional
correlation
analyses
for
the
relationship
between
absolute
and
comparative
ability
When
adding
two
modes
of
analysis,
namely,
vector-compiled
scores
and
inventory
mean
scores5,
Pearson’s
rs,
calculated
for
vector-compiled
scores
of
comparative
ability
estimates
5Vector-compiled
scores
were
each
participant
(in
the
replication
condition)
scores
in
all
8
domains
lined
up
in
one
vector
with
8
(domains)
*
240
(participants)
=
1920
rows.
Inventory
mean
scores
were
calculated
by
463
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Table
8:
Replication
condition:
Mean
comparative
ability
estimates
and
judgmental
weight
of
own
versus
peers’
abilities.
Ability
Domain
difficulty1
Percentile
estimate2
Using
mouse
Driving
Riding
bicycle
Saving
money
Telling
jokes
Playing
chess
Juggling
Programming
2.70
(2.63)
5.19
(2.41)
4.14
(2.44)
6.63
(2.08)
6.10
(2.06)
7.74
(1.75)
7.64
(1.97)
8.29
(1.74)
71.2
∗∗∗
(17.90)
65.2
∗∗∗
(22.08)
61.0
∗∗∗
(20.48)
62.9
∗∗∗
(21.10)
52.4
(22.63)
41.0
∗∗∗
(27.00)
32.0
∗∗∗
(27.67)
40.7
∗∗∗
(29.22)
Judgment
weight:
Judgment
weight:
Own
ability3
Others’
ability3
0.29
∗∗∗
0.85
∗∗∗
0.76
∗∗∗
0.79
∗∗∗
0.75
∗∗∗
0.82
∗∗∗
0.59
∗∗∗
0.83
∗∗∗
0.04
–0.11
∗∗
–0.06
–0.05
0.04
–0.03
0.18
∗∗
–0.06
Note:
Tablepresented
as
in
original
study
(Kruger,
1999,
Table
2)
encompassing
descriptive
statistics,
one-sample
t-tests,
and
regressions.
1
Mean
(SD)
scores
for
item-wise
domain
difficulty.
Higher
numbers
reflect
greater
diffi-
culty.
2
Mean
(SD)
scores
for
item-wise
comparative
ability/percentile
estimates.
Scores
above
50
reflect
an
above-average
effect,
estimates
below
50
reflect
a
below-average
effect.
See
supplementary
tables
8.1
and
9.1
for
test
statistics
and
CI’s.
3
Standardised
betas
from
multiple
regressions
predicting
participants’
comparative
ability
(percentile)
estimates
from
own
absolute
ability
and
peers’
absolute
ability,
respectively.
∗∗
p
<
.01.
∗∗∗
p
<
.001.
and
other’s
absolute
ability,
were
r(1918)
=
0.50
(95%
CI
[0.46,
0.53]);
and
between
comparative
ability
estimates
and
own
absolute
ability
were
r(1918)
=
0.81
(95
%
CI
[0.79,
0.82]);
with
these
correlations
being
different
from
each
other
(Hotelling’s
(1940)
t(1917)
=
27.61,
p
<
0.001).
For
inventory
mean
scores,
correlations
between
comparative
ability
estimates
and
other’s
absolute
ability
were
r(238)
=
0.53
(p
<
.001,
95%
CI
[0.43,
0.62]);
and
between
own
and
comparative
ability
r(238)
=
0.85
(
p
<
.001,
95%
CI
[0.82,
0.89]);
with
these
correlations
being
different
from
each
other
(Hotelling’s
t(237)
=
11.75,
p
<
0.001).
However,
when
using
a
mixed-effects
model
with
random
intercepts
at
the
level
of
participants
to
explain
comparative
ability,
positive
changes
in
own
ability
explained
positive
changes
in
comparative
ability
and
the
relationship
between
others’
and
comparative
ability
being
the
opposite
(Table
9).
The
findings
from
both
replicated
and
the
new
analyses
present
strong
support
for
H
1
.
averaging
the
8
domains
for
each
participant
(row-wise),
resulting
in
240
rows.
P-values
for
vector-compiled
scores
correlations
are
not
provided
as
those
do
not
account
for
repeated
responses
of
the
same
person.
464
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Table
9:
Estimated
fixed-effects
coefficients
of
the
mixed-effects
regression
model
with
changes
in
Comparative
Ability
explained
by
Others’
and
Own
Ability.
Predictors
(Intercept)
Own
Ability
Others’
Ability
B
S.E.
CI
p
12.56
7.18
–0.42
1.33
0.16
0.21
[9.95,
15.18]
[6.86,
7.50]
[–0.84,
–0.01]
<
0.001
<
0.001
0.04
Note.
The
table
presents
the
fixed-effects
coefficients
with
all
the
model
predictors.
See
supplementary
section
“Mixed
Models”
for
step-wise
regression
results.
3.2.3
H
2
:
Relationship
between
comparative
ability,
domain
difficulty,
and
desirabil-
ity.
We
conducted
one-sample
t-tests
to
examine
domain-wise
comparative
ability
ratings
using
the
50
th
percentile
estimates
of
comparative
ability
to
classify
above
and
below
average
effects
(as
in
Kruger,
1999).
Similar
as
in
Kruger‘s
(1999)
findings,
participants
indicated
to
be
above-average
for
all
easy
ability
domains
(ps
<
.001)
and
below-average
for
three
of
the
four
difficult
ability
domains
(ps
<
.001;
see
Table
8
column
2
for
descriptive
statistics,
and
tables
8.1
and
9.1
in
the
OSF
supplement
for
test
statistics
and
CI’s).
For
the
above
and
below-average
effects
across
all
abilities,
we
found
a
strong
negative
correlation
between
comparative
ability
estimates
and
domain
difficulty
(r(6)
=
–0.85,
p
=
.0073,
95%
CI
[–
0.97
-0.37]).6
Item-wise
comparative-ability-domain-difficulty
correlations
are
provided
in
the
supplementary
under
‘Replication
condition:
Item-wise
correlations
between
domain
difficulty
and
comparative
ability
ratings
for
each
ability
domain’.
When
comparing
desirability
ratings
between
easy
(M
=
8.731,
SD
=
1.01)
and
difficult
ability
domains
(M
=
7.58,
SD
=
1.40),
a
paired-samples
Wilcoxon
test
revealed
easy
abilities
to
be
more
desirable
(M
difference
=
1.16,
Z(238)
=
9.42,
p
<
.001,
r
=
0.66,
95%
CI
[0.59,
0.73]).
One-sample
Wilcoxon
tests
revealed
that
all
domain-specific
desirability
scores
were
higher
than
the
scale
midpoint
(ps
<
.001;
supplementary
Table
9.4).
That
corresponded
with
a
strong
positive
relationship
between
comparative
ability
and
desirability
(r(6)
=
0.72,
p
=
.0448,
95%
CI
[0.03,
0.95]).
3.2.4
H
2
:
Additional
Analyses
for
the
relationship
between
comparative
ability,
do-
main
difficulty,
and
desirability.
Similarly,
we
found
a
negative
association
between
comparative
ability
and
domain
difficulty
ratings
when
using
vector-compiled
scores
(r(1918)
=
–0.35,
95%
CI
[–0.39,
–0.31]).7
6See
Table
11
in
the
OSF
supplement:
equivalence
tests
1–2.
7See
Table
11
in
the
OSF
supplement:
equivalence
tests
2–3.
The
presented
correlation
on
vector
compiled
scores
is
not
an
optimal
measure
as
these
do
not
account
for
dependence
in
several
measures
provided
by
the
same
individual.
Hence,
p-values
are
not
informative
and
therefore
not
reported.
465
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
However,
when
using
inventory
mean
scores,
opposite
to
the
original
study,
we
found
a
positive
association
between
comparative
ability
and
mean
domain
difficulty
ratings
(r(238)
=
0.16,
p
=
.013,
95%
CI
[0.04,
0.28]).8
As
this
inventory
mean
scores
correlation
did
not
correspond
to
the
other
results,
we
conducted
an
exploratory
analysis9,
revealing
a
small
positive
correlation
between
comparative
ability
and
domain
difficulty
ratings
in
easy
(r(238)=
0.03,
95%
CI
[–0.10,
0.15],
p
=
.70);
and
a
small
negative
correlation
in
difficult
ability
domains
(r(238)=
–0.10,
95%
CI
[–0.23,
0.02],
p
=
.11).
Using
mixed
models
with
random
intercepts
at
the
participant
level,
H
2
was
not
supported
as
difficulty
did
not
predict
changes
in
comparative
ability
(Table
10).
Table
10:
Estimated
fixed-effects
coefficients
of
the
mixed-effects
regression
model
with
changes
in
Comparative
Ability
explained
by
Others’
and
Own
Ability
in
the
Replication
Con-
dition.
Predictors
B
S.E.
CI
p
(Intercept)
Own
Other
Difficulty
Desirability
Ambiguity
8.72
7.07
–0.48
–0.04
0.57
0.12
2.4
0.18
0.21
0.16
0.21
0.17
[4.01,
13.43]
[6.73,
7.42]
[–0.90,
–0.06]
[–0.36,
0.28]
[0.16,
0.99]
[–0.21,
0.45]
<
.001
<
.001
0.025
0.817
0.007
0.48
Note.
The
table
presents
the
fixed-effects
coefficients
with
all
the
model
predictors.
See
supplementary
section
“Mixed
Models”
for
step-wise
regression
results.
The
original
analysis’
methods
provided
support
for
H
2
.
Additionally,
a
Simpson’s
paradox
can
be
observed
when
averaging
all
eight
domains
into
one
score
over
various
manipulated
factors
for
each
participant
and
then
correlating
them.10
3.3
Extension:
Easy
domain
and
difficult
domain
conditions
3.3.1
Comparative
ability
for
easy
and
difficult
items
by
conditions
We
conducted
paired-sample
Wilcoxon
tests
comparing
difficulty
ratings
between
the
easy
and
difficult
replication/original
and
extension
domains
and
found
comparative
ability
to
be
estimated
higher
for
easy
abilities
across
all
comparisons
(summarized
in
Table
7,
all
p
<
.001).
8See
Table
11
in
the
OSF
supplement:
equivalence
tests
4–5.
9Not
included
in
the
preregistration.
10For
an
overview
of
all
correlations
between
mean
scores
across
inventories
for
the
replication
condition
see
Tables
3.1
and
3.2
in
the
OSF
supplement.
466
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
3.3.2
Above
and
below
average
Relationship
between
absolute
and
comparative
ability
We
conducted
multiple
linear
regression
analyses
to
test
how
ratings
of
both
own
and
others’
ability
predict
comparative
ability
judgments
across
all
abilities.
Models
in
both
conditions
predicted
variance
in
comparative
ability
judgments
(F
easy
(2,
222)
=
246.6,
p
<
.001,
R
adj
2
=
.69,
95%
CI
[0.62,
0.76];
and
F
difficult
(2,
223)
=
342.9,
p<.001,
R
adj
2
=
.75,
95%
CI
[0.70,
0.81]).
Yet,
the
only
significant
predictors
of
participants’
own
absolute
ability
were
comparative
ability
judgments
in
both
the
easy
(𝛽
=
0.86,
t(222)
=
17.32,
p
<
.001)
and
the
difficult
domain
condition
(𝛽
=
0.90,
t(223)
=
15.61,
p
<
.001).
Table
11:
Extension
conditions:
Mean
comparative
ability
estimates
and
judgmental
weight
of
own
and
peers’
abilities
by
domain
difficulty.
Easy
domain
condition
Ability
Difficult
domain
condition
Judgmental
weight
Judgmental
weight
Judgmental
weight
Judgmental
weight
of
own
ability
1
of
others’
ability
1
of
own
ability
1
of
others’
ability
1
Using
mouse
Driving
Riding
bicycle
Saving
money
Telling
jokes
Playing
chess
Juggling
Programming
0.48
∗∗∗
0.75
∗∗∗
0.65
∗∗∗
0.81
∗∗∗
0.70
∗∗∗
0.79
∗∗∗
0.78
∗∗∗
0.85
∗∗∗
0.03
–0.1
0.06
–0.03
0.10
∗
0.02
0.05
–0.03
0.58
∗∗∗
0.78
∗∗∗
0.79
∗∗∗
0.78
∗∗∗
0.70
∗∗∗
0.75
∗∗∗
0.68
∗∗∗
0.79
∗∗∗
0.15
∗
–0.02
0.06
–0.07
0.14
∗∗
0.01
0.05
0.03
1
Standardised
betas
(𝛽)
from
multiple
regressions
predicting
participants’
comparative
ability
(percentile)
estimates
from
their
estimates
of
their
own
absolute
ability
and
the
absolute
ability
of
their
peers,
respectively.
∗
p
<
.05.
∗∗
p
<
.01.
∗∗∗
p
<
.001.
Item-wise
multiple
linear
regression
analyses
showed,
consistent
with
the
original
study
and
replication
condition,
that
extension
condition
participants
weighted
own
ability
es-
timates
stronger
than
others’
ability
estimates
when
assessing
their
comparative
abilities
(Table
11).
All
standardized
betas
(𝛽)
of
own
absolute
abilities
were
positive
and
ps
<.001
(for
all
abilities),
whereas
𝛽s
of
others’
absolute
abilities
were
bi-directional
and
smaller.
For
the
easy
domain
condition,
the
correlation
between
own
ability
and
comparative
ability
was
r(6)
=
0.99
(p
<
.001,
95%
CI
[0.97,
0.999]);
and
the
correlation
between
others’
and
comparative
ability
was
r(6)
=
0.96
(p
<
.001,
95%
CI
[0.78,
0.99]);
and
these
correlations
were
different
from
each
other
(Hotelling’s
(1940)
t(5)
=
2.85,
p
=
0.037).
For
the
difficult
domain
condition,
the
correlation
between
own
ability
and
comparative
ability
was
r(6)
=
0.97
(p
<
.001,
95%
CI
[0.85,
0.995]);
and
the
correlation
between
others’
and
467
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
comparative
ability
was
r(6)
=
0.92
(p
=
.001,
95%
CI
[0.60,
0.99]);
with
weaker
support
found
for
these
correlations
as
being
different
from
each
other
(Hotelling’s
t(5)
=
2.24,
p
=
0.075).
3.3.3
Additional
Analyses:
Relationship
between
absolute
and
comparative
ability
The
vector-compiled
score
correlation
for
the
easy
domain
condition
between
own
and
comparative
ability
was
r(1798)
=
0.78
(95%
CI
[0.76,
0.80]);
and
between
others’
and
comparative
ability
was
r(1798)
=
0.47
(95%
CI
[0.43,
0.51]).
For
the
difficult
domain
condition
correlations
between
own
and
comparative
ability
was
r(1806)
=
0.78
(95%
CI
[0.76,
0.80]);
and
between
others’
and
comparative
ability
was
r(1806)
=
0.45
(95%
CI
[0.41,
0.48]).
Additionally,
also
mixed
models
indicated
that
own
ability
was
a
better
predictor
of
comparative
ability
than
others’
ability
(Table
12).
Table
12:
Estimated
fixed-effects
coefficients
of
the
mixed-effects
regression
model
with
changes
in
Comparative
Ability
explained
by
Others’
and
Own
Ability
in
the
Extension
Con-
ditions.
Predictors
B
S.E.
CI
p
Easy
condition
extension
(Intercept)
15.48
Own
6.56
Other
–0.04
1.3
0.16
0.2
[12.94,
18.02]
[6.25,
6.88]
[–0.43,
0.36]
<0.001
<0.001
0.861
Difficult
condition
extension
(Intercept)
16.53
Own
6.4
Other
–0.02
1.49
0.16
0.22
[13.61,
19.44]
[6.09,
6.72]
[–0.44,
0.41]
<0.001
<0.001
0.94
Note.
Fixed-effects
coefficients
with
all
model
predictors.
Participants
represented
the
random
effect.
See
supplementary
section
“Mixed
Models”
for
step-wise
regression
results.
Inventory
mean
score
correlations
for
the
easy
domain
condition
between
own
and
comparative
ability
was
r(223)
=
0.83
(p
<
.001,
95%
CI
[0.78,
0.87]);
and
between
others’
and
comparative
ability
was
r(223)
=
0.52
(p
<
.001,
95%
CI
[0.42,
0.61]).
In
the
difficult
domain
condition
the
correlation
between
own
and
comparative
ability
was
r(224)
=
0.87
(p
<
.001,
95%
CI
[0.83,
0.90]);
and
between
others’
and
comparative
ability
r(224)
=
0.70
(p
<
.001,
95%
CI
[0.62,
0.76]).
468
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Table
13:
Extensions:
Mean
domain
difficulty
and
mean
comparative
ability
estimates
tested
against
the
average
(scale
midpoint).
Easy
domain
condition
Difficult
domain
condition
Ability
Domain
difficulty
Percentile
estimate
1
Domain
difficulty
Percentile
estimate
2
Using
mouse
Driving
Riding
bicycle
Saving
money
Telling
jokes
Playing
chess
Juggling
Programming
3.13
(2.90)
4.58
(2.77)
3.88
(2.77)
5.31
(2.76)
4.64
(2.67)
6.71
(2.56)
6.07
(2.60)
7.46
(2.13)
71.27
(20.51)
∗∗∗
66.32
(22.74)
∗∗∗
65.76
(21.92)
∗∗∗
63.63
(25.55)
∗∗∗
59.74
(20.55)
∗∗∗
47.81
(27.55)
46.76
(27.75)
49.57
(25.71)
5.77
(2.36)
7.16
(2.15)
7.85
(2.12)
6.32
(2.60)
7.67
(1.98)
8.34
(1.89)
7.81
(2.00)
8.15
(1.84)
55.79
(21.12)
∗∗∗
40.63
(29.28)
∗∗∗
48.90
(27.54)
62.68
(25.77)
∗∗∗
40.82
(27.03)
∗∗∗
41.86
(27.22)
∗∗∗
39.67
(27.66)
∗∗∗
45.36
(26.02)
∗∗
*p<.05,
**p<.01,
∗∗∗
p<.001.
Note:
Scores
are
displayed
with
the
following
structure:
Mean
(SD).
1
Scores
above
50
reflect
an
above-average
effect,
estimates
below
50
reflect
a
below-
average
effect.
See
Table
9.2
in
supplementary
for
test
statistics
and
CI’s.
2
See
Table
9.2
in
supplementary
for
test
statistics
and
CI’s.
3.3.4
Relationship
between
domain
difficulty
and
comparative
ability.
As
indicated
above,
one-sample
t-tests
indicated
above-average-effect
for
the
easy
and
below-average
effect
for
the
difficult
condition
(Table
13
for
mean
scores
and
SD’s,and
Ta-
bles
9.2–9.3
in
the
OSF
supplement
for
test
statistics).
However,
the
below-average-effect
was
not
expressed
in
the
easy
extension
condition,
and
the
above-average-effect
was
not
clearly
expressed
in
the
difficult
extension
condition.
Item-wise
correlations
between
com-
parative
ability
and
domain
difficulty
for
each
ability
are
provided
in
the
OSF
supplement
under
‘Extension
conditions:
correlations
between
comparative
ability
and
domain
diffi-
culty
ratings
for
each
ability
domain’.
The
easy
domain
condition
contains
mixed
results
of
medium
to
no
associations
(p
<.936),
whereas
the
difficult
domain
condition
contains
negative
associations
for
all
abilities
(p
<.001).
Congruent
with
original
and
replication
findings,
there
were
negative
relationships
between
domain
difficulty
and
comparative
abil-
ity
in
the
easy
r(6)
=
–0.90
(p
=
.002,
95%
CI
[–0.982,
–0.537])11;
and
difficult
conditions
(r(6)
=
-0.75,
p
=
.033,
95%
CI
[–0.951,
–0.092]).12
11See
OSF
supplement:
equivalence
tests
7–8.
12See
OSF
supplement:
equivalence
tests
9–10.
469
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
3.3.5
Above
and
below
average
Additional
analyses
for
the
relationship
between
domain
difficulty
and
compar-
ative
ability.
Congruent
with
both
original
and
replication
findings,
correlations
between
comparative
ability
and
mean
domain
difficulty
were
negative
for
vector-compiled
score
in
the
easy
(r(1798)
=
–0.27,
95%
CI
[–0.31,
–0.22])
and
difficult
(r(1798)
=
–0.31,
95%
CI
[–0.35,
–0.27])
conditions.
When
averaging
across
the
inventory
(inventory
mean
scores),
this
relationship
changes
to
r(223)
=
0.32
(p
<
.001,
95%
CI
[.19,
.43])
in
the
easy
condition
and
r(223)
=
–0.13
(p
=
.0498,
95%
CI
[–0.26,
–0.0002])
in
the
difficult
condition
–
showing
the
possibility
of
a
Simpson’s
paradox,
just
as
in
the
replication
condition.13
Different
from
the
replication
data,
in
both
easy
and
difficult
conditions,
with
decreasing
difficulty,
comparative
ability
increases
(Table
14).
Table
14:
Estimated
fixed-effects
coefficients
of
the
mixed-effects
regression
model
with
changes
in
Comparative
Ability
explained
by
Others’
and
Own
Ability
in
the
Extension
Con-
ditions.
Predictors
B
S.E.
CI
p
Comparative
Ability
Easy
Condition
(Intercept)
18.6
2.35
Own
6.37
0.18
Other
–0.13
0.21
Difficulty
–0.41
0.16
Desirability
0.15
0.21
Ambiguity
–0.1
0.19
[13.99,
23.21]
[6.02,
6.71]
[–0.54,
0.28]
[–0.72,
–0.11]
[–0.26,
0.56]
[–0.47,
0.27]
<0.001
<0.001
0.546
0.008
0.468
0.6
Comparative
Ability
Difficult
Condition
(Intercept)
25.57
2.66
Own
6.11
0.17
Other
–0.16
0.22
Difficulty
–1.04
0.2
Desirability
0.16
0.2
Ambiguity
–0.17
0.19
[20.36,
30.79]
[5.78,
6.45]
[–0.59,
0.26]
[–1.43,
–0.64]
[–0.22,
0.55]
[–0.55,
0.21]
<0.001
<0.001
0.451
<0.001
0.405
0.37
Note.
The
table
presents
the
fixed-effects
coefficients
with
all
the
model
predictors.
Participants
represented
the
random
effect.
See
supplementary
section
“Mixed
Models”
for
step-wise
regression
results.
13See
OSF
supplement
Tables
5.1,
5.2,
7.1
and
7.2
for
correlations
between
mean
scores
across
inventories
in
the
extension
conditions.
470
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
3.3.6
Above
and
below
average
Comparisons
of
ambiguity
and
difficulty
ratings
between
the
three
conditions
As
parametric
assumptions
were
not
met14,
to
test
whether
different
domain
definitions
from
the
original
domains
would
result
in
different
domain
difficulty
and
ambiguity
ratings,
we
first
conducted
a
Kruskal-Wallis
test
that
showed
differences
in
difficulty
scores
across
conditions
(H(2)
=
237,
p
<
.001,
𝜂
2
=
0.34;
Figure
1).
Supporting
the
first
part
of
H
3–4
,
post-hoc
Bonferroni
corrected
Mann-Whitney
tests
showed
that,
compared
to
the
replication
condition
(Mdn
replication
=
6.00,
M
replication
=
6.05,
SD
=
1.15),
participants
in
the
easy
domain
condition
(Mdn
easy
=
5.00,
M
easy
=
5.22,
SD
=
1.63)
rated
lower
domain
difficulty
(p
<
.001).
Participants
in
the
difficult
domain
condition
(Mdn
difficult
=
7.78,
M
difficult
=
7.39,
SD
=
1.19)
rated
higher
domain
difficulty
than
in
the
other
conditions
(ps
<
.001;
Figure
1A).
We
conducted
a
second
Kruskal-Wallis
test
and
found
differences
in
participants’
ambiguity
ratings
between
the
three
conditions
(H(2)
=
11.47,
p
=
.003,
𝜂
2
=
0.014;
Figure
1B).
As
predicted
in
the
second
part
of
H
3–4
,
post-hoc
Bonferroni
corrected
Mann-Whitney
tests
showed
replication
condition
ambiguity
ratings
(Mdn
replication
=
2.88
M
replication
=
3.00,
SD
=
1.24)
to
be
lower
than
both
the
easy
extension
condition
(Mdn
easy
=
2.38,
M
easy
=
2.68,
SD
=
1.23;
p
adj
=
0.01)
and
the
difficult
extension
condition
ambiguity
ratings
(Mdn
difficult
=
2.38,
M
difficult
=
2.76,
SD
=
1.43;
p
adj
=
0.01).
We
found
no
support
for
differences
between
easy
and
difficult
extension
conditions’
ambiguity
ratings,
(p
adj
≈
1.00).
3.3.7
Relationship
between
comparative
ability,
and
domain
difficulty
and
desirabil-
ity
(examining
H
2
in
the
extension
conditions)
In
the
following
section,
the
easy
(n
=
225)
and
difficult
(n
=
226)
extension
conditions
results
are
analyzed
in
the
same
way
as
reported
above
for
the
replication
condition.
For
the
above-
and
below-average
effects
across
all
abilities,
we
found
a
strong
negative
correlation
between
comparative
ability
estimates
and
domain
difficulty
in
both
extension
conditions
(see
above).
Item-wise
comparative-ability–domain-difficulty
correlations
are
provided
in
the
OSF
supplement
‘Extension
conditions:
correlations
between
comparative
ability
and
domain
difficulty
ratings
for
each
ability
domain’.
When
comparing
desirability
ratings
between
easy
and
difficult
ability
domains
via
Wilcoxon
signed
ranks
test,
in
the
easy
extension
condition
easy
(M
=
4.23,
SD
=
2.13)
abilities
to
be
more
desirable
than
difficult
abilities
(M
=
6.22,
SD
=
1.56;
Z(223)
=
–10.62,
p
<
.001,
r
=
0.75,
95%
CI
[0.70,
0.80]),
as
well
as
in
the
difficult
extension
condition
easy
abilities
(M
=
6.78,
SD
=
1.44),
difficult
(M
=
7.99,
SD
=
1.30;
Z(224)
=
–9.26,
p
<
.001,
r
=
0.69,
95%
CI
[0.62,
0.75]).
One-sample
Wilcoxon
tests
revealed
that
all
domain-specific
desirability
scores
were
higher
than
the
scale
midpoint
(ps
<
.001;
OSF
supplement
Tables
9.5–9-6).
Moreover,
correlations
between
comparative
ability
and
desirability
in
easy
(r(6)
14See
“Statistical
assumptions
and
normality
Tests”
section
in
the
detailed
supplementary
on
OSF
for
parametric
tests.
471
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
A
B
12.5
12.5
10.0
Mean
Ambiguity
Accross
Conditions
10.0
Mean
Difficulty
Accross
Conditions
Above
and
below
average
7.5
5.0
7.5
5.0
2.5
2.5
Replication
Easy
Ext
Difficult
Ext
Replication
Conditions
Easy
Ext
Difficult
Ext
Conditions
Figure
1:
Box
and
violin
plots
of
domain
difficulty
and
ambiguity
ratings
across
replication,
easy
extension,
and
difficult
extension
conditions
with
uncorrected
p-values
for
group-wise
comparisons
and
overall
models.
Panel
A:
Mean
difficulty
across
conditions.
Panel
B:
Mean
ambiguity
across
conditions.
ns
p>.05,
∗
p<.05,
∗∗
p<.01,
∗∗∗
p<.001,
∗∗∗∗
p<.0001.
=
0.66,
p
=
.074,
95%
CI
[–0.08,
0.93])
and
difficult
extension
conditions
(r(6)
=
0.15,
p
=.72,
95%
CI
[–0.62,
0.77])
remain
uncertain.
3.3.8
Extension
H
2
:
Additional
Analyses
for
the
relationship
between
comparative
ability,
and
domain
difficulty
and
desirability
Similarly,
we
found
a
negative
association
between
comparative
ability
and
domain
difficulty
ratings
when
using
vector-compiled
scores
in
the
easy
extension
condition
(r(1798)
=
–0.27,
472
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
95%
CI
[–0.31,
–0.22])15
as
well
as
in
the
difficult
extension
condition
(r(1806)
=
-0.31,
95%
CI
[–0.35,
–0.27])16.
Similar
to
our
findings
for
the
replication
condition,
when
using
inventory
mean
scores,
we
found
a
positive
association
between
comparative
ability
and
mean
domain
difficulty
ratings
in
the
easy
extension
condition
(r(223)
=
0.32,
p
<
.001,
95%
CI
[0.19,
0.43])17
and
a
negative
association
in
the
difficult
extension
condition
(r(223)
=
–0.13,
p
=
.05,
95%
CI
[–0.26,
–0.0002])18.
3.3.9
Exploratory
Analysis:
comparative
ability
across
conditions
In
an
exploratory
analysis
using
a
3
(Condition)
x
2
(Difficulty)
mixed
design,
an
aligned
rank-transform
nonparametric
factorial
ANOVA
showed
both
main
effects
of
condition
(F(2,
1376)
=
47.03,
p
<
.0001,
𝜂
2
G
=
0.064)
and
difficulty
(F(1,
1376)
=
302.17,
p
<
.0001,
𝜂
2
G
=
0.169),
as
well
as
the
interaction
effect
(F(1,
1376)
=
15.23,
p
<
.0001,
𝜂
2
G
=
0.022),
were
significant.19
Post-hoc
multiple
comparisons
revealed
significant
differences
between
all
comparisons
at
Bonferroni
corrected
ps
<.001,
except
the
comparison
between
easy
items
in
replication
compared
to
easy
items
in
the
easy
extension,
difficult
items
in
the
replication
compared
to
difficult
extension,
and
difficult
easy-extension
compared
to
easy
difficult-extension
(as
expected
from
power-simulations),
with
ps
≈
1.00.
3.4
Replication
Evaluation
The
following
section
compares
the
original
study
and
current
replication
based
on
the
replication
evaluation
criteria
by
LeBel
et
al.
(2019).
We
found
clear
support
for
replication
hypotheses
H
1
and
H
2
.
Both
correlations
between
own
absolute
ability
and
comparative
ability
across
all
abilities
displayed
as
conducted
in
the
original
study
and
additional
analyses
detected
strong
effects
in
the
same
direction
as
the
original,
but
we
found
no
support
for
difficulty
as
a
predictor
of
comparative
ability
in
a
mixed-effects
model
using
the
replication
data
(Table
15).
Positive
and
significant
standardized
betas
for
all
own
absolute
abilities,
and
predominantly
negative
and
non-significant
standardized
betas
for
others’
absolute
abilities
were
replicated
(Table
16).
The
strong
evidence
bolsters
Kruger’s
research
on
egocentrism
as
comparative
ability
judgments
are
based
on
participants’
own
levels
of
ability
instead
of
their
perceptions
of
others’
level
of
ability
(Kruger,
1999;
Kruger
&
Burrus,
2004).
An
underlying
mechanism
might
be
focalism,
a
complementary
bias
on
people’s
tendency
to
place
more
judgmental
weight
on
the
target
(self)
and
less
weight
on
the
referent
(others)
15See
OSF
supplement:
equivalence
tests
11–12.
16See
OSF
supplement:
equivalence
tests
13–14.
17See
OSF
supplement
Table
11:
equivalence
tests
15–16.
18See
OSF
supplement
Table
11:
equivalence
tests
17–18.
19As
this
analysis
was
an
oversight
in
our
preregistration,
an
additional
power
simulation
was
executed,
showing
excellent
power
for
observing
main
and
interaction
effects
of
a
3x2
mixed
ANOVA.
See
OSF
supplement
“Power
Simulation
for
Exploratory
Analysis”
for
more
information.
473
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
A
B
Diff
Diff
125
Rep
125
****
****
100
****
●
●
●
Mean
Comparative
Ability
C
Con
Easy
●
Easy
Rep
●
●
Easy
●
Diff
125
****
****
****
ns
****
100
Con
Diff
ns
●
75
Above
and
below
average
●
●
●
75
100
75
●
●
50
50
50
●
●
●
25
25
0
●
Rep
Easy
Condition
Diff
25
0
●
Difficult
Easy
Difficulty
0
Difficult
Easy
Difficulty
Figure
2:
Comparative
ability
across
conditions.
Panel
A.
Mean
easy
and
difficult
mean
com-
parative
ability
ratings
by
condition.
Panel
B.
Mean
comparative
ability
ratings
by
difficulty.
Panel
C.
Mean
easy
and
difficult
mean
comparative
ability
ratings
by
condition
with
SD.
ns
p>.05,
∗
p<.05,
∗∗
p<.01,
∗∗∗
p<.001,
∗∗∗∗
p<.0001.
when
making
direct
comparisons
between
the
two
(Krizan
&
Suls,
2008).
An
alternative
explanation
is
that
people
simply
have
more
information
about
themselves
than
they
do
about
others.
Paired
with
expectations
about
distributions
of
values
of
luck
and
skills,
participants
might
have
rationally
judged,
based
on
their
best
guess,
that
their
own
abilities
are
higher
compared
to
others’
abilities
when
tasks
were
easy
and
vice
versa
when
tasks
were
difficult
(Moore
&
Healy,
2008).
Above
and
below-average
effects
(H
2
)
replicated
with
a
slightly
smaller
effect.
Ad-
ditional
analyses
revealed
a
smaller
effect
in
the
same
direction,
but
when
averaging
the
474
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Table
15:
Comparison
of
correlational
study
effect
sizes
between
the
original
article
and
replication
based
on
criteria
created
by
LeBel
et
al.
(2019).
p
Correlation
coefficient
(r)
and
95%
CI
p
Correlation
coefficient
(r)
and
95%
CI
Variables
(across
all
abilities)
Original
study
Replication
condition
Replication
evaluation
Own
ability
and
comparative
ability
Inventory
mean
and
absolute
own
ability
and
comparative
ability
<.001
r(6)
=
.95
[0.90,
0.97]
<.001
r(6)
=
0.99,
[0.96,
1.00]
Signal-
consistent
/
/
<.001;
<.001
Additional
analyses
Domain
difficulty
and
comparative
ability
Inventory
mean
and
absolute
domain
difficulty
and
comparative
ability
<.001
r(6)
=
–.96,
[–0.98,
–0.92]
0.007
r(238)
=
.85
[0.82,
0.89];
r(1918)
=
0.50,
[0.46,
0.53];
Own
(B
=
7.18)
vs
others’
ability
(B
=
–0.42)
r(6)
=
–0.85,
[–0.97
-0.37]
/
/
.013;
<.001
r(238)
=
0.16
[0.04,
0.28];
r(1918)
=
–0.35,
[–0.39,
–0.31];
Difficulty
as
predictor
of
comparative
ability
B
=
-0.04
Additional
analyses
Signal-
consistent,
smaller
entire
inventory
for
each
participant
and
thereby
reducing
the
variability
in
responses,
a
Simpson’s
paradox
seems
to
occur.
Additionally,
we
found
no
support
for
difficulty
as
a
predictor
of
comparative
ability
in
a
mixed
regression
model
using
the
replication
data,
but
we
found
support
in
both
extensions.
Participants
tended
to
indicate
higher
rather
than
lower
comparative
ability
in
both
the
replication
and
the
easy
conditions,
where
difficulty
ratings
were
normally
distributed.
This
was
not
the
case
for
the
difficult
condition,
where
475
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Table
16:
Comparison
of
mean
comparative
ability
estimates
and
judgmental
weight
of
own
versus
others’
abilities
by
domain
difficulty
between
the
original
study
and
replication
condition.
Original
study
Ability
Replication
condition
Replication
outcome
Judgmental
Judgmental
Judgmental
Judgmental
weight
of
own
weight
of
weight
of
own
weight
of
ability1
others’
ability1
ability1
others’
ability1
Using
mouse
0.21
0.06
0.29
∗∗∗
0.04
Driving
Riding
bicycle
Saving
money
Telling
jokes
Playing
chess
Juggling
Programming
.89
∗∗∗∗
.61
∗∗∗∗
.90
∗∗∗∗
.91
∗∗∗∗
.96
∗∗∗∗
.89
∗∗∗∗
.85
∗∗∗∗
–.25
∗
–0.02
–.25
∗∗∗
–0.03
–.22
∗∗
–0.16
–0.1
0.85
∗∗∗
0.76
∗∗∗
0.79
∗∗∗
0.75
∗∗∗
0.82
∗∗∗
0.59
∗∗∗
0.83
∗∗∗
–0.11
∗∗
–0.06
–0.05
0.04
–0.03
0.18
∗∗
–0.06
Replicated,
own
absolute
abilities
are
all
positive
(same
direction)
and
significant
(all
p
<.001)
Note.
The
original
study
only
provided
the
standardized
betas
and
p-values.
The
transformed
R
2
and
F
2
values
would
only
represent
the
effect
size
of
one
predictor
instead
of
the
overall
regression,
so
only
the
p-values
and
directions
were
compared.
1
Standardised
betas
from
multiple
regressions
predicting
participants’
comparative
ability
(per-
centile)
estimates
from
their
estimates
of
their
own
absolute
ability
and
the
absolute
ability
of
their
peers,
respectively.
∗
p
<
.05.
∗∗
p
<
.01.
∗∗∗
p
<
.001.
∗∗∗∗
p
<
.0001.
difficulty
ratings
were
right-skewed.
In
other
words,
the
Simpson
paradox
was
produced
by
the
above-average-effect
being
stronger
than
the
below-average-effect
in
the
replication
and
the
easy
conditions.
Overall,
this
shows
the
contextual
effects
of
the
inventory’s
difficulty
on
participants’
ratings
of
tasks
difficulty
and
comparative
ability.
Using
both
one-sample
Wilcoxon
and
t-tests,
both
above-and-below-average
effects
replicated
with
smaller
effects,
whereas
above-average
effect
sizes
replicated
closer
to
the
original
study
(Table
17).
De-
spite
smaller
effect
sizes,
the
observed
results
support
above-and-below-average
effects.
The
prevalence
of
the
below-average-effect
also
demonstrates
that
motivated
reasoning
to
476
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
see
oneself
as
superior
fails
to
account
for
certain
situations,
such
as
for
difficult
abilities
in
the
replication.
Table
17:
Comparison
of
one-sample
t-test
effect
sizes
between
the
original
article
and
replication
based
on
criteria
created
by
LeBel
et
al.
(2019).
Cohen’s
d
and
95%
CI
Original
study
(n=37)
Each
easy
ability
Each
difficult
ability
(excluding
telling
jokes)
Replication
outcome
0.90
[0.22,
1.57]
–1.44
[–2.17,
–0.72]
Replication
condition
(n=240)
Easy
abilities
Using
mouse
Driving
Riding
bicycle
Saving
money
Difficult
abilities
Telling
jokes
Playing
chess
Juggling
Programming
1.18
[1.02,
1.35]
0.69
[0.55,
0.83]
0.54
[0.40,
0.67]
0.61
[0.47,
0.75]
Signal-consistent
Signal-consistent,
smaller
Signal-consistent,
smaller
Signal-consistent,
smaller
0.11
[–0.02,
0.23]
No
signal
–0.33
[–0.46,
–0.20]
Signal-consistent,
smaller
–0.65
[–0.79,
–0.51]
Signal-consistent,
smaller
–0.32
[–0.45,
–0.19]
Signal-consistent,
smaller
4
Discussion
We
replicated
and
extended
the
findings
in
Kruger’s
(1999)
Study
1.
Both
the
replication
and
the
extension
results
provide
strong
support
for
above-
and
below-average
effects,
depending
on
difficulty.
In
addition,
we
present
important
boundary
conditions.
First,
above-and-below-average
effects
appear
stronger
the
more
difficult
the
domain
abilities
are
(compare
Tables
8
and
11).
Second,
the
difficulty
of
different
activities
(ability
domains)
might
provoke
or
suppress
below
-or
above-average-effects;
we
observed
a
below-average-
effect
when
the
presented
abilities
were
difficult,
and
vice
versa,
an
above-average
effect
when
the
presented
abilities
were
easy.
In
that
context,
we
observed
an
interaction
effect
between
manipulations
(making
the
original
scale
easier
or
more
difficult)
and
item-group
difficulty
(easy
vs
difficult
items),
looking
at
comparative
ability.
Ambiguity
was
low
across
conditions
with
additional
information
introduced
in
the
extensions
decreasing
ambiguity.
477
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
4.1
Replication
outcomes
Egocentrism
is
a
compelling,
yet
only
one
of
many
explanations
for
above-and-below-
average-effects
(Zell
et
al.,
2020).
Alternatively,
judgments
might
be
rationally
based
on
differential
access
to
information
influencing
predictions
(Moore
&
Small,
2007).
In
other
words,
by
having
more
information
about
the
own
than
others’
performance
in
different
activities,
others’
performance
is
evaluated
less
extremely
than
the
own
performance
(Moore
&
Healy,
2008).
Moreover,
the
replication
advances
our
understanding
of
the
conditions
in
which
the
above
or
below-average
effects
are
more
pronounced,
i.e.,
when
abilities’
difficulty
and
supplied
information
about
them
differ.
It
complements
a
recent
meta-analysis
on
the
above-average-effect
(Zell
et
al.,
2020),
showing
a
larger
effect
when
using
the
direct
(compare
oneself
to
others
on
a
single
scale
with
the
midpoint
defined
as
average)
rather
than
indirect
testing
method
(assess
oneself
and
the
comparison
group
independent
from
each
other,
with
the
average
being
defined
as
the
difference
between
the
two
values).
Fewer
research
center
on
the
below-average-effect,
yet
success
in
replicating
the
effect
suggest
that
the
same
conditions
may
also
be
applicable
in
strengthening
the
below-average
effect.
On
the
other
hand,
the
replication’s
smaller
effect
sizes
challenge
the
influence
of
certain
established
factors
on
the
effects.
For
instance,
people
showed
the
strongest
biases
in
comparative
ability
judgments
when
the
comparison
group
was
abstract
instead
of
concrete,
and
no
specific
information
and
contact
with
the
comparison
group
contributes
to
that
abstractness
(Alicke
et
al.,
1995).
A
notable
discrepancy
between
the
original
and
replication
is
the
comparison
group:
original
study
participants
compared
themselves
to
other
students
from
their
psychology
course,
which
was
much
more
concrete
than
replication
participants
comparing
themselves
to
others
of
the
same
age,
gender,
and
socioeconomic
background.
The
replication’s
smaller
effects
suggest
that
in
contrast
to
past
explanations,
people
may
not
display
tendencies
to
choose
vulnerable
comparison
targets
to
compare
themselves
with
when
given
an
abstract
referent
group
(Chambers
&
Windschitl,
2004).
As
people
display
preferences
in
selecting
representative
targets,
they
might
choose
comparison
targets
of
varying
ability
depending
on
task
difficulty,
and
the
availability
of
information
and
cognitive
resources
(Nisbett
et
al.,
1983).
This
may
have
been
the
case
for
the
current
replication
and
is
a
promising
direction
for
future
research.
4.2
Outcomes
of
the
extensions
to
the
original
study
Both
H
3
and
H
4
were
supported.
We
found
lower
domain
difficulty
ratings
in
the
easy
domain
condition
than
the
replication
condition
(d
=
0.59)
and
higher
domain
difficulty
ratings
in
the
difficult
domain
condition
than
the
replication
condition
(d
=
1.15)
supporting
the
first
part
of
the
extension
hypotheses
(H
3–4
)
on
differences
in
domain
difficulty.
As
interpretations
of
easy
or
difficult
abilities
contribute
to
different
perceptions
of
domain
478
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
difficulty,
the
observed
results
provide
insight
on
how
this
affects
participant
interpretation
of
“average”
ability.
In
a
study
by
Kim
et
al.
(2017),
people
construed
below-median
averages
and
showed
above-average
effects
for
abilities
perceived
as
easy,
and
construed
averages
at
or
above
the
median
for
abilities
perceived
as
difficult.
For
accurate
assessments
of
comparative
ability
judgments,
researchers
not
only
need
to
ascertain
how
people
interpret
“average”
ability,
but
also
place
efforts
in
lowering
variations
in
the
perceived
difficulty
of
abilities.
Hence,
the
original
domain
definitions
may
have
been
open
to
interpretation,
influencing
the
results.
Moreover,
we
found
support
for
the
second
part
of
H
3–4
,
that
ambiguity
was
lower
in
the
replication
conditions.
Eventually,
more
information
provided
might
have
led
to
clarification
and
hence
decreased
perceptions
of
ambiguity.
Previous
research
showed
a
tendency
to
view
oneself
as
above-average
for
ambiguous
abilities
(Dunning
et
al.,
1989),
and
to
select
favorable,
self-serving
definitions
amongst
ambiguous
traits
describing
a
wide
variety
of
behaviors
(Gilovich,
1983;
Kunda,
1987),
which
could
not
be
reflected
from
our
data.
Finally,
comparing
comparative
ability
scores
across
conditions
(replication
vs
extensions)
and
by
the
difficulty
of
the
items
(easy
vs
difficult),
show
an
interaction
effect.
That
indicates
that
both
domain
difficulty
and
ambiguity
might
influence
comparative
ability
ratings
and
thereby
above-and-below-average-effects.
However,
despite
the
presented
extensions
potentially
presenting
the
influence
of
abilities’
difficulty
and
their
definitions’
ambiguity
on
the
effects,
more
research
is
needed
to
address
above-and-below-average-
effects’
boundary
conditions.
4.3
Limitations
and
future
directions
Deviating
from
the
original
study,
in
our
replication
we
measured
the
continuous
rela-
tionship
between
variables
and
analyzed
data
on
participant
and
item
levels.
Moreover,
possible
inferences
from
comparisons
between
added
and
original
study
correlations
be-
tween
domain
difficulty
and
comparative
ability
are
limited.
Our
tests
supported
original
ability
categorizations
as
easy
or
difficult,
all
original
study
tests
(including
one-sample
tests
and
correlations
of
ratings
across
all
abilities)
were
also
carried
out
for
the
replication
condition.
While
we
recommend
future
replications
testing
the
continuous
relationship
between
variables
to
avoid
limitations
in
performing
study
comparisons,
misclassification,
and
issues
in
categorizing
continuous
variables,
we
also
caution
of
low
reliability
when
using
the
presented
scale
and
particularly
the
suggested
(easy
and
difficult
ability)
subscales
(Table
5).
Furthermore,
the
replication’s
ability
domain
definitions
are
all
based
on
Kruger’s
(1999)
original
domains.
Yet,
these
domains
may
not
be
as
accurate
and
widely
applicable
at
present.
For
example,
a
recent
survey
indicated
that
the
easy
ability
“saving
money”
is
challenging
for
the
majority,
with
69%
of
Americans
having
less
than
$1000
in
their
savings
accounts
(Huddleston,
2019).
For
future
tests,
the
current
ability
domains
can
be
updated
and
pretested.
Although
Kim
et
al.
(2017)
found
the
above-average-effect,
most
of
479
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
the
14-items
they
used
were
general
abilities
such
as
written
or
spoken
expression.
More
relevant
and
comprehensive
items
can
be
included
in
future
studies
and
bigger
pretest
samples
(original
study:
n
=
39)
used
to
select
ability
domains
and
validate
the
instrument.
How
do
people
assess
task
difficulty?
This
question
goes
beyond
the
scope
of
the
current
investigation
yet
is
a
critical
open
question
if
difficulty
serves
as
a
moderator
between
the
above
and
below
average
effects.
Difficulty
has
been
described
in
previous
research
to
increase
as
a
function
of
cognitive
and/or
physical
load,
with
those
loads
being
rather
additive
than
interactive
components
in
making
difficulty
(Feghhi
&
Rosenbaum,
2019).
Different
factors
might
be
linked
to
such
perceptions,
such
as
error
probability,
weights
of
errors
(one
error
is
worse
than
another),
attention
demands
or
potentially
a
cost-benefit
calculation
determining
judgments
of
difficulty
(Feghhi
&
Rosenbaum,
2019).
The
underlying
mechanisms
of
task
difficulty
judgments
remain
unclear,
yet
in
our
extension’s
stimuli,
we
attempted
to
embed
quantitative
numerical
information
regarding
load
constructed
to
be
perceived
as
more
and
less
difficult.
We
found
that
these
were
indeed
rated
as
more
and
less
difficult
by
the
participants.
This
allows
for
the
use
of
a
quantifiable
latent
concept
such
as
load
as
a
predictor
of
difficulty.
The
operationalization
of
such
latent
concepts
requires
systematic
testing
in
future
research.
Together
with
many
past
studies,
the
present
replication
only
establishes
the
ubiquity
of
the
above
and
below-average
effects.
Much
less
is
known
about
the
effects’
impacts,
especially
for
the
below-average
effect.
The
directionality
of
the
above-average
effect’s
impacts
is
still
debated.
Tendencies
to
see
oneself
as
better
than
others
can
serve
a
wide
variety
of
affective,
cognitive,
and
social
functions
such
as
temporary
boosts
in
task
per-
formance,
longer
life
expectancy,
and
well-being
(Bopp
et
al.,
2012;
Ehrlinger
&
Dunning,
2003;
Taylor
&
Brown,
1988;
Zell
et
al.,
2020).
But
it
can
also
result
in
harmful
long-term
consequences
of
having
unrealistic
expectations,
heightened
disengagement,
and
decreased
self-esteem
(Polivy
&
Herman,
2000;
Robins
&
Beer,
2001).
In
contrast,
less
research
has
been
conducted
on
the
below-average-effect’s
impacts,
predominantly
focusing
on
its
negative
consequences,
such
as
lower
grades
(Mattern
et
al.,
2010),
or
worse
subjective
well-being
(Goetz
et
al.,
2006).
Other
research
suggested
that
the
below-average-effect
can
also
induce
positive
motivational
and
behavioral
consequences
in
the
long
run
(Whillans
et
al.,
2020).
This
highlights
the
need
for
continued
research
on
the
below-and-above-average-
effects’
consequences.
5
Conclusion
We
closely
replicated
Kruger’s
(1999)
study,
showing
the
above-
and
below-average
effects
to
be
robust.
Manipulating
the
difficulty
of
(easy
and
difficult)
ability
domains
participants
were
to
compare
themselves
with
others,
which
showed
that
easier
items
might
provoke
the
above-average
effect
but
dampen
the
below-average
effect
and
vice
versa
for
more
difficult
items.
480
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
References
Alicke,
M.
D.
(1985).
Global
self-evaluation
as
determined
by
the
desirability
and
con-
trollability
of
trait
adjectives.
Journal
of
Personality
and
Social
Psychology,
49(6),
1621–1630.
http://dx.doi.org/10.1037/0022-3514.49.6.1621.
Alicke,
M.
D.,
Klotz,
M.
L.,
Breitenbecher,
D.
L.,
Yurak,
T.
J.,
&
Vredenburg,
D.
S.
(1995).
Personal
contact,
individuation,
and
the
better-than-average
effect.
Journal
of
Personality
and
Social
Psychology,
68(5),
804–825.
https://psycnet.apa.org/doi/10.1037/0022-3514.
68.5.804.
Altman,
D.
G.,
&
Royston,
P.
(2006).
The
cost
of
dichotomizing
continuous
variables.
BMJ,
332(7549),
1080.
http://dx.doi.org/10.1136/bmj.332.7549.1080.
Aucote,
H.
M.,
&
Gold,
R.
S.
(2005).
Non-equivalence
of
direct
and
indirect
measures
of
unrealistic
optimism.
Psychology,
Health
and
Medicine,
10(2),
194–201.
http://dx.doi.
org/10.1080/13548500512331315443.
Bazerman,
M.
H.,
&
Moore,
D.
A.
(2012).
Judgment
in
Managerial
Decision
Making.
John
Wiley
&
Sons.
Burson,
K.
A.,
Larrick,
R.
P.,
&
Klayman,
J.
(2006).
Skilled
or
unskilled,
but
still
unaware
of
it:
How
perceptions
of
difficulty
drive
miscalibration
in
relative
comparisons.
Journal
of
Personality
and
Social
Psychology,
90(1),
60–77.
http://dx.doi.org/10.1037/0022-
3514.90.1.60.
Bopp,
M.,
Braun,
J.,
Gutzwiller,
F.,
&
Faeh,
D.
(2012).
Health
risk
or
resource?
gradual
and
independent
association
between
self-rated
health
and
mortality
persists
over
30
years.
PLoS
ONE,
7(2),
1-10.
http://dx.doi.org/10.1371/journal.pone.0030795.
Bosson,
J.
K.,
Swann,
W.
B.,
Jr.,
&
Pennebaker,
J.
W.
(2000).
Stalking
the
perfect
measure
of
implicit
self-esteem:
The
blind
men
and
the
elephant
revisited?
Journal
of
Personality
and
Social
Psychology,
79(4),
631–643.
https://dx.doi.org/10.1037/0022-3514.79.4.631.
Brandt,
M.
J.,
Ijzerman,
H.,
Dijksterhuis,
A.,
Farach,
F.,
Geller,
J.,
Giner-Sorolla,
R.,
.
.
.
Veer,
A.
V.
(2014).
The
replication
recipe:
What
makes
for
a
convincing
replication?
Journal
of
Experimental
Social
Psychology,
50,
217-224.
http://dx.doi.org/10.2139/ssrn.
2283856.
Brown,
J.
D.
(1986).
Evaluations
of
self
and
others:
Self-enhancement
biases
in
social
judgments.
Social
Cognition,
4(4),
353-376.
http://dx.doi.org/10.1521/soco.1986.4.4.
353.
Brown,
J.
D.
(2012).
Understanding
the
better
than
average
effect:
Motives
(still)
matter.
Personality
and
Social
Psychology
Bulletin,
38(2),
209–219.
http://dx.doi.org/10.1177/
0146167211432763.
Chambers,
J.
R.,
&
Windschitl,
P.
D.
(2004).
Biases
in
social
comparative
judgments:
The
role
of
nonmotivated
factors
in
above-average
and
comparative-optimism
effects.
Psychological
Bulletin,
130(5),
813–838.
http://dx.doi.org/10.1037/0033-2909.130.5.
813.
481
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Champely,
S.,
Ekstrom,
C.,
Dalgaard,
P.,
Gill,
J.,
Weibelzahl,
S.,
Anandkumar,
A.,
...
&
De
Rosario,
M.
H.
(2018).
Package
‘pwr’
(1.3-0).
https://cran.r-project.org/web/packages/
pwr/pwr.pdf.
Chung,
J.,
Schriber,
R.
A.,
&
Robins,
R.
W.
(2016).
Positive
illusions
in
the
aca-
demic
context:
A
longitudinal
study
of
academic
self-enhancement
in
college.
Per-
sonality
and
Social
Psychology
Bulletin,
42(10),
1384–1401.
http://dx.doi.org/10.1177/
0146167216662866.
Cohen,
J.
(1983).
The
cost
of
dichotomization.
Applied
Psychological
Measurement,
7(3),
249–253.
http://dx.doi.org/10.1177/014662168300700301.
Cohen,
J.
(1988).
Statistical
power
analysis
for
the
behavioral
sciences.
Hillsdale,
N.J:
L.
Erlbaum
Associates.
http://dx.doi.org/10.1177/014662168300700301.
Ehrlinger,
J.,
&
Dunning,
D.
(2003).
How
chronic
self-views
influence
(and
potentially
mislead)
estimates
of
performance.
Journal
of
Personality
and
Social
Psychology,
84(1),
5–17.
https://psycnet.apa.org/doi/10.1037/0022-3514.84.1.5.
Epley,
N.,
&
Caruso,
E.
M.
(2004).
Egocentric
ethics.
Social
Justice
Research,
17(2),
171–187.
https://doi.org/10.1023/B:SORE.0000027408.72713.45.
DellaVigna,
S.
(2009).
Psychology
and
economics:
Evidence
from
the
field.
Journal
of
Economic
Literature,
47(2),
315–72.
http://dx.doi.org/10.1257/jel.47.2.315.
Dunning,
D.,
Meyerowitz,
J.
A.,
&
Holzberg,
A.
D.
(1989).
Ambiguity
and
self-evaluation:
The
role
of
idiosyncratic
trait
definitions
in
self-serving
assessments
of
ability.
Journal
of
Personality
and
Social
Psychology,
57(6),
1082–1090.
https://doi.org/10.1037/0022-
3514.57.6.1082
Dunning,
D.,
Heath,
C.,
&
Suls,
J.
M.
(2004).
Flawed
self-assessment:
Implications
for
health,
education,
and
the
workplace.
Psychological
Science
in
the
Public
Interest,
5(3),
69–106.
http://dx.doi.org/10.1111/j.1529-1006.2004.00018.x.
Eriksson,
K.,
&
Funcke,
A.
(2013).
A
below-average
effect
with
respect
to
American
political
stereotypes
on
warmth
and
competence.
Political
Psychology,
36(3),
341–350.
http://dx.doi.org/10.1111/pops.12093.
Feghhi,
I.,
&
Rosenbaum,
D.
A.
(2019).
Judging
the
subjective
difficulty
of
different
kinds
of
tasks.
Journal
of
Experimental
Psychology:
Human
Perception
and
Performance,
45(8),
983–994.
http://dx.doi.org/10.1037/xhp0000653.
Gerber,
J.
P.,
Wheeler,
L.,
&
Suls,
J.
(2018).
A
social
comparison
theory
meta-analysis
60+
years
on.
Psychological
bulletin,
144(2),
177–197.
http://dx.doi.org/10.1037/
bul0000127.
Giladi,
E.
E.,
&
Klar,
Y.
(2002).
When
standards
are
wide
of
the
mark:
Nonselective
superiority
and
inferiority
biases
in
comparative
judgments
of
objects
and
concepts.
Journal
of
Experimental
Psychology:
General,
131(4),
538–551.
https://psycnet.apa.
org/doi/10.1037/0096-3445.131.4.538.
Gilovich,
T.
(1983).
Biased
evaluation
and
persistence
in
gambling.
Journal
of
Personality
and
Social
Psychology,
44(6),
1110–1126.
https://psycnet.apa.org/doi/10.1037/0022-
482
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
3514.44.6.1110.
Goetz,
T.,
Ehret,
C.,
Jullien,
S.,
&
Hall,
N.
C.
(2006).
Is
the
grass
always
greener
on
the
other
side?
Social
comparisons
of
subjective
well-being.
The
Journal
of
Positive
Psychology,
1(4),
173–186.
http://dx.doi.org/10.1080/17439760600885655.
Heine,
S.
J.,
&
Lehman,
D.
R.
(1997).
The
cultural
construction
of
self-enhancement:
An
examination
of
group-serving
biases.
Journal
of
Personality
and
Social
Psychology,
72(6),
1268–1283.
http://dx.doi.org/10.1037/0022-3514.72.6.1268.
Higgins,
E.
T.,
King,
G.
A.,
&
Mavin,
G.
H.
(1982).
Individual
construct
accessibility
and
subjective
impressions
and
recall.
Journal
of
Personality
and
Social
Psychology,
43(1),
35–47.
https://psycnet.apa.org/doi/10.1037/0022-3514.43.1.35.
Higgins,
E.
T.,
&
Bargh,
J.
A.
(1987).
Social
cognition
and
social
perception.
Annual
Review
of
Psychology,
38(1),
369–425.
https://doi.org/10.1146/annurev.ps.38.020187.
002101.
Hotelling,
H.
(1940).
The
selection
of
variates
for
use
in
prediction,
with
some
comments
on
the
general
problem
of
nuisance
parameters.
Annals
of
Mathematical
Statistics,
11(3),
271–283.
http://dx.doi.org/10.1214/aoms/1177731867.
Huddleston,
C.
(2019).
Survey:
69%
of
Americans
have
less
than
$1,000
in
savings.
Retrieved
July
27,
2020,
from
https://www.gobankingrates.com/saving-money/savings-
advice/americans-have-less-than-1000-in-savings/.
Johansson,
M.,
&
Allwood,
C.
M.
(2007).
Own-other
differences
in
the
realism
of
some
metacognitive
judgments.
Scandinavian
Journal
of
Psychology,
48(1),
13–21.
http://dx.
doi.org/10.1111/j.1467-9450.2007.00565.x.
Kim,
Y.,
Kwon,
H.,
&
Chiu,
C.
(2017).
The
better-than-average
effect
is
observed
because
“average”
is
often
construed
as
below-median
ability.
Frontiers
in
Psychology,
8,
898.
http://dx.doi.org/10.3389/fpsyg.2017.00898.
Klar,
Y.,
Medding,
A.,
&
Sarel,
D.
(1996).
Nonunique
invulnerability:
Singular
versus
distributional
probabilities
and
unrealistic
optimism
in
comparative
risk
judgments.
Or-
ganizational
Behavior
and
Human
Decision
Processes,
67(2),
229–245.https://doi.org/
10.1006/obhd.1996.0076.
Koellinger,
P.,
Minniti,
M.,
&
Schade,
C.
(2007).
“I
think
I
can,
I
think
I
can”:
Overconfi-
dence
and
entrepreneurial
behavior.
Journal
of
Economic
Psychology,
28(4),
502–527.
https://doi.org/10.1016/j.joep.2006.11.002.
Krizan,
Z.,
&
Suls,
J.
(2008).
Losing
sight
of
oneself
in
the
above-average
effect:
When
egocentrism,
focalism,
and
group
diffuseness
collide.
Journal
of
Experimental
Social
Psychology,
44(4),
929–942.
http://dx.doi.org/10.1016/j.jesp.2008.01.006.
Kruger,
J.
(1999).
Lake
Wobegon
be
gone!
The
“below-average
effect”
and
the
egocentric
nature
of
comparative
ability
judgments.
Journal
of
Personality
and
Social
Psychology,
77(2),
221–232.
https://doi.org/10.1037/0022-3514.77.2.221.
Kruger,
J.,
&
Burrus,
J.
(2004).
Egocentrism
and
focalism
in
unrealistic
optimism
(and
pessimism).
Journal
of
Experimental
Social
Psychology,
40(3),
332–340.
http://dx.doi.
483
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
org/10.1016/j.jesp.2003.06.002.
Kunda,
Z.
(1987).
Motivated
inference:
Self-serving
generation
and
evaluation
of
causal
theories.
Journal
of
Personality
and
Social
Psychology,
53(4),
636–647.
https://psycnet.
apa.org/doi/10.1037/0022-3514.53.4.636.
Kunda,
Z.
(1990).
The
case
for
motivated
reasoning.
Psychological
Bulletin,
108(3),
480–498.
https://doi.org/10.1037/0033-2909.108.3.480.
LeBel,
E.
P.,
McCarthy,
R.
J.,
Earp,
B.
D.,
Elson,
M.,
&
Vanpaemel,
W.
(2018).
A
uni-
fied
framework
to
quantify
the
credibility
of
scientific
findings.
Advances
in
Methods
and
Practices
in
Psychological
Science,
1(3),
389-402.https://doi.org/10.1177/2515245918787489.
LeBel,
Vanpaemel,
Cheung,
&
Campbell.
(2019).
A
brief
guide
to
evaluate
replications.
Meta-Psychology,3.
https://dx.doi.org/10.15626/MP.2018.843
Litman,
L.,
Robinson,
J.,
&
Abberbock,
T.
(2017).
TurkPrime.com:
A
versatile
crowdsourc-
ing
data
acquisition
platform
for
the
behavioral
sciences.
Behavior
Research
Methods,
49(2),
433–442.
https://doi.org/10.3758/s13428-016-0727-z.
MacCallum,
R.
C.,
Zhang,
S.,
Preacher,
K.
J.,
&
Rucker,
D.
D.
(2002).
On
the
practice
of
dichotomization
of
quantitative
variables.
Psychological
Methods,
7(1),
19–40.
https://
psycnet.apa.org/doi/10.1037/1082-989X.7.1.19.
Mattern,
K.
D.,
Burrus,
J.,
&
Shaw,
E.
(2010).
When
both
the
skilled
and
unskilled
are
unaware:
Consequences
for
academic
performance.
Self
and
Identity,
9(2),
129–141.
http://dx.doi.org/10.1080/15298860802618963.
Moore,
D.
A.,
&
Kim,
T.
G.
(2003).
Myopic
Social
Prediction
and
the
Solo
Comparison
Effect.
Journal
of
Personality
and
Social
Psychology,
85(6),
1121–1135.
http://dx.doi.
org/10.1037/0022-3514.85.6.1121.
Moore,
D.
A.
(2007).
Not
so
above
average
after
all:
When
people
believe
they
are
worse
than
average
and
its
implications
for
theories
of
bias
in
social
comparison.
Organizational
Behavior
and
Human
Decision
Processes,
102(1),
42–58.
https://doi.org/10.1016/j.
obhdp.2006.09.005.
Moore,
D.
A.,
&
Cain,
D.
M.
(2007).
Overconfidence
and
underconfidence:
When
and
why
people
underestimate
(and
overestimate)
the
competition.
Organizational
Behavior
and
Human
Decision
Processes,
103(2),
197–213.
http://dx.doi.org/10.1016/j.obhdp.2006.
09.002.
Moore,
D.
A.,
&
Small,
D.
A.
(2007).
Error
and
bias
in
comparative
judgment:
On
being
both
better
and
worse
than
we
think
we
are.
Journal
of
Personality
and
Social
Psychology,
92(6),
972–989.
http://dx.doi.org/10.1037/0022-3514.92.6.972.
Moore,
D.
A.,
&
Healy,
P.
J.
(2008).
The
trouble
with
overconfidence.
Psychological
Review,
115(2),
502–517.
https://psycnet.apa.org/doi/10.1037/0033-295X.115.2.502.
Naggara,
O.,
Raymond,
J.,
Guilbert,
F.,
Roy,
D.,
Weill,
A.,
&
Altman,
D.
(2011).
Analysis
by
categorizing
or
dichotomizing
continuous
variables
is
inadvisable:
An
example
from
the
natural
history
of
unruptured
aneurysms.
American
Journal
of
Neuroradiology,
32(3),
437–440.
http://dx.doi.org/10.3174/ajnr.a2425.
484
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Nisbett,
R.
E.,
Krantz,
D.
H.,
Jepson,
C.,
&
Kunda,
Z.
(1983).
The
use
of
statistical
heuristics
in
everyday
inductive
reasoning.
Psychological
Review,
90(4),
339–363.
http://dx.doi.
org/10.1037/0033-295x.90.4.339.
Polivy,
J.,
&
Herman,
C.
P.
(2000).
The
false-hope
syndrome:
Unfulfilled
expectations
of
self-change.
Current
Directions
in
Psychological
Science,
9(4),
128–131.
https://
psycnet.apa.org/doi/10.1111/1467-8721.00076.
R
Core
Team.
(2020).
R:
A
language
and
environment
for
statistical
computing.
R
Foundation
for
Statistical
Computing,
Vienna,
Austria.
https://www.R-project.org/.
Robins,
R.
W.,
&
Beer,
J.
S.
(2001).
Positive
illusions
about
the
self:
Short-term
benefits
and
long-term
costs.
Journal
of
Personality
and
Social
Psychology,
80(2),
340–352.
https://psycnet.apa.org/doi/10.1037/0022-3514.80.2.340.
Ross,
M.,
&
Sicoly,
F.
(1979).
Egocentric
biases
in
availability
and
attribution.
Journal
of
Personality
and
Social
Psychology,
37(3),
322–336.
https://doi.org/10.1037/0022-3514.
37.3.322.
Roth,
A.
V.,
Schroeder,
R.,
Huang,
X.,
&
Kristal,
M.
(2008).
Handbook
of
metrics
for
research
in
operations
management:
Multi-item
measurement
scales
and
objective
items.
London:
SAGE.
Sedikides,
C.,
Meek,
R.,
Alicke,
M.
D.,
&
Taylor,
S.
(2014).
Behind
bars
but
above
the
bar:
Prisoners
consider
themselves
more
prosocial
than
non-prisoners.
British
Journal
of
Social
Psychology,
53(2),
396–403.
https://doi.org/10.1111/bjso.12060.
Simonsohn,
U.
(2015).
Small
Telescopes:
Detectability
and
the
Evaluation
of
Replication
Results.
Psychological
Science,
26(5),
559–569.
https://doi.org/10.1177/0956797614567341.
Srull,
T.
K.,
&
Gaelick,
L.
(1983).
General
principles
and
individual
differences
in
the
self
as
a
habitual
reference
point:
An
examination
of
self-other
judgments
of
similarity.
Social
Cognition,
2(2),
108–121.
https://psycnet.apa.org/doi/10.1521/soco.1983.2.2.108.
Stewart,
M.,
Brown,
J.
B.,
Weston,
W.,
McWhinney,
I.
R.,
McWilliam,
C.
L.,
&
Freeman,
T.
(2013).
Patient-centered
medicine:
transforming
the
clinical
method.
CRC
Press.
Sundström,
A.
(2008).
Self-assessment
of
driving
skill.
A
review
from
a
measurement
perspective.
Transportation
Research
Part
F:
Traffic
Psychology
and
Behaviour,
11(1),
1–9.
https://psycnet.apa.org/doi/10.1016/j.trf.2007.05.002.
Sweeny,
K.,
&
Shepperd,
J.
A.
(2007).
Do
people
brace
sensibly?
Risk
judgments
and
event
likelihood.
Personality
and
Social
Psychology
Bulletin,
33(8),
1064–1075.
http://
dx.doi.org/10.1177/0146167207301024.
Tavakol,
M.,
&
Dennick,
R.
(2011).
Making
sense
of
Cronbach’s
alpha.
International
Journal
of
Medical
Education,
2,
53–55.
http://dx.doi.org/10.5116/ijme.4dfb.8dfd[
Taylor,
S.
E.,
&
Brown,
J.
D.
(1988).
Illusion
and
well-being:
A
social
psychological
perspective
on
mental
health.
Psychological
Bulletin,
103(2),
193–210.http://dx.doi.org/
10.1037/0033-2909.103.2.193.
Walsh,
E.,
&
Ayton,
P.
(2009).
My
imagination
versus
your
feelings:
Can
personal
affective
forecasts
be
improved
by
knowing
other
peoples’
emotions?
Journal
of
Experimental
485
Judgment
and
Decision
Making,
Vol.
17,
No.
1,
January
2022
Above
and
below
average
Psychology:
Applied,
15(4),
351–360.
http://dx.doi.org/10.1037/a0017984.
Weinstein,
N.
D.
(1980).
Unrealistic
optimism
about
future
life
events.
Journal
of
Per-
sonality
and
Social
Psychology,
39,
806–820.
http://dx.doi.org/10.1037/0022-3514.39.
5.806.
Weinstein,
N.
D.
(1983).
Reducing
unrealistic
optimism
about
illness
susceptibility.
Health
Psychology,
2(1),
11–20.
http://dx.doi.org/10.1037/0278-6133.2.1.11.
Whillans,
A.
V.,
Jordan,
A.
H.,
&
Chen,
F.
S.
(2020).
The
upside
to
feeling
worse
than
average
(WTA):
A
conceptual
framework
to
understand
when,
how,
and
for
whom
WTA
beliefs
have
long-term
benefits.
Frontiers
in
Psychology,
11,
642.
http://dx.doi.org/10.
3389/fpsyg.2020.00642.
Wilson,
A.
E.,
&
Ross,
M.
(2001).
From
chump
to
champ:
People’s
appraisals
of
their
earlier
and
present
selves.
Journal
of
Personality
and
Social
Psychology,
80(4),
572–584.
https://psycnet.apa.org/doi/10.1037/0022-3514.80.4.572.
Windschitl,
P.
D.,
Rose,
J.
P.,
Stalkfleet,
M.
T.,
&
Smith,
A.
R.
(2008a).
Are
people
excessive
or
judicious
in
their
egocentrism?
A
modeling
approach
to
understanding
bias
and
accuracy
in
people’s
optimism.
Journal
of
Personality
and
Social
Psychology,
95(2),
253–273.
https://psycnet.apa.org/doi/10.1037/0022-3514.95.2.253.
Windschitl,
P.
D.,
Conybeare,
D.,
&
Krizan,
Z.
(2008b).
Direct-comparison
judgments:
When
and
why
above-
and
below-average
effects
reverse.
Journal
of
Experimental
Psychology:
General,
137(1),
182–200.
https://doi.org/10.1037/0096-3445.137.1.182.
Zell,
E.,
&
Alicke,
M.
D.
(2011).
Age
and
the
better-than-average
effect.
Journal
of
Applied
Social
Psychology,
41(5),
1175–1188.
https://doi.org/10.1111/j.1559-1816.2011.00752.
x.
Zell,
E.,
Strickhouser,
J.
E.,
Sedikides,
C.,
&
Alicke,
M.
D.
(2020).
The
better-than-average
effect
in
comparative
self-evaluation:
A
comprehensive
review
and
meta-analysis.
Psy-
chological
Bulletin,
146(2),
118–149.
https://psycnet.apa.org/doi/10.1037/bul0000218.
Ziano,
I.,
Mok,
P.
Y.,
&
Feldman,
G.
(2021).
Replication
and
extension
of
Alicke
(1985)
better-than-average
effect
for
desirable
and
controllable
traits.
Social
Psychological
and
Personality
Science,
12(6),
1005–1017.
https://doi.org/10.1177/1948550620948973.
486