Edited by:
Reviewed by:
*Correspondence:
This article was submitted to Quantitative Psychology and Measurement, a section of the journal Frontiers in Psychology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
We explored how students interpret the relative likelihood of capturing a population parameter at various points of a CI in two studies. First, an online survey of 101 students found that students’ beliefs about the probability curve within a CI take a variety of shapes, and that in fixed choice tasks, 39% CI [30, 48] of students’ responses deviated from true distributions. For open ended tasks, this proportion rose to 85%, 95% CI [76, 90]. We interpret this as evidence that, for many students, intuitions about CIs distributions are ill-formed, and their responses are highly susceptible to question format. Many students also falsely believed that there is substantial change in likelihood at the upper and lower limits of the CI, resembling a cliff effect (
Think of a 95% confidence interval (CI). Do all points inside a CI have the same likelihood of capturing the population parameter? Are some points in the interval more likely than others? The former intuition implies a uniform distribution, the latter may describe a normal or
Cat’s eye confidence intervals, points inside a CI are not equally likely to land on the μ.
Using CIs instead of
A CI is a point estimate surrounded by uncertainty. When sampling from a population distribution our sample mean gives us an estimate of the population mean (μ). As sampling is repeated, and new sample means are calculated the sample means vary. As sampling continues (drawing samples of the same
The relative likelihood of each point across an interval falling on μ is not equal (
We use the term SLD to refer to a cognitive representation of the relative likelihood of each point across and beyond a CI in landing on the population parameter. For example, a uniform SLD reflects the (incorrect) belief that every point inside a CI is equally likely to have landed on μ.
We can think of a CI as comprised of three main sections: (1) The interval between the lower limit and upper limit, (2) The limits themselves, and (3) The regions beyond the limits. Observing a person’s judgments about likelihood of points in these sections allows us to plot a shape that represents their SLD. For example, if a person judges all the points in Section 1 as equally likely, the points in Section 3 as substantially less likely than those in Section 1 and as all equally unlikely, then their cognitive representation of a CI can be plotted as a square, or uniform distribution. Note that this may not necessarily mean that if asked explicitly ‘what distribution does a CI have?’ they would answer ‘uniform.’ In fact, we suspect that many students could not answer this question at all. What we elicit in our judgment tasks of the likelihood of points in different sections is their intuition—a hint at what their SLD might be.
One important determinant of the shape of an SLD may be whether an individual thinks of a CI as a substitute for Null Hypothesis Significance Tests (NHSTs).
Strict adherence to the Neyman–Pearson decision making model may lead to uniform SLDs with sharp cliffs at the lower and upper limits (or perhaps the inability to make any statement about relative likelihood when looking at a single CI), whereas Fisherian sympathies may lead to SLDs that reflect a more gradually decreasing distribution such a normal or
Our research program followed previous experiments (
Experiment 1 aimed to elicit students’ SLDs for CIs. Although this study was exploratory, we hypothesized that shape categories would include: (1) a square shape or uniform distribution, (2) a linear decrease or triangle distribution, (3) a normal distribution, and (4) various hybrids of the previous three. We used three different tasks to elicit the students’ distributions, and assessed consistency between tasks.
Final year honors undergraduates and graduate students were recruited through official course online noticeboards, email, social networking websites, and word of mouth, 101 students agreed to participate. Two thirds (66%) of students identified psychology as their main discipline; other disciplines include social science (13%), neuroscience (6%), and medicine (5%). The remainder (10%) did not identify their discipline. Most students (63%) were enrolled in a post graduate program, and the remaining students where completing their honors (fourth year undergraduate).
We developed an internet survey consisting of three judgment tasks. The first task in Experiment 1 asked students to rate the relative likelihood of points across a CI landing on μ, the parameter being estimated. The second task asked students to choose the shape that best represented their intuitions about the distribution of a CI from a set of six multiple choice options.
Task 3 presented students with a 50% CI, and asked them to adjust its width to correspond to a 80% CI, and a 95% CI. This task was repeated for adjusting a 95% CI to an 80% and a 50% CI. We also asked two open ended questions about how the student approached the tasks, and about their familiarity with the concept of likelihood distributions. Finally we asked for the students’ familiarity with CIs and some demographic details.
Potential student respondents were emailed an invitation with a link to the survey, and a brief explanation of the tasks involved in the questionnaire. We also invited potential respondents to pass on the link to others who might be interested in participating. The survey was designed to take around 15 min to complete, however, no time limit was set. The average time taken to complete the survey was 16 min.
Students were provided with a simple research vignette as well as
A CI was presented as part of Task 1. Students rated the likelihood of points L_{1} to L_{9} relative to the sample mean (
The 19-point scale ranged from (1) ‘More likely [to] land on the μ,’ (3) ‘About equally likely [to] land on the μ,’ (5) ‘Very slightly less likely to be the μ’ to (19) ‘Almost zero likelihood.’ The range was skewed so that students had 16 choices below ‘About equally likely…’ An identical figure (except for the label ‘50% CI’) and a comparable vignette were presented, for a 50% CI.
The upper dashed and lower dotted lines represent the differences in relative likelihood from
Likelihood distribution curves, for 50% (gray lines) and 95% CIs (dark lines).
A student’s SLD was categorized as Correct if the comparison distribution accounted for at least 97% of variance of the students’ SLD. It was sometimes difficult to demarcate this category from the Gradual Curve and the Triangle categories below. Even though we set the
Percentage [and 95% CIs] of students with each shape in Task 2.
Shape | % Students (95% |
% Students (50% |
% Students (consistent across 95% and 50% | ^{b} |
---|---|---|---|---|
Correct | 15 [9, 23] | 17 [11, 25] | 10 [6, 17] | |
Bell shape | 12 [7, 20] | 18 [11, 26] | 5 [2, 11] | |
Triangle | 4 [2, 10] | 7^{a} [3, 14] | 2^{a} [0, 7] | |
Half Circle | 10 [6, 17] | 5 [2, 11] | 4 [2, 10] | |
Mesa | 16 [10, 24] | 12 [7, 17] | 9 [5, 16] | |
Square | 19 [12, 28] | 13 [8, 21] | 11 [6, 18] | |
Other | 25 [17, 34] | 36 [27, 45] | 14 [8, 22] |
Gradual Curve: We categorized students’ SLDs as Gradual Curve for both 50% and 95% CIs if the curve had a curvilinear drop across the CI. This category was very lenient. If an SLD was not classified as Correct, had no cliff between points L_{5} and L_{6} (these points lie either side of the upper limit of the CI, see
Square and Mesa: For Square or Mesa classifications, a student must have rated L_{1} to L_{5} as equally likely, and then dropped the likelihood rating between points L_{5} and L_{6} A Square is defined by final points (L_{6} to L_{9}) with equal likelihood ratings whereas a Mesa shows a continued reduction in likelihood after from L_{6} to L_{9}.
Half Circle: For the Half Circle classification several points from (L_{1} to L_{5}) had to show a drop in likelihood. However, the drop in likelihood between L_{5} and L_{6}, located just inside and just outside the CI, had to be greater than for the other pairs of Ls.
Triangle: A SLD was classified as Triangle if it had a negative linear trend, that is, equal drops in likelihood over at least seven consequent points.
Other: A classification of Other was given to all SLDs that did not fit any of the above criteria, including showing no change in likelihood across any of the points (two respondents showed this response).
Task 2 asked students to choose a shape that best represented their SLD from a set of six shapes (
The underlying distribution of a CI is directly linked to the width of a CI given a
The likelihood distribution of each point of a 95% CI landing on μ. The 50, 80, and 95% marks indicate the upper limit of the CI with each of those
In Task 3, students were shown a CI and asked to click on the slider to the right of the screen to set the lower interval to what they felt was the right length at a different
Percentages of students for all shape categories for Task 1 are presented in
Six likelihood distributions presented as response options for Task 3.
Three quarters (75%, [67, 84]) of students adjusted the intervals in the correct direction. However, adjustments were on average not sufficiently large. When asked to adjust from an initial
A quarter of students (25%, 25 of 101, 95% CI [16, 33]), incorrectly believed that CI width would
We asked students two open ended questions; the first was “How did you go about answering these questions, for example, did you have any particular model in mind? Did you use any rules of thumb?” We received informative responses from 68 (of 101) of students (the remainder merely said, for example, “no” or “from lectures”).
Of the informative responses, 38% (26 of 68) mentioned having a normal distribution model in mind, for example “I believe that plausible values of a 95% CI are normally distributed around the mean. Thus, the further away a data point is from the mean in a 95% CI, the less likely it is to be the μ. Correspondingly, a 50% CI would be measuring a narrower distribution than a 95% CI.” Eight responses mentioned SD and one response mentioned SE.
Seventeen responses defined the CIs as a range. Of these, eight students mentioned that they thought everything inside the CI is equally likely. Six students mentioned that they found the tasks challenging or confusing. For example:
“There is a 95% chance that the true μ is within those parameters. I did not think the likelihood would be affected by a value’s distance to the mean as long as the values were within the 95% CI. However, when I saw the graphs, it does make more sense for the [normal distribution]…now I’m just confusing myself.”
The second open ended question was “…we introduced likelihood distributions, are you familiar with this concept? If so where have you come across it before?” We received 87% (87 of 101) usable responses. Of these, 53% expressed familiarity with likelihood distributions.
Our investigation has revealed four new CI misconceptions (
Misconceptions observed in Experiment 1.
Description of misconception | |
---|---|
New | All points inside a CI are equally likely to land on the μ |
New | All points outside a CI are equally unlikely to land on the μ |
There is a likelihood cliff at the end of a CI (both 50% and 95% CIs) | |
New | 50% CIs and 95% CIs have the same distribution. |
Likelihood decreases in a linear way as we move away from the sample mean. | |
As confidence level increases, CI width decreases (for the same data). | |
New | A 95% CI is roughly double the width of a 50% CI. |
Students were asked to give a rating of their familiarity with CIs on a six-point Likert scale that ranged from 1 “not at all familiar with CIs” to 6 “Very familiar with CIs…Often use them in research.” The median score was 4 which correspond to the statement “I have seen CIs in research and know what they tell me.” We compared the familiarity ratings of students grouped by their chosen SLDs in Task 3. The CIs overlapped considerably for all groups except Mesa. With a difference between Mesa (3.2) and Square (4.0) of 0.8 [1.4, 0.17], Mesa and Bell shaped (3.8) of 0.6 [1.0, 0.17] and Mesa and Half circle (4.1) of 0.9 [1.7, 0.19].
Our results indicate two main findings. First, almost three quarters (74%) of students gave a response inconsistent with the normative in at least one of the three tasks. Second, although 75% of students adjust CIs in the correct direction, one quarter (25%) of students believed that, for given data, as the
There was a large difference between the proportion of ‘correct’ SLDs we elicited in Task 1 compared to the proportion who selected the correct option in the fixed choice Task 2. In Task 1, only 27% of students’ SLDs were categorized as Bell or Triangle categories which we considered the most correct answers, whereas in Task 3, 61% chose Bell. Clearly, students’ responses were heavily shaped by the question format. This perhaps reflects the fragility of their understanding of the concept of the CI distribution.
Students’ shapes suggested several different types of SLDs. The Square shape indicated dichotomous decision making, a binary decision about the importance of results based on the arbitrary limit of the CI. This kind of intuition is consistent the Neyman–Pearson dichotomous decision making in so much as it prohibits discussion of varying likelihoods within the interval. The Half circle and Mesa shapes may also reflect a tendency to interpret CIs dichotomously, however, at the same time they reveal another SLD, a gradual reduction of likelihood for values further away from the sample mean. They may thus represent a combination of Neyman–Pearson and Fisherian NHST, possibly the hybrid approach outlined by
Only four students were classified as Triangle in Task 1 and nobody selected the Triangle in Task 3. It remains ambiguous whether it is just a poor approximation to a normal SLD or whether it exists as a linear SLD. If a linear SLD does exist at the very least it is not a representation of dichotomous decision making. However, we can’t rule out the former as it would be difficult to plot out an accurate Bell shape in SLD Tasks 1 and 2. This was explored further in Experiment 2.
Around half the students had the same shape for both
A quarter (25%) of our student sample thought that as
The ambiguity of the word ‘confidence’ may be the best explanation for the results found in both studies. Confidence is associated with surety, and more precise predictions are judged as preferable (
There are considerable differences in student judgments depending on question format. For example, in Task 2, the majority of the students (61%) identified the normal distribution as their 95% CI SLD. Yet in Task 1, only 27% student responses fit the normal distribution (Bell) category. One explanation is the availability heuristic (
After accounting for limitations, there is sufficient support to believe that the SLDs elicited in Tasks 1 and 2 represent something real. This evidence comes from triangulating the elicited SLDs with open ended comments. For example, Student #8’s answers to all tasks were classified as Square and that student stated “I answered these questions believing that for a 95% CI there is a 95% chance that the true μ is within [the limits], I did not think the likelihood would be affected by a value’s distance to the mean as long as the values were within the 95% CI.” In addition only three open ended responses directly mentioned conceptual difficulty with the tasks.
It is important to consider whether these results are due to students guessing their answers given the question format or whether there are consistent and coherent beliefs. The validity of capturing arguably ‘fuzzy’ intuitions using the exactness of a 19-point scale needs to be considered. The informativeness of these qualitative responses prompted us to design a second study to interview the students and explore whether their thoughts and intuitions can be adequately measured using such tasks.
In our second study, we wanted to further explore student intuitions as well as trial a new visual CI presentation, namely, cat’s eyes. We were interested in how students would interpret the figure and to what extent it might mitigate misconceptions. Cat’s eyes were designed to provide information about the relative distribution of likelihood across a CI (
Of the original sample of 101 students in Experiment 1, 24 agreed to an interview. All interviewees (
Summary of Study 2 participants’ results from Study 1 (Tasks 1 and 3).
# | SLD 95% Task 1 | SLD 50% Task 1 | Inverse length^{a} | Double |
||
---|---|---|---|---|---|---|
1 | Bell | Bell | No | No | 61.6 | 82.3 |
2 | Other | Other | Yes | No | 100^{c} | 12 |
3 | Half Circle | Square | No | No | 61.6 | 90.1 |
4 | Correct | Mesa | Yes | No | 100^{c} | 6 |
5 | Square | Square | Yes | No | 100^{c} | 29.2 |
6 | Correct | Correct | No | No | 41.4 | 93 |
7 | Mesa | Mesa | Yes | No | 100^{c} | 11.9 |
8 | Other | Half Circle | No | No | 42 | 86.6 |
9 | Mesa | Flat line | No | No | 61.6 | 92 |
10 | Other | Other | No | No | 55.4 | 93 |
11 | Other | Bell | Yes | No | 100^{c} | 23.6 |
12 | Mesa | Mesa | Yes | No | 100^{c} | 6 |
13 | Other | Other | No | Yes | 67.3 | 86.6 |
14 | Correct | Correct | Yes | No | 100^{c} | 6 |
15 | Square | Square | No | No | 70.8 | 89.4 |
16 | Correct | Bell | No | Yes | 67.4 | 92.8 |
17 | Half Circle | Other | No | No | 48.8 | 99.0 |
18 | Correct | Mesa | Yes | No | 100^{c} | 15 |
19 | Mesa | Mesa | No | Yes | 67.4 | 82.3 |
20 | Bell | Bell | Yes | No | 100^{c} | 25.9 |
21 | Half Circle | Half Circle | No | No | 55.6 | 96.6 |
22 | Square | Square | Yes | No | 100^{c} | 20.6 |
23 | Bell | Bell | No | No | 48.8 | 96.7 |
24 | Square | Flat line | No | Yes | 67.4 | 89.4 |
Students were provided with paper, and pencils, so they could annotate and add diagrams to their explanations if they chose. As the interview progressed they were gradually given copies of their answers to the original survey questions. All interviews were recorded, transcribed and coded. All transcripts were double coded. Students were also provided with a computer with the cat’s eye program developed in ESCI (
The interview consisted of three parts, the first focused on Task 1 from Experiment 1, where the participants judged the relative likelihood of nine points on a CI relative to the mean. The second part explored Task 3 and the third part introduced the participants to cat’s eyes. Students were encouraged to speak their thoughts candidly.
Part 1 began by picturing the students’ SLD as elicited in Task 2 of Experiment 1.
Three steps used to elicit students’ CIs during Section “Introduction” of the interview. 5 1: dots marked by interviewer, to represent that student’s responses to Experiments 1, 2. Student joins the dots, 3. Interviewer extends the shape across the page to mirror the shape and draw the students SLD. L_{1} to L_{9} represent the corresponding points on
The second task discussed in the interview focused on students’ intuitions about
Part 3 of the interview introduced students to the novel concept of cat’s eyes. First, the students saw a CI and the on-screen controls were explained these controls enabled the students to change N, SD, toggle the overall distribution and the cat’s eye (the amount of shaded area inside the overall distribution represented by the interval) on and off. Then the task was explained; the interviewer would ask the student to make a series of guesses about changes in length given
The audio from student interviews was recorded then independently transcribed by a research assistant (
Coding sheets and a coding manual were developed and each interview was independently double coded. The coding process involved identifying each mention of 22 possible concepts. When a misconception or correct conception (from the list 1.1 to 4.5 in Results below) was found in a students’ interview transcript it was coded as
Twenty-one of the concepts (both CI misconceptions and correct CI conceptions) investigated were developed
Overall every participant held at least one CI misconception, with a mean of 4.6 misconceptions per participant. At the end of the interviews (after exposure to cat’s eye CIs) the mean number of misconceptions dropped to 2.0 and one participant had no CI misconceptions.
The conceptions were grouped into four categories. Three categories were misconceptions: definitional, relational, and shape. The fourth category comprised correct conceptions.
We coded six ‘Definitional’ misconceptions, which reflect misunderstandings about what a CI is or represents. The following is a list of definitional misconceptions, followed by an example quote from the interviews.
“You can be 95% confident that if you replicated the study the mean will fall within this gray area.” (Student #21)
[Interviewer] “What does the [Confidence Interval] represent?”
[Student] “That’s where the sample means can be considered to be within the 95% CI” (#9)
“The [
This misconception could either be due to simple miscommunication or a genuine misunderstanding about the difference between SDs and SEs. Of course any SE is actually a SD—of a sampling distribution. But we are making the extra assumption that by SD a student means the SD of the data, not of the sampling distribution of the sample mean.
“I remember I know that a CI [is] kind of four SDs wide.” (#4)
“I’m not quite sure but I suppose having a 50% CI means you probably don’t have a lot of data… with a 95% CI you probably have a wider range of data and therefore can make a larger percentage judgment of where the μ might lie.” (#5)
“[A 50% CI means] It means it’s just as equally likely as it is unlikely… so if I were to like flip a coin the mean might be there, if I did it again it might be there and if I did it again it might be down there because it’s 50–50. And it …if it’s 50% in here and 50% outside, it means it’s the same [likelihood] all across the board.” (#14)
‘Relational’ misconceptions are misconceptions about how the different aspects of a CI relate, for example, how length changes as a function of
“It [CI length] stays the same [as sample size increases], it just stretches up and down… it wouldn’t do that in real life as there is no point in doing that but it stays the same.” (#14)
“If you have more participants… for some reason my intuition [is that the CI] just gets wider because you cover more range.” (#24)
“I guess you’re casting the net wider [as
Interviewer: “with independent groups, you are looking for an overlap of less than a quarter of the CIs to be statistically significant at 0.05.
Student: “I thought they couldn’t overlap at all?”
“So if you reduce the sample size the CI should go wider because you are less confident.” (#24)
Shape misconceptions are misconceptions about the shape of the distribution underlying a CI. One of these CI shape misconceptions (the cliff effect) was previously identified by
“My reasoning was CI just tells you that I’m 95% confident that the mean falls within this range but it doesn’t mean that [if a point] is closer to the mean, it doesn’t tell you that it’s more likely to happen.” (#24)
“The others [points inside the CI] are equally likely and these [points outside the CI] are equally unlikely.” (#22)
“Well that point is just outside the CI so it’s much less likely.” (#15)
“So the likelihood that the mean decreases is the same regardless of the percentage of the CI.” (#13)
“The further away you move from the mean the less likely it is to be representative of the μ, and [I was] really just working on predominantly a linear scale but I kind of ran out of room, perhaps if I could do it again I’d move it all backward so it would look a bit more linear.” (#23)
“[A 95% CI is] just under double [the length of a 50% CI]. One full 50% CI would fit in one of the MOEs (margin of error; i.e., one arm) of the 95%CI.” (#21)
We were also interested in students’ correct conceptions. ‘Correct conceptions’ are those which fit with normative statistical theory.
“We are looking at a smaller area of the distribution.” (#17)
“I was trying to represent the curved nature… how the normal distribution curved and tapered off.” (#4)
“So when we start with a 50% CI a 95% CI would have to cover a lot more area than twice the amount. Probably two and a half times the amount.” (#1)
“Well assuming that the results stay similar [the length] would decrease, as long as you are not introducing more variability to the data… say the extra data is something totally different from the first data.” (#5)
“… it just decreases in likelihood as we get away from the mean.” (#10)
Frequency of students stating intuitions at first mention (Time 1), at last mention immediately before exposure to cat’s eye program (Time 2), and after exposure to cat’s eye program (Time 3) for definitional misconceptions (
Frequency of students stating intuitions at first mention (Time 1), at last mention immediately before exposure to cat’s eye program (Time 2), and after exposure to cat’s eye program (Time 3) for relational misconceptions (
Frequency of students stating intuitions at first mention (Time 1), at last mention immediately before exposure to cat’s eye program (Time 2), and after exposure to cat’s eye program (Time 3) for shape misconceptions (
Frequency of students stating intuitions at first mention (Time 1), at last mention immediately before exposure to cat’s eye program (Time 2), and after exposure to cat’s eye program (Time 3) for correct conceptions (
To check whether introducing the cat’s eye had a effect on students’ intuitions we ran a Wilcoxon Rank Sign test for each intuition. The rankings of the responses were coded as: For misconceptions (Categories 1–3) the following rankings were used Present = 0, Unsure = 1, Implicit Absent = 2, Explicit Absent = 3.
For the correct conceptions the rankings were Present = 3, Implicit Present = 2, Unsure = 1, Explicit Absent = 0.
Wilcox Rank Sign Test for student misconceptions at baseline (Time 1), at last mention immediately before exposure to cat’s eye program (Time 2), and after exposure to cat’s eye program (Time 3).
Time 1 to Time 2 |
Time 2 to Time 3 | |||
---|---|---|---|---|
Intuition | ||||
1.1 | –1 | 0.317 | 1 | 0.317 |
1.2 | –2.07 | 0.038 | 0 | 1 |
1.3 | –1.41 | 0.157 | –2.165 | 0.03 |
1.4 | –1 | 0.317 | –1.414 | 0.157 |
1.5 | –1 | 0.317 | –1 | 0.317 |
2.1 | 0 | 1 | –3.272 | 0.001 |
2.2 | –1 | 0.317 | –2.06 | 0.039 |
2.3 | 0 | 1 | –2.549 | 0.011 |
2.4 | 0 | 1 | –2.232 | 0.026 |
2.5 | 0 | 1 | –3.491 | <0.001 |
3.1 | –1.089 | 0.276 | –3.109 | 0.002 |
3.2 | –0.772 | 0.47 | –1.823 | 0.068 |
3.3 | –1.414 | 0.157 | –1.89 | 0.59 |
3.4 | –1.342 | 0.18 | –1.89 | 0.059 |
4.1 | 0 | 1 | –3.464 | 0.001 |
4.2 | 0 | 1 | –1 | 0.317 |
4.3 | 0 | 1 | –1.414 | 0.157 |
4.4 | 0 | 1 | –1.493 | 0.135 |
4.5 | –1.414 | 0.157 | –1.134 | 0.257 |
By interviewing students we confirmed that the misconceptions discussed in Experiment 1 were held by some students and that the measures we used to elicit SLDs had good construct validity. In
Student quotes demonstrated both misconceptions and correct conceptions that students have about CIs. In addition, interviews identified a misconception not previously described in the literature, that the sample size has an effect of the
Initially we had expected to find clusters of definitional and relational misconceptions that could link to false beliefs about CI distributions. In reality, students’ understanding is too fragile and our sample size too small for any robust patterns to be identified. Instead the frequently diverse responses to the different tasks suggest that students often hold several competing intuitions (good and bad), and that misconceptions about shape do not translate into logically consistent relational misconceptions. Overall, cat’s eyes reduced the number of definitional misconceptions by 25%, relational misconceptions by 91%, and shape misconceptions by 53%.
As expected, there were few mentions of definitional misconceptions. Also, exposure to cat’s eyes had no substantial effect on removing the definitional misconceptions mentioned. This can be explained by the focus of the interview. Students were never asked to define a CI. Instead students were asked to clarify any definitions they gave. Also the cat’s eye program was designed to improve students’ SLDs and reasoning on the effects of sample size
The interviews were much more successful at eliciting relational and shape misconceptions. Cat’s eyes were effective in removing these misconceptions in most but not all cases. Some students still found the relational and shape concepts challenging even when provided with a tool that demonstrates relational and shape concepts. One student reflected on why a CI is such a difficult concept:
“I guess it’s hard just because there are so many things to hold in your head at the same time. There is the distribution, there is the
The cat’s eye makes explicit the normal distribution underlying the CI (
“I don’t think I could reason that [the interaction between length,
The cat’s eye was particularly helpful at reducing the relational misconception about
“The larger the CI, the less confident we are of the [results]. The smaller the CI, the more power a study would have, the more confident we are of the results.” (#11)
Here is the same student gaining insight through the cat’s eye:
“So changing the
Misconception 2.2; ‘length of a CI increases as
“A 95% CI is smaller because you are more certain that the μ lies within this area. The 50% you are less certain as to where the population mean is so you have to widen your… scope.” (#5)
This idea of equating confidence to precision is also present in our previous quote by student (#12). As mentioned, in colloquial language, a highly confident person gives a precise estimate while a person with low confidence in their guess will often give a vaguer estimate. This intuition would logically also be responsible for the misconception ‘a 50% CI means we lack data.’ A person with little information would not be very confident about any guess they made. In fact student (#5) had both of these misconceptions.
Cat’s eyes also helped students understand that the underlying distribution of a CI was a normal distribution. Fourteen (of 24) participants interviewed mentioned that they thought that a CI had an underlying uniform distribution, meaning all points inside the CI were equally likely to land on the μ:
“Basically if you can say 95% of the time [the μ] is within the CI, then that means that any of these points are equally likely along that line.” (#18)
Exposure to the cat’s eyes removed this misconception for all but three of these participants. For example the student quoted above responded:
“The confidence interval, it represents… the curve represents the likelihood that a point is the μ.” (#18)
However, not all participants were satisfied by this:
“If [μ] is there (M), we don’t know that it’s definitely there, or definitely there (points to upper limit). So you can’t say that something is less likely. It could fall anywhere in that range… I suppose I could say it’s less and less likely because that’s what the picture tells me, but it doesn’t tell me why.” (#22)
One interesting insight gained from these interviews is that although the tasks we presented to the students were understood and were valid as representations of the students’ intuitions, students’ intuitions were fuzzy, conflicting but coherent. Students were able to communicate their intuitions and even reason them out despite the intuitions sometimes conflicting with one another. As mentioned, the relationship between
Interviewer (I): “What would happen to a CI when we decrease the
Student (S) (#17): “I would say it gets smaller”
I: “What if we decreased the number of participants?”
S: “It would get bigger… because wouldn’t you be less certain?”
I: “Let’s go with what you are thinking.”
S: “Well if there are less participants then you are less certain so that means you have to make the CI itself smaller, like a smaller percentage …and then if you did that I would…Oh no! Now everything is conflicting! If it is smaller, then doesn’t that mean it’s better? Or…No wait…I don’t know!
Student #17 has basically correct intuitions when each concept is presented separately. When the concepts are presented together the student was conflicted. She understood that decreasing the sample size increases the uncertainty (by reducing the number of participants, and therefore increasing variability). She also believed that decreasing the
Interviews also provided some new unexpected directions for future research. Student #13 provided a SLD that indicated the likelihood of capturing μ initially increased as it moved away from the sample mean before gradually dropping off.
Initially we thought that the student made a mistake and was trying to plot a normal distribution. The interview provided a much more interesting explanation for this SLD:
Interviewer (I): “One thing I noticed is that [point L_{2} and L_{3}] are more likely to be the μ than the sample mean, did you mean that?”
Student #13 (S): “Yes”
I: “If you don’t mind let’s go behind your reasoning…”
S: “I always overestimate.”
I: “What did you mean by ‘you always overestimate’?”
S: “I just [have to] account for variables I haven’t thought of before.”
I: “So when you don’t account for variables, what does that mean? Does it mean you’re likely to get the estimate wrong or that you don’t have faith in your sample mean?”
S: “I’m just not 100% that it’s the correct mean, that it’s most likely reflecting the μ.”
I: “How is it [the μ] more likely to be around the mean but not the sample mean?”
S: “Um…I don’t know. I just don’t take the [sample] mean as a correct reflection so I always go outside the [sample] mean.”
Student #13 seems to have correct conceptions about sampling variability. She is correct that the μ is much more likely to land within the interval between L_{1} and L_{3} than to land on exactly the sample mean. She also seems to understand why error bars are so important. There may well be variables that a researcher has not considered that may make the sample mean an incorrect estimation. These correct intuitions ignore two normative principles: First, that any interval will take up more area of a likelihood distribution compared with any single point. Second, that researchers often make the assumption of a normal distribution when running statistical tests to account for variability in sampling data (but perhaps not other variables that have not been considered) using this assumption. Overall Student #13 showed correct conceptions about variability but did not seem to understand how variability is represented using CIs.
One limitation of the study was participant fatigue. After a 45-min interview about statistics several participants seemed exhausted and may not have had motivation or energy to address their misconceptions with care. Also, any improvements found in the interviews after introducing cat’s eyes may have been merely temporary and not a stable and permanent conceptual change. Long term follow ups were beyond the scope of this study. A classroom intervention with follow ups could measure this using a shorter more direct qualitative survey to reduce participant fatigue Nonetheless, for many (71%) participants in our study, cat’s eyes helped with some difficult relational concepts. We also did not have a control or comparison intervention—conceivably there may be a simpler intervention that may improve intuitions.
To reduce investigator bias, a standard interviewing script was created and piloted several times before the first participant was interviewed. The script was identical for all participants. However, the interviewer and the participants were able to depart from the script when necessary and it is possible that on occasion some implicit investigator bias may have colored the participants’ responses. To further reduce investigator bias in the coding process all interviews were independently coded by a research assistant with an adequate mean Cohen’s Kappa of 0.81.
One methodological limitation involved the phrasing of the text presented to describe the 19 points in
Finally, our sample was small, and a sample of convenience. The proportion of students holding any conception does not necessarily provide a good estimate of the proportion holding that conception in the population of all graduate students. Experiment 1 gives a better estimate of the proportions of students with relational and shape misconceptions. For example, in our previous survey, 25% of students believed that as you increase a
Overall the educational implications of using cat’s eyes are promising. We argue that they are a useful conceptual tool for students to encounter and discuss and maybe useful to keep in mind as a guide for thinking about and interpreting CIs. From this study a very minimal intervention has produced reasonably favorable results, at least in the short term. By providing a computer simulation that helped students explore and experience the relationships between length,
Interviewing participants in Study 2 triangulated the evidence from Experiment 1. Students do have varying SLD shapes and these shapes reflect how they think about CIs. Also, the measures used to plot students’ SLDs have good construct validity. In addition, the interviews confirmed previously reported CI misconceptions and have revealed new CI misconceptions. Students were able to articulate their intuitions and in some cases justify them. The interview data suggest that cat’s eyes improve student intuitions, particularly misconceptions about likelihood distributions, and relational misconceptions. Students seem to benefit from exploring their intuitions and testing whether these intuitions match with the cat’s eye program. Finally another important implication from the results is that students are able to hold several seemingly contradictory intuitions at one time, such as the presence of a normal distribution and the idea that everything inside the CI is equally likely to land on μ.
Studies 1 and 2 have provided a narrative on common misconceptions as well additional insights into how students think about CIs. It’s reasonable to assume that many researchers and clinicians also hold such misconceptions about CIs (although such claims would need to be verified). The much advocated position of moving to estimation (effect sizes and confidence intervals) is unlikely to return substantial benefits if CIs are routinely misinterpreted, or merely used as a substitute dichotomous decision making criteria. It is important to provide students, researchers and clinicians with easy to use, and intuitive tools that can help them overcome CI misconceptions. Cat’s eyes are promising to be quite effective at improving SLDs and reducing relational and shape misconceptions.
This study was carried out in accordance with the recommendations of Human Research Ethics Guidelines, University Human Ethics Committee’ with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the University Human Ethics Committee.
All research reported in the paper were approved by the La Trobe Human Ethics Committee: HEC Approval Number 06-171.
PK designed the study and analyzed the data. PK also wrote up the initial draft. JL was involved with the design of the study and the drafting of the paper. GC was the principal supervisor overseeing the study, guiding the analysis and interpretation of the results. GC also was involved with drafting the paper.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.