^{1}

^{*}

^{2}

^{1}

^{2}

Edited by: Laura Martignon, Ludwigsburg University, Germany

Reviewed by: Ruomeng Zhao, LinkedIn, United States; Bernd Lachmann, University of Ulm, Germany

This article was submitted to Educational Psychology, a section of the journal Frontiers in Psychology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Mathematical problem-solving and spatial visualization are areas in which performance has been shown to vary with sex. This article describes the impact of gender on spatial relations measured in 331 secondary school students (202 males, 129 females), 145 (105 males, 40 females) of whom had been selected to participate in a mathematical talent stimulation project after passing a complex problem-solving test. In the two tests administered, the

Although the importance of visualization in mathematical problem solving has been highlighted in mathematics education (

A number of studies has focused on gender differences in these two areas, suggesting possible relationships between them (

This exploration of the effect of gender and mathematical performance on the differences observed in secondary school students’ visual abilities includes a review of the literature on gender differences in the two types of skills.

Review papers and meta-analyses have identified greater mathematical problem-solving aptitudes among men (

Other factors to be considered in gender difference studies is the date they are conducted and the group of people participating. A meta-analysis conducted 18 years later by

Although women continue to be underrepresented in science, technology, engineering and mathematics (STEM) education and careers (

Further to a meta-analysis of differences between the sexes in mathematics covering a number of countries (

Meta-analyses have consistently reported males to be more spatially skilled than females (

Different performance factors have been identified in the effect of gender on mental rotation results, depending on the measuring instrument used and the conditions in which the tests were administered and scored. In a 3D mental rotation test measuring speed of performance as one such factor, time limits and the use of raw scores were found to benefit males (

The effect of time is associated with the strategy used to complete tests, with women being shown to be less self-assured when sitting these tests in mental rotation (

The literature review conducted for this article revealed wider differences between the sexes in mental rotation than other spatial exercises. No consensus was detected, however, on how such differences may be impacted by scoring criteria, i.e., by the use of absolute values or the ratio of each to the number of items answered. The review also identified the early years of secondary school as the time when gender differences appear in complex mathematical problem solving. No conclusive evidence was found of interaction between spatial skills and complex problem-solving abilities in the differences between the sexes observed, particularly among Spanish students.

With a view to contributing to this issue, the research questions posed in this study were: do gender and the ability to solve complex problems affect the differences observed in the participants of the current study’ spatial aptitudes? If so, what performance measurements reflect that effect? To this end, results of 13- to 16-year old Spanish students are compared in two different test assessing the spatial ability (mental rotation and visualization of an object in three dimensions from a two-dimensional model) as well as the factors related to performance, completion time, and strategies used to answer the items.

A total of 331 s, 2nd, 3rd and 4th -year secondary education students participated in this study. The mean age of the sample was 15 (±0.97) and the range 13 to 16. Part of the sample, 105 males and 40 females from nine provinces in Spain, were selected to participate in ESTALMAT, a project to encourage mathematically talented students, selected on the grounds of a math test in which the problems were divided into sections by level of difficulty. The participants didn’t receive any incentives. The test assessed students’ aptitude for and attitudes around mathematical knowledge. The differences in the number of boys and girls in this group attested to the differences between the sexes in complex problem-solving reported for youths of those ages, especially where the questionnaires combined areas such as geometry, arithmetic and logical reasoning (

‘The vertices of a triangle bear the number 1 or −1 and the product of the three is shown in the middle. If we add the four numbers: (a) What values may the sum take? What combination yields zero? (b) What would the sum be if instead of a triangle we had a square? (c) If we use a polygon with an even number of sides, can the sum be zero? Why? (d) What sort of polygons with an odd number of sides could give us zero? Why?’

The 186 students (97 males and 89 females) in the other group were enrolled in 2nd, 3rd, or 4th-year secondary education in two schools, each in a different Spanish province. According to their teachers, these students (‘NCPs’) had exhibited no complex problem-solving talent.

With a view to exploring the issue in greater depth, this study analyzed the effect of gender and mathematical ability on performance in two spatial tests frequently used to diagnose spatial aptitudes in Spain.

The following instruments were used in this study:

PMA-SR measures the ability to mentally rotate two-dimensional figures quickly and accurately (

Each test item consists in a two-dimensional drawing, which subjects must match to only one of four three-dimensional figures. This test is often used to study gender differences (

Hereafter, the two aforementioned tests are referred to as PMA-SR and DAT-SR. The working hypothesis defined to explore the impact of gender differences and mathematical abilities on performance indicators was based on the earlier findings described above. The PMA-SR test was therefore deemed more appropriate to detect gender differences in spatial ability, for it measures mental rotation in a specific plane, whereas the DAT-SR test measures the ability to construct a three-dimensional object from its two-dimensional representation. The PMA-SR test might better identify gender differences in speed-related factors, given the short time afforded subjects to complete the exercise. The DAT-SR test, in turn, might furnish a more reliable measure of strategy-based self-confidence. Since there is only one correct answer to each item in DAT-SR, items left blank are a more sensitive indication of student uncertainty and therefore their level of self-confidence. More self-confident subjects would not need to analyze all the options as intensely and could consequently answer more quickly without leaving items blank.

The tests were administered to the original recommendations on instructions and timing. The talented complex problem-solvers sat the tests during one of their ESTALMAT project sessions, routinely conducted outside class time (on Saturday mornings). The PMA-SR instructions were delivered in 5 min, after which students were allowed 5 min to complete the test. After a 30 min break, the DAT-SR test was administered, again with a 5 min explanation followed in this case by 20 min to do the exercise. The same procedure was deployed with the control group students, who participated during normal classroom time.

As students were given no prior information about the scoring procedure, they did not know that the total score in PMA-SR was found as the difference between the number of correct and incorrect answers and in DAT-SR as the number of correct responses. They were, however, told that the number of correct choices per item in PMA-SR was indeterminate and that there was only one per item in DAT-SR.

All the subjects gave their consent to voluntarily participate in the study, which are compliant with the guidelines given by the Bioethics Committee from both UNED and University of Granada in relation to human subjects.

A 2 × 2, bi-factorial intergroup design was used, in which Gender (categories: male and female) and Ability (categories: CP, talented complex problem-solvers; and NCP, no complex problem-solving talent) were the independent variables. The dependent variables were performance, speed and confidence, measured in terms of the following indicators.

Number of correct items (A1): in PMA-SR an item was deemed correct only if, of the six options given, all the actual rotations and no others were chosen. In DAT-SR an item was deemed correctly answered if the single correct option was chosen.

Number of incorrect items (A2): in PMA-SR an item was deemed incorrect if any actual rotation was not chosen, or any non-rotations were. In DAT-SR, items were deemed incorrect when the wrong option was chosen.

Number of items attempted (B1): the number of items attempted was the number answered: B1 = A1 + A2.

Number of blank items (C1): blank items were all the ones where students chose none of the options. In PMA-SR, B1 + C1 = 20 and in DAT-SR, B1 + C1 = 50.

Test score (A3): in PMA-SR the score was found by subtracting the number of incorrect from the number of correct items. In DAT-SR the score was the number of correctly answered items.

Last item answered (B2): as the items were sorted correlatively, the value was the item answered that was numbered highest.

Number of omissions (C2): the number of omissions was the number of items left blank prior to the last item answered. For PMA-SR, C2 + (20-B2) = C1 and for DAT-SR C2 + (50-B2) = C1.

Performance is measured by A3 indicator, which in DAT coincides with A1 whereas in PMA it also involves A2 for its calculation. B1 and B2 are speed indicators. C2 and C1 are used for measuring confidence, as they can differentiate whether an item is blank because of doubts in the correct answer or because of lack of time to answer it. The ratios of the number of correct answers and the number of items omitted to the number of items answered were used to infer the effectiveness of the strategy deployed (

Number of correct answers/number of items answered (AR1).

Number of items omitted/number of items answered (CR2).

In order to perform statistical analyses of data, those subjects whose protocols were incomplete or showed errors were removed from the analysis. First, the mean and standard deviation in the different scores was calculated (see

Mean, standard deviation, and

Right (A1) | 11.40 | 3.85 | 10.83 | 3.67 | 7.73 | 4.52 | 5.95 | 4.01 | 77.60** ( |
5.86* ( |
1.54 ( |

Wrong (A2) | 2.22 | 1.91 | 1.88 | 1.56 | 4.61 | 4.22 | 4.88 | 4.02 | 46.60** ( |
0.012 ( |
0.594 ( |

Score (A3) | 32.26 | 10.92 | 30.55 | 10.06 | 23.40 | 12.57 | 18.49 | 11.63 | 58.41** ( |
5.84* ( |
1.36 ( |

Attempted (B1) | 13.62 | 3.83 | 12.70 | 3.66 | 12.34 | 3.94 | 10.82 | 3.57 | 12.29** ( |
7.36** ( |
0.436 ( |

Last item (B2) | 13.96 | 3.85 | 13.53 | 3.95 | 12.83 | 4.07 | 11.34 | 3.86 | 12.55** ( |
4.26* ( |
1.28 ( |

Blank (C1) | 6.38 | 3.83 | 7.30 | 3.66 | 7.66 | 3.94 | 9.17 | 3.57 | 12.29** ( |
7.36** ( |
0.436 ( |

Omitted (C2) | 0.34 | 1.06 | 0.83 | 2.37 | 0.49 | 1.94 | 0.51 | 1.69 | 0.163 ( |
1.56 ( |
1.30 ( |

Right (A1) | 43.98 | 7.73 | 44.38 | 7.92 | 32.39 | 11.16 | 30.77 | 9.66 | 127.47** ( |
0.299 ( |
0.811 ( |

Wrong (A2) | 4.33 | 5.84 | 2.55 | 2.55 | 13.73 | 10.72 | 14.11 | 8.87 | 116.95** ( |
0.515 ( |
1.24 ( |

Score (A3) | 43.98 | 7.73 | 44.38 | 7.92 | 32.39 | 11.15 | 30.77 | 9.61 | 127.47** ( |
0.299 ( |
0.811 ( |

Attempted (B1) | 48.31 | 4.85 | 46.92 | 7.06 | 1.00 | 6.34 | 44.88 | 6.78 | 8.42** ( |
3.20 ( |
0.011 ( |

Last item (B2) | 48.57 | 4.37 | 47.95 | 5.04 | 46.40 | 6.16 | 45.85 | 6.70 | 10.04** ( |
0.744 ( |
0.003 ( |

Blank (C1) | 1.69 | 4.85 | 3.08 | 7.06 | 3.89 | 6.34 | 5.11 | 6.78 | 8.42** ( |
3.20 ( |
0.011 ( |

Omitted (C2) | 0.26 | 1.00 | 1.03 | 5.21 | 0.28 | 0.91 | 0.97 | 2.44 | 0.005 ( |
6.85** ( |
0.021 ( |

Mean, standard deviations, and

Right (AR1) | 0.83 | 0.13 | 0.84 | 0.14 | 0.62 | 0.29 | 0.54 | 0.30 | 78.61** ( |
1.50 ( |
0.250 ( |

Omitted (CR2) | 0.03 | 0.09 | 0.09 | 0.30 | 0.07 | 0.41 | 0.05 | 0.19 | 0.010 ( |
0.456 ( |
1.25 ( |

Right (AR1) | 0.91 | 0.12 | 0.93 | 0.07 | 0.70 | 0.22 | 0.68 | 0.18 | 128.24** ( |
0.129 ( |
1.46 ( |

Omitted (CR2) | 0.007 | 0.04 | 0.05 | 0.31 | 0.007 | 0.02 | 0.02 | 0.06 | 1.16 ( |
5.11* ( |
1.11 ( |

CPs scored significantly higher than NCPs in all the performance indicators in both tests: more correct answers (A1) [

Gender had a significant effect on two of the performance indicators in PMA-SR, with males answering more items correctly (A1) [

The CPs scored consistently higher in the speed indicators than the NCPs: more items attempted (B1) [

In the PMA-SR test male subjects earned higher speed indicator scores, answered more items (B1) [

Problem-solving capacity exerted no prominent effect on the number of items omitted (C2) in either test, although talented complex problem-solvers left significantly fewer items blank (C1) [

Although no differences were observed between the sexes in the total number of items left blank in the DAT-SR test, obvious differences were recorded in the number omitted (C2) [

The gender differences in the number of speed-related blank items found in PMA-SR were not observed in connection with omissions. In this test the mean number of omissions was less than half an item, an indication that subjects only exceptionally failed to answer due to uncertainty. As in the other indicators, no inter-variable interaction was observed in omissions.

CPs exhibited significantly higher AR1 scores than NCPs in both tests, denoting a higher percentage of correct answers and fewer errors [

Males’ statistically significant higher absolute performance in terms of number of correct answers, scores and number of items answered in the PMA-SR test was absent in the AR1 findings. In other words, the differences between the sexes in the fraction of correct answers relative to the number of items answered were not significant.

In DAT-SR, as in the case of the absolute values which showed no differences in performance by sex, the AR1 ratio revealed the absence of significance between males’ and females’ likelihood of responding correctly to the items answered. In contrast, a significantly higher ratio of items omitted to items answered was observed for females (CR2) [

This study used two spatial tests, PMA-SR and DAT-SR, to analyze the effect of gender and the ability to solve complex mathematical problems on performance. Gender (male/female) and mathematical ability (complex problem solvers/non-solvers) were the independent variables, while the performance indicators were score, number of correct and incorrect answers, number of items attempted, number left blank, number omitted and the last item answered, along with the ratios of the number of correct answers and the number of omissions to the total number of items answered. The study’s four major contributions to the effect of gender and mathematical talent on spatial aptitudes are highlighted below.

CP students performed better and faster than NCPs in both tests administered here. The former were found to score significantly better than the latter in both tests: making fewer mistakes, leaving fewer items blank, answering more items, and exhibiting a higher success rate per item answered. The present findings therefore corroborate the positive relationship between mathematical talent and visual ability reported earlier (

Although gender differences have been frequently and separately reported in studies of mathematical performance and visual skills, no interaction was observed in any of the indicators analyzed here. When explored together, the effect of one variable on the other was not determinant and the differences in mathematical ability were unrelated to the gender differences found in the tests. Nor did gender determine the differences observed in mathematical ability. Unlike other studies, the research conducted here was unable to confirm that differences between the sexes revealed by spatial tests concur with differences in complex problem-solving abilities (

The inference drawn from the data, according to which none of the indicators denoted gender differences in both tests, is that the differences between the sexes in the performance factors were related to characteristics specific to each test. In other words, this study failed to find males more visually skilled, faster or more confident, for the differences in men’s and women’s scores were not observed consistently across the instruments and assessment criteria applied (

In this study, the performance differences observed in the PMA-SR test were speed-related, with males answering more items and completing more of the test, although at a success rate no higher than the females’ in any of the items. In this test, boys implemented a better strategy because it was faster, whereas they did not outperform the females in terms of success per item or number of omissions. Therefore, the strategy of answering more items per unit of time yields more correct responses per unit of time, as reported by other authors for mental rotation tests (

No differences between the sexes were observed in the speed or effectiveness indicators for DAT-SR. Differences were observed in that test with respect to omissions, with females more willing to leave an item blank when they were unsure of the answer. That finding was not consistent with results reported for an abridged version of the DAT-SR test, which revealed significant gender differences in the number of correct answers and items answered, but not in the absolute number of omissions or the ratio of omissions to the items answered (

Gender-related differences in strategy implemented varied depending on the test. In the PMA boys deployed faster strategies, whereas in the DAT test girls proved more reluctant to guess.

The findings for the CP group were the same whether expressed as the absolute value of the variables or the value relative to the number of items attempted. The absolute DAT test results were likewise unchanged in any of the indicators when ratioed to the number of items attempted. In PMA-SR in contrast, the differences observed between the sexes in the absolute number of correct answers were absent when expressed as a fraction of the number of items answered, as observed by earlier authors (

Two limitations to this study are sample size and the smaller proportion of women. In relation to the sample, the results obtained are specific to the Spanish students who participated in the study, using the ability to solve complex mathematical problems as an indicator of mathematical ability, and the results obtained in PMA and DAT test as an indicator of spatial ability. Further generalization of the results of this study about gender differences in mathematical performance and visualization should take this limitation into account, as well as the heterogeneity of students with mathematical talent (

The inequalities between the CP and control groups were consistent with previous reports (

Although some of the test scores attest to differences between the sexes, an analysis of the cognitive aspects associated with such differences is believed to be in order. Despite the dependence of the reluctance to guess on personality factors, the parameter of greatest relevance may be the time invested in mentally rotating objects rather than the speed in answering or the decision to answer an item.

The datasets generated for this study are available on request to the corresponding author.

The studies involving human participants were reviewed and approved by the Bioethics Committee of the University of Granada. Written informed consent to participate in this study was provided by the participants’ legal guardian/next of kin.

IR-U performed the overall planning and design of the study, the bibliographical revision, and the methodology and statistical analyses in the study. RR-U performed the assessment of the subjects, wrote the theoretical background, and derived the conclusions according to the results in the study.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.