An Analysis on Students’ Thinking Complexity through Their Writing in a General Education Course

To understand and evaluate students’ thinking habits and abilities, as well as to assess the effectiveness of the course, a Narrative Qualitative Analysis (NQA) was conducted to study students’ writing assignments submitted to a compulsory General Education course, “In Dialogue with Nature”, at the Chinese University of Hong Kong. Based on the Wolcott-Lynch (WL) model, students’ written assignments in this course were analyzed, from which their overall thinking patterns and characteristics were studied. It is found that around 80% of the students are clustered within the lowest two thinking performance patterns of the WL model called Confused Fact-Finder and Biased Jumper. Moreover, the students' representative thinking components were extracted to reveal their general thinking habits. Although the overall thinking performance patterns of the students always stay the same within one term, improvement can be observed by analyzing individual thinking components


Introduction
Since its full launch in 2012, the General Education Foundation (GEF) Programme, consisting of two courses, "In Dialogue with Humanity" and "In Dialogue with Nature", has become an important part of the core curriculum for all undergraduate students at the Chinese University of Hong Kong (CUHK). The two GEF courses aim to improve students' thinking capabilities and enhance their academic preparedness for university study and life. Both courses emphasize classics reading and seminar discussion, engaging students in dialogues with and on those classics which are inspiring to the enduring questions of humanity and have profoundly influenced human society. "In Dialogue with Humanity" guides students to examine the ideas about the good life and good society; in the course, students will encounter excerpts from Bible, Qur'an, Plato's Symposium, Analects of Confucius, Rousseau's Social Contract, Max's The Economic and Philosophic Manuscripts of 1844 and more. "In Dialogue with Nature," on the other hand, voyages through human knowledge that attempts to understand and reflect on our position in nature. In this course, students will read Plato's Republic, Darwin's On the Original of Species, Carson's Silent Spring and James Watson's DNA: the Secret of Life. The complete reading lists of the two courses can be found in Appendix A.
Besides reading, students participate in weekly tutorial discussions to share and construct their understanding of these classics. In the middle and at the end of the term, they are required to submit two to three written assignments to organize, consolidate, and reflect on their understanding. With such course and assessment designs, the GEF Programme aims to help students to develop active attitudes and critical thinking skills essential for independent learning.
To monitor and evaluate the effectiveness of the GEF Programme, both quantitative and qualitative approaches have been centrally administrated to measure the reception of the two courses since its 2012 launch. Such has been done through a termly Course and Teaching Evaluation (CTE) questionnaire and yearly focus group interviews, respectively. The results show that both courses are generally well-received by students and other stakeholders from the university to society at large. Besides, the GEF Programme and individual course teachers hope to look for other ways to understand their students and get further informed on their teaching, motivating the study discussed in this paper.

Narrative Qualitative Approach and Thinking Complexity
The study stems from the Qualitative Narrative Assessment Project organized by Association for Core Texts and Courses (ACTC); it led to the formation of the Narrative Qualitative Analysis (NQA) team in the GEF Programme at CUHK and the NQA Project from 2014 to 2017. The narrative qualitative approach provides an attractive alternative to supplement the above-mentioned methods of course assessment. First, the narrative qualitative approach is different from both CTE and focus group interviews which depend on students' subjective recollection of their learning experience; instead, it aims to identify objective evidence of student learning by examining their written assignments. More than that, the narrative approach enables the course assessment to go beyond a mere set of numbers obtained from CTE, and to provide systematic and concrete descriptions of student learning. The 2014-2017 NQA project employs the narrative qualitative approach and puts the research focus on students' thinking complexity, a shared intended learning outcome of both GEF courses.
In this paper, thinking complexity refers to the so-called reflective thinking discussed by John Dewey in the 1930s (Dewey, 1933(Dewey, , 1938King & Kitchener, 1994). According to Dewey, people start to think truly reflectively only when dealing with open-ended problems, which contain controversy and uncertainty where no answer can be provided by formal logic. For instance, calculating the trajectory of a Newtonian motion may require acute thinking skills, but it has nothing to do with reflective thinking; yet, in the two GEF courses, the questions under discussion often fall into the category of open-ended ones. For example, the students are also required to understand Newtonian Mechanics, but the objective is no longer to complete a calculation, but to evaluate its importance.
In the 1990s, King and Kitchener developed a new theoretical model, the Reflective Judgement Model (RJM), to describe the developmental progression of reflective thinking from childhood to adulthood. Inspired by the developmental stage theories of Piaget (1960Piaget ( , 1970Piaget ( , [1956 1974; King & Kitchener, 1994) and the skill theory model of Fischer (1980;King & Kitchener, 1994), King and Kitchener proposed seven developmental stages in RJM based on an interrelated network of assumptions about the nature of knowledge and approaches to justification as people reason about open-ended problems (King & Kitchener, 1994).
Combining the new theoretical model of RJM and Fischer's later developed dynamic skill theory (Fischer & Bidell, 1998), Wolcott and Lynch (2001;Wolcott, 2006) proposed a model that can be applied in educational practices to boost and evaluate students' performance. The Wolcott-Lynch Model forms the theoretical foundation for the NQA study discussed in this paper and shall be discussed briefly in the following.

The Wolcott-Lynch Model
The Wolcott-Lynch Model compressed the seven developmental stages in RJM into five consecutive steps. Each step shows a qualitative difference characterized by the corresponding thinking skill, namely, knowing, identifying, exploring, prioritizing, and envisioning from Step 0 to Step 4, respectively (Wolcott & Lynch, 2001;Wolcott, 2006). As shown in Figure 1, at Step 0, students already have foundation knowledge and skills to solve problems, including repeating and paraphrasing information from textbooks, conducting computations, etc. However, at this step, they do not acknowledge or even perceive the uncertainties underlying open-ended problems and dichotomously consider knowledge as only "right" or "wrong". Regarding students' approach to justification, they either depend on unexamined personal opinions or heavily rely on experts to provide the "correct" answer. Basically, Step 0 skills show no reflective thinking at all, corresponding to the pre-reflective thinking in RJM (King & Kitchener, 1994).
Steps 1 and 2 are quasi-reflective according to RJM (King & Kitchener, 1994), but they are different in approaches to justification. At Step 1, individuals not only can identify relevant information related to the open-ended problems but also realize such problems cannot be solved with certainty. They accept that individuals may have different opinions on open-ended problems due to enduring uncertainties; experts, too, disagree with each other because of their own biases. However, individuals with Step 1 skills cannot differentiate their personal opinions from experts' examined views and cannot differentiate various views from experts either. When asked to justify their viewpoints, they tend to choose and stack up supporting evidence and ignore contrary information. When moving to Step 2 skills, individuals gradually realize the inter-coordination between knowledge and justification, namely, to "relate evidence and arguments to knowing" (King & Kitchener, 1994, p. 63). "Quality of evidence, and not just quantity, is important in Step 2" (Wolcott, 2006, p. 1-8). They value the process of exploring, going beyond personal perspectives to examine different solutions to an open-ended problem and analyze the underlying assumptions, leading to a well-balanced analysis of the open-ended problem. However, at this step, individuals still feel difficult and reluctant to make decisions among different options.
Further up, Steps 3 and 4 are truly reflective in RJM (King & Kitchener, 1994). At Step 3, individuals begin to generate overarching guidelines to prioritize among different perspectives of the open-ended problem and to evaluate different options carefully and as objectively as possible. The ability to prioritize enables individuals to make well-founded decisions and put them into implementation. Compared with Step 4, at Step 3, individuals still lack the ability to take the "long view" in dealing with open-ended problems, namely, the ability to refine their approaches under new situations constantly. This higher level of thinking ability allows individuals to consider open-ended problems as ongoing inquiries and stay alert on current solutions' limitations. When new information becomes available, they may be able to envision innovative strategies beyond current approaches.
As explained in the previous paragraphs, the stepwise progression of thinking complexity in the Wolcott-Lynch Model reveals the influence of stage theories in the developmental psychology of education. Indeed, Wolcott and Lynch believe that these Steps are developed consecutively, in the sense that less complex lower-step skills are necessary precursors to more complex upper-step skills (Wolcott, 2006). Better performance in lower-step skills can promote the appearance and development of higher-step skills. Nevertheless, in practice, when dealing with the open-ended problems, students do not perform at a single static step as in the ideal model. Instead, according to Fischer's Dynamics Skill Theory, a person performs in a developmental range covering multiple steps. This is represented in the Wolcott-Lynch model by the ladder used by the person to move between different steps. Based on a person's performances in different steps, the Wolcott-Lynch conceptual model evolves into the Wolcott-Lynch Thinking Performance Patterns.

Wolcott-Lynch Thinking Performance Patterns
Based on a person's performance in different thinking steps of the Wolcott-Lynch Model, he/she can be categorized into different Thinking Performance Patterns, from less to more complex corresponding to "Confused Fact-Finder", "Biased Jumper", "Perpetual Analyzer", "Pragmatic Performer" and "Strategic Re-Visioner". For example, given the hierarchical progression from lower to higher steps, if a person performs poorly on Foundation knowledge and skills, he/she cannot perform well on the more complex steps. In this case, he/she belongs to Performance Patter 0, "Confused Fact-Finder". Another person with improved thinking complexity may already master the skills in Steps 1 and 2 but is still weak in more complex skills in Steps 3 and 4. Then, he can be categorized as Performance Pattern 2, "Perpetual Analyzer". "Strategic Re-Visioner" is the most complex Performance Pattern, meaning that an individual in this pattern can perform well in all four steps. Each Thinking Performance Pattern demonstrates a distinct overall approach to addressing open-ended problems and has its strengths and weaknesses, as shown and described in Table 1.  (Fischer & Bidell, 1998).

Purpose and Objectives
The Wolcott-Lynch Model and Thinking Performance Patterns lay the theoretical foundation for the NQA study in this paper, which can be divided into two phases, as summarized in Table  2. Phase I is the NQA project conducted from 2014 to 2017. At this phase, the NQA team explored the Wolcott-Lynch Model and applied the Thinking Performance Patterns to evaluate students' written assignments and understand their overall thinking complexities. Based on the findings from the NQA project, members of the NQA team have carried out various extended studies in Phase II. One extension is introducing students' self-evaluation of their thinking complexity and comparing it with teachers' evaluation. Furthermore, in addition to students' overall Thinking Performance Patterns, micro-scale developments indicated by individual thinking components were also analyzed in the extended study. The framework of the two phases of study and their main results can be found in our recent paper (Wu et al., 2022). More detailed discussions on Phase I and some extended studies can also be found in the final report submitted to ACTC  and in the paper presented at the CUHK EXPO 2017(Chat et al., December 2017. The purpose of the current paper is to present the results of a specific extended study, focusing on the data collected in 2016-2017, Term 1 and the analyses carried out afterwards. In this study, students' selfevaluation was first introduced, and micro-scale analysis of individual thinking components was conducted systematically. This study hopes to shed light on the following questions: 1. How do students in the GEF courses understand and evaluate their thinking complexity? 2. Are the GEF courses able to improve or trigger any change in students' thinking complexity within one term?

Method
In order to answer the above reserach questions, a systematic study was conducted in the academic year 2016-2017 Term 1 in the GEF course "In Dialogue with Nature". It contains both students' self-evaluation and teacher's evaluation, as summarized in Table 3. After collecting students' first written assignment, called Reflective Journal, the teacher analyzed each student's writing based on the Thinking Performance Patterns.
After students submitted their final Term Papers, the teacher did the same analysis again.
In the first class, the course teacher invited students to self-evaluate their thinking complexity. The teacher first introduced the Wolcott-Lynch Model and the Thinking Performance Patterns. Each student was given one piece of paper with Table 1 to explain the overall approaches, strengths and weaknesses of each Thinking Performance Pattern. Then, the students were invited to evaluate their own Thinking Performance Patterns using Table 1 voluntarily. The self-evaluation results were sent to the teacher anonymously. 75 student self-evaluation data were collected in this way.
The teacher's evaluation was based on the two written assignments submitted in the middle and at the end of the term. In the middle of the term, the teacher released several open-ended problems related to the course to the students. They chose one topic and wrote a short essay called Reflective Journal of 600-800 words in English or 900-1500 words in Chinese. Similarly, at the end of the term, they must submit a longer final Term Paper of 1300-1500 words in English or 1900-2500 words in Chinese to address another open-ended problem related to the course. For Reflective Journal and Term Paper, the teacher used the same Table  1 to evaluate students' thinking complexity. For each submitted written assignment, the teacher assigned an overall Thinking Performance Pattern and highlighted the corresponding strengths and weaknesses based on the student's performances demonstated in the writing. Samples of the assessment can be found in Appendix B. In practice, sometime, the teacher might find it difficult to label the student within a single Thinking Performance Pattern; Instead, the student's performance lied between two consecutive Thinking Performance Patterns. In this case, the teacher would label the student in the middle of the two Thinking Performance Patterns. One sample for the intermediate case can also be found in Table B2 of Appendix B.
In the end, a total of 95 valid teacher evaluation data were collected, meaning that 95 students were assessed twice on both their Reflective Journals and Term Papers to trace their performance change within the term. Further analyses were carried out. The results are discussed in the following.

Results and Discussion
With the 75 student self-evaluation and 95 teacher evaluation data, further analyses were conducted to understand students' overall thinking performance patterns, analyze students' performances on micro-level thinking components, and trace their performance change within the term. This section will discuss the results of these studies.

Teacher's Evaluation
The result of teacher's evaluations on students' overall Thinking Performance Patterns is shown in Figure 2. As mentioned previously, the teacher might label the overall performance of a student's written assignment with a single Thinking Performance Pattern or in the intermediate state in between two consequent Thinking Performance Patterns, e.g., Confused Fact-Finder/Biased Jumper (See Table B2 in Appendix B). According to Figure 2, teaching's evaluation on Reflective Journal and Term Paper yield similar distributions on students' overall Thinking Performance Patterns, suggesting that their overall thinking complexity has no pronounced change within one term. To confirm the above observation, the change of every student's Thinking Performance Patterns from Reflective Journal to Term Paper was also traced, as depicted in Figure 3. Every student was assessed twice by the same teacher, once in the middle and once at the end of the term. To calculate the change, each Thinking Performance Pattern was assigned a number. 0 was assigned to Confused Fact-Finder, 1 to Biased Jumper, 2 to Perpetual Analyzer, 3 to Pragmatic Performer and 4 to Strategic Re-Visioner. The intermediate state will be assigned an extra 0.5; for example, if the student's assignment was assigned with an overall Thinking Performance Pattern of Confused Fact-Finder/Biased Jumper, it would get a value of 0.5. Similarly, Biased Jumper/Perpetual Analyzer would get 1.5. The change of a student's overall Thinking Performance Patterns was thus defined to be the value obtained at Term Paper minus the value obtained at Reflective Journal. For example, if a written assignment got 1 for Reflective Journal and 1.5 for Term Paper, the change would be 1.5 minus 1 equal 0.5. Positive change means an improvement from the middle to the end of the term, whereas negative change corresponds to a regression. Figure 3 shows the change of the students' overall Thinking Performance Patterns from Reflective Journal to Term Paper. Around 75% of the students' overall performance stayed almost unchanged, namely, the change was marginal or within one step, with the values of change lying between -0.5 to 0.5. Therefore, we may conclude that students' overall Thinking Performance Patterns are stable within one term, and no substantial change, like a stepwise improvement, usually happen in such a short period of time. The result may sound frustrating for educators who aim to improve students' thinking capabilities, but it is, in fact, consistent with other related studies. For instance, King and Kitchener's 10-year longitudinal study shows that cognitive development takes a long time to advance to a higher level. On average, it needs 3 to 4 years to move to the next cognitive stage (King & Kitchener, 1994

Number of Students
Percentage Standard Normal Distribution more complex ones; it explains why some students' overall Thinking Performance Patterns lie between two consecutive patterns, which suggests that the student might be on the way of developing towards a more complex pattern but not yet mastering the more advanced thinking skills stably. Therefore, in these cases, the students should still be categorized into the less complex Thinking Performance Patterns. For example, the teacher might assign an overall Thinking Performance Pattern of Biased Jumper/Perpetual Analyzer to a student's Reflective Journal. In order to compare the result with other studies, this written assignment will be counted as Biased Jumper in the comparison. Table 4 shows the distributions of students' overall Thinking Performance Patterns combining both students' self-evaluation and the teacher's evaluation of their two written assignments, together with the results of the 2014-2017 NQA project .   and comparable with other research to study college students in the U. S. (King & Kitchener, 1994;Lynch & Wolcott, 2001;Wolcott, 2006). Furthermore, as in the NQA project, more than 20% of the students performed like Confused Fact-Finder, and more than 50% performed like Biased Jumper. When adding together, around 80% of the students are clustered in the two lowest Thinking Performance Patterns, again comparable with other studies (Wolcott, 2006).
Interestingly, a distinct gap could be observed when comparing the mean values from the students' self-evaluation with those from the teacher's evaluation. Much higher than the teacher's evaluation, the mean value from the student side is 1.73, close to Perpetual Analyzer, suggesting students, on average, believe they are performing almost like Perpetual Analyzer. This nearly one-step difference between the students' self-evaluation and the teacher's analysis enables the educators to visualize the different expectations between the student and the teacher, a scenario that often happens in the classroom. On the one hand, it suggests that one must be careful to rely solely on students' self-reflection, e.g., CTE, to assess the effectiveness of a course; on the other hand, the difference also indicates a region that could inform the classroom teaching and deserves further study. Part of our continuous research follows along this direction.

Analysis on Individual Thinking Components
Other than overall Thinking Performance Patterns, which show no significant change within the term, individual thinking components of each Thinking Performance Pattern were also analyzed, to obtain detailed characteristics of students' thinking and demonstrate the microscale development within the term.

Individual Thinking Components: Improvements and Weaknesses
In the discussion, the term "individual thinking components" refers to the items listed in the "Common Weaknesses" and "Major Improvements" of each Thinking Performance Pattern in Table 1. A complete list of these individual thinking components is given in Table 5. To make it clear, individual thinking components in the same Thinking Performance Patterns have the same color, and each thinking component is assigned a code for later discussion. Inadequately identifies and addresses solution limitations and "next steps" As explained in the section of Method, for every assessed written assignment, the teacher not only assigned an overall Thinking Performance Pattern to it but also highlighted the corresponding thinking components that were demonstrated in the writing. Further analyses were conducted based on the 95 teacher evaluation data. Figure 4 and Figure 5 show the counts and percentages of each individual thinking component (represented by the code) in Reflective Journal and Term Paper, corresponding to the improvements and the weaknesses, respectively. Each count means one highlight of the corresponding thinking component, and the percentage is defined as the total counts of the component over the total number of students, namely, the occurring frequency of the corresponding component. Results from Reflective Journal and Term Paper are shown separately to trace the change from the middle to the end of the term.

A Narrative Portrait of Students' Thinking Complexity
Beyond a series of numbers, results depicted in Figure 4 and Figure 5 can be illustrated in word clouds to generate qualitative descriptions of students' strengths and weaknesses demonstrated in their writing. The word clouds are shown in Figure 6. The phrases in the word clouds are the short descriptions of the individual thinking components as listed in Table 5. Sizes of the words are proportional to the occurring frequencies (or the total counts), calculated as the average from both written assignments. From the word clouds, a general impression of students' thinking performance in their writing can be obtained. Most students could acknowledge the multiple perspectives lying behind open-ended problems and use evidence logically. Furthermore, some of them could identify the assumptions and biases behind different perspectives, and even attempted to control their own biases to give coherent and balanced descriptions. However, on the other hand, they were generally weak in justification. When making arguments, they tended to stack up evidence and jump to conclusions. Sometimes, they failed to break down the problems into small questions and had difficulties separating opinions from evidence and might insist on their own opinions.
With the word clouds in Figure 6, the NQA analysis starts to yield qualitative descriptions beyond merely simple labels of overall Thinking Performance Patterns. However, it is also obvious from Figure 4 and Figure 5 that students' performances vary not only between different patterns but also within each pattern. To draw an accurate and representative description of students' average thinking complexity and habits, further analysis can be conducted where the thinking components are classified into different Tiers based on their frequencies of occurrence.
The situation in the improvement thinking components is clear and straightforward. Individual thinking components in each Thinking Performance Pattern appear with similar frequencies.
For example, in both Reflective Journal and Term Paper analyses, thinking components of Biased Jumper appear most frequently. These thinking components are considered Tier 1. The improvement thinking components from Perpetual Analyzer come next, and they are Tier 2 components. One complication comes from "2iii Control own biases", which appears much less frequently in Reflective Journal; yet, when combined with the counts in Term Paper, the average is still much higher than the components in Pragmatic Performer. Therefore, it is still considered as a Tier 2 component. Following similar arguments, improvement thinking components in Pragmatic Performer and Strategic Re-Visioner belong to Tier 3 and Tier 4, respectively.
The situation is more complicated regarding the thinking components of weaknesses. Like the improvement situation, thinking components of Biased Jumper, except for "1f View experts as being opinionated", appear most frequently and belong to Tier 1. Things are different for Tier 2. In this case, the next frequent thinking components of weaknesses are not from Perpetual Analyzer but from Confused Fact-Finder. However, not all thinking components in this pattern behave equivalently. Most counts are clustered on the three thinking components "0f Apply evidence inappropriately", "0g Inappropriately cite textbook and 'facts'", and "0h Draw conclusion intuitively". Therefore, these three components from Confused Fact-Finder are considered Tier 2. Following the same logic, the thinking components 0d-0e, 1f plus 2a-2c are Tier 3, and the rest belong to Tier 4. Table 6 summarizes the four Tiers of thinking components and the average counts and percentages in each Tier. The "Average Counts" were calculated as the total counts in this Tier divided by the total number of thinking components from both Reflective Journal and Term Paper, and the "Average Percentage" were the average counts divided by the total number of students. According to Table 6, over 50% of the students demonstrated the Tier 1 thinking components in their written assignments, and over 25% demonstrated those in Tier 2. Thinking components in Tier 3 only appeared in around 10% of the students' writing and Tier 4 components barely appeared; thus, components in these two Tiers could be neglected. Focusing on the frequent thinking components in Tiers 1 and 2, a narrative portrait of students' thinking complexity demonstrated in their writing could be drawn, with both the strengths and the weaknesses. These thinking components are representative and characterize the students' thinking complexity when dealing with open-ended problems.  Figure 7 shows the narrative portrait, which enables the educators to see more details. It ranges over three Thinking Performance Patterns from Confused Fact-Finder to Perpetual Analyzer, suggesting a developmental region where teaching might take effect. As expected, thinking components of Biased Jumper appear most frequently in both students' strengths and weaknesses, but differences still exist between different thinking components. Moreover, thinking components from nearby performance patterns also appear frequently in the students' writing, including the strength items from Perpetual Analyzer and some of the weaknesses from Confused Fact-Finder. More interestingly, although no change in the overall Thinking Performance Patterns was observed within the term, micro-scale changes seem to appear in individual thinking components from Reflective Journal to Term Paper, which leads to our last analysis discussed in the paper.

Individual Thinking Components: Changes within the Term
As mentioned above, it is interesting to investigate the changes that happened in individual thinking components from Reflective Journal to Term Paper. Again, only those representative thinking components in Tiers 1 and 2 are considered because their total counts are substantial enough that their changes might have some indication beyond mere random fluctuations.
To carry out the analysis, we define the mean value of each thinking component as the total counts in this thinking component divided by the total number of students, equivalent to its occurring frequency. Take the thinking component "1a Jump to conclusions" as an example. It got 59 counts in Reflective Journal and 56 counts in Term Paper, out of 95 students. Therefore, the mean value of 1a at Reflective Journal equals 59/95=0.62, and its mean value at Term Paper is 56/95=0.59. In this definition, all Tiers 1 and 2 thinking components can be plotted in terms of their mean values and changes from Reflective Journal to Term Paper.  It is interesting to note that all the improvement thinking components have visible increases from Reflective Journal to Term Paper, indicating a general and stable progress of students' thinking complexity within the term. Performance on weakness components fluctuates and no clear pattern could be identified. Visible increases could be observed in a few components, meaning that the students' performance on these thinking components, unfortunately, worsened within the term. However, most weakness components increase only very slightly, and a few of them even get dropped on the mean values. Given that Term Paper is longer than Reflective Journal, the students often feel more challenging to prepare it; meanwhile, essay writing by itself is a task with enduring uncertainties. The observed complication might be understandable. On average, it is reasonable to conclude that compared with the improvement thinking components, the changes in the weaknesses are not obvious. To make it clear and specific, the average of the changes of the improvements and the weaknesses are calculated and compared. As shown in Table 7, the mean values of the improvements on average increase by 0.12 (55.6%) while that of the weaknesses increases marginally by 0.04 (11%). We interpret the results as follows: although the overall Thinking Performance Patterns of the students are stable within one term without any apparent improvement. By examining micro-scale individual thinking components, visible improvement could still be identified. Furthermore, this analysis also suggests a possible pathway for students to improve their thinking complexity. In a supportive educational environment, they may be exposed to some novel and sophisticated higher-level thinking skills, then gradually learn, and put them into practice, as shown by the visible improvement in their written assignments. On the other hand, in this progress, it is still challenging for them to overcome the weaknesses existing in the lower level of thinking that already become part of their thinking habits. Only through continuous educational support and practice, the students might be able to strengthen the higher-level skills and get rid of the old habits, and eventually advance to the next level of thinking. These conclusions are supported by the data and consistent with the internal logic of the Wolcott-Lynch Model and Fischer's dynamic skill theory. According to Fischer (1980, Fischer & Bidell 1998, the stage theories, including RJM, "capture only large jumps between levels, not the smaller micro-development steps described by this process" (King & Kitchener, 1994, p. 34-35). Instead, these microdevelopment steps, named "skill acquisition", were highlighted in Fischer's dynamic skill theory. The emergence of a new level of thinking must be constructed through a series of microdevelopment steps. A supportive and effective education environment provides a platform for students to develop and practice these higher-level skills, which are only haphazardly required in daily life events (King & Kitchener, 1994). In this context, the improvement we observed in students' writing can be seen as a piece of encouraging evidence to support the effectiveness of the course.

Conclusion
To conclude, the paper demonstrates that the NQA study, with its narrative qualitative analysis based on the Wolcott-Lynch Model, brings a wealth of information on students' thinking complexity to foster refinements on course teaching. Back to the two research questions raised at the beginning of the paper. For the first question, "how do students in the GEF courses understand and evaluate their own thinking complexity", an distinct gap was observed between the perceptions of the students and the teachers. From the teacher side, around 80% of the students clustered at the lowest two Thinking Performance Patters of Confused Fact-Finder and Biased Jumper. On average, they performed like Biased Jumper. However, the students perceived their own Thinking Performance Patterns near Perpetual Analyzer, almost one step higher than the teacher's evaluation. To answer the second research question, "can we observe any change in their thinking complexity within one term", unfortunately, consistent with other research, our analysis yields no change in students' overall Thinking Performance Patterns. However, when we move to study micro-scale individual thinking components, results are encouraging. The study reveals a narrative portrait of students' thinking complexity, with their strengths and weaknesses. In line with the result from the overall Thinking Performance Patterns, students' performance most frequently lies in the pattern of Biased Jumper, but spreads over a developmental range from the less complex Confused Fact-Finder to the more complex Perpetual Analyzer. Overall, the students can acknowledge the uncertainties and multiple perspectives lying behind open-ended problems, showing that the epistemology assumption of most students has already gone beyond the absolutist and gradually becomes the relativist. Some of them could demonstrate in their writing the more advanced and complicated skills of Perpetual Analyser. However, on the other hand, their skills of justification could not catch up with their epistemology progress. The typical weaknesses of Biased Jumper, e.g., stacking up evidence and jumping to conlcusion, persistently appeared in their writing, and they are generally weak in interpreting and evaluating evidence, as well as building up arguments to analyze the problems. Moreover, visible changes within the term could be identified from the analysis of individual thinking components, a piece of encouraging evidence to show the effectiveness of the course in improving students' thinking complexity. The result also suggests a possible pathway of student' cognitive development, which is intuitively reasonable.
However, challenges also get exposed in this study. First, the gap between the students' selfevaluation and the teacher's evaluation is interesting and deserves further exploration. Students' self-evaluation was first introduced in this study, but it was conducted in a straightforward manner; thus, it is questionable if the result is reliable. One of our further research is to redesign the students' self-evaluation and to confirm the gap observed between the teacher and the student. Results from the further study will be reported later. Second, it can be seen that the narrative qualitative analysis on both overall Thinking Performance Patterns and individual thinking components heavily relies on the Wolcott-Lynch Model. As a practical model designed originally for business students, this model does have limitations when applying to the GEF courses, which focus on reading classics and seminar discussion. Individual thinking components should be carefully chosen and fined-tuned to carry out meaningful analyses in the future. Finally, this study is solely based on students' writing, exhibiting only part of the students' thinking abilities. If a comprehensive understanding of students' overall thinking complexity is seriously wanted, other methods like interviews must be included.

Funding
This paper is under the support of the General Education Foundation (GEF) Programme of the Office of University General Education at the Chinese University of Hong Kong.