Part 4: Data Collection & Analysis Methodology

U T A with star in the center, used when staff photo is unavailable

by Martin Wallace
April 12 2019

[This is the fourth part in a series of blog posts about our data gathering, processing and analysis methodology.]

In this post, I will tie up a few loose ends regarding data collection and cleanup, potential problems with our methods and how we have corrected for them with version 2.0 of the Maker Literacies program, and then provide some examples of how the data can be used for analysis.

Description of the pre- and post-self-assessment surveys

Keep in mind that everything covered up to this point has been an experimental proof-of-concept pilot, including the program itself, data collection methods, and analysis, all built upon the foundation of our beta phase list of maker competencies. While the list of beta phase maker competencies has been rescinded and superseded by a revised list, the original list can still be downloaded for reference. The beta list may be particularly useful while exploring and analyzing the data that resulted from the pilot program.

In our pre- and post-self-assessment surveys, we included Likert scale questions for makerspace equipment knowledge plus competencies 1-6 and 9. [Refer to the beta list for exact wording of the competencies, and to survey question banks (included with the data) for exact wording of the Likert scales and other questions.] We ran Cronbach alpha analysis on the Likert scales using the pilot data and determined that all scales were valid with the exception of the scale for competency 5, “employs effective knowledge management practices”. [Note: we have corrected for this in version 2.0, among other improvements in our surveys.]

Key to our data collection and analysis is the use of a pre- measurement and two post- measurements, which I’ll call “Reflection” and “Now”. During the post-self-assessment, students are asked to think back to the beginning of the semester and re-rate their competency before completing their makerspace project, and then, of course, to rate their competency after completing the project. An example is shown here, but you can refer to the survey question banks to see how these were written for each competency. [Note: we have since conformed all Likert scales for version 2.0 in line with best-practices as identified in the literature, making them more uniform in wording.]

JUSTIFICATION FOR “REFLECTION” AND “NOW”

While it may seem odd to ask students to reflect back and re-rate their competencies prior to completing a makerspace project, we did this to avoid an exhibited problem inherent with simply asking students to rate themselves at the beginning and then again at the end of the course. Students tend to over estimate their competence at the beginning of the semester, because they either do not fully understand what it is that they are rating themselves on, or because they do not yet know what they do not know. As will be shown by the data, students tend to rate their competence lower on the post-self-assessment survey “Reflection” than they did on the pre-self-assessment survey. In some cases, they even rate themselves lower on the “Now” than they did on the pre-self-assessment. This makes it look as though they are leaving the course with less competence than they had when they entered the course. By adding the “Reflection” and “Now” measurements to the post-self-assessment survey, forcing them to think back to the beginning of the course, they are likely to provide a more realistic response regarding their competence at the beginning and end of the semester. Further, they are providing us a relative measure of how much competence that they believe they gained through completing the makerspace project, which is what we are primarily interested in measuring.

Well-noted criticism for asking students to re-rate how they perceived their competence prior to completing the makerspace project at the end of the semester is that 1) students will be biased by the knowledge that they have gained from completing the project when re-rating their initial competence and 2) students seeking social desirability will game the scales in order to show that they have gained competence because they believe that’s what we want to see.

As for the first concern, we were already aware that students tend to overrate their competence at the beginning of the semester and that we shouldn’t rely on the pre- rating as an accurate measure… and our data bears this out. In fact, we believe the “Reflect” and “Now” measures will be more accurate, even if biased, than simply using a pre- and post- rating. The objection is certainly pertinent when measuring objective knowledge, such as in pre- and post-testing (where students are expected to supply correct answers to specific questions), but not when measuring subjective self-reported levels of competence.

As for the second concern, in version 2.0 we have incorporated Reynolds Short Form C of the Marlowe-Crowne Social Desirability Scale into the post-self-assessment surveys in order to detect these types of responses and weight them accordingly in our analysis. Keep in mind that the data used in this analysis has not incorporated any correction for social desirability.

JUSTIFICATION FOR IDENTIFYING AND REMOVING POTENTIALLY UNRELIABLE RESPONSES

In my previous post in this series, I went into great depth to explain a heuristic model for identifying and removing potentially unreliable responses, based on a 1979 paper by Horowitz & Golob. Our assumptions were that some students would have rushed through the survey without fully reading the questions, or worse, simply clicked through the survey without reading the questions at all. We applied a modified version of the Horowitz & Golob model that compares each student’s pre-self-assessment ratings to their post-self-assessment “Reflection” ratings, believing that the “Reflection” ratings would more strongly correlate with the pre- ratings than the “Now” ratings would. While this heuristic model leaves open a lot of room for subjectivity, a handful of clearly unreliable responses do emerge from the data, allowing us to remove them from the data set prior to analysis.

Criticism of this approach warns us not to remove any results from the data, because this leaves us open to accusations of cherry picking data, or more seriously, p-hacking our data in order to achieve statistically significant results. In my examples below I will show examples from both the “Raw” data and the “Reliable” data (as determined by my own subjective heuristic criteria outlined in my previous post). It remains an open questions as to whether or not to use the raw or reliable data, and certainly it will depend on the context for which the data is being used. For our current purposes, which mostly pertain to program improvement and demonstration of proof of concept, we believe the “Reliable” data is… well… more reliable. When we venture into scholarly publications where p-values become more important, we’ll consider using the raw data for analysis. As you’ll see from our current preliminary analysis, there’s not much difference between the two versions of the data, but that may be because I weeded out potentially unreliable results too conservatively. I’ll be closely monitoring the significance of this dilemma as I continue working with the data.

OTHER CONSIDERATIONS

In addition to the above problems and criticism, we have either already reconfigured our version 2.0 surveys to take into consideration the following circumstances, or we are actively seeking solutions. Each of these has been identified as problematic by the literature and we’re implementing best-practices from that same literature in order to improve our survey methods; each will be addressed in future blog posts. These include “Likert scale left bias” indicating that people are more likely to select the response options located on the left side of the scale; “Negatively worded stems” indicating that negatively worded stems tend to confuse individuals, causing them to select the responses that are opposite to their beliefs; and “Bidirectional response options” intended to help identify potentially unreliable responses.

DATA ANALYSIS

So, finally, we can get into some simple analysis of the data. To safely proceed, I’m going to demonstrate four “views” of data from selected courses in order to take into consideration the criticisms of our data methodology as explained above. I define these as such:

TWO DATA SETS

Raw data—responses of all students who opted in to participate in the study
Reliable data—the most potentially unreliable responses removed from the raw data

TWO COMPARISONS

Pre and Post/Now
Post/Reflection and Post/Now

FOUR VIEWS OF THE DATA

Raw Pre and Post/Now
Reliable Pre and Post/Now
Raw Post/Reflection and Post/Now
Reliable Post/Reflection and Post/Now

EXAMPLE 1: IE 4340 ENGINEERING PROJECT MANAGEMENT

In IE 4340: Engineering Project Management, we measured students’ abilities to assemble effective teams, which is competency number four in the beta list of maker competencies. We collected data over three semesters (fall 2017, spring and fall 2018) and had 52 participating students.

Competency 4 is framed thusly:

Competency 4: Assembles effective teams

4.a. Recognizes opportunities to collaborate with others

4.b. Evaluates the costs & benefits of “DoingitTogether” (DIT) vs. “DoingitYourself” (DIY)

4.c. Seeks team members with skills appropriate for specific project requirements

4.d. Joins a team where his/her skills are sought and valued

In our surveys, each of the enumerated dimensions 4.a through 4.d are presented as Likert scales where students rate their competence on a scale of 1-5, where 1=no competence and 5=highly competent.

FLAGGED DATA.

In order to get my Reliable data, I apply the heuristic model for identifying potentially unreliable results and remove the potentially unreliable responses. Here are some examples, with explanations:

COMPARISON OF MEANS

In order to compare the means, we do the following: Competency 4 has four scales, as indicated above. All scales range from 1-5, with 1 being no competency and 5 being high competency. A student’s score is calculated by averaging the four scales together. Aggregate scores are calculated by averaging all student scores together. The outcomes comparisons below use aggregate scores.

Here we have all four views combined in to bar graphs. The first shows raw data and the second shows reliable data, and in each we can compare both Pre to Post/Now and Post/Reflection to Post/Now.

RAW ASSEMBLES EFFECTIVE TEAMS

From the data we get the following means for comparison:

N=52 students
Pre: 3.2019
Post/Reflection: 3.0385
Post/Now: 3.5769

From this we see that on average, students overestimated their competence assembling effective teams in the pre-self-assessment survey by 11.0639% difference (comparing their pre-survey responses to their post/Reflection responses), and gained 17.7193% increase in their ability to assemble effective teams (comparing their post/Reflection responses to their post/Now responses).

RELIABLE ASSEMBLES EFFECTIVE TEAMS

From the data we get the following means for comparison:

N=43 students (after nine were eliminated due to being highly unreliable responses)
Pre: 3.2093
Post/Reflection: 2.9826
Post/Now: 3.5233

From this we see that on average, students overestimated their competence assembling effective teams in the pre-self-assessment survey by 9.32775% difference (comparing their pre-survey responses to their post/Reflection responses), and gained 18.1285% increase in their ability to assemble effective teams (comparing their post/Reflection responses to their post/Now responses).

EXAMPLE 1 SUMMARY

We can see a couple of things from this analysis. First, our prediction that students tend to overestimate their competence in assembling effective teams when completing the pre-self-assessment survey holds true, as exemplified in both the raw data and the reliable data. We can begin to see that the reflection element of the post-self-assessment survey may be a more accurate measure of students’ competency prior to completing the makerspace project.

Second, we can see that both the raw and reliable data show similar patterns with negligible differences after having eliminated only nine responses due to being potentially unreliable. As already mentioned, the method for eliminating potentially unreliable results is largely subjective, and I used a conservative approach in this example that only eliminated nine responses. Perhaps by eliminating more potentially unreliable results would break this pattern. Once we have collected more data and are able to remove more results while still maintaining a large sample, we will see a more significant difference between raw and reliable data.

In conclusion, based on this analysis, we can say that using the Post/Reflection data is a more accurate data point for measuring student learning than using pre-self-assessment data, and that raw data vs. reliable data needs further review. In my next post, I will further support the first conclusion by showing how students overestimate their competence before completing a makerspace project to the extent that data shows that they left the course with less competence than they began with, by comparing the Pre data with the Post/Now data in a course that measures two competencies: Competency 1: Identifies the need to invent, design, fabricate, build, repurpose or repair some “thing” in order to express an idea or emotion, or to solve a problem; and Competency 3. Demonstrates time management best practices.

Add new comment

Your name

Comment

About text formats

Restricted HTML

Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <button> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
Lines and paragraphs break automatically.
Web page addresses and email addresses turn into links automatically.

CAPTCHA

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

Part 4: Data Collection & Analysis Methodology

Related Topics

RAW ASSEMBLES EFFECTIVE TEAMS

RELIABLE ASSEMBLES EFFECTIVE TEAMS

Add new comment

Restricted HTML