Part 3: Instructions For Identifying Potentially Unreliable Responses In The Maker Literacies Data

U T A with star in the center, used when staff photo is unavailable

by Martin Wallace

[This is the third in a series of blog posts about our data gathering, processing and analysis methodology.]

Picking up where I left off in my last post, in this post I will explain the process for identifying unreliable responses in the Public Pre-Post Combined Assessment Data FA17-FA18. For the purposes of this post, I am including a revised version of the Excel file, where I have used three criteria to help identify potentially unreliable results: intraclass correlation coefficient (ICC), average absolute deviation (AAD), and survey response durations from both pre- and post-self-evaluations. Take a moment to download both data files and open them so you can follow along.

Note that the revised version of the data will not be placed in the Mavs Dataverse. The version I am sharing here is for example only, so that you can later take the original data and prune potentially unreliable responses according to your own criterion thresholds. Since the criteria for determining which responses might be unreliable are subjective, others may want to repeat the process to see if they come up with similar conclusions, or tweak their thresholds to be more restrictive than my conservative example. You may also opt to not prune any responses and rely on statistical methods to identify and remove outliers (a topic for another post).

As you scroll down the left side of the Data sheet, you’ll notice many cells are highlighted with an assortment of different colors. Some rows have one or two colors; some have more dotting the cells further to their right. In total there are eighteen different flag colors: eight ICCs (equipment knowledge, competencies 1-6, and competency 9), eight AADs (equipment knowledge, competencies 1-6, and competency 9), and two durations (pre-survey and post-survey). While I won’t explain every color here, each color is a flag indicating that the response is among the least reliable responses according to one of the three criteria: ICC, AAD or duration. The more colors you see in a row, the less reliable it is among all potentially unreliable responses. Rows that have three or more colors you might just want to remove from the data outright. Rows that have fewer colors may have unreliable data in the flagged ICC, AAD or duration sub-set, but still have reliable data in its non-flagged areas. In my sample data, I have left all rows intact with none hidden or deleted. I will determine reliability of each row on a case-by-case basis while performing the data analysis for each specific sub-set of data (i.e., for each competency). Having them flagged will help ease and expedite the analysis process.

While you’re scrolling down through the data, you should also notice just how many rows are flagged with at least one color. If I were to remove every row that was flagged, it would only leave about half of the total responses (just by skimming, I haven’t actually counted); and this, after I used what I believed to be very conservative measures for placing responses into the potentially unreliable category (more on this, shortly).

The process for calculating ICCs and AADs required first copying the target data sub-set from the master Data sheet into two separate sheets, one for ICC and one for AAD (I’ve left those sheets intact in my example), deleting and rearranging some columns, hiding empty rows, and then using some of Excel’s built-in functions on the remaining data. The idea is to get a discrete set of data for every competency measured into its own Excel sheet. Once you have the data copied and trimmed into new sheets, you perform the functions for ICC and AAD, and copy the data calculated in those columns into new columns on the master Data sheet. (Note, you will need to add eight columns for AAD; empty columns for all eight ICCs are already in the public data set.)

Because this process was lengthy and time consuming, I’m only going to describe in-depth the first iteration of getting ICC on one sub-set of data, the Knowledge of Makerspace Equipment set. I performed this identical procedure on all remaining sub-sets of data, including competencies 1-6 and 9.  

Similarly, since the process for flagging potentially problematic AADs is extremely similar to flagging potentially problematic ICCs, I will only explain in depth those aspects that are different from ICC, while referring to relevant bits from the ICC examples when possible. Again I will be using the Knowledge of Makerspace Equipment data sub-set.

HEURISTIC MODEL FOR IDENTIFYING POTENTIALLY UNRELIABLE RESPONSES

Before I begin to explain the process for calculating ICC and AAD in detail, I will first explain why I’m using ICC, AAD and survey durations as a heuristic model for identifying potentially unreliable responses in the data. Essentially, we are comparing two sets of answers to essentially the same sets of questions from the same student—once before the student completes a project in the makerspace, and once after the students completes their project. While one would expect to see some variance, we wouldn’t expect wildly different answers the second time around, on the whole, from the first time. ICC and AAD can be thought of as ways to detect this wild variance.

Intraclass Correlation Coefficients show the strength of a relationship between two sets of data, and whether the relationship is direct or inverse. It is a probability ratio ranging between -1 to 1 that can be used to help identify, in our case, responses that may not be reliable. The further the number is from zero, the stronger the correlation is, and the more reliable it is presumed to be. Note that ICC is way of comparing one set of data to another; in our case it is comparing a set of answers to a group pre-self-assessment survey questions to a corresponding set of answers in a group of post-self-assessment survey questions. ICC does not care about each individual answer in pre- compared to its counterpart in post-. In other words, it is looking for a pattern, or trend.

Average Absolute Deviations, on the other hand, show a measure of the average variance between individual data points in two sets of data. In the case of AAD, the higher the number—the more variance—the more indicative of an unreliable response. AAD is not looking for a pattern, but at the actual variance in values among individual data values between two set. In our case, for a set of answers to a group of questions from the pre-self-assessment survey, each answer is compared directly to its corresponding answer from the post-self-assessment, then all the differences are averaged to calculate AAD.

Both ICC and AAD are probabilistic variables prone to subjectivity; where you set the threshold for what gets flagged as “potentially unreliable” and what makes the cut is a matter deserving further investigation beyond the scope of this blog post. Hopefully you’ll get a feel for what I mean by all of this as I go through my example using the “knowledge of makerspace equipment” survey responses in the next section.

Survey duration is pretty straightforward, requiring no calculations. The durations are listed as columns in the data. You can use a histogram to identify extreme outliers and flag those as potentially unreliable. A survey response that lasted less than one minute (again, this is subjective) could implicate a student who quickly clicked through the survey without reading the questions. A survey response that lasted hours could implicate a student who was trying to respond to the survey while distracted with other activities, and not giving their full attention to the survey.

IN DEPTH INSTRUCTIONS

First, download and save the data as an Excel file, and then create a copy of the file by saving it “as” with a new name. I added the word “Reliable” to the end of mine, like this: Public Pre-Post Combined Assessment Data FA17-FA18 Reliable. This way, if you mess up, you can always go back to the original file and begin again.

INTRACLASS CORRELATION COEFFICIENTS

For “Knowledge of Makerspace Equipment” and each Competency (1-6 & 9), you will calculate an ICC following this process; my example uses the makerspace equipment data:

  1. Freeze the pane to right of target ICC column so that target ICC column remains static when scrolling left and right. You may have to first unfreeze the pane if it is already frozen at another location. Use your favorite search engine for “excel freeze pane” for instructions.
  2. Scroll to the right to the first column of the target data. In my example that is the heading “Pre.knowledge of makerspace equipment - 3D Printing”. Select this and all of the target Likert scale data headings to its right. In my case, I want all Likert scale headings about knowledge of makerspace equipment selected. These will always be consecutive columns that can be selected using one Shift-click to grab them all.
    • Do NOT select the multiple choice or text entry data that may be before or after the Likert scale data. In all cases Likert scale data will be groups of 3 columns: a pre- column, and post- before/beginning column, and a post- now/end column.
  3. With the column headers still selected, scroll to the very bottom of the data, row 347. Hold down Shift and click in the cell in the bottom row (row 347) directly beneath the last column heading in your target data. If done correctly, you will now have all target data selected. Do Ctrl-C to copy the data.
  4. Open a new sheet/tab and rename it to match the ICC you are working with, for example, Equip_ICC. While in cell A1 in this new sheet, do Shift-F10 followed by the letter V to paste the values (not the equations) that you copied from the Data sheet. Immediately go to the Insert tab and select Table. If Table is not available to you (if it’s “grayed out”) then you miraculously managed to paste the data as a table.
    • When inserting the table, be sure all your data is still selected from when you pasted it. It should remain selected after pasting, but if you clicked anywhere you’ll need to go back and select it again.
    • When creating the table, since you also copied/pasted the column headings, be sure to select the checkbox “My table has headers”.

For conducting ICC and AAD analysis of two data sets, we want to use the pre-self-assessment as one data set, and specifically the reflective versions of the same questions from the post-self-assessment as our second data set. Therefore:

  1. I delete all post- now/end columns from the data (every third column), leaving only data from the pre-survey and post- before/beginning columns.
  2. Next, separate pre- columns from post- columns into two groups, while keeping everything in order. You should have equal number of pre- columns on the left, and post- columns on the right. (Note: if you are confused about any of this, refer to my example data for clarification.)

It is at this juncture that you should flag and hide all the rows where either the pre- or post- (or both) data is completely empty. But first let’s get our work environment configured for the remainder of work ahead of us. This will make the remainder of this tutorial a thousand times easier. If you skip this step, you’ll find yourself scrolling left and right endlessly.

  1. Select the entire table using the southeast pointing arrowhead in the top left of the spreadsheet. With the whole table selected, click and drag column edges until all columns appear within view. Since all columns are selected, when you shrink the size of one column, the other columns automatically shrink to the same size.

Now that you no longer have to scroll left and right, you can scroll down the sheet to find empty rows, and rows where either the pre- or post- data is completely empty. These rows should certainly be flagged as potentially unreliable.

  1. Go down all 347 rows, flag the rows with a color of your choice (I recommend bright red), and hide them. Use your favorite search engine for “excel hide rows” for instructions.
  2. Now that all the data is in its proper place, insert a new empty column at the far left of the sheet. Label it ICC.
  3. Select the top cell in the new ICC column. Paste the formula =ABS(CORREL(array1,array2)) into the formula bar.
    • Still in the formula bar, click on the word array1. A small helper bar will appear underneath with the formula. In that little helper bar, again click on “array 1”. Now click and drag across all the target Pre data in that row (B2 through W2 in my example).
    • Click again in the little helper bar, but this time on the word “array2”. Click and drag across all the target Post data in that row (X2 through AS2).
    • Now you have both data arrays selected. Let go of the mouse button. Don’t click anything.

After doing the above, my formula for the equipment ICC looks like this: =ABS(CORREL(B2:W2,X2:AS2)). If you are following along precisely, your data will always begin in cell B2. The rest will vary depending on how many columns are in the competency Likert data, but more or less look similar to mine.

  1. Now hit enter. The Intraclass correlation coefficient should now appear in cell A2. I noticed that sometime when I did this, Excel auto-fills the cells below with their ICCs, and in other times I had to do it manually. So:
    • If it didn’t do it automatically, go ahead and fill the column down with the formula by selecting cell A2, then do Shift-Ctrl-down arrow all at once. This will select all the cells in the column. Then in your Home tab, go over to the Editing pane and select Filldown.

Since I am looking for distance from zero to determine my threshold for what is and what isn’t considered reliable, I appended the ABS() function to get the absolute value of the ICC, to convert negative values to positive ones. This will allow better visualization of distances from zero in a histogram. More on that later when I look at histograms.

A sticking point that you will immediately notice is that many of the cells are populated with divide by zero errors (#DIV/0!). This error indicates that due to either the pre-answers or the post-answers having an average standard deviation of zero, the coefficient can’t be calculated. This happens when all values in either the pre- or post- data are equal. Since nothing can be assumed by this data, I’m playing it safe and not flagging them as potentially unreliable. However, I am going to hide all rows with this error so that it doesn’t affect further analysis of the correlation coefficients. Use the filter in the column header to uncheck the value #DIV/0!, leaving only numerical values in the data set.

Now I can create a histogram of the remaining visible data in the ICC column to try to identify the potentially unreliable responses. Select all the data (not the headers) in the column, switch to the Insert tab in Excel and select Histogram in the Charts pane. A histogram will appear. At the bottom of the histogram you’ll see numbers ranging from the lowest ICC to the highest ICC, subdivided into bins (bars) that represent how many rows of data fall into each range.

In my example, I will view the histogram in its default representation. For ICCs, the number of responses in the furthest-left column in the histogram holds the data rows with the lowest ICCs, representing the least reliable results. In my example, this results in 19 rows with values of 0.17 or lower; I will flag the 19 rows in the data by adding background color other than bright red, since I’ve already declared that color represents missing data. To do this without resorting the data (which will result in mis-aligned data when adding our flagged rows back the master data sheet) we can use the filter in the ICC column once again, and deselect all values greater than 0.17.

The above process is where subjectivity begins to play a part in this process. I chose to be very conservative by identifying only the bin to the far left as representing potentially unreliable results. Since we are dealing with probabilities here, it is likely that some of the data in the left bar are perfectly reliable and that some in bins to its right contain unreliable results. But the whole purpose of this process is to create a heuristic approach that will save us time from looking at every response individually and subjectively determining the reliability of each response. Likewise, bins to the right of this no doubt will contains some unreliable results. You may decide to flag responses in the second, third, or even fourth bin; but remember, the more rows you flag, the fewer rows you’ll have remaining in your data and as I’ve already established, using my conservative measure, we’ve already flagged about half of our total rows.

Before moving on, next I need to unhide all of the rows in the ICC sheet that I have hidden throughout this process—the empty rows, the rows where the respondent didn’t answer the questions in either the pre- or post-survey, and the rows with the divide by zero errors. If you skip this critical step, your data will be misaligned when you copy it back to the master Data sheet. To unhide all the rows, first use the filter in the column header and check Select All so that the values #DIV/0! and values greater than 0.17 once again appear, then select the entire sheet, right click anywhere in the data, and select Unhide to reveal the rows with missing data.

Copy the data (not the header) in the new column, then go back to the beginning Data tab. Select the top empty cell in the target ICC column (not the header). You will need to paste both the value (not the formula) and the background color from the ICC sheet. To do this, press Shift-F10 and in the menu that pops up, hover over “Paste Special” and select “Values & Source Formatting”. This will paste the values and color formatting from the ICC sheet. This concludes the instructions for calculating ICC’s for each student respondents’ familiarity with makerspace equipment pre- and post-self-evaluations. Repeat this process for each competency in the data (competencies 1-6, plus 9) until all of blank ICC columns are filled in.

AVERAGE ABSOLUTE DEVIATION

Since this procedure is nearly identical as the instructions for calculating ICC, I’m listing here only those steps that deviate from the instructions above.

  1. Blank ICC columns were already available in the worksheet. For AAD, add a new empty column to the right of each of the ICC columns. Label the column headers similarly to how you labeled ICC columns.
  1. Open a new sheet/tab and rename it to match the AAD you are working with, for example, Equip_AAD. Follow the remainder of instructions in number 4, above. Rather than list all re-labeling instructions here, simply substitute AAD for ICC in the remainder of the ICC instructions above.
  1. Select the top cell in the new ICC column. Paste the formula =AVEDEV(array1,array2) into the formula bar.
    • Still in the formula bar, click on the word array1. A small helper bar will appear underneath with the formula. In that little helper bar, click on “number 1”. Now click and drag across all the target Pre data in that row (B2 through W2 in my example).
    • Click again in the little helper bar, but this time on the word “number 2”. Click and drag across all the target Post data in that row (X2 through AS2).
    • Now you have all the data you need selected. Ignore “number 3” in the helper bar. Let go of the mouse button. Verify your formula for the equipment AAD looks like this: =AVEDEV(B2:W2,X2:AS2).

While you should see no divide by zero errors (#DIV/0!) when calculating AAD, you will probably notice many zero (0) values. If you scan the data, you’ll see that the student gave the same answer to every question, indicating that they simply clicked down the column in the Likert scales. This is indicative of a student who didn’t read the questions and rushed through the survey. I have chosen to flag these as potentially unreliable right from the gate using bright red, but since this is subjective, you may opt to not do this. Once they are flagged, you should go ahead and hide the rows using the column header fliter.

Now I can create a histogram of the remaining visible data in the AAD column to try to identify remaining potentially unreliable responses. Select all the data (not the headers) in the column, switch to the Insert tab in Excel and select Histogram in the Charts pane. A histogram will appear. At the bottom of the histogram you’ll see numbers ranging from the lowest AAD to the highest AAD, subdivided into bins (bars) that represent how many rows of data fall into each range. Again I will stick with the default histogram.

For AADs, the number of responses in the furthest-right column in the histogram holds the data rows with the highest AADs, representing the least reliable results (remember, this is opposite of ICCs). In my example, this results in  only 1 row with value of 2.51 or greater; I will flag the row in the data by adding background color other than bright red or the color I chose for my ICC column. To do this without resorting the data (which will result in mis-aligned data when adding our flagged rows back the master data sheet) we can use the filter in the AAD column once again, and deselect all values lower than 2.51.

Unhide all the rows as described with ICC, above, and then copy and paste the values with format back into the corresponding column in the master data sheet. This concludes the instructions for calculating AAD’s for each student respondents’ familiarity with makerspace equipment pre- and post-self-evaluations. Repeat this process for each competency in the data (competencies 1-6, plus 9) until all of blank AAD columns are filled in.

Now that I have identified potentially unreliable responses in my data, I can finally move on to performing some data analysis. That will be the topic of the next post in this series.

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <button> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.