9 Validity Studies

The preceding chapters and the Dynamic Learning Maps® (DLM®) Alternate Assessment System 2014–2015 Technical Manual—Integrated Model (Dynamic Learning Maps Consortium, 2016a) provide evidence in support of the overall validity argument for results produced by the DLM assessment. This chapter presents additional evidence collected during 2020–2021 for two of the five critical sources of evidence described in Standards for Educational and Psychological Testing (American Educational Research Association et al., 2014): evidence based on test content and response process. Additional evidence can be found in Chapter 9 of the 2014–2015 Technical Manual—Integrated Model (Dynamic Learning Maps Consortium, 2016a) and the subsequent annual technical manual updates (Dynamic Learning Maps Consortium, 2017a, 2017b, 2018, 2019, 2020).

9.1 Evidence Based on Test Content

Evidence based on test content relates to the evidence “obtained from an analysis of the relationship between the content of the test and the construct it is intended to measure” (American Educational Research Association et al., 2014, p. 14). This section presents results from data collected during 2020–2021 regarding blueprint coverage and student opportunity to learn the assessed content. For additional evidence based on test content, including the alignment of test content to content standards via the DLM maps (which underlie the assessment system), see Chapter 9 of the 2014–2015 Technical Manual—Integrated Model (Dynamic Learning Maps Consortium, 2016a).

9.1.1 Evaluation of Blueprint Coverage

The Instructionally Embedded model blueprints are unique in that they specify a pool of Essential Elements (EEs) that are available for assessing; teachers are responsible for choosing the EEs for assessment from the pool that meet a pre-specified set of criteria (e.g., “Choose three EEs from within Claim 1.”). For additional information about selection procedures, see Chapter 4 in the 2014–2015 Technical Manual—Integrated Model (Dynamic Learning Maps Consortium, 2016a). Teachers are responsible for making sure blueprint coverage is attained during both the fall and spring embedded windows; they can also test beyond what is required by the blueprint to support instruction if they choose. Responses to fall and spring assessments are combined to calculate results used for summative purposes.

In 2020–2021, the fall window was available from September 2020 through December 2020, while the spring window was available from February 2021 through May 2021. Using the same procedure used in prior years, teachers selected the EEs for their students to test on from among those available on the English language arts (ELA) and mathematics blueprints in both the fall and spring windows. Teachers did not have to make the same EE selections in spring as they did in the fall.

Table 9.1 summarizes the expected number of EEs required to meet blueprint coverage and the total number of EEs available for instructionally embedded assessment for each grade and subject. A total of 255 EEs (148 in ELA and 107 in mathematics) for grades 3 through high school were available; 8,869 students in those grades participated in the instructionally embedded fall window, and 11,557 students participated in the instructionally embedded spring window. Overall, 12,004 students participated across the fall and spring windows. Histograms in Appendix A summarize the distribution of total unique EEs assessed per student in each grade and subject.

Table 9.1: Essential Elements (EEs) Expected for Blueprint Coverage and Total Available, by Grade and Subject
English language arts
Mathematics
Grade Expected n Available N Expected n Available N
3   8 17 6 11
4   9 17 8 16
5   8 19 7 15
6   9 19 6 11
7 11 18 7 14
8 11 20 7 14
9–10 10 19 6 26
11–12 10 19
Note. High school mathematics is reported in the 9–10 row. There were 26 EEs available for the 9–11 band. While EEs were assigned to specific grades in mathematics blueprint (eight EEs in grade 9, nine EEs in grade 10, and nine EEs in grade 11), a teacher could choose to test on any of the high school EEs, as all were available in the system.

Table 9.2 summarizes the number and percentage of students in three categories: students who did not meet all blueprint requirements, students who met all blueprint requirements exactly, and students who exceeded the blueprint requirements during the instructionally embedded fall window. In total, 93% of students in ELA and 78% of students in mathematics met or exceeded blueprint coverage requirements during the instructionally embedded fall window.

Table 9.2: Number and Percentage of Students in Each Blueprint Coverage Category, by Subject During the Fall Window
English language arts
Mathematics
Coverage category n % n %
Not met    650   7.4 1,846 21.7
Met 6,033 68.4 4,116 48.5
Exceeded 2,141 24.3 2,533 29.8
Met or exceeded 8,174 92.7 6,649 78.3

Table 9.3 summarizes the number and percentage of students in each of the three coverage categories for students in the instructionally embedded spring window. In total, 97% of students in ELA and 85% of students in mathematics met or exceeded blueprint coverage requirements during the instructionally embedded spring window.

Table 9.3: Number and Percentage of Students in Each Blueprint Coverage Category, by Subject During the Spring Window
English language arts
Mathematics
Coverage category n % n %
Not met      366   3.2 1,663 14.7
Met   7,309 63.3 5,244 46.4
Exceeded   3,869 33.5 4,400 38.9
Met or exceeded 11,178 96.8 9,644 85.3

Table 9.4 summarizes the number and percentage of students in each of the three coverage categories for students over the course of both the instructionally embedded fall and spring windows. In total, 96% of students in ELA and 95% of students in mathematics met or exceeded blueprint coverage requirements overall.

Table 9.4: Number and Percentage of Students in Each Blueprint Coverage Category, by Subject Overall
English language arts
Mathematics
Coverage category n % n %
Not met      485   4.0      630   5.4
Met   4,868 40.6   4,703 40.2
Exceeded   6,633 55.3   6,378 54.5
Met or exceeded 11,501 95.9 11,081 94.7

Four performance levels are used to report results for the DLM assessment: Emerging, Approaching the Target, At Target, and Advanced. Table 9.5 summarizes the distribution of students in each blueprint coverage category by performance level achieved and by subject. A larger percentage of students who exceeded blueprint coverage achieved at the Advanced level, compared with students who met or did not meet blueprint requirements. Similarly, for students who did not test on the number of required EEs, there was a larger percentage of students achieving at the Emerging performance level than was observed for students meeting requirements exactly or exceeding requirements. This finding is likely explained at least in part by the standard setting approach. Because the total linkage levels mastered are summed for the subject and cut points applied to distinguish achievement, students who do not meet blueprint requirements have fewer opportunities to demonstrate linkage level mastery, while those exceeding blueprint requirements have additional opportunities to demonstrate mastery.

Table 9.5: Percentage of Students in Each Blueprint Coverage Category by Performance Level and Subject
English language arts (%)
Mathematics (%)
Performance level Not met Met Exceeded Not met Met Exceeded
Emerging 67.8 47.7 39.5 75.2 63.8 52.0
Approaching the Target 22.7 32.6 28.0 21.0 30.3 32.8
At Target 9.3 18.8 25.0 3.3 5.5 11.0
Advanced 0.2 0.9 7.5 0.5 0.4 4.2

Before taking any DLM assessments, educators complete the First Contact survey for each student, which is a survey of learner characteristics. Responses from the ELA, mathematics, and expressive communication portions of the survey were included in an algorithm to calculate the student’s complexity band for each subject. For more information, see Chapter 4 of the 2014–2015 Technical Manual—Integrated Model (Dynamic Learning Maps Consortium, 2016a). The complexity band was used to recommend the appropriate, corresponding linkage level during instructionally embedded assessment. Table 9.6 summarizes the percentage of students in each blueprint coverage category based on their complexity band for each subject across both instructionally embedded windows. When comparing complexity band distributions in ELA and mathematics by blueprint coverage category, there was a slightly larger percentage of Foundational and Band 2 students not meeting requirements and a larger percentage of Band 1 students meeting or exceeding requirements.

Table 9.6: Percentage of Students in Each Blueprint Coverage Category by Complexity Band and Subject During the Fall and Spring Windows
English language arts (%)
Mathematics (%)
Complexity band Not met Met Exceeded Not met Met Exceeded
Foundational 23.1 18.8 19.8 19.8 18.0 18.6
Band 1 36.3 43.0 44.0 40.8 43.8 45.4
Band 2 33.6 31.3 30.6 37.8 34.6 32.7
Band 3   7.0   6.9   5.6   1.6   3.6   3.2

Not meeting blueprint requirements may be due in part to students exiting midway through the window or other external factors but may also be due to teacher misconceptions about blueprint coverage during the instructionally embedded windows. Of the 485 students who did not meet the blueprint requirements in ELA across both windows, 14% (n = 70) of students exited during the school year or had an extenuating circumstance entered into the system. In mathematics, of the 630 students who did not meet blueprint requirements across both windows, 7% (n = 45) exited or had a special circumstance entered. While not all states makes use of special circumstance codes–and the number of students affected by extenuating circumstances may in fact be larger–additional research is needed to understand factors that affect blueprint coverage.

9.1.2 Opportunity to Learn

After administration of the spring 2021 operational assessments, teachers were invited to complete a survey about the assessment (see Chapter 4 of this manual for more information on recruitment and response rates). The survey included four blocks of items. The first, third, and fourth blocks were fixed forms assigned to all teachers. For the second block, teachers received one randomly assigned section.

The first block of the survey served several purposes. Results for other survey items are reported later in this chapter and in Chapter 4 in this manual. One item provided information about the relationship between students’ learning opportunities before testing and the test content (i.e., testlets) they encountered on the assessment. The survey asked teachers to indicate the extent to which they judged test content to align with their instruction across all testlets. Table 9.7 reports the results. Approximately 73% of responses (n = 4,231) reported that most or all reading testlets matched instruction, compared to 70% (n = 4,026) for mathematics. More specific measures of instructional alignment are planned to better understand the extent that content measured by DLM assessments matches students’ academic instruction.

Table 9.7: Teacher Ratings of Portion of Testlets That Matched Instruction
None
Some (< half)
Most (> half)
All
Not applicable
Subject n % n % n % n % n %
Reading 290 5.0 1,173 20.3 2,409 41.7 1,822 31.5 83 1.4
Mathematics 296 5.2 1,310 22.9 2,442 42.6 1,584 27.6 98 1.7

The second block of the survey was spiral-assigned so that teachers received one randomly assigned section. In three of the randomly assigned sections, a subset of teachers was asked to indicate the approximate number of hours spent instructing students on each of the conceptual areas by subject (i.e., ELA, mathematics, and science). Teachers responded using a 5-point scale: 0–5 hours, 6–10 hours, 11–15 hours, 16–20 hours, or more than 20 hours. Table 9.8 and Table 9.9 indicate the amount of instructional time spent on conceptual areas for ELA and mathematics, respectively. Using 11 or more hours per conceptual area as a criterion for instruction, 55% of the teachers provided this amount of instruction to their students in ELA, and 44% did so in mathematics.

Table 9.8: Instructional Time Spent on ELA Conceptual Areas
Number of hours
0–5
6–10
11–15
16–20
>20
Conceptual area Median n % n % n % n % n %
Determine critical elements of text 11–15 324 28.1 223 19.4 170 14.8 134 11.6 300 26.1
Construct understandings of text 11–15 242 21.2 199 17.4 183 16.0 176 15.4 343 30.0
Integrate ideas and information from text 11–15 273 24.0 220 19.3 192 16.9 151 13.3 301 26.5
Use writing to communicate 11–15 330 28.7 202 17.6 168 14.6 156 13.6 293 25.5
Integrate ideas and information in writing 6–10 385 33.8 203 17.8 181 15.9 152 13.4 217 19.1
Use language to communicate with others 16–20 139 12.1 129 11.2 163 14.1 177 15.4 544 47.2
Clarify and contribute in discussion 11–15 267 23.3 177 15.4 199 17.3 168 14.6 336 29.3
Use sources and information 6–10 457 39.8 224 19.5 167 14.6 111   9.7 188 16.4
Collaborate and present ideas 6–10 404 35.3 233 20.4 196 17.1 126 11.0 184 16.1
Table 9.9: Instructional Time Spent on Mathematics Conceptual Areas
Number of hours
0–5
6–10
11–15
16–20
>20
Conceptual area Median n % n % n % n % n %
Understand number structures (counting, place value, fraction) 16–20 225 19.5 161 14.0 146 12.7 174 15.1 446 38.7
Compare, compose, and decompose numbers and steps 6–10 375 33.0 226 19.9 176 15.5 148 13.0 213 18.7
Calculate accurately and efficiently using simple arithmetic operations 11–15 347 30.4 167 14.6 131 11.5 173 15.2 322 28.2
Understand and use geometric properties of two- and three-dimensional shapes 6–10 432 37.7 265 23.1 178 15.5 147 12.8 123 10.7
Solve problems involving area, perimeter, and volume 0–5 670 58.8 186 16.3 120 10.5   95   8.3   68   6.0
Understand and use measurement principles and units of measure 6–10 472 41.7 266 23.5 189 16.7 111   9.8   95   8.4
Represent and interpret data displays 6–10 469 41.4 260 22.9 167 14.7 125 11.0 112   9.9
Use operations and models to solve problems 6–10 426 37.2 199 17.4 172 15.0 156 13.6 191 16.7
Understand patterns and functional thinking 6–10 349 30.5 234 20.5 186 16.3 196 17.1 178 15.6

Results from the teacher survey were also correlated with total linkage levels mastered by conceptual area, as reported on individual student score reports. In mathematics, results were reported at the claim level rather than conceptual area, due to the blueprint structure. The median instructional time was calculated for each mathematics claim from teacher responses at the conceptual area level. While a direct relationship between amount of instructional time and number of linkage levels mastered in the area is not expected, as some students may spend a large amount of time on an area and demonstrate mastery at the lowest linkage level for each EE, we generally expect that students who mastered more linkage levels in the area would also have spent more instructional time in the area. More evidence is needed to evaluate this assumption.

Table 9.10 summarizes the Spearman rank-order correlations between ELA conceptual area instructional time and linkage levels mastered in the conceptual area and between mathematics claim instructional time and linkage levels mastered in the claim. Correlations ranged from 0.18 to 0.35, with the strongest correlations observed for writing conceptual areas (ELA.C2.1 and ELA.C2.2) in ELA and number sense (M.C1) in mathematics.

Table 9.10: Correlation Between Instuction Time and Linkage Levels Mastered
Conceptual area Correlation with instruction time
English language arts
ELA.C1.1: Determine critical elements of text 0.18
ELA.C1.2: Construct understandings of text 0.27
ELA.C1.3: Integrate ideas and information from text 0.34
ELA.C2.1: Use writing to communicate 0.35
ELA.C2.2: Integrate ideas and information in writing 0.33
Mathematics
M.C1: Demonstrate increasingly complex understanding of number sense 0.29
M.C2: Demonstrate increasingly complex spatial reasoning and understanding of geometric princicples 0.21
M.C3: Demonstrate increasingly complex understanding of measurement, data, and analytics procedures 0.20
M.C4: Solve increasingly complex mathematical problems, making productive use of algebra and functions 0.27

The third block of the survey included questions about the student’s learning and assessment experiences during the 2020–2021 school year. During the COVID-19 pandemic, students may have been instructed in a variety of different instructional settings, which could have affected their opportunity to learn. Teachers were asked the percentage of time students spent in each instructional setting. Table 9.11 displays the possible settings and responses. A majority of responses indicated that students spent greater than 50% of the time in school. More than a third of responses indicated at least some time spent in home with direct instruction with the teacher (either one-on-one or as a class), in home with a family member providing instruction, and no formal instruction. Fewer responses indicated an instructional setting in home with the teacher present or an instructional setting other than the settings presented in the survey question.

Table 9.11: Percentage of Instruction Time Spent in Each Instructional Setting
None
1–25
26–50
51–75
76–100
Unknown
Instructional setting n % n % n % n % n % n %
In school    398   7.2    360   6.5 422 7.6 739 13.3 3,551 64.0   81   1.5
Direct instruction with teacher remotely, 1:1 2,635 51.7 1,579 31.0 321 6.3 171   3.4    226   4.4 161   3.2
Direct instruction with teacher remotely, group 2,880 56.0 1,393 27.1 302 5.9 182   3.5    228   4.4 161   3.1
Teacher present in the home 4,570 91.3    132   2.6   47 0.9   27   0.5      74   1.5 153   3.1
Family member providing instruction 3,124 61.4 1,161 22.8 206 4.0 123   2.4    154   3.0 320   6.3
Absent (no formal instruction) 3,283 66.4 1,174 23.7 109 2.2   51   1.0      59   1.2 269   5.4
Other 3,477 82.6      87   2.1   35 0.8   21   0.5      46   1.1 542 12.9

Teachers were also asked what instructional scheduling scenarios applied to their student that year. Table 9.12 reports the possible instructional scheduling scenarios and teacher responses. The majority of teachers reported no delayed start of the school year, no lengthened spring semester, no extended school year through summer, and that change(s) between remote and in-person learning occurred during the school year.

Table 9.12: Instructional Scheduling Scenarios Around Student Schedules
Yes
No
Unknown
Instructional setting n % n % n %
Delayed start of the school year 2,062 36.3 3,493 61.5 124 2.2
Lengthened spring semester    485   8.7 4,919 88.3 165 3.0
Extended school year through summer 1,446 25.9 3,854 68.9 293 5.2
Change(s) between remote and in-person learning during the school year 2,976 51.8 2,659 46.3 110 1.9

9.2 Evidence Based on Response Processes

The study of test takers’ response processes provides evidence about the fit between the test construct and the nature of how students actually experience test content (American Educational Research Association et al., 2014). The validity studies presented in this section include teacher survey data collected in spring 2021 regarding students’ ability to respond to testlets and a description of the test administration observations and writing samples collected during 2020–2021. For additional evidence based on response processes, including studies on student and teacher behaviors during testlet administration and evidence of fidelity of administration, see Chapter 9 of the 2014–2015 Technical Manual—Integrated Model (Dynamic Learning Maps Consortium, 2016a).

9.2.1 Test Administration Observations

To be consistent with previous years, the DLM Consortium made a test administration observation protocol available for state and local users to gather information about how educators in the consortium states deliver testlets to students with the most significant cognitive disabilities. This protocol gave observers, regardless of their role or experience with DLM assessments, a standardized way to describe how DLM testlets were administered. The test administration observation protocol captured data about student actions (e.g., navigation, responding), educator assistance, variations from standard administration, engagement, and barriers to engagement. The observation protocol was used only for descriptive purposes; it was not used to evaluate or coach educators or to monitor student performance. Most items on the protocol were a direct report of what was observed, such as how the test administrator prepared for the assessment and what the test administrator and student said and did. One section of the protocol asked observers to make judgments about the student’s engagement during the session.

During 2020–2021, there were 218 test administration observations collected in four states. Because test administration observation data are anonymous and the sample of students available for test administration observations may not have represented the full population of students taking DLM assessments due to students completing assessments in a variety of locations (Table 9.12), we do not report the findings from those observations here as part of the assessment validity evidence.

9.2.2 Interrater Agreement of Writing Sample Scoring

All students are assessed on writing EEs as part of the ELA blueprint. Teachers administer writing testlets at two levels: emergent and conventional. Emergent testlets measure nodes at the Initial Precursor and Distal Precursor levels, while conventional testlets measure nodes at the Proximal Precursor, Target, and Successor levels. All writing testlets include items that require teachers to evaluate students’ writing processes; some testlets also include items that require teachers to evaluate students’ writing samples. Evaluation of students’ writing samples does not use a high-inference process common in large-scale assessment, such as applying analytic or holistic rubrics. Instead, writing samples are evaluated for text features that are easily perceptible to a fluent reader and require little or no inference on the part of the rater (e.g., correct syntax, orthography). The test administrator is presented with an onscreen selected-response item and is instructed to choose the option(s) that best matches the student’s writing sample. Only test administrators rate writing samples, and their item responses are used to determine students’ mastery of linkage levels for writing and some language EEs on the ELA blueprint. We annually collect student writing samples to evaluate how reliably teachers rate students’ writing samples. However, due to the COVID-19 pandemic, interrater reliability ratings for writing samples collected during the 2019–2020 administration were postponed until 2021. For a complete description of writing testlet design and scoring, including example items, see Chapter 3 of the 2015–2016 Technical Manual Update—Integrated Model (Dynamic Learning Maps Consortium, 2017a).

During the spring 2021 administration, five Instructionally Embedded model states opted to participate in writing sample collection. Teachers were asked to submit student writing samples within Educator Portal. Requested submissions included papers that students used during testlet administration, copies of student writing samples, or printed photographs of student writing samples. To allow the sample to be matched with test administrator response data from the spring 2021 administration, each sample was submitted with limited information to enable matching to the observed educator ratings.

Table 9.13 presents the number of student writing samples submitted in each grade and writing level. A total of 1,363 student writing samples were submitted from districts in five states. In several grades, the emergent writing testlet does not include any tasks that evaluate the writing sample; therefore, emergent samples submitted for these grades are not eligible to be included in the interrater reliability analysis (e.g., Grade 3 emergent writing samples). Additionally, writing samples that could not be matched with student data were excluded (e.g., student name or identifier was not provided). These exclusion criteria resulted in the availability of 957 writing samples for evaluation of interrater agreement. Due to the suspension of on-site events during 2020–2021, the writing samples rating event for 2021 was postponed. These samples will instead be rated during the 2022 event.

Table 9.13: Number of Writing Samples Collected During 2020–2021 by Grade and Writing Level
Grade Emergent writing samples Conventional writing samples
  3     0 60
  4 109 49
  5     0 38
  6     0 52
  7 123 74
  8     0 81
  9   66 56
10   84 58
11   58 45
12     3   1

9.3 Evidence Based on Internal Structure

Analyses of an assessment’s internal structure indicate the degree to which “relationships among test items and test components conform to the construct on which the proposed test score interpretations are based” (American Educational Research Association et al., 2014, p. 16).

One source of evidence comes from the examination of whether particular items function differently for specific subgroups (e.g., male versus female). The analysis of differential item functioning (DIF) is conducted annually for DLM assessments based on the cumulative operational data for the assessment. For example, in 2019—2020, the DIF analyses were based on data from the 2015—2016 through 2018—2019 assessments. Due to the cancellation of assessment during spring 2020, additional data for DIF analyses were not collected in 2019—2020. Thus, updated DIF analyses are not provided in this update, as there are no additional data to contribute to the analysis. For a description of DIF results from 2019—2020, see Chapter 9 of the 2019–2020 Technical Manual Update—Instructionally Embedded Model (Dynamic Learning Maps Consortium, 2020).

Additional evidence based on internal structure is provided across the linkage levels that form the basis of reporting. This evidence is described in detail in Chapter 5 of this manual.

9.4 Conclusion

This chapter presents additional studies as evidence for the overall validity argument for the DLM Alternate Assessment System. The studies are organized into categories where available (content and response process), as defined by the Standards for Educational and Psychological Testing (American Educational Research Association et al., 2014), the professional standards used to evaluate educational assessments.

The final chapter of this manual, Chapter 11, references evidence presented through the technical manual, including Chapter 9, and expands the discussion of the overall validity argument. Chapter 11 also provides areas for further inquiry and ongoing evaluation of the DLM Alternate Assessment System, building on the evidence presented in the 2014–2015 Technical Manual—Integrated Model (Dynamic Learning Maps Consortium, 2016a) and the subsequent annual technical manual updates (Dynamic Learning Maps Consortium, 2017a, 2017b, 2018, 2019, 2020), in support of the assessment’s validity argument.