Patterns of Literacy Achievement
The final step in our work has been to study patterns of literacy achievement in the schools under study. To do so, SII researchers followed two cohorts of students as they passed through the schools under study, examining differences in achievement growth among students in the four quasi-experimental groups. One cohort of students in the study was followed as it passed from kindergarten to 2nd grade over the course of the study, with SII researchers assessing these students’ achievement in spring of kindergarten, the fall and spring of first grade, and the fall and spring of second grade. A second cohort of students was followed as it passed from third to fifth grade over the course of the study, and this group was assessed during the fall and spring of third, fourth, and fifth grades. The kindergarten cohort included approximately 3,600 students, while the third grade cohort included approximately 4,000 students. Students in the sample analyzed here were enrolled in a total of 114 schools: 28 ASP schools, 31 AC schools, 29 SFA schools, and 26 comparison schools.
SII researchers used the TerraNova assessment published by CTB McGraw-Hill to measure students’ growth in literacy achievement. This assessment produced two literacy scale scores for students at each administration—a Reading Comprehension score and a Language score. The Reading Comprehension scale charted students’ academic growth as they moved beyond basic oral and reading comprehension to a level at which they are able to analyze and evaluate more extended text by employing various reading comprehension strategies. The Language scale charted the degree to which students moved from a basic understanding of sound/symbol relationships to a more complex understanding of the structure of the English Language. In both the lower and upper grades the findings from the language scale score closely mirror findings from the reading comprehension scale score. As a result, in this paper, we discuss only the findings from the reading comprehension scale score. (Student achievement data for reading and mathematics can be downloaded here.)
The sample of students assessed during the course of the study was more disadvantaged than a representative sample of U.S. students, reflecting the fact that schools in the SII sample were disproportionately drawn from high and medium poverty neighborhoods in order to study the effects of specific instructional interventions on student achievement in high poverty settings. For example, over half of the students in the SII sample were African American, while another nineteen percent were Hispanic. Moreover, over half the parents of SII students had only a high school education or less, and over 40% of the students’ mothers were single parents. Finally, substantial percentages of students in the SII sample were at risk of school failure. For example, twenty percent of SII students received services for learning difficulties, eighteen percent received special education services, and thirteen percent repeated a grade early in their elementary school career. Although schools in the SII sample varied in their degree of disadvantage, on average the schools served a highly disadvantaged student population.(In a previous section, we detailed the sample characteristics.)
Achievement Growth Models
In order to examine general patterns in reading achievement in the schools under study, we have been fitting a series of three-level hierarchical linear models (HLM) in which multiple test scores per student are nested within students, who are nested within schools (examples here). In these analyses, we have been especially interested in examining differences in rates of literacy achievement growth between each set of CSR schools and the set of comparison schools. In the evaluation literature, such analyses are commonly referred to as “intent-to-treat” models since they examine achievement patterns for any schools nominally involved with a CSR program regardless of its level of implementation. Since schools were not randomly assigned to treatments, we also have been using propensity score stratification to statistically equate schools on 34 observed pre-treatment characteristics and then match the schools (using optimal matching) based on their propensity to have received treatment (discussed previously, here). Under the assumption of strongly ignorable treatment assignment, the average treatment effect in our statistical models is determined by pooling within-stratum treatment effects—the difference in mean rates of achievement growth between treated and untreated schools with similar pre-treatment characteristics. We caution the reader that the results about to be presented are preliminary and have yet to be peer-reviewed.
We also have been examining differences between each set of CSR schools and all other schools in our sample (for example, SFA schools versus comparison, ASP and AC schools). Once again, we are using propensity score stratification to statistically equate schools on 34 pre-treatment covariates and to match them using an optimal matching program. In many ways this comparison is superior to the one between each set of CSR schools and the set of comparison schools, not only because the larger sample of schools provides for better matches between treated and untreated schools, but also because some of the schools in our so-called “comparison” group participated in a variety of whole-school reform programs (e.g., expeditionary earning/outward bound, direct instruction, etc.).
To date, the findings of all of these analyses seem to follow logically from our discussion to this point. For example, thus far, we have found that the school improvement strategy followed in ASP schools failed to produce instructional practices that were different from instructional practices in comparison schools. As a result, it is not surprising that patterns of achievement in ASP schools were also indistinguishable from patterns of achievement in comparison schools. For both cohorts of students (K-2 and 3-5), our preliminary analyses failed to find any significant differences in students’ rates of achievement growth over time across ASP and comparison schools—or across ASP and all other schools. Again, given the lack of differences in instructional practices across ASP and control schools, this finding is not surprising.
In contrast, we have found differences in patterns of achievement between SFA and comparison schools (under certain model conditions) and between SFA and all other schools. In our preliminary analyses, these differences are most apparent for the lower grades cohort, but the magnitude of these differences varies depending on the statistical adjustments used in the models. In a model with no controls other than the propensity score, for example, SFA students gain about 6 points more than comparison students over the two-year interval from the spring of kindergarten to the spring of 2nd grade. After adjusting for patterns of student mobility in the schools under study, however, the SFA advantage over comparison schools during the same interval increases to more than 10 points. A similar 10 point advantage holds when SFA students are compared with all other students in AC, ASP and comparison schools. This SFA advantage is especially impressive considering that students in the SII study gained in percentile rank over this interval relative to the norming population. An average student beginning our study at the 30th percentile in a comparison school, for example, finished the end of 2nd grade at about the 40th percentile. The SFA effect moved a comparable student in the average SFA school from the 30th percentile to the 50th percentile.
In a similar vein, our preliminary analyses have found statistically significant differences in patterns of achievement growth for students in AC schools in the upper grades (see, HLM model here). Students in AC schools grew at a significantly faster rate than students in comparison schools and faster than students in all other schools. From the beginning of 3rd grade to the end of 5th grade, for example, our analyses suggest that students in AC schools, on average, scored an additional 9-12 points on the reading comprehension outcome, depending on the model adjustments (for HLM output see, example 1, example 2, and example 3). The size and interpretation of the AC effect on reading comprehension is similar to the one found for SFA schools in early grades reading, except that in the upper grades cohort, students in the SII were losing ground relative to the norming population. For example, our statistical models suggest that the average student in a comparison school who began 3rd grade at about the 40th percentile nationally ended the study at about the 30th percentile; by contrast, our models suggest that the equivalent student in the average AC school who began 3rd grade at the 40th percentile would achieve at or above the 40th percentile nationally at the end of 5th grade.
We should note that the effects on reading comprehension for SFA schools in the lower grades and for AC schools in the upper grades represent the average “intent-to-treat” effects of those interventions. We have also have been examining whether exposure to the treatment has influenced student growth in our schools. In preliminary work we have been able to demonstrate effects of exposure to the treatment in a number of ways. For example, we examined the degree of implementation at each school by examining the organizational, instructional, and staff development profiles of schools and the degree to which they reflected the aims of the SFA and AC designs, respectively. Using this strategy, we have found that, in both interventions, schools with better implementation scores in fact show higher rates of achievement growth. Additionally, we also observed an implementation effect for SFA schools in the upper grades. Within our sample, a number of SFA schools indicated that they participated in Roots and Wings, while others indicated they did not. Thus, while all of the schools considered themselves SFA schools, Roots and Wings schools were more resource intensive, especially in the upper grades, where the Wings curriculum (grades 2-6) is enacted immediately upon completion of reading Roots (grades K-2). Indeed, in the upper grade models, SFA schools that participated in Roots and Wings had students with greater achievement growth than students in other SFA schools that was also significantly better than the growth rates for students in comparison schools.
Additionally, it has become apparent in our work that student mobility plays a large role in moderating effects of the CSR programs on student achievement. For example, we have shown that the overall effects of both SFA in the lower grades and AC in the upper grades increase when the statistical models adjust for student mobility, thus demonstrating that students who stay in treated schools for a longer period of time make greater gains in achievement. These same effects can also be demonstrated when a variable indicating students’ entry into the study at the initial starting point is entered into the model. Students remaining longer in an SFA school (in the lower grades) or an AC school (in the upper grades) show significantly higher achievement growth. Readers may examine a spreadsheet demonstrating mobility effects on HLM model outcomes for AC schools here. Third, we have run statistical models adjusting for student demographic characteristics and for school level propensity strata (but without the CSR program variables entered into the analysis; see Figure 2, Figure 3, and Figure 4). We then examined student-level residuals, calculating the model predicted gains for each student. When we examined bar plots with 95% confidence intervals for SFA versus other students in the lower grades and for AC versus other students in the upper grades, we found that students making the greatest gains were less mobile and located in schools with less student mobility. More importantly, the magnitude of the CSR program effect was greatest for those same students, presumably because they received greater exposure to the treatment.
These results are important for two reasons. First, the fact that the CSR program effect varies with exposure to the treatment strengthens our causal arguments about the effects of program designs on achievement gains. Since students varied in their treatment dosages and since higher dosages tended to predict higher achievement gains, we can infer that the treatment is more likely than not the causal agent producing the achievement gains. Second, the results also show that the potential for achievement growth after the adoption of one of these interventions is greater than the average “intent-to-treat” effect if schools have a faithful implementation of the design. In addition, our results show that the “intent-to-treat” effect on students is a lower bound if students are present to receive the treatment in successive years. Consider, for example, that the SFA effect in the lower grades is averaged over all SFA students in the model. But, of these students, only 29.6% received SFA-like instruction in both 1st and 2nd grade. Another 15.4% were present both years and had one year of SFA-like instruction, while only 4.1% were present both years and received instruction very similar to students in comparison schools in both years of the study. About half of the SFA students were only present for one year of the study. Thus, 37.2% of the SFA students in our achievement models received SFA-like instruction in the one year they were present, while the remaining 13.2% received instruction that was very similar to the instruction received by students in the comparison schools in the one year they were present.1
Thus, even for an intervention with high rates of implementation fidelity, the transfer of the treatment to individual students over successive years reaches less than a third of all SFA students in our achievement models. When we examine “intent-to-treat” effects we must keep in mind that not only is implementation fidelity incomplete (a condition that is potentially manipulable by CSR design), but also that student mobility severely limits the treatment dosage received by students (a condition that is less easily manipulated by CSR programs or, indeed, by social policies).
1 Daily log data submitted by teachers was used in a discriminant analysis to identify teachers as being either AC-like, SFA-like, ASP-like or comparison-like in their instruction (see, Rowan and Correnti, 2007). Once teachers’ instruction was identified as being in one of these groups, the instructional-type was written back to individual students. We then constructed instructional profiles for students across the years of the study. The discriminant analysis correctly identified 76% of SFA teachers as being such, 62% of AC teachers as such, only 44% of ASP teachers as such, and only 36% of comparison school teachers as such. These data show that even when intervention programs produce strong effects on instruction (as did SFA and AC), many teachers do not implement the intended instructional regimes faithfully. The analysis also shows that teachers in both comparison and ASP schools implemented patterns of instruction that was close to the preferred regimes of SFA and AC.