Christopher H, Tienken, Ed.D
Seton Hall University
The education blogosphere and social media are filled with commentaries about standardized testing. New state-mandated tests in Grades 3-8 and high school are being administered across the country during the 2014-2015 school year. The Smarter Balanced Assessment Consortium (SBAC) and the Partnership for Assessment of Readiness for College and Careers (PARCC) are two examples of the new breed of computer-based assessments aligned with the Common Core State Standards. Some pundits and education bureaucrats have heralded those tests and other state tests as the new frontier of assessment.
Assertions abound about the ability of the results from the new state standardized tests to provide fine-grained and actionable information about student learning and teacher effectiveness. There are even claims that the results from the new batch of standardized tests can categorize students in elementary school as college and career ready.
But what does the evidence say about these and other popular claims related to how the results from the new tests can be effectively used by school administrators? What responsibility do professors of education leadership / school administration have in engaging in a thorough critique of the claims with their leadership candidates?
In this commentary I address four common assertions made about the usefulness of the results from tests like SBAC and PARCC and suggest that professors have a responsibility to facilitate evidence-based discussions with their leadership candidates about the utility of results from standardized tests to make important decisions about teaching and learning.
The results from the state mandated, Common Core aligned tests will be diagnostic and provide parents, teachers, school administrators, bureaucrats, and policy makers important information about student learning and the quality of the teaching that public school children receive.
Assertion #1 is not validated by the literature on diagnostic testing. As colleagues and I have written elsewhere, in order to provide diagnostic information about an individual student’s mastery of any one skill, the test results must have reliability figures of around .80 to .90. To attain that level of reliability there must be about 20-25 questions per skill (Frisbie, 1988; Tanner, 2001). Keep in mind there are multiple skills embedded in each Common Core standard, so the PARCC, SBAC, or other state tests would need to have 100’s of questions just to fully assess a few standards. In fact, some of the Common Core Standards have 10-15 skill objectives embedded in them requiring upwards of 200 questions to assess one standard.
The tests do not have enough questions to diagnose student mastery at the individual level for any of the skills or standards. Thus, any “diagnostic” decisions made from state standardized test results about a student’s mastery of specific standards will be potentially flawed.
Another issue related to diagnostic testing is the time frame in which the results will be received by teachers, school administrators, and parents. Most results from state standardized tests given in the spring will not be returned until the end of the school year or during the summer months. How is that diagnostic or informative? Do you wait three to five months for results from your primary care doctor? Diagnostic information is generally data or information received within a short period of time so that adjustments can be made to intervention protocols immediately.
Consider further that teachers, parents, school administrators, and students will not be able to see every question from their state mandated tests. Some states are releasing a small number of questions whereas other states are not releasing any actual test items. How can teachers or parents “diagnose” needs if they do not know the questions the students answer correctly or incorrectly or if they cannot see the actual student answers to the questions? At that point the process becomes guessing, not diagnosing.
It would be similar to situation in which a child’s classroom teacher sent home a grade from a recent classroom test and only provided 10% or 20% of the questions from the test and did not let the parents see the child’s answers to those questions. How is that diagnostic?
Shouldn’t school administrator candidates understand basic principles of diagnostic assessment? Is important that candidates understand clearly the limitations and appropriate uses of state mandated test results as tools to diagnose student learning?
Vendors of state mandated tests opine that the results can provide stakeholders important information about the quality of a student’s teacher and the academic achievement of students.
Regardless of what proponents of using state test results claim about the quality of information gained from testing, the results from standardized test most often provide information about the family and community economic environments in which a student lives than how much a student knows or how well a teacher teaches. Colleagues and I have been able to predict results from standardized tests in New Jersey, Connecticut, Michigan, and Iowa with a good deal of accuracy. Much of standardized test score can be accounted for by factors outside of the control of school personnel. (e.g. Maylone, 2002; Sackey, 2014; Tienken, 2015; Wilkins, 1999).
Through a series of cross-sectional and longitudinal studies completed in New Jersey since 2011, my colleagues and I have begun the process of demonstrating the predictive accuracy of family and community demographic variables in Grades 3-8, and high school.
For example, in New Jersey our best models predicted the percentage of students scoring proficient or above on the former Grade 6 NJASK tests in 70% of the districts for the language arts portion of the test and in 67% of the districts for the math portion in our sample of 389 school districts (Tienken, 2014). We accurately predicted the percentage of students scoring proficient or above on the Grade 7 language arts tests for 77% of the districts and 66% of the districts in math for our statewide sample of 388 school districts. We have had similar results in grades 3-8 and 11 in NJ and other states (e.g., Turnamian & Tienken, 2013).
Should school administration candidates graduate preparation programs knowing that the results from commercially prepared standardized tests are influenced heavily by factors outside the control of teachers? Should they know not to use the results from one test to make important decisions about students or teachers?
Another common assertion made about state standardized tests is that the results will be able to indicate whether students in grades 3-8 and high school are college and career ready.
As a parent and a professor of education leadership I find the assertion stunning. The idea that results from one test can provide that level of predictive information is incredible, especially given that standardized tests like the SAT cannot even predict very accurately which students will do well during their first year of college and beyond (Atkinson & Geiser, 2009). In fact, a student’s high school GPA is generally a more accurate predictor of first year college success and completion, yet bureaucrats and some school administrators claim that the results from their state tests will be able to tell a parent of a 9 year old whether her son is on the path for college or a career (College Board, 2012).
Claiming that a test score from a state mandated test indicates whether a child is college or career ready is like professing the sun revolves around the Earth. Is it not reasonable to expect that school administration graduates know that the standardized testing “sun” does not revolve around the Earth?
Proponents of state standardized tests aligned to the Common Core assert that the Core is a more rigorous set of standards than all other previous state standards and hence the tests are the most rigors test administered in public schools to date.
The claims of enhanced rigor most often come from one privately funded report by a pro-Common Core think-tank (Carmichael, et al., 2010). Sure, some of the Common Core Standards might be more “rigorous” than some of the previous standards in some states. But most often people who make the claim of enhanced rigor are looking myopically at the verbs used in the Standards. Verbs like “analyze” are used in some of the Standards, but when one reviews the Standards closely, one notices that students are analyzing for a single correct answer; hardly divergent, creative, innovative, or open-ended thinking.
In fact, much of the Core Standards and many of the questions on the new state standardized tests require students to find one correct answer. Many of the tests, like PARCC and SBAC, attempt to achieve the claim of increased rigor by inflating the complexity of the questions through the use of contrived directions and hard to follow tasks.
Should school administration candidates be expected to become critical consumers of information and dig below the headline to review the substance of claims regarding the claims of Common Core rigor and the technical quality of the new state mandated tests and their results? Is it too much for parents to ask of their school administrators to have an understanding and knowledge of the strengths and weaknesses of the interventions, such as curriculum and assessment products, that they impose upon their children?
Serving It Up
There seems to be no shortage of curricular Kool-Aid being served by proponents of standardization and testing. Is it acceptable for school administration candidates to leave our preparation programs and parrot inaccurate or incomplete information they hear from education bureaucrats or other sources? Do professors of education administration have a professional obligation to facilitate their candidates learning the critical thinking, critique, and research skills necessary to be able identify the standardization and assessment Kool-Aid? If we don’t provide those skills who will?
Atkinson, R.C. & Geiser, S. (2009). Reflections on a century of college admissions tests. Educational Researcher, 38(9), 665-676.
Carmichael, S.B., Martino, G., Porter-McGee, K., & Wilson, S. (2010). The state of state standards and the Common Core in 2010. Washington, D.C.: Thomas B. Fordham Institute.
College Board. (2012). 2012 college-bound seniors. Total group profile report. Author. Retrieved from http://research.collegeboard.org/programs/sat/data/archived/cb-seniors-2012
Frisbie, D.A. (1988). Reliability of scores from teacher-made tests. Educational Measurement: Issues and Practice, 7(1), 25-35.
Maylone, N. (2002, June). The relationship of socioeconomic factors and district scores on the Michigan educational assessment program tests: An analysis
(Unpublished doctoral dissertation). Eastern Michigan University, Ypsilanti.
Sackey, A. N. L. (2014). The influence of community demographics on student achievement on the Connecticut Mastery Test in mathematics and language arts in grades 3 through 8. Seton Hall University. Unpublished doctoral dissertation. Retreived from http://scholarship.shu.edu/cgi/viewcontent.cgi?article=3033&context=dissertations
Tanner, D.E. (2001). Assessing academic achievement. Boston, MA: Allyn and
Tienken, C.H. (in press). Standardized test results can be predicted: Stop using them to drive policy making. In Tienken & Mullen (Eds.). Education policy perils: Tackling the tough issues. New York: Routledge Publishing.
Tienken, C.H. (2015). Parking the rhetoric on the PARCC. www.christienken.com Retrieved from http://christienken.com/2015/01/24/parking-the-rhetoric-on-parcc/
Tienken, C.H. (2014). State test results are predictable. Kappa Delta Pi Record, 50(4), 154-156.
Turnamian, P. G., & Tienken, C.H. (2013). Use of community wealth demographics to predict statewide test results in Grade 3. In Mullen & Lane (2013) Becoming a global voice. National Council of Professors of Educational Administration Yearbook, 134-146.
Wilkins, J. L. M. (1999). Demographic opportunities and school achievement. Journal of Research in Education, 9(1), 12-19.
Portions of this blog were adapted from my previous writing: Parking the
Rhetoric on the PARCC. Retrieved from