Log In | Register

Frequently asked questions

Who developed the assessment items, and how was the work funded?

More than 20 AAAS staff, scores of reviewers, over a thousand teachers, and more than 150,000 students have been involved in this effort since work began in 2004. The Project 2061 research and development team was led by George DeBoer, principal investigator and deputy director of Project 2061 and by Jo Ellen Roseman, co-principal investigator and director of Project 2061. A high premium was placed on content accuracy, so item development in each topic area was coordinated by a research associate with a Ph.D. in a discipline closely related to that topic area. At any one time, we typically had one research associate in the physical sciences, one in the earth sciences, and two in the life sciences working on the project. In addition, all of the programming for the website and for the database where misconceptions, clarification statements, and drafts of items were stored was developed in-house. Horizon Research, Inc., served as evaluator for the project. A list of contributors (past and present) and their roles appears under Acknowledgments below. The initial work was supported by a grant from the National Science Foundation’s Division of Elementary, Secondary, and Informal Education in the Education and Human Resources Directorate (ESI grant # 03352473). Subsequent work was funded by additional grants from NSF and from the U.S. Department of Education.

Do the assessment items on the website cover the essential middle school science learning goals?

Providing comprehensive coverage of a content area even for a single grade level was well beyond the scope of this project. Instead, we wrote assessment items for a small number of key ideas that were fundamental to an understanding of topics widely taught in schools. Topics such as chemical reactions, interdependence in ecosystems, evolution, plate tectonics, and energy are covered in all state and national standards documents, and the specific knowledge of those topics that we tested is of central importance, but we make no claim that our coverage is comprehensive for any of those topics.

How many assessment items are available for each science idea?

The range is large. For example, under the Cells topic, for the key idea that “cells in multicellular organisms repeatedly divide to make more cells for growth and repair,” there is only a single item. But for the key idea that “although there are many different types of cells in terms of size, structure, and function, all cells have certain characteristics in common,” there are 31 items. We did not write assessment items for every aspect of the science ideas we were targeting, and we did not try to provide equal coverage of all those ideas. The items that we wrote are meant to illustrate the kinds of test items that could be used diagnostically to find out what students know and the misconceptions they hold.

In designing the assessment items, how did the developers decide on the specific knowledge and skills that students would be expected to have for each science idea?

One of the most challenging aspects of assessment design is to be as clear as possible about what the expectations are for students. Before we developed items for each topic, we first identified critical knowledge and skills—what we call key ideas—from the relevant content standards in Benchmarks for Science Literacy and in National Science Education Standards. Later work made use of the Next Generation Science Standards. We also reviewed the research literature on student learning to identify common misconceptions that students are likely to have and the learning problems they encounter with certain science ideas. With that information in hand, we then selected a set of ideas and crafted them into a coherent story, spelling out as specifically as possible what would be tested by the items, what was beyond the scope of those ideas, and the level of content sophistication that would be expected of middle school students. When writing these clarification statements, we considered the mental model we wanted students to have about a science idea and whether all parts of the idea held together to create a coherent whole that would be meaningful to the students.

Why does the science assessment website include only multiple-choice items?

Our decision to focus on multiple-choice items is largely a pragmatic one: Multiple-choice testing is the most widely used type of student assessment at all levels of education and is likely to remain so. Therefore, our goal has been to improve on the design and use of multiple-choice assessment in science education. Although there is a widely held view that multiple-choice items are useful only for testing recall of memorized definitions and trivial facts, we contend that if they are well-designed, multiple-choice items can be used effectively to probe students’ ability to explain or predict real-world phenomena or to analyze the reasons why a scientific claim might be true or not. Indeed, multiple-choice items, especially if they include misconceptions as answer choices, have the advantage of focusing students’ attention on a particular aspect of the knowledge being targeted. Open-ended items, on the other hand, often produce answers that are unrelated to the knowledge being assessed, making it difficult to use the results diagnostically. Nevertheless, our focus on multiple-choice items should not be misinterpreted as an endorsement of them as the best form of science assessment. Well-designed performance tasks, interactive online testing, and free-response items provide information that multiple-choice items cannot provide. In fact, there are several constructed response items included in the Evolution Project materials. Any type of assessment can be used effectively if developers attend to the purposes that the assessment will serve and adhere to fundamental principles of design.

How were the assessment items piloted and field tested?

Our item development process involved a two-year iterative cycle of design and revision, including national pilot and field testing. During pilot testing, about 100 middle school students and 100 high school students responded to each item. During field testing, about 1000 middle school students and 1000 high school students responded to each item. Students who took part in pilot testing received eight items and were asked to provide written explanations for why each answer choice was correct or incorrect. Students were also asked whether they had guessed or not, whether anything about the item was confusing or unfamiliar, and where they had learned about the idea that was being tested. After addressing any issues revealed by the pilot testing, revised versions of the items were then field tested with a larger national sample to determine the psychometric properties of the items. Field tests were composed of approximately 30 items. Data from field tests were then analyzed to identify any items that did not perform as expected based on various statistical analyses. Any problematic items were discarded from the item bank.

Have the assessment items undergone bias review?

Although the items have not been reviewed by a formal bias review committee, during the development process we did pay attention to the same issues that a bias review committee would pay attention to. Bias review committees examine curriculum and assessment materials that students encounter to make sure that no person will be offended or disadvantaged by the way the materials are presented and that language or context will not negatively affect the performance of any group of students. In our development work, our review protocol includes criteria such as the following: “The context should not advantage or disadvantage students because of their interest or familiarity with the context,” and “Reviewers should look for unfamiliar general vocabulary and language that may not be familiar to poor readers or students whose first language is not English.” We also decided not to include names of fictitious people in the assessment items. All persons are referred to as “student,” “scientist,” etc., so a student can assign whatever race, ethnicity, or gender the student wants in his or her own mind. If a person’s gender is mentioned in an item, approximately the same numbers of items refer to males as females. Finally, we included data on how well males and females and students whose primary language was and was not English performed on each item so that users of these items could exclude items if they felt a subgroup’s score was lower than expected.

What makes the assessment items developed for this website particularly useful for diagnostic purposes?

The assessment items on this website have been designed using protocols, including multiple levels of expert review, that ensure an extremely close match between an item and a specific science learning goal. As a result, there is a high probability that students who answer correctly have the knowledge that an item is testing and, conversely, that students who answer incorrectly do not have that knowledge. Besides the close alignment of the items to the target learning goals, the incorrect answer choices used in the items are based on relevant misconceptions that many students have about the targeted science idea. When students choose one of these misconceptions as their (incorrect) answer, teachers are better able to diagnose the nature of their students’ learning difficulties and target instruction to overcome those difficulties. Finally, clustering test items around a key idea and rank ordering the items by how many students in the national field testing answered correctly makes it easy for teachers to see at glance the relative frequency of both the correct ideas and misconceptions and compare those data with the performance of their own students.

What is meant by the boundaries that are described for each sub-idea?

Just as we use sub-ideas to specify as precisely as possible the knowledge the test items can cover, we use boundary statements to specify what the test items do not cover. In some cases the knowledge that is excluded may be too technical or too sophisticated for the intended grade level, and in some cases, the knowledge that is excluded is tested under another idea. In no way are we saying that students should not learn what is being excluded, only that we are not testing that idea in this set of test items. Boundary statements are also used to note any conditions or contexts that were taken into account in designing a set of items (e.g., “test items will involve situations in which forces are constant, not situations in which the forces are increasing or decreasing.”).

Am I allowed to use the assessment items on this website in my classroom?

All of the resources on this website, including the items themselves, are intended to be used widely by science teachers and other educators as stated in the $this->Html->link('terms of use', '/pages/policies'); ?> policy.

Acknowledgments

George E. DeBoer, Ph.D., Principal Investigator
Jo Ellen Roseman, Ph.D., Co-Principal Investigator

Abigail Burrows, Senior Project Coordinator
Natalie Dubois, Ph.D., Research Associate
Jean Flanagan, Research Assistant
Arhonda Gogos, Ph.D., Research Associate
Martin Fernandez, Research Associate
Peter Hanselmann, Web Applications Developer
Cari Herrmann Abell, Ph.D., Senior Research Associate
Bernard Koch, Research Associate
Mary Koppal, Director of Communications
Ed Krafsur, Technology Specialist
Kristen Lennon, Ph.D., Research Associate
Alice Lurain, Ph.D., Research Associate
An Michaels, Ph.D., Research Associate
Karina Nabors, Ph.D., Research Associate
David Pollock, Research Assistant
Tom Regan, Ph.D., Research Associate
Brian Sweeney, Manager, Applications Development
Jill Wertheim, Ph.D., Research Associate
Ted Willard, Project Director
Linda Wilson, Research Associate
Paula Wilson, Ph.D., Consultant