Deck 15: Classroom Assessment, Grading, and Standardized Testing

Full screen (f)
exit full mode
Question
A school administrator wants to identify the top 10 percent of the senior students in order to recommend them for scholarship competition at the highest rated university in the state.What testing purpose would serve the administrator's purpose?

A)Criterion-referenced
B)Diagnostic
C)Norm-referenced
D)Standardized
Use Space or
up arrow
down arrow
to flip the card.
Question
Kathy took the Stanford Achievement Test on Monday and again on Friday.Her two scores differed by only three points.These results may indicate a good level of what type of reliability?

A)Alternate-form
B)Internal consistency
C)Split-half
D)Test-retest
Question
Paper-and-pencil exercises,direct observations of performances,development of portfolios,and creation of artifacts are all methods of

A)assessment.
B)evaluation.
C)measurement.
D)testing.
Question
One of the most efficient and effective ways to increase the reliability of a test is to

A)have more than one person grade the test.
B)keep the test brief.
C)lengthen the test.
D)provide ample response time.
Question
The most important attribute of a norming sample is that it should be

A)completely random.
B)large and diverse.
C)limited in size.
D)similar to future test-takers.
Question
At the beginning of the semester,Mr.Rumstead gave a formative test for the purposes of setting objectives.At the end of the course he gave the same test to determine grades.The second time this test was given,it was used as what type of test?

A)Aptitude
B)Diagnostic
C)Formative
D)Summative
Question
A teacher who is interested in finding out how well a student is doing in class compared to students in other schools should use what type of test?

A)Criterion-referenced
B)Diagnostic
C)Norm-referenced
D)Teacher-designed
Question
The precision of a test refers to test reliability as measured by what method?

A)Alternate-form
B)Internal consistency
C)Split-half
D)Test-retest
Question
Criterion-referenced tests are used primarily to assess

A)achievement of general instructional goals.
B)each student's achievement compared to other students.
C)mastery of specific objectives.
D)the range of achievement in a large group.
Question
The term objective as used in objective testing refers to the

A)content goals of the items.
B)goal(s)of the test.
C)type of material covered.
D)way the test is scored.
Question
Which one of the following student outcomes is MOST likely to be the result of a criterion-referenced assessment tool?

A)Anita was the first student to complete the test.
B)Ben answered 10 out of 12 questions correctly.
C)Randy scored at the eighty-ninth percentile.
D)Sonia ranked fifteenth in French and ninth in music.
Question
For which one of the following situations would a criterion-referenced test be the most appropriate measure to use?

A)Assessing the range of abilities in a large,mixed-ability group of students
B)Comparing students' general ability in specific subject areas such as English,algebra,or general science
C)Measuring mastery of basic competencies in addition and subtraction
D)Selecting candidates for a teaching position when only a few openings are available
Question
"We will have weekly quizzes,but your final grade will be based only on the midterm and final exam." This decision implies that the quizzes are to be used for what type of evaluation?

A)Criterion-referenced
B)Formative
C)Norm-referenced
D)Summative
Question
A test or rating scale is objective to the extent that it

A)is free of biases of the administrators and scorers.
B)measures only one,or only a very few variables.
C)predicts an important and realistic criterion.
D)yields the same score each time an individual takes it.
Question
The fundamental purpose of educational assessment is to

A)determine the quality of the outcomes being judged.
B)identify which programs or people are superior to others.
C)obtain objective data that express performances in quantitative terms.
D)provide information to support decision-making.
Question
A local high school developed a math achievement test and used the results to determine the selection of students for an advanced placement course with a limited number of seats.What type of test should be used?

A)Criterion-referenced
B)Diagnostic
C)Norm-referenced
D)Standardized
Question
What type of test would provide the most useful information for the following question: "Are the learning outcomes for the new unit sequenced appropriately?"

A)Diagnostic
B)Formative
C)Placement
D)Summative
Question
Which one of the following situations requires a norm-referenced evaluation?

A)Assessing whether an individual has been drinking too much to drive
B)Certifying whether a newly graduated education student can perform satisfactorily as a teacher
C)Hiring one manager from a pool of ten applicants for a large department store
D)Reporting to parents about how much students have learned during the semester
Question
What type of test would provide the most useful information for the following question: "Are students making satisfactory progress in learning the metric system?"

A)Diagnostic
B)Formative
C)Placement
D)Summative
Question
A major difference between formative and summative tests is the

A)format of the test items.
B)interpretation of the test data.
C)preparation of the test directions.
D)role played by content validity in the two tests.
Question
Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items.
Descriptive Statistics
<strong>Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items. Descriptive Statistics   The validity of this test is relatively strong.</strong> A)The conclusion in this item's statement is NOT supported by the above data. B)The conclusion in this item's statement is supported by the above data. C)There are insufficient data presented to warrant this conclusion. <div style=padding-top: 35px>
The validity of this test is relatively strong.

A)The conclusion in this item's statement is NOT supported by the above data.
B)The conclusion in this item's statement is supported by the above data.
C)There are insufficient data presented to warrant this conclusion.
Question
Students from minority groups in deprived socioeconomic backgrounds who have taken culture-fair tests have generally performed

A)better than other minority groups on standardized tests.
B)better than their non-deprived peers but worse than non-minority groups.
C)on par with or even worse than their own performances on other types of standardized tests.
D)worse than their non-deprived peers if the test was individually administered.
Question
What type of validity is currently thought to include all other types of validity?

A)Content-related
B)Construct-related
C)Criterion-related
D)Prediction-related
Question
The connection between validity and reliability can be best expressed by the statement that validity

A)is essentially the same as reliability.
B)requires and may be assured through reliability.
C)requires but cannot be assured through reliability.
D)requires only a limited reliability.
Question
When a test actually measures what it purports to measure,the test is said to be

A)credible.
B)reliable.
C)usable.
D)valid.
Question
Jerry scored 81 on a test.The mean of this test was 70,the standard deviation was 10,and the standard error of measurement was 2.Given these data,we can be reasonably certain (2 times out of 3)that Jerry's true score would be in the range from

A)71 to 91.
B)75 to 85.
C)77 to 87.
D)79 to 83.
Question
Which one of the following definitions best describes "true score"?

A)Confidence score if the test were perfectly reliable
B)Hypothetical score on a student's best day
C)Observed raw score plus the confidence score
D)Obtained raw score minus measurement error
Question
A test or any assessment instrument is objective to the extent that it

A)is free of biases of the test administrators and scorers.
B)measures only a very few variables at a time.
C)predicts an important and realistic criterion.
D)yields the same score each time an individual takes it.
Question
After administering a standardized test of reading comprehension,scores are compared to teacher estimates of reading comprehension for each student.What particular technical concern of measurement is involved in this comparison?

A)Commonality
B)Discrimination
C)Reliability
D)Validity
Question
A general term for the type of testing that is used to guide planning and identify students' needs is ________ assessment.
Question
A general term for the type of testing that is used to determine final achievement in a course is ________ assessment.
Question
The evidence for validity that concerns whether the test measures the trait in question rather than some other trait is what type of evidence?

A)Construct-related
B)Content-related
C)Criterion-related
D)Test-retest
Question
On standardized tests,a difference of a few points between two raw scores is likely to be insignificant due to the

A)central limits theorem.
B)confidence interval for the scores.
C)equivalence reliability of the tests.
D)probability of chance.
Question
Research on bias in testing indicates that standardized tests predict school achievement

A)equally well for all groups of students.
B)equally well for white,English-speaking students.
C)very accurately for only a small percentage of students.
D)very accurately for high SES students in all ethnic groups.
Question
The validity of any test is related directly to the

A)difficulty of the test.
B)evaluation of expert reviewers.
C)length of the test.
D)purpose of the test.
Question
Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items.
Descriptive Statistics
<strong>Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items. Descriptive Statistics   A raw score of 33 places a student one standard deviation below the mean.</strong> A)The conclusion in this item's statement is NOT supported by the above data. B)The conclusion in this item's statement is supported by the above data. C)There are insufficient data presented to warrant this conclusion. <div style=padding-top: 35px>
A raw score of 33 places a student one standard deviation below the mean.

A)The conclusion in this item's statement is NOT supported by the above data.
B)The conclusion in this item's statement is supported by the above data.
C)There are insufficient data presented to warrant this conclusion.
Question
Fred made 103 and Frank made 96 on an achievement test with a confidence interval of 4.These results indicate that

A)Fred's true score is definitely higher than Frank's true score.
B)measurement error can account for the differences in their scores.
C)the true scores of each student are probably very close.
D)the test used to generate these scores must be very reliable.
Question
Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items.
Descriptive Statistics
<strong>Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items. Descriptive Statistics   The amount of measurement error in the score distribution is acceptable.</strong> A)The conclusion in this item's statement is NOT supported by the above data. B)The conclusion in this item's statement is supported by the above data. C)There are insufficient data presented to warrant this conclusion. <div style=padding-top: 35px>
The amount of measurement error in the score distribution is acceptable.

A)The conclusion in this item's statement is NOT supported by the above data.
B)The conclusion in this item's statement is supported by the above data.
C)There are insufficient data presented to warrant this conclusion.
Question
If a thermometer measured an oven's temperature as 400°F for five days in a row when the temperature was actually 350°F,this measuring instrument would be

A)both reliable and valid.
B)both unreliable and invalid.
C)reliable but not valid.
D)valid but not reliable.
Question
Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items.
Descriptive Statistics
<strong>Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items. Descriptive Statistics   The reliability of this test was fairly weak.</strong> A)The conclusion in this item's statement is NOT supported by the above data. B)The conclusion in this item's statement is supported by the above data. C)There are insufficient data presented to warrant this conclusion. <div style=padding-top: 35px>
The reliability of this test was fairly weak.

A)The conclusion in this item's statement is NOT supported by the above data.
B)The conclusion in this item's statement is supported by the above data.
C)There are insufficient data presented to warrant this conclusion.
Question
Establishing scoring standards is more difficult for making norm-referenced decisions than for making criterion-referenced decisions.
Question
A test must be both reliable and valid in order to be useful.
Question
A large group of people representing a given grade level across the nation make up the ________ for a standardized achievement test.
Question
A teacher who uses informal ungraded tests is relying upon _________ assessment to her inform his or her teaching.Increasingly teachers today must show student learning and achievement.Overall,in America we are seeing a high increase in the use of high-stakes tests and public demand for _______.
Question
The process that includes many kinds of ways to sample and observe students' skills,knowledge,and abilities is ________.
Question
Criterion-referenced assessment is valuable in determining mastery of basic skills.
Question
The process in assessment that provides a numeric description of a characteristic or event is ________.
Question
Reliability is the degree to which a test measures what it is supposed to measure.
Question
Tests in which the scoring of items does not require interpretation are said to be ________.
Question
A fifth grade teacher gives an ________ to Tasha to measure how much Tasha has learned in social studies.
Question
A test designed to measure differences in achievement among students is ________.
Question
If scores of individuals are consistent over time,the test is said to have high ________.
Question
When test performances are compared to standards rather than to scores of others,the test is said to be ________ referenced.
Question
Woolfolk highly recommends comparing states using standardized test scores because such comparisons provide the most objective information.
Question
The standard error of measurement is related inversely to test reliability,i.e. ,the smaller the SEM is,the higher the reliability coefficient is.
Question
Both criterion-referenced and norm-referenced report cards indicate student progress toward specific goals.
Question
Assessment is the term used to describe the process of gathering information about students' learning outcomes.
Question
Measurement is the quantitative component of evaluation.
Question
Content-related validity refers to the degree to which the test items cover the appropriate topics.
Question
A test that measures what it is supposed to is considered to be ________.
Question
When you write multiple-choice items,you should use

A)as much wording as possible in the distractors.
B)distractors that require fine discriminations.
C)"none of the above" less frequently than "all of the above."
D)stems that present a single problem.
Question
Compare and contrast formative and summative measurements.Identify the different uses of formative and summative tests in your answer.
Question
What guideline for writing multiple-choice items is violated in the following item stem? "A norm-referenced test is..."

A)Each alternative must fit the grammatical form of the item.
B)Item stems should be stated in simple terms.
C)The stem should include a complete question.
D)Unessential details should be omitted in the item stem.
Question
The most defensible practice for scoring essay tests is to evaluate

A)all parts of one student's paper before going on the next student's paper.
B)each one of the items for all students with reference to its respective model answers.
C)each question as acceptable or unacceptable and assign equal weight to each question.
D)the response for each question with regard to content,organization,and mechanics,with each factor weighted equally.
Question
Which one of the following procedures would improve the reliability and validity of grading short essay tests,thus refuting the complaint of sensitivity to bias and variability in grading?

A)Administering more pretests
B)Grading on the curve
C)Implementing a contract system
D)Using a scoring rubric
Question
All of the following statements are true of essay tests EXCEPT:

A)Each question should give students a precise task.
B)Less material can be covered in essay than in multiple-choice tests.
C)Students should be able to answer the questions in a few words for the sake of efficiency.
D)Questions should measure the higher-level objectives.
Question
The most important use of essay tests is to

A)measure simple learning outcomes.
B)measure complex learning outcomes.
C)reduce grading time.
D)sample a wide variety of learning outcomes.
Question
Which one of the following sources would be the LEAST likely product to be found in a student's portfolio?

A)Artistic products
B)Peer comments
C)Standardized test results
D)Written products
Question
Which one of the following strategies does NOT tend to increase the reliability of essay test grades?

A)Base your ratings on a model answer that you have constructed.
B)Grade all essay items for each student in turn based on a pre-established point system.
C)Have students place their names on the back of their test papers.
D)Score all responses to one essay item before moving on to the next item.
Question
Which one of the following statements is TRUE regarding the use of portfolios in assessment?

A)Criterion-referenced rather than norm-referenced grading should be used.
B)Only positive samples of student performances should be selected for a portfolio.
C)Portfolios work best with older students (middle or high school).
D)Teachers rather than students should select the work to be included in the portfolio.
Question
The key feature of authentic assessments is

A)development of tests by professional evaluators.
B)high test-retest reliability.
C)testing in a realistic context.
D)use of essays as the primary form of testing.
Question
Identify the type of objective test item that is most appropriate for measuring the following specific learning outcome: "Select the best reason for a specific action from a given list of alternatives."

A)Essay
B)Multiple-choice
C)Short-answer
D)True-false
Question
Compare and contrast norm-referenced and criterion-referenced tests and identify situations where each would be appropriate.
Question
What is the major problem in scoring essay tests?

A)Difficulty with establishing time limits for responding
B)Limiting the content covered compared to objective tests
C)Restricting the tasks to more complex learning outcomes
D)Subjectivity in assessing the learning products
Question
What is the most serious problem in the multiple-choice test item that you are now reading?

A)Distractor responses should be clearly incorrect.
B)Students should not have to discriminate between the alternative choices.
C)The response choices include two distractors that essentially have the same meaning.
D)The stem does not present a single,straightforward problem.
Question
Objective tests are generally more reliable than essay tests because objective tests can

A)be corrected for guessing a response correctly.
B)contain more independent items measuring achievement.
C)eliminate subjective judgment in their preparation.
D)measure almost any important educational attainment.
Question
Which one of the following actions is a limitation of multiple-choice tests?

A)Allow for bluffing
B)Are difficult to grade
C)Can be difficult to prepare
D)Cannot measure higher-order learning
Question
Which one of the following procedures best reflects performance assessment?

A)Making a class presentation that utilizes more than one medium to demonstrate the steps to follow in designing a specific communication
B)Performing well enough on the SAT to obtain a combined Verbal and Quantitative score of at least 1080
C)Submitting a journal that will be evaluated on the basis of whether it contains notes for each class meeting
D)Writing an essay on the "Republican Revolution" in the 1994 elections,citing primary social,economic,and political forces that led to this event
Question
Define and distinguish between reliability and validity.Is it necessary for a test to be valid in order for it to be reliable? Is it necessary for a test to be reliable in order for it to be valid? Explain your decisions.
Question
Exhibitions differ from portfolios because exhibitions

A)are authentic assessments.
B)involve an immediate audience.
C)use criterion-referenced standards.
D)use norm-referenced standards.
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/185
auto play flashcards
Play
simple tutorial
Full screen (f)
exit full mode
Deck 15: Classroom Assessment, Grading, and Standardized Testing
1
A school administrator wants to identify the top 10 percent of the senior students in order to recommend them for scholarship competition at the highest rated university in the state.What testing purpose would serve the administrator's purpose?

A)Criterion-referenced
B)Diagnostic
C)Norm-referenced
D)Standardized
Norm-referenced
2
Kathy took the Stanford Achievement Test on Monday and again on Friday.Her two scores differed by only three points.These results may indicate a good level of what type of reliability?

A)Alternate-form
B)Internal consistency
C)Split-half
D)Test-retest
Test-retest
3
Paper-and-pencil exercises,direct observations of performances,development of portfolios,and creation of artifacts are all methods of

A)assessment.
B)evaluation.
C)measurement.
D)testing.
assessment.
4
One of the most efficient and effective ways to increase the reliability of a test is to

A)have more than one person grade the test.
B)keep the test brief.
C)lengthen the test.
D)provide ample response time.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
5
The most important attribute of a norming sample is that it should be

A)completely random.
B)large and diverse.
C)limited in size.
D)similar to future test-takers.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
6
At the beginning of the semester,Mr.Rumstead gave a formative test for the purposes of setting objectives.At the end of the course he gave the same test to determine grades.The second time this test was given,it was used as what type of test?

A)Aptitude
B)Diagnostic
C)Formative
D)Summative
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
7
A teacher who is interested in finding out how well a student is doing in class compared to students in other schools should use what type of test?

A)Criterion-referenced
B)Diagnostic
C)Norm-referenced
D)Teacher-designed
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
8
The precision of a test refers to test reliability as measured by what method?

A)Alternate-form
B)Internal consistency
C)Split-half
D)Test-retest
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
9
Criterion-referenced tests are used primarily to assess

A)achievement of general instructional goals.
B)each student's achievement compared to other students.
C)mastery of specific objectives.
D)the range of achievement in a large group.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
10
The term objective as used in objective testing refers to the

A)content goals of the items.
B)goal(s)of the test.
C)type of material covered.
D)way the test is scored.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
11
Which one of the following student outcomes is MOST likely to be the result of a criterion-referenced assessment tool?

A)Anita was the first student to complete the test.
B)Ben answered 10 out of 12 questions correctly.
C)Randy scored at the eighty-ninth percentile.
D)Sonia ranked fifteenth in French and ninth in music.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
12
For which one of the following situations would a criterion-referenced test be the most appropriate measure to use?

A)Assessing the range of abilities in a large,mixed-ability group of students
B)Comparing students' general ability in specific subject areas such as English,algebra,or general science
C)Measuring mastery of basic competencies in addition and subtraction
D)Selecting candidates for a teaching position when only a few openings are available
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
13
"We will have weekly quizzes,but your final grade will be based only on the midterm and final exam." This decision implies that the quizzes are to be used for what type of evaluation?

A)Criterion-referenced
B)Formative
C)Norm-referenced
D)Summative
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
14
A test or rating scale is objective to the extent that it

A)is free of biases of the administrators and scorers.
B)measures only one,or only a very few variables.
C)predicts an important and realistic criterion.
D)yields the same score each time an individual takes it.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
15
The fundamental purpose of educational assessment is to

A)determine the quality of the outcomes being judged.
B)identify which programs or people are superior to others.
C)obtain objective data that express performances in quantitative terms.
D)provide information to support decision-making.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
16
A local high school developed a math achievement test and used the results to determine the selection of students for an advanced placement course with a limited number of seats.What type of test should be used?

A)Criterion-referenced
B)Diagnostic
C)Norm-referenced
D)Standardized
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
17
What type of test would provide the most useful information for the following question: "Are the learning outcomes for the new unit sequenced appropriately?"

A)Diagnostic
B)Formative
C)Placement
D)Summative
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
18
Which one of the following situations requires a norm-referenced evaluation?

A)Assessing whether an individual has been drinking too much to drive
B)Certifying whether a newly graduated education student can perform satisfactorily as a teacher
C)Hiring one manager from a pool of ten applicants for a large department store
D)Reporting to parents about how much students have learned during the semester
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
19
What type of test would provide the most useful information for the following question: "Are students making satisfactory progress in learning the metric system?"

A)Diagnostic
B)Formative
C)Placement
D)Summative
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
20
A major difference between formative and summative tests is the

A)format of the test items.
B)interpretation of the test data.
C)preparation of the test directions.
D)role played by content validity in the two tests.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
21
Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items.
Descriptive Statistics
<strong>Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items. Descriptive Statistics   The validity of this test is relatively strong.</strong> A)The conclusion in this item's statement is NOT supported by the above data. B)The conclusion in this item's statement is supported by the above data. C)There are insufficient data presented to warrant this conclusion.
The validity of this test is relatively strong.

A)The conclusion in this item's statement is NOT supported by the above data.
B)The conclusion in this item's statement is supported by the above data.
C)There are insufficient data presented to warrant this conclusion.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
22
Students from minority groups in deprived socioeconomic backgrounds who have taken culture-fair tests have generally performed

A)better than other minority groups on standardized tests.
B)better than their non-deprived peers but worse than non-minority groups.
C)on par with or even worse than their own performances on other types of standardized tests.
D)worse than their non-deprived peers if the test was individually administered.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
23
What type of validity is currently thought to include all other types of validity?

A)Content-related
B)Construct-related
C)Criterion-related
D)Prediction-related
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
24
The connection between validity and reliability can be best expressed by the statement that validity

A)is essentially the same as reliability.
B)requires and may be assured through reliability.
C)requires but cannot be assured through reliability.
D)requires only a limited reliability.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
25
When a test actually measures what it purports to measure,the test is said to be

A)credible.
B)reliable.
C)usable.
D)valid.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
26
Jerry scored 81 on a test.The mean of this test was 70,the standard deviation was 10,and the standard error of measurement was 2.Given these data,we can be reasonably certain (2 times out of 3)that Jerry's true score would be in the range from

A)71 to 91.
B)75 to 85.
C)77 to 87.
D)79 to 83.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
27
Which one of the following definitions best describes "true score"?

A)Confidence score if the test were perfectly reliable
B)Hypothetical score on a student's best day
C)Observed raw score plus the confidence score
D)Obtained raw score minus measurement error
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
28
A test or any assessment instrument is objective to the extent that it

A)is free of biases of the test administrators and scorers.
B)measures only a very few variables at a time.
C)predicts an important and realistic criterion.
D)yields the same score each time an individual takes it.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
29
After administering a standardized test of reading comprehension,scores are compared to teacher estimates of reading comprehension for each student.What particular technical concern of measurement is involved in this comparison?

A)Commonality
B)Discrimination
C)Reliability
D)Validity
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
30
A general term for the type of testing that is used to guide planning and identify students' needs is ________ assessment.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
31
A general term for the type of testing that is used to determine final achievement in a course is ________ assessment.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
32
The evidence for validity that concerns whether the test measures the trait in question rather than some other trait is what type of evidence?

A)Construct-related
B)Content-related
C)Criterion-related
D)Test-retest
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
33
On standardized tests,a difference of a few points between two raw scores is likely to be insignificant due to the

A)central limits theorem.
B)confidence interval for the scores.
C)equivalence reliability of the tests.
D)probability of chance.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
34
Research on bias in testing indicates that standardized tests predict school achievement

A)equally well for all groups of students.
B)equally well for white,English-speaking students.
C)very accurately for only a small percentage of students.
D)very accurately for high SES students in all ethnic groups.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
35
The validity of any test is related directly to the

A)difficulty of the test.
B)evaluation of expert reviewers.
C)length of the test.
D)purpose of the test.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
36
Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items.
Descriptive Statistics
<strong>Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items. Descriptive Statistics   A raw score of 33 places a student one standard deviation below the mean.</strong> A)The conclusion in this item's statement is NOT supported by the above data. B)The conclusion in this item's statement is supported by the above data. C)There are insufficient data presented to warrant this conclusion.
A raw score of 33 places a student one standard deviation below the mean.

A)The conclusion in this item's statement is NOT supported by the above data.
B)The conclusion in this item's statement is supported by the above data.
C)There are insufficient data presented to warrant this conclusion.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
37
Fred made 103 and Frank made 96 on an achievement test with a confidence interval of 4.These results indicate that

A)Fred's true score is definitely higher than Frank's true score.
B)measurement error can account for the differences in their scores.
C)the true scores of each student are probably very close.
D)the test used to generate these scores must be very reliable.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
38
Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items.
Descriptive Statistics
<strong>Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items. Descriptive Statistics   The amount of measurement error in the score distribution is acceptable.</strong> A)The conclusion in this item's statement is NOT supported by the above data. B)The conclusion in this item's statement is supported by the above data. C)There are insufficient data presented to warrant this conclusion.
The amount of measurement error in the score distribution is acceptable.

A)The conclusion in this item's statement is NOT supported by the above data.
B)The conclusion in this item's statement is supported by the above data.
C)There are insufficient data presented to warrant this conclusion.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
39
If a thermometer measured an oven's temperature as 400°F for five days in a row when the temperature was actually 350°F,this measuring instrument would be

A)both reliable and valid.
B)both unreliable and invalid.
C)reliable but not valid.
D)valid but not reliable.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
40
Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items.
Descriptive Statistics
<strong>Use this information for the following question.The following descriptive statistics are given for an educational psychology examination administered to 323 students (N).There were 60 multiple-choice items (k)on this test.The items in this set are classified as Use of Knowledge (U)items. Descriptive Statistics   The reliability of this test was fairly weak.</strong> A)The conclusion in this item's statement is NOT supported by the above data. B)The conclusion in this item's statement is supported by the above data. C)There are insufficient data presented to warrant this conclusion.
The reliability of this test was fairly weak.

A)The conclusion in this item's statement is NOT supported by the above data.
B)The conclusion in this item's statement is supported by the above data.
C)There are insufficient data presented to warrant this conclusion.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
41
Establishing scoring standards is more difficult for making norm-referenced decisions than for making criterion-referenced decisions.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
42
A test must be both reliable and valid in order to be useful.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
43
A large group of people representing a given grade level across the nation make up the ________ for a standardized achievement test.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
44
A teacher who uses informal ungraded tests is relying upon _________ assessment to her inform his or her teaching.Increasingly teachers today must show student learning and achievement.Overall,in America we are seeing a high increase in the use of high-stakes tests and public demand for _______.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
45
The process that includes many kinds of ways to sample and observe students' skills,knowledge,and abilities is ________.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
46
Criterion-referenced assessment is valuable in determining mastery of basic skills.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
47
The process in assessment that provides a numeric description of a characteristic or event is ________.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
48
Reliability is the degree to which a test measures what it is supposed to measure.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
49
Tests in which the scoring of items does not require interpretation are said to be ________.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
50
A fifth grade teacher gives an ________ to Tasha to measure how much Tasha has learned in social studies.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
51
A test designed to measure differences in achievement among students is ________.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
52
If scores of individuals are consistent over time,the test is said to have high ________.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
53
When test performances are compared to standards rather than to scores of others,the test is said to be ________ referenced.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
54
Woolfolk highly recommends comparing states using standardized test scores because such comparisons provide the most objective information.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
55
The standard error of measurement is related inversely to test reliability,i.e. ,the smaller the SEM is,the higher the reliability coefficient is.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
56
Both criterion-referenced and norm-referenced report cards indicate student progress toward specific goals.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
57
Assessment is the term used to describe the process of gathering information about students' learning outcomes.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
58
Measurement is the quantitative component of evaluation.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
59
Content-related validity refers to the degree to which the test items cover the appropriate topics.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
60
A test that measures what it is supposed to is considered to be ________.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
61
When you write multiple-choice items,you should use

A)as much wording as possible in the distractors.
B)distractors that require fine discriminations.
C)"none of the above" less frequently than "all of the above."
D)stems that present a single problem.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
62
Compare and contrast formative and summative measurements.Identify the different uses of formative and summative tests in your answer.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
63
What guideline for writing multiple-choice items is violated in the following item stem? "A norm-referenced test is..."

A)Each alternative must fit the grammatical form of the item.
B)Item stems should be stated in simple terms.
C)The stem should include a complete question.
D)Unessential details should be omitted in the item stem.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
64
The most defensible practice for scoring essay tests is to evaluate

A)all parts of one student's paper before going on the next student's paper.
B)each one of the items for all students with reference to its respective model answers.
C)each question as acceptable or unacceptable and assign equal weight to each question.
D)the response for each question with regard to content,organization,and mechanics,with each factor weighted equally.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
65
Which one of the following procedures would improve the reliability and validity of grading short essay tests,thus refuting the complaint of sensitivity to bias and variability in grading?

A)Administering more pretests
B)Grading on the curve
C)Implementing a contract system
D)Using a scoring rubric
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
66
All of the following statements are true of essay tests EXCEPT:

A)Each question should give students a precise task.
B)Less material can be covered in essay than in multiple-choice tests.
C)Students should be able to answer the questions in a few words for the sake of efficiency.
D)Questions should measure the higher-level objectives.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
67
The most important use of essay tests is to

A)measure simple learning outcomes.
B)measure complex learning outcomes.
C)reduce grading time.
D)sample a wide variety of learning outcomes.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
68
Which one of the following sources would be the LEAST likely product to be found in a student's portfolio?

A)Artistic products
B)Peer comments
C)Standardized test results
D)Written products
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
69
Which one of the following strategies does NOT tend to increase the reliability of essay test grades?

A)Base your ratings on a model answer that you have constructed.
B)Grade all essay items for each student in turn based on a pre-established point system.
C)Have students place their names on the back of their test papers.
D)Score all responses to one essay item before moving on to the next item.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
70
Which one of the following statements is TRUE regarding the use of portfolios in assessment?

A)Criterion-referenced rather than norm-referenced grading should be used.
B)Only positive samples of student performances should be selected for a portfolio.
C)Portfolios work best with older students (middle or high school).
D)Teachers rather than students should select the work to be included in the portfolio.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
71
The key feature of authentic assessments is

A)development of tests by professional evaluators.
B)high test-retest reliability.
C)testing in a realistic context.
D)use of essays as the primary form of testing.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
72
Identify the type of objective test item that is most appropriate for measuring the following specific learning outcome: "Select the best reason for a specific action from a given list of alternatives."

A)Essay
B)Multiple-choice
C)Short-answer
D)True-false
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
73
Compare and contrast norm-referenced and criterion-referenced tests and identify situations where each would be appropriate.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
74
What is the major problem in scoring essay tests?

A)Difficulty with establishing time limits for responding
B)Limiting the content covered compared to objective tests
C)Restricting the tasks to more complex learning outcomes
D)Subjectivity in assessing the learning products
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
75
What is the most serious problem in the multiple-choice test item that you are now reading?

A)Distractor responses should be clearly incorrect.
B)Students should not have to discriminate between the alternative choices.
C)The response choices include two distractors that essentially have the same meaning.
D)The stem does not present a single,straightforward problem.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
76
Objective tests are generally more reliable than essay tests because objective tests can

A)be corrected for guessing a response correctly.
B)contain more independent items measuring achievement.
C)eliminate subjective judgment in their preparation.
D)measure almost any important educational attainment.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
77
Which one of the following actions is a limitation of multiple-choice tests?

A)Allow for bluffing
B)Are difficult to grade
C)Can be difficult to prepare
D)Cannot measure higher-order learning
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
78
Which one of the following procedures best reflects performance assessment?

A)Making a class presentation that utilizes more than one medium to demonstrate the steps to follow in designing a specific communication
B)Performing well enough on the SAT to obtain a combined Verbal and Quantitative score of at least 1080
C)Submitting a journal that will be evaluated on the basis of whether it contains notes for each class meeting
D)Writing an essay on the "Republican Revolution" in the 1994 elections,citing primary social,economic,and political forces that led to this event
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
79
Define and distinguish between reliability and validity.Is it necessary for a test to be valid in order for it to be reliable? Is it necessary for a test to be reliable in order for it to be valid? Explain your decisions.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
80
Exhibitions differ from portfolios because exhibitions

A)are authentic assessments.
B)involve an immediate audience.
C)use criterion-referenced standards.
D)use norm-referenced standards.
Unlock Deck
Unlock for access to all 185 flashcards in this deck.
Unlock Deck
k this deck
locked card icon
Unlock Deck
Unlock for access to all 185 flashcards in this deck.