Assessment: introduction for learning and teaching


  • Introduction
  • Why assess?
  • What is assessment?
  • Assessment related tasks
  • Selecting and unpacking ideas for assessment
  • Quality Assessment Standards - check list
  • Determining outcomes
  • What is authentic assessment?
  • When do teachers assess students' attainment of outcomes?
  • Four different times for assessment - diagnostic, formative, summative, and generative
  • Steps for Creating Criteria Referenced Assessment
  • Norm-Referenced tests
  • Accommodation information
  • Assorted assessment and evaluation links
  • Practice assessment scenarios and role play for preprofessional and professional educators:

Assessment glossary


The most important task in schooling should be: deciding what students should learn. As important as this is, it is often neglected by assuming the standards provide worthy learning goals and outcomes for all student. However, this is not the page to lament that topic, but it is critical to emphasize - it is importance to consider the appropriateness of what we expect students to know before you begin an assessment process. With that as a warning, this article focus is on assessment.

Why assess?

Assessment is conducted continuously by both professional educators and students with the purpose to improve learning and instruction.

What is assessment?

Assessment is the collection of data to inform. It is the measurement activities educators use to attempt to make valid inferences about students' knowledge, skills, and dispositions; as well as using those measurements and inferences to make curricular and instructional decisions. Information that should be bias free, fair and consistent (valid and reliable) for the purpose of suggesting developmentally appropriate activities to achieve productive mastery oriented growth and assist with affective communication for learning through instruction.

A cautionary note: assessment and evaluation are related, but they are not the same. Evaluation is a process of determining the value, or quality, of a response, product, or performance, that is assessed. It should be based upon established criteria. It is the evaluator whom puts value labels on the different levels of the criteria of performance. When grades are used to put students into categories, failing or passing, it has valued consequences, and is evaluation. When teachers are fired based on their students' performance, that is evaluation. When decisions of consequence are made based on a value judgment, that is evaluation. Someone decides consequences, often rewards and punishment, based on their evaluation of achievement levels.

One may wonder if all assessment is evaluative. However, that doesn't have to be the situation. If the assessment levels are constructed to describe levels of achievement students will develop as they progress to mastery of a topic or skill, then there need be no value placed on student's performance. No need to down grade, or evaluate young students performance poorly when it is below the top level as the student's performance would be assessed and described by its levels to decide where learning and instruction should start and what kinds of learning activities would be best to move closer toward mastery. No value is attached. Only decisions based on the student's performance at his or her current developmental level.

Good assessment levels are guides that describe how students progress and can be used to guide and anticipate what students might do, by assessing their present understanding and performance, and making instructional and learning decisions. Once a value is added, then they become evaluations - like grades, GPA...

This article focuses on assessment of learning. For information related to teacher assessment see - teaching in the pedagogy directory.

Assessment related tasks

While the following tasks are numbered they do not need to be started or completed in any specific order. They are interrelated and work on one will often necessitate a change in another.

  1. Decide topics and big ideas students will learn.
  2. Unpack the topics or big ideas to identify facts, concepts, relationships, generalizations, skills, processes, attitudes, and habits of mind students will need to know to perform activities for the topics or big ideas.
  3. Identify problems, questions, tasks, or activities students can perform to learn the intended information and skills and which will also provide usable information to infer student's understanding and use of the ideas, attitudes, habits of mind, skill to perform tasks, and their abilities to use the necessary practices and processes to investigative the topic to solve problems and gain expertise of the big idea.
  4. Describe how students can be expected to communicate and/ or do the identified problems, questions, tasks, or activities when they have mastery of the topic or big idea.
  5. Describe the kinds of responses students might have as they begin to learn and on through mastery. Order them into levels that describe observations of what students will say or do, when they answer questions or perform tasks or activities. The student's performance will be the artifact to infer what students know to assess their learning and make teaching and learning decisions.


Selecting and unpacking ideas for assessment

After a topic, big idea, or skill is identified, the first thing to do is decide what information is important to know about it, how understanding can be expressed, and what level of understanding, competence, or skill students should have. This process is known as unpacking a standard or idea and the results are communicated as: big ideas, goals, facts, concepts, generalizations, objectives, or outcomes. See planning and the planning tool box for more information.

Next, is to decide how to determine what students know about the concepts or information related to the concepts. To determine how much a person knows about an idea is measurement. All measurement needs a standard or unit to use as a ruler. Since concepts are mental ideas and can't be seen, then they can't be measured directly. The solution...

To determine what concepts a person knows and the degree of conceptual understanding he or she has of those concepts we observe students as they do or make something. What they say or do is know as an artifact. So in the assessment game concepts or ideas are associated to what a person can do to indicate their conceptualization of a particular concept. In other-words what a person can do is used to infer what there understanding is for the ideas being assessed. Usually those ideas have different levels of understanding and those levels are used as the measuring guide or ruler. The measuring guide, therefore, describe different levels of something a person can do to indicate the person's level of understanding or skill. This information is known as a scoring guide or rubric and each level has information know as outcomes.

Placing a student at a particular level is the measurement aspect. Just as the length of a room can be measured differently by different people the placing of students at different levels can also occur. These differences or inconsistencies are attributed to bias, or an assessment that is not valid or reliable.

For example. A scoring guide or rubric that has clear descriptions of levels with sharp differences between level outcomes will be easier for people to score student's behavior or artifact. It would also provide for more agreement between different teachers scoring the same behavior or artifacts resulting with more scores at the same level for the same student or similar student understanding. This assessment would be considered as a more valid assessment. Validityis how well the assessment measures what it is supposed to measure. Ways to increase validity.

The amount of agreement among different observer's measurements or level placements to each others determines the assessment's reliability. A test’s consistency or the degree to which an assessment yields consistent results; ways to attain reliability include test-retest, alternate form, split-half, and inter rater comparisons. The manner in which an assessment is created, implemented, and scored all affect its reliability. Ways to increase reliability.

Quality Assessment Standards - check list

  1. Match and measure the intended big ideas, goals, outcomes, concepts, skills, standards (YES - NO)
  2. Students have opportunity to learn (YES - NO)
  3. Free of bias (YES - NO)
  4. Developmentally and grade level appropriate (YES - NO)
  5. Consistent, reliable and valid (YES - NO)
  6. Appropriate levels. Be aware of levels that are camouflaged evaluative statements. (YES - NO)

Determining outcomes

Each topic or subtopic can be stated as a big idea. A statement that captures the essence or power of the topic. Each topic must be unpacked or its powerful ideas identified along with all the necessary subtopics.

Suggestions for unpacking, concepts, and outcomes can be found in planning and other subject and topic related areas on this site.

Whether you create your own or use concepts and outcomes from curriculums or standard documents it is essential you have a very good understanding of what is desirable for students to know (concept, relationship, generalization) and do (outcome, skills) at their specific level to be able to create and implement assessment.

What is authentic assessment?

Authentic assessment is when the task students perform to demonstrate a behavior, skill or create an artifact is a meaningful, often real world, application of knowledge, skills, and dispositions. The closer the task is to what people do in the world as architects, machinists, doctors, mechanics, construction workers, designers, business people, politicians, parents, citizens, the more authentic the assessment.

An examples of an authentic assessments in mathematics for algebra and pattern recognition.
Compared to the same task as teacher directed or work book kind of assessment.

When do teachers assess students' attainment of outcomes?

If assessment is ongoing and continuous, then the answer to the question. When should we assess? Is answered with always.

However, if assessment is ongoing and continuous there is an element of time for assessment that has meaning to the assessor and the person being assessed with respect to self-assessment.

The time element can be separated into four categories, which are helpful to use when making decisions to facilitate learning.

Four different times for assessment - diagnostic, formative, summative, and generative

While it's possible any assessment task, activity, or questions might fit in any and all of these four categories, here are some important reasons to consider each of the four.

1 - Diagnosis.

The major characteristic to associate with diagnostic assessment is that it is preliminary.

It is to probe into what is known before facilitating instruction. It usually precedes learning activities, but doesn't have to. It can come at anytime during a lesson. For example, if during a lesson a question arises, that depends on background information, the teacher can ask a diagnostic question to check the students' level of understanding for that background information. At the end of that diagnostic question, she can decide if the students are ready to proceed or if the background information needs to be developed before continuing on the day's planned activity. Diagnose of students' readiness.

2 - Formative.

This type of assessment is used to check the students' progress toward learning. It can happen at anytime during a lesson and is usually understood as such.

3 - Summative.

This kind of assessment is usually associated with the time immediately after facilitating learning. However, what is that time frame? Is it at the end of a five minute mini-teach where summative assessment is to summarize what was learned in the five minutes? Or is it a time frame of an hour, day, week, month, or year?

One could argue that it is only summative if you are inclined to think the students understand the concepts and can perform the outcomes, other wise it could be considered formative. Whatever, it usually is considered the last assessment before the teacher moves to another topic. It could be the first summary check, or a question to double or triple check, and of course the assessment results will be within the range of acceptable or above.

4 - Generative.

This assessment is to inquire into the students' understanding of being able to apply, use, adapt, alter, or join ideas that have been taught. The purpose of seeing how well students understand what they learned.

Assessment that tries to determine if the information has become strong enough to be usable beyond the scope of the examples to which they are familiar or examples that are similar to what was presented during instruction, or are they able to use it in ways that were not presented and demonstrate a variety of application, analysis, and synthesis with the information.

Again, any one of these kinds can come at anytime of the lesson and only makes sense with respect to the purpose of the teacher within a sequence of facilitating learning. When planning a teacher can anticipate all four types of assessment that will be used through out a sequence for each concept within the sequence. The planning will prepare the teacher to interact with students and be ready to facilitate their learning in real time that will be individualized for each student.

Steps for Creating Criteria Referenced Assessment

  1. Identify a big idea or a general description of what students are to know to be assessed.
  2. Identify facts, concepts, generalizations, skills, process, and/or attitudes needed to understand what is characterized by the big idea or other description of what students are to know.
  3. Identify problems, questions, tasks, or activities that students can perform that could provide usable information to infer students understanding of the concepts or generalizations; their attitudes related to the topics, concepts, and/ or generalizations; their skill or ability to perform necessary or identified skills; and their abilities to use inquiry and selected investigative practices and processes.
  4. Describe what students can be expected to communicate and/ or do if they are to successfully complete the identified problems, questions, tasks, and/ or activities.
  5. Select one or more problems, questions, tasks, or activities that can be used to initiate student’s performance of the selected task that will provide assessment data to infer their understanding, attitude, and/ or skills in the identified areas.
  6. Create the assessment problems, questions, tasks, or activities, as they will be presented to students.
  7. Create and write all administration guidelines that are necessary to engage students in the assessment task so that they might be successful in completing it.
  8. Review the needs for all students that will be assessed to identify all accommodations for special needs and provide for those accommodations.
  9. Identify and describe what kinds of responses students might have for each item for different levels of understanding or skills that might be seen in an artifact created by a student or viewed when performed by a student.
  10. Order the different descriptions from low to high levels and select or modify them to create a scoring guide or rubric.
  11. Describe how the scoring guide or rubric will be checked for consistency, validity, and reliability.
  12. Describe how to check the assessment items for bias so that no one will be offended or unfairly penalized.
  13. Pilot assessment task/tool to see how it works with students.

Norm-Referenced tests

Norm-Referenced tests are standardized based on a representative group.

Student performance is compared to a standardized group with such statements as:

  1. Your child scored at the 50% percentile on the Iowa Test of Basic Skills. That can be interpreted to mean that 49% of the students that took the test scored lower and 49% of the students scored higher.
  2. Your child scored at the 99% percentile on the Iowa Test of Basic Skills. That can be interpreted to mean that 99% of the students that took the test scored lower, none of the students scored higher, and 1% had the same score.
  3. Your child scored at the 75% percentile on the Iowa Test of Basic Skills. That can be interpreted to mean that 74% of the students that took the test scored lower, 24% of the students scored higher, and 1% scored the same.


  1. Iowa Test of Basic Skills (ITBS)
  2. California Achievement Test (CAT)
  3. American College Testing (ACT)
  4. Metropolitan Reading Readiness Test (MRT)
  5. Cognitive Abilities Test (CogAt)


  1. Provide information about the achievement of individual students or groups of students
  2. Identification of possible ways to improve school curriculums or programs
  3. Purchased, administered, and scored inexpensively
  4. Supplements other assessment methods to clarify the larger picture of student performance
  5. Objective scoring procedure


  1. People too often miss use test to categorize and label students in ways that can cause damage
  2. People frequently misuse test scores to make improper comparisons between schools, districts, classes
  3. Fails to promote individual student learning
  4. Is a poor predictor of individual student performance
  5. Usually mismatches with the content of a school’s curriculum
  6. Can be used to dictate and restrict curriculum
  7. People too often assume that test scores are infallible
  8. Too often people develop an over reliance on this one type of assessment
  9. Results are based on a normal distribution (bell-shaped curve)
  10. Measures students against other students
  11. Sorts students into winners and losers
  12. Does not test for what students know in a manner that can be used to facilitate learning.

Accommodation information

Accommodation introduction and examples

Relationship of assessing growth and achievement

The assessment and accountability movement has caused people to realize that reporting and relying on achievement or proficiency alone to rate teachers gives an incomplete picture. Students can come into a grade above or below grade level and make little or very good progress and that progress or lack of progress will not be represented in an achievement or proficiency score alone. Therefore, making it impossible to determine a teacher's success or failure with an end of year achievement or proficiency score. Therefore, data must be collected to determine both achievement/ proficiency and rate of achievement or growth.

Reporting and displaying data for both growth and achievement is one way schools, government, and other stake holders believe they can achieve a more accurate view of what is being achieved by students and teachers. For example: scores for achievement could be reported in terms of a percent for proficiency and growth reported in terms of an average growth percentile for individual or groups of students. These scores would then be plotted on a graph with 0-100% on each axis where the fifty percentile divides each in half. Creating the representation, like the one below with four quadrants, one in which each of the combined scores might fall.

chart growth and proficiency


Assorted assessment and evaluation links

Practice assessment scenarios and role play for preprofessional and professional educators:


Dr. Robert Sweetland's notes