Thursday, August 2, 2012

Evaluating Training – What it all about?

Theoretical Basis – Psychometrics

A recent work experience has reminded me again that sometimes we lose track of the big picture and get caught up in the process; specifically in this case the Kirkpatrick Model for evaluating training programs.

Due to the almost cult status of the Donald Kirkpatrick’s Four Level Model (1) for evaluating training programs, I’m going to assume you have some familiarity with the model and I won’t overview it here.  The recent challenge has to do with the Level-Three Evaluation or evaluating behavior.

As I began thinking about the topic of evaluations, and trying to frame the big picture, I decided to develop a concept map to help give me some focus. I’ve attached a cleaned up version so you can’t see how messy my brain really is. As the various components and associations between the components took form the genesis, the source, the Big Daddy of Reality hit me that provides the key to being successful in conducting evaluations.  Here it is…

The why and how to conducting evaluations in the realm of training must be linked to why we implement training or the specific training methods. Sound simplistic?  It might be.  The goal of evaluation is to determine the worth or effectiveness of whatever is being evaluated.


Evaluations can be divided into two major types: summative which is used to determine if a goal has been met, or formative, a feedback mechanism, focused on improving training results. This post is going to focus on the formative type.

Two Major Categories of Evaluations

The major methods of formative evaluations that provide actionable information include:

·       Student feedback. Also known as smiley sheets, student reaction sheets, level-one evaluations. This is a measurement of customer satisfaction; what they liked or didn’t like. (Not to be confused with what they learned).

·       Interim assessments. A variety of activities that spot check on how the learners are progressing during the learning process.  These include: homework, quizzes, worksheets, questioning, one-minute papers, concept mapping, problem solving observations…

·       Test item analysis. Using performance data on test questions.  The data can indicate areas of learning weakness, poor test item construction, poor instructional design or ineffective implementation.

·       Application assessment. A determination of transfer-of-training. This is the extent learners were able to apply what they learned in the work environment and to what degree?
Major Types of Formative Evaluations

What’s the Hype About?

Why all the excitement about conducting Level-Three? In my opinion, if stems from the misconception that the information gather in level-three is more important or valuable then in the preceding levels. Or maybe it is the professional bragging rights to be able to say we conducted, some arbitrary set number of Level Three evaluations.  Our general notion that bigger is better run-a- muck. Whatever the motivation, the problem is force fitting a process that doesn’t fit into reality and yields little benefit.

Should the Focus be on Why Behavior Changed?

Somehow the connection between learning (Level-Two) and behavior (Level-3) got disconnected. In designed instruction, the content is based on the learning objectives derived from the tasks performed on the job.  Hmmm…tasks…behavior.  If we achieved succes in design and correctly tested for learning achievement of objectives (behaviors) we already know the answer to a Level-3; that behavior has changed (or not) as a result of the training. If you took Kirkpatrick’s tenet that you should allow time (a vague requirement) for changes in behavior to take place, none of us would be testing immediately after training.  Maybe he is really suggesting a check for retention?
Level 2 Evaluations Validate Desired Behaviors

An Opportunity for Improvement

I think Mr. Kirkpatrick started out with the right goal but chose the incorrect methodology. He suggests the focus should be on “How much transfer of knowledge, skills and attitudes occurs?” then offers a classic Posttest-Only Control Group design which will tell you if the training was the cause of the change in behavior, but not the extent or effectiveness of the transfer of the training.  Great if that is what you want to know.  I suggest a simple pretest would reveal if potential students already have the KSAs desired. Then compared to the posttest (Level-2) you would know if training was the causal factor.

I submit the evaluation should provide “actionable” data that can be used to determine:

·       Is transfer of training occurring? (Are learning outcomes being performed on the job?)

·       The validity of the analysis (Are we teaching the correct behaviors?)

·       The effectiveness of training (How well the learner was prepared to support field activities?)

·       The efficiency of the training (Are the process or methods the most efficient?)

Since behavior change has already been verified via Level-Two, I submit it is a lavish use of resources to conduct level-three evaluations using the method described by Kirkpatrick for a performance based training programs. There is another evaluation that can provide valuable information by focusing on the degree of success of transfer-of-training; an application assessment.

In an application assessment (transfer-of-training) information would be gained from the learners and the learners’ supervisor or manager via observation, survey and interviews.  If it is not happening then find out why.

There are many non-training barriers in the way of transfer including (2):
·       Lack of reinforcement on the job

·       Interference from immediate (work) environment

·       Nonsupportive organizational culture

·       Pressure from peers to resist change

It is important to remove these barriers although is suggest this is more of a leadership or management function.

Other business and government agencies have recognize the benefits of a properly focused evaluation and have developed survey instruments.  Take a look at the survey instruments developed by the U.S. Coast Guard in the attached article.  The Coast Guard survey example offers an interesting Likert type scale for gathering information about “training benefit” that could be used as a template.

There are other related questions that may be of a concern like retention and proficiency that deal with spacing learning events over time and opportunities for practice but we’ll save those for another post.


Next time you’re requested to perform a Kirkpatrick Level –Three evaluation, evaluate if the information generated is “actionable” and the return on the resource investment will give you real value.  And, consider offering an alternative.

Best Regards,



1.   Kirkpatrick Donald, L.(1996). Evaluating Training Programs: The four levels. San Fransisco, CA. Berrett-Koehler Publishers.

2.   Broad Mary, L. & Newstrom John, W. (1992) Transfer of Training; Action-Packed Strategies to Ensure High Payoff  from Training Investments. Addison-Wesley, Reading, Massachusetts.