Thursday, November 8, 2007

Context in Language Software Evaluation

Language teachers are frequently called on by their school or department to evaluate tutorial CALL software for students to use in class or in a self-access lab on their own. Such evaluations are seldom carried out by someone with all the relevant skills and experience to conduct a pedagogically sound evaluation or, if so, may not apply a principled approach to evaluation. Typically, an evaluation takes the form of a superficial sight-seeing trip through randomly selected parts of a program until one tires of the experience and a feeling from the gut urges "use it" or "forget about it."

What is tutorial CALL?
We're talking here about pedantic, tutorial software that offers explicit language instruction, not generic computing or CMC tools that might be used in a language learning environment. Evaluations may be for
  • software
  • websites
  • courseware
Some common criticisms of tutorial CALL

"Students don't learn a language with a computer program."

True, but only the likes of Rosetta Stone and others traffic in the kind of absurd marketing claims that software alone can teach language. Moreover, with only a textbook students won't learn to communicate in a language either, but we still use it to provide structure.
Teachers experienced in using technology in the classroom know that tutorial CALL programs supplement classroom activities; they don't replace them. And they only accomplish that if well chosen.

"The problem with the software is that you don't know if students are using it."

Again, this is only true if we chose the wrong software. If we looked broadly at "educational software," we would see that the vast majority of titles are designed for the retail market (individual not institutional use). The design criteria there seems limited in scope to flashy graphic features that reproduce well in printed ads, rather than the more bothersome design assumptions involving language teaching methodology, what Hubbard calls "teacher fit" (approach), or the theory that describes ideal conditions for instructed SLA and the construction of tasks that provide those conditions (Chapelle).

Retail programs usually do not concern themselves with providing mechanisms for accountability. Institutional settings may require the software to report student time on task as well as scoring, either through e-mail notification, logs accessible by the teacher, or some kind of built-in drop box. More elaborate programs used in schools integrate these kind of LMS features.

Purposes of evaluations
  • to make a purchase or implementation decision on software, a process that effectively ends when this practical outcome is reached (a decision-driven evaluation);
  • to give design feedback in the developmental stages of software (a formative evaluation);
  • for research motivated by a hypothesis or open-ended question;
  • for a published software review (a summative evaluation).

General problems with evaluations
  • Evaluators use different criteria.
  • Evaluators are informed by different interests, knowledge, and experience.
  • Evaluations lack consistency across reviews.
  • Evaluations lack of inter-rater reliability.
The dilemma facing most evaluation situations is that while only a local decision can take the specific learning environment and population into account, only one based on a principled approach (see below) by someone with a grounding in language teaching methodology, instructional design, and of course content expertise can render an evaluation that's valid and consistent. Teachers selecting their own software tend to evaluate subjectively based on their own teaching and learning experience, computer literacy, and personal preferences.

Types of evaluation
If we look specifically at summative evaluations of language learning software, we find the following common approaches:
  • Checklists
  • Guides
  • Surveys
  • Principled frameworks
Checklists, the most common approach, offer a set set of questions, usually binary options or fill-in. They are simple to follow and may raise awareness among teachers inexperienced in CALL of the wide range of factors to be considered. They are more meaningful if questions elicit commentary.

Criticisms of checklists abound:
  • Terms used are not defined, used inconsistently, or open to varied interpretation.
  • Elements are not weighted; some influence may seem disproportionate.
  • Their simplicity belies the need for background knowledge and experience to accurately, appropriately respond.
  • Questions are little more than lists of features to look for.
  • They are focused on technology more than teaching and learning (language learning potential).
  • They lack reliability, validity.
For a checklist example, see the Software Evaluation Guidelines by the National Center for ESL Literacy Education (2003). This checklist addresses technical and pedagogical issues but not methodology specifically, which is a common omission. Among the questions:

"Do the individual program lessons fit within the time constraints of class or lab sessions so that a learner can finish a lesson in one sitting?"

This question seems to assume some validity in the use of tutor-type software in a class—at the expense of human instruction and more meaningful and authentic student-student or student-teacher interaction. For the most part, aside from introducing the functionality of tutorial CALL software to students, class is not the place to work on these programs. The
tutor mode of computer use implies the absence of a teacher and the presence of a virtual teacher, if you will.

Guides are what I would describe as a hybrid between a checklist and more discursive prompts for thinking through, if not formally evaluating, the pedagogical value and instructional design efficiency of software. I created such a guide ten years ago, A Guide for Evaluating Language Learning Software.

Surveys assess student or teacher response to software or courseware after a considerable period of use, such as a semester.

Principled frameworks* represent an organizing scheme to characterize relationships between elements of language teaching and learning and computer use. Among the best known and most often referred to in the filed include

  • Philip Hubbard's framework:
    • methodology driven
    • emphasizes need for evaluator to understand LL approach taken in design and fit to instructional approach
    • Non-hierarchical model**
      • Teacher fit (approach): assumptions about the nature of LL in light of what’s possible with computer-aided instruction.
      • Learner fit (design): realization of approach: syllabus, tasks, activities, language difficulty, skill focus, roles of teacher and learner materials.
      • Operational description (procedure): the form the approach and design take in the program: layout, activity type, feedback.

  • Carol Chapelle's framework
    • theory-based, task-oriented
    • driven by interactionist position***
    • focused on design and structure of LL task
    • evaluation can be judgmental at initial selection based on how well suited it appears to be and it can be done empirically based on data from actual student use


  • CALICO Journal Software Review (Jack Burston)
    • descriptive not prescriptive
    • discursive not intuitive
    • software requirements (must meet first two and some combination of last three)
      1. Pedagogical validity
      2. Curriculum adaptability
      3. Efficiency
      4. Effectiveness
      5. Pedagogical innovation

      Four categories, based on Hubbard’s framework, form the template for evaluation for the journal:

      1. Technical features
      2. Activities (procedure)
      3. Teacher fit (approach)
      4. Learner fit (design)


      The CALICO software evaluation template thus presents a consistent qualitative measuring device.


    • *As described by Levy and Stockwell in CALL Dimensions: Options and issues in computer-assisted language learning, pp 59–64.
      **Based on Richards and Rogers model (1982, 1986) of Approach, Design, and Procedure.
      ***Language is a rule-governed cultural activity learned in interaction with others; environmental factors are more dominant in language acquisition, as opposed to innate abilities of the nativist position.



No comments: