CHAPTER 2: LITERATURE REVIEW
2.1 BASIC CONCEPTS OF TESTING
According to Brown (1994: p.252), “A test, in plain or ordinary words, is a method of
measuring a person’s ability or knowledge in a given area.” Moore (1992: p.138) proposes
that evaluation is an essential tool for teachers because it gives them feedback concerning
what the students have learned and indicates what should be done next in the learning
process. Evaluation helps us to understand students better, their abilities, interests,
attitudes, and needs in order to better teach and motivate them. However, in the book of
Brown (1994, p.373) he stresses that tests are seen by learners as dark clouds hanging over
their heads, upsetting them with thunderous anxiety as they anticipate the lightning bolts of
questions they do not know and worst of all a flood of disappointed if they do not make the
grade. Read (1983, p.3) shares the idea saying a language test is a sample of linguistic
performance or a demonstration of language proficiency. In other words, a test is not
simply a set of items that can be objectively marked; it can also involve a ‘subject’
educational of spoken and written performance with the assistance of a checklist, a rating
scale, or a set of performance criteria.” Nga (1992, p.2) also confirms that tests commonly
refer to a set of items or questions designed to be presented to one or more students under
specified conditions. Harrions (1986, p.1) notices that a natural extension of classroom
work, providing teachers and students with useful information that can serve as a basis for
improvement and a test is necessary but unpleasant imposition from outside the classroom.
That means test is a useful tool to measure learners’ ability in a certain situation especially
in classroom.
2.2 TYPES OF TESTS
2.2.1 Proficiency Tests
According to Hughes (1990:9), “Proficiency tests are designed to measure people’s ability
in a language regardless of any training they may have had in that language.” That is to say
the content of a proficiency test is not based on the content or objectives of any language
5
course test takers may have followed. It is rather based on a specification of what they
have to be able to do in the language to meet the requirement of their future aims.
Other test specialists, such as Carroll and Hall (1985), Harrison (1986) and Henning (1987)
share the same view that proficiency test helps both teachers and learners know whether
the learners can be able to follow a particular course or they have to take some pre-
departure training to some other popular tests such as TOEFL, IELTS, which are used to
test students’ proficiency for their study in some English speaking countries. In Vietnam
proficiency tests are of different levels namely A, B, C for workers, engineers, teachers,
architects, etc.
2.2.2 Achievement Tests
As it has been mentioned above, not many teachers are interested in proficiency tests since
it does not base on any particular course book. (Hughes, 1990:10) states: “In contrast to
proficiency tests, achievement tests are directly related to language courses, their purpose
being to establish how successful individual students, groups of students, or the courses
themselves have been in achieving objectives”. Achievement tests are usually carried out
after a course on a group of learners who take the course. Sharing the idea about
achievement tests with Hughes, Brown (1994:259) suggests: “An achievement test is
related directly to classroom lessons, units or even total curriculum”. Achievement tests, in
his opinion, “are limited to a particular material covered in a curriculum within a particular
time frame.” Another useful comment on achievement tests offered by Finocchiaro and
Sako (1983:15) is that achievement types or attainment tests are widely employed in any
language teaching institutions. They are used to measure the amount of degree of control
of discrete language and cultural items and of integrated language skills acquired by the
students within a specific period of instruction in a specific course”. In his book, Harrison
(1983:7) shows: “an achievement test looks back over a longer period of learning than the
diagnostic test, for example, a year’s work, or even a variety of different courses.” He also
points out that achievement tests are intended to show the standard, which the students
have reached in relation to other students at the same level.
There are two kinds of achievement tests: final achievement tests and progress
achievement tests.
6
Final achievement tests are those administered at the end of a course of study. They may
be written and administered by ministries of education, official examining boards, or by
members of teaching institutions. Clearly, the content of these tests must be related to the
courses with which they are concerned, but the nature of this relationship is still a matter of
disagreement amongst language testers.
According to some testing experts, the content of a final achievement test should be based
directly on a detailed course syllabus or on the books and other material used. This has
been referred to as the syllabus–content approach. It has an obvious appearance, since the
test only contains what it is thought that the students have actually encountered, and thus
can be considered, in this respect at least, a fair test. The disadvantage of this type is that if
the syllabus is badly designed, or the books and other materials are badly chosen, then the
results of a test can be very misleading. Successful performance on the test may not truly
indicate successful achievement of course objectives.
The alternative approach is to design the test content directly on the objectives of the
course, which has a number of advantages. Firstly, it forces designers to elicit course
objectives. Secondly, test takers show how far they have achieved those objectives. This in
turn puts pressure on those who are responsible for the syllabus and for the selection of
books and materials to ensure that these are consistent with the course objectives. Tests
based on course objectives work against the perpetuation of poor teaching practice, a kind
of course–content–based test, almost as if part of a conspiracy fails to do. It is the author’s
belief that test content based on course objectives is much preferable, which provides more
accurate information about individual and group achievement, and is likely to promote a
more beneficial backwash effect on teaching.
Progress achievement tests, as the name suggests, are intended to measure the progress that
learners are making. Since ‘progress’ in achieving course objectives, these tests should be
related to objectives. These should make a clear progression towards the final achievement
test based on course objectives. Then if the syllabus and teaching methods are appropriate
to these objectives, progress tests based on short – term objectives will fit well with what has
been taught. If not, there will be pressure to create a better fit. If it is the syllabus that is at fault, it
is the tester’s responsibility to make clear that it is there, that change is needed, not in the tests.
7
In addition, more formal achievement tests require careful preparation; teacher could feel
free to set their own ways to make a rough check on students’ progress to keep learners on
their toes. Since such tests will not form part of formal assessment procedures, their
construction and scoring need not be purely towards the intermediate objectives on which a
more formal progress achievement tests are based. However, they can reflect a particular
‘route’ that an individual teacher is taking towards the achievement of objectives.
2.2.3 Diagnostic Tests
According to Hughes (1990:13), “Diagnostic tests are used to identify students’ strengths
and weaknesses. They are intended primarily to ascertain what further teaching is
necessary”. Brown (1994:259) proposes, “A diagnostic test is designed to diagnose a
particular aspect of a particular language.” Harrison (1983) remarks that this kind of tests
is used at the ends of a unit in the course book or after a lesson designed to teach one
particular point. This kind of test is reasonably straight-forward to find out what skills are
applied well or badly by the learners. Otherwise, this leads to disadvantage, as it is not so
easy to obtain a detailed analysis of a learner’s command of grammatical structures. In
order to be sure of this, we would need a number of examples of the choice the student
made between the two structures in every different context on which we thought was
significantly different and important enough to warrant obtaining information. Tests of this
kind still need a tremendous amount of work to produce. Whether or not they become
generally available will depend on the willingness of individuals to write them and of
publishers to distribute them.
2.2.4 Placement tests
According to Hughes (1990:14), “Placement tests are intended to provide information
which will help to place students at the stage of the teaching progamme most appropriate
to their abilities. Typically, they are used to assign students to classes at different levels.”
In other words, we use placement tests to place pupils into classes according to their ability
so that they can start a course approximately at the same level as the other students in the group.
2.2.5 Progress Tests
A progress test is designed to measure the extent to which the students have mastered the
material taught in the classroom. It is based on the language programme which the students
8
have been following and is just as important as an assessment of the teacher's own work as
the students' own learning. Results obtained from the progress tests enable the teacher to
become more familiar with the work of each of the students and with the progress of the
class in general. It also aims at stimulating learning and reinforcing what has been taught.
Good performances may act as a mean of encouraging the students, and even poor
performances may act as an incentive-to more work.
According to Baker (1989, p.103), the frequent use of the progress test, as a goad to
encourage application on the part of the learners, can also in theory serve as a basis for
decisions on course content, learner placement and future course design. He also concludes
that the results of a progress test can be used as an indication to parts of the course content,
which have not been mastered by numbers of students and thus need remedial action.
Moreover, a properly written progress test sampling correctly from the course content can
be a pointer to learners which part of the course need more attention, and to course
designers which parts of the course have not been effective. Whereas, Khoa's research
(1999, p. 13) establishes: “A progress test is an ‘on-the-way’ achievement test, which is
linked to the specific content of a particular set "of teaching materials" or particular course
of instruction.
Progress tests are prepared by a teacher and given at the end of a chapter, a course, or a
term. They may also be regarded as similar in nature to achievement tests but narrower and
much more specific in scope. These tests help the teacher to judge the degree of success of
his or hers in teaching and to identify the weaknesses of the learners. The application of
progress tests is gaining force in many universities and colleges in Vietnam nowadays.
They are parts of what is generally known as ''continuous assessment", a process of
assessment which takes into consideration the results scored by students when they did
their progress tests.
2.2.6 Direct versus Indirect Tests
It is pointed out by Hughes (1990:15) that direct testing requires the candidate to perform
precisely the skills that we wish to measure. If we want to know how well the candidate
can write compositions, we ask them to write compositions. If we want to know how well
they pronounce words, we ask them to speak. The tasks, and the texts which are used,
should be as authentic as possible. There is a fact that the tasks cannot be really authentic.
9
Nevertheless, the effort is to make them as realistic as possible. Direct testing is easier to
design when it is intended to measure the productive skills of speaking and writing since
the very acts of speaking and writing provide us with information about the candidate’s
ability. With listening and reading it is necessary to get candidates not only to listen or read
but also to demonstrate that they have done this successfully. He also indicates several
attractions of direct testing. Firstly, if teachers want to assess pupils’ ability, it is relatively
straightforward to create the conditions, which will elicit the behavior based on judgments.
Secondly, in his opinion at least in the case of the productive skills, the assessment and
interpretation of students’ performance is quite straight - forward. Thirdly, there is likely to
be a helpful backwash effect since practice for the test involves the practice of the skills
that we want to encourage.
By contrast, indirect testing tries to measure the abilities that “underlie” the skills in which
we are interested (Hughes, 1990:15). One section of the TOEFL is considered an indirect
measure of writing ability where the candidate has to identify which of the underlined
elements is erroneous or inappropriate in formal Standard English. Another example of
indirect testing id Lado’s (1961) proposes methods of testing pronunciation ability by a
paper and pencil test in which the candidate has to identify pairs of words, which rhyme
with each other. The main problem with indirect tests is that the relationship between
language performance and skill performance in which we are usually interested tends to be
rather weak in strength and uncertain in nature. We do not know enough about the
component parts of composition writing to predict accurate composition writing ability
from scores on tests that measure the abilities, which we believe underlies it. We may
construct tests of grammar, vocabulary, discourse markers, handwriting, and punctuation.
Still we will not be able to predict accurately scores on compositions even if we make sure
of the representation of the composition scores by taking many samples.
2.2.7 Discrete point verse integrative testing
According to Hughes (1990:16), “Discrete point testing refers to the testing of one element
at a time, item”, which means the test involves a series of items and each item tests a
particular grammatical structure. On the contrary, integrative testing requires the candidate
to combine many language elements in the completion of a task involving writing a
composition, taking notes while listening to a lecture, taking a dictation, or completing a
10
cloze passage. Henning (1987) shares with Hughes the idea that discrete point tests will
usually be indirect, while integrative tests will tent to be direct. However, some integrative
testing methods, such as the cloze procedure, are indirect. Similarly, he stresses that the
distinction between discrete point and integrative was tests originated by John and Carroll
(1961). Discrete point tests are designed to measure knowledge or performance in very
restricted area of the target language. On the other hand, integrative tests are said to tap a
greater variety of language abilities. Moreover, Henning (1987) offers examples of
integrative tests such as random cloze dictation, oral interview, and oral imitation tasks.
2.2.8 Norm – Referenced versus Criterion – Referenced Testing
Imagine that a reading test is administered to an individual student. When teachers use
questions to see how the students perform the test, they may be given two kinds of
answers. The first kind would be that the student obtained a score that placed her or him in
the top ten per cent of candidates who have taken that test, or in the bottom five percent; or
that she or he did better than sixty percent of those who took it. Hughes (1990:17) defined,
“A test which is designed to give this kind of information is said to be norm – referenced.”
According to Henning (1987), a norm – referenced test must have been administered to a
large sample of people. For the purpose of language testing and testing in general, norm –
reference tests also have strengths and weaknesses. Positively, the comparison can easily
be made with the performance or achievement of a large population of students.
Negatively, norm – referenced tests are usually valid only with the population on which
they have been normed.
Criterion–referenced tests are not without their share of weakness. The objectives of
criterion – referenced tests are often too limited and restrictive (Henning, 1987: 7). The
purpose of criterion – referenced tests is to classify people according to the fact that
whether or not they are able to perform some task or set of tasks satisfactorily. Moreover,
the test must match teaching objectives perfectly, so that any tendency of the field of
language measurement, criterion tests possesses two positive virtues: they are helpful in clarifying
objectives and they motivate students to a setting standard in terms of what they can do.
2.2.9 Objective Testing versus Subjective Testing
11
The difference between objective testing and subjective testing is that of scoring. If no
judgment is required on the part of the scorer, then the scoring is objective. A multiple–
choice item test, with the correct responses unambiguously identified, would be a case to
point. If judgment is called for, the scoring is said to be subjective. There are different
degrees of subjectivity in testing. The impressionistic scoring of a composition may be
considered more subjective than the scoring of short answers in response to questions on a
reading tsak. In Oller’s point of view (1979), many tests, such as cloze tests, “lie
somewhere between subjectivity and objectivity”. As a result, many testers are seeking
after objectivity in scoring not only for the sake of objectivity itself, but also for the great
reliability it brings.
2.2.10 Communicative Language Testing
In recent years, in parallel with the development of communicative language teaching
(CLT), communicative language testing has been the focus of a great number of researches
on language testing. Discussions have been centered on the desirability of measuring the
ability to take part in acts of communication. In sum, it is assumed that the main function
of language is to enable people to communicate with each other in society. As a result,
testing language ability is but testing communicative ability (including reading and
listening, the two receptive skills necessary for the process of communication, a two-way
process (Khoa, 1999). Communicative language testing may embrace a number of testing
approaches such as direct versus indirect testing, objective versus objective testing and etc.
Based upon the theory language ability is a complex and multifaceted construct. Bachman
(1991, p.678) proposes the following characteristics or communicative tests: “First, such
tests create an “information gap," requiring test takers to process complementary
information through the use of multiple sources of input. Test takers, for example, might
be required to perform a writing task that is based on input from both a short recorded
lecture and a reading passage on the same topic. A second characteristic is that of task
dependency, with tasks in one section of the test building upon the content of earlier
sections, including the test taker's answers to those sections. Third, communicative tests
can be characterized by their integration of test tasks and content within a given domain of
discourse. Finally, communicative tests attempt to measure a much broader range of
language abilities including knowledge of cohesion, functions, and sociolinguistic
12
appropriateness than did earlier tests, which tended to focus on the formal aspects of the
language grammar, vocabulary, and pronunciation.”
2.3 CHARACTERISTICS OF A GOOD TEST
In order to make a well – designed test, teachers have to take into consideration the various
factors such as the purpose of a test, the content of the syllabus, the students’ background
and so on. In addition to these factors, test characteristics play a very important role in
constructing a good test. According to a number of leading scholars in testing as Valette
(1977), Harrison (1983), Weir (1990), Carroll and Hall (1985), and Brown (1994) all good
tests have four main characteristics as: Validity, reliability, practicality, discrimination
2.3.1 Validity
2.3.1.1 Construct validity
Construct validity is defined by Anastasi (1982: 144) as “the extent to which the test may
said to measure a theoretical construct or trait. Each construct is developed to explain and
organize observed response consistencies. It derives from establishing inter-relationships
among behavioral measures focusing on a broader; more enduring and more abstract kind
of behavioral description construct validation requires the gradual accumulation of
information from a variety of source. Any data throwing light on the nature of the trait
under consideration for the conditions affecting its development and manifestations are
grist for this validity mill.”
Construct validity is viewed from a purely statistical perspective in much of the recent
American Bachman and Palmer (198l) literature. It is seen principle as a matter of the
posterior statistical validation of whether a test has measured a construct that has a reality
dependence of other constructs.
2.3.1.2 Content validity
The more a test simulates the dimensions of observable performance and accords with
what is known about that performance, the more likely it is to have content and construct
validity. According to Kelly (1978:8), content validity seems “an almost and completely
overlapping concept “with construct validity and for Moller (1982: 68), “the distinction
between construct and content validity language proficiency.” Anastasi (1982: 131) defines
13
content validity as “essentially the systematic examination of the test content to determine
whether it covers a representative sample of the behavior domain to be measured.” She
shows a fact of useful guideline for establishing content validity:
- The behavior domain to be tested must be systematically analyzed to make certain that
major aspects are covered by the test items with correct proportions:
- The domain under consideration should be fully described in advance, rather than being
defined after the test has been prepared.
- The content validity depends on the relevance of the individual test relevance of item content.
2.3.1.3 Face validity
Anastasi (1982:136) points out that face validity is not validity in the technical sense; it
refers, not to what the test actually measures, but to what it appears who take it, the
administrative personnel who decide on its use and other technically untrained observers.
Fundamentally, the question of face validity concerns report and public relations. Lado
(1961), Davies (1968), Ingram (1977), Palmer (1981), and Bachman and Palmer (1981)
have all discounted the value of face validity. If a test does not have face validity though, it
may not be acceptable to the students taking it, or the teachers using it. If the students do
not accept it as valid, their adverse reaction to it may mean that they do not perform in a
way that truly reflects their ability. Anastasi (1982:136) takes a similar line “Certainly if
test content appears irrelevant, inappropriate, silly or childish, the result will be poor co-
operation, regardless of the actual validity of the test. Especially in adult testing, it is not
sufficient for a test to be objectively valid. It also needs face validity to function effectively
in practical situations.”
2.3.1.4 Backwash validity
Language teachers operating in a communicative frame work normally attempt to equip
students with skills that are judged relevant to present or future needs, and to the extent
that tests are designed to reflect these, the closer the relationship between the test and the
teaching that precedes it, the more the test is likely to enhance construct validity. A
suitable criterion for judging communicative tests in the future might well be the degree to
which they satisfy students, teachers, and future users of test results, as judged by some
systematic attempt to gather data on the perceived validity of the test. If the first stage, with
14
Không có nhận xét nào:
Đăng nhận xét