New registrations are now closed for the 2009 IATEFL site. The forum content is for reference only.

Cardiff Online

Objectivity and Reliability

17 replies [Last post]
silvia_purpuri
silvia_purpuri's picture
Joined: 2009-03-09
User offline. Last seen 1 year 50 weeks ago.

How can testing be efficient, objective, valid and reliable?

Diana
Diana's picture
Joined: 2009-03-10
User offline. Last seen 2 years 23 weeks ago.

Dear Silvia,

Just testing the validity and reliability of your posting - is there a word objectiveness as well as the more usual objectivity? i'm not trying to be funny, I really don't know !

Diana

silvia_purpuri
silvia_purpuri's picture
Joined: 2009-03-09
User offline. Last seen 1 year 50 weeks ago.

Dear Diana,

thank you for posting this message. The word objectiveness does exist (..just checked..), but "objectivity" is probably the best choice, thanks. I'll edit the whole topic as soon as you read this post, so that the discussion can be re-started! Pls confirm!

Best,

Silvia.

 

Olwyn Alexander
Olwyn Alexander's picture
Joined: 2009-03-09
User offline. Last seen 2 years 45 weeks ago.

Dear Silvia,

I don't thnk testing can be all those things all at once. For example validity and reliability have an inverse relation, when one goes up the other usually goes down. So the tester has to work out what is most important and design for that while trying to strike a balance with the other features.

Classroom assessment in EAP can be more authentic and valid than global language exams for example because it can make use of students background knowledge and test more complex language performance whereas the global exams try to exclude background knowledge and minimise complexity in the performance.

Have you tried designing tests for your students? Which aspect did you focus on primarily?

Olwyn

Diana
Diana's picture
Joined: 2009-03-10
User offline. Last seen 2 years 23 weeks ago.

Dear Silvia,

Sorry, I've only just seen this - my own forum (ELT-Man) is hottening up...

As far as reliability is concerned, this word for me has the connotation of something being the same over a long period of time without having to change it too frequently. One can just rely on it without any soul-searching.

Validity is a more abstract concept for me, more a theoretical search for an absolute truth. In my own case, I would be satisfied if my tests were "reliable", this would mean I could use them several times and the results would be meaningful. Whether they gave a "valid" picture of the student's performance - well, I would hope to attain this, but only approximately.

Diana

silvia_purpuri
silvia_purpuri's picture
Joined: 2009-03-09
User offline. Last seen 1 year 50 weeks ago.

Dear Olwyn,

thanks for your contribution. And thank you for raising the question.

May I ask you to explain why you think validity and reliablility have an inverse relation?

Dear all,

if you had to define the words "validity" and "reliability", how would you define them?

Olwyn Alexander
Olwyn Alexander's picture
Joined: 2009-03-09
User offline. Last seen 2 years 45 weeks ago.

Dear Silvia,

I actually thought quite hard about this when I was researching assessment for EAP for a handbook for teachers. I thought it might be worth copying here the part of the chapter that dealt with this (sorry it's quite long):

Validity concerns the evidence that is provided to justify interpretations of test performance and is derived from the test construct, i.e. the underlying assumptions about language and skills on which measurements for a particular test are based. It is important to be sure that the test measures what it claims to measure and that it is appropriate to measure this for the particular test purpose.

Reliability involves the consistency of measurement in different situations and is influenced by factors such as the setting, the different forms of a test and the different raters who score the test. It is important to be sure that students are not disadvantaged by taking one form of the test rather than another or being rated by different people.

Validity and reliability are often discussed together in the literature on testing because changes in one tend to affect the other for a particular test purpose. Their relationship can be illustrated simply using the example of a driving test carried out on a computer simulator rather than on the road. The design of the computer simulation for the test is based on a construct of driving which describes measurable performance on which to base test tasks. The construct could be very narrow, resulting in a limited set of tasks, e.g., each driver sees a road moving past him on the computer screen and by operating the controls he is able to stay on the correct side of the road and turn a corner or stop when so instructed. If each driver saw the same road with the same sequence of corners and stops, this performance could be measured very reliably by simply counting the number of times each driver drifted to the wrong side or failed to turn a corner or stop. However, this construct under-represents the complexity of authentic driving performance and reduces validity. These test results could not be used with confidence to make predictions about a driver’s safety on the real road.

More complexity could be added to the computer simulation to widen the construct and make it a better representation of the authentic driving situation by including random pedestrians, traffic controls or other potential hazards. Validity would be improved because predictions on the basis of the test results could be made with more confidence. However, this new complexity would make it more difficult to ensure that if the same driver was tested on two different days, he would achieve the same result. Drivers with different skills profiles would perform better on different parts of the test. Thus the reliability of the test would be reduced. The more closely the construct matches the complexity of the authentic situation the more confidence there can be in predicting a driver’s safety on the road but the less reliable the measurement is likely to be. This is also the case for language testing.

Olwyn

silvia_purpuri
silvia_purpuri's picture
Joined: 2009-03-09
User offline. Last seen 1 year 50 weeks ago.

Dear Olwyn,

thanks for sharing your experience with us once again. It's all very interesting and clear. Could you tell me where I can buy your book on line - is it EAP Essentials: a teacher's guide to principles and practice, published by Garnet Education: Reading, UK? I would love to read it!

Silvia

Olwyn Alexander
Olwyn Alexander's picture
Joined: 2009-03-09
User offline. Last seen 2 years 45 weeks ago.

Hi Silvia,

Yes, that's the one. It's available on Amazon.com (or Amazon.co.uk and presumably on other local amazon sites). It can probably be ordered through your local textbook store as well.

I hope you enjoy reading it.

Olwyn

Pete MacKichan
Pete MacKichan's picture
Joined: 2009-03-11
User offline. Last seen 2 years 43 weeks ago.

Hi,

 

Silvia asked "How can testing be efficient, objective, valid and reliable?" Put simply, I don't think it can :-)

I think that the need to have testing instruments that are efficient and reliable - in other words that can be completed quickly and cheaply (relatively) means that validity is a pretty tall order.

In the field of EAP, how do we create a test that is reliable and yet can be used by students from different subject areas? It seems that the two common approaches - "familiar content" and "obscure" content -  undermine the validity of the test for both test takers (who can't see the connection between the test and their linguistic needs/goals) and the test users (who can't see the connection between the language skills tested and the language skills needed). Is there an alternative to these approaches?

How do we make the tasks valid? In real life students reading is connected to a productive task - for example, an essay or a presentation. How can we build in some kind of validity without creating some form of double jeopardy in which a reading test becomes a test of writing?

And (getting carried away) how do we test reading comprehension? Is it actually possible to test receptive skill

Cheers,

Pete

Olwyn Alexander
Olwyn Alexander's picture
Joined: 2009-03-09
User offline. Last seen 2 years 45 weeks ago.

Hi Pete,

You say 'How can we build in some kind of validity without creating some form of
double jeopardy in which a reading test becomes a test of writing?
' but I think that is exactly what we should do so we can approach more closely to the authentic situation where skills are integrated and students get ideas from the texts they read and refashion them to answer different questions.

It is possible to test reading, critical thinking, note-taking and writing with one text.

Olwyn

Kevin Westbrook
Kevin Westbrook's picture
Joined: 2009-03-11
User offline. Last seen 2 years 41 weeks ago.

I agree Olwyn, In EAP, for example, reading is often done as the prelude to writing and it could even be said that without reflecting these interrelationships, it is very difficult to make it valid. However, Pete is right in that the test design has to be done in such a way that you still enable the student to produce something that can be assessed even if there is this interdependence between the two tasks. My wife is currently trying to design an entrance test for her university, and the number of issues is horrendous.

Kevin

 

Pete MacKichan
Pete MacKichan's picture
Joined: 2009-03-11
User offline. Last seen 2 years 43 weeks ago.

Hi,

Don't get me wrong, I'm a fan of integrated testing; if we are looking at proficiency testing I don't really think anything else works.

It is strange that integrated testing has fallen into such disfavour - why is this? I have just been looking at test materials from the Institute of Linguists Diploma in English for International Communication - a high level examination that is no longer available. In this test, candidates are given texts which they have to use as the basis for a presentation and follow-up report, a debate and written communication - there is no discrete testing of receptive skills. This kind of approach to language testing seems to me to have a lot of validity - and I can't actually see why it should not also be reliable given appropriate standardisation and monitoring of test markers.

Certainly this kind of testing is more expensive to run, but are there any other arguments against it?

Pete

Olwyn Alexander
Olwyn Alexander's picture
Joined: 2009-03-09
User offline. Last seen 2 years 45 weeks ago.

Hi Pete,

My understanding of the new look TOEFL (and I hasten to add I haven't taught to it) is that it has an integrated reading/listening/summary part. Students read a text and then listen to a brief talk which puts a different point of view. They then summarise what they've heard. So this is integration of a kind.

IELTS don't do this (although they used to do something like it I think) beause it is too hard to ensure reliability in a large scale test.

However, those of us lucky enough to work in centres where IELTS/TOEFL are not the exit exams but we design our own coursework and exams, have the luxury of adopting a more integrated approach to assessment. This might involve several linked submissions over a semester (for example). It also allows the student to work in their specialist subject area, which is more appropriate and motivating for them.

Olwyn

andyb
andyb's picture
Joined: 2009-03-10
User offline. Last seen 2 years 45 weeks ago.

Hi. I'm not sure that this kind of integration is good. Integrated testing of this sort fell out of favour (I think) because the result of the test gave you an integrated result. I.e., it shows that you can read a bit, listen a bit and summarise a bit mashed together at level X. What does that actually mean? This is why more transparent testing and bands came about, which describe people performing a task rather than correlating performance with some norm-referenced criteria.
Plus of course there is the need for fresh starts and coping with affect, whereby a single topic might disadvantage some people who either know or care too little or too much about that single topic. I suppose with a tight EAP brief, you might get away with it, but even so, so many different types of engineering or plant biology specialiisms are there?

Kevin Westbrook
Kevin Westbrook's picture
Joined: 2009-03-11
User offline. Last seen 2 years 41 weeks ago.

Hi andyb,

You wrote:

the result of the test gave you an integrated result. 

Only if you design it that way. There is nothing stopping the test designer allocating marks to each section, requiring an overall mark to pass with minimum marks in each section for example.

The problem with discrete testing (I'm not sure what is more transparent about it) is that it only tells me how good the person is at that test. In practice, nobody exists in a purely "reading" environment, for example, and it is actually close to impossible to test only one thing anyway. In an EAP environment, it is actually more useful to me to know that they can do all those things "a bit" than to know they passed a test that actually doesn't give me any information on their ability to cope with study (IELTS certainly doesn't, for example).

That doesn't mean that there aren't areas where the type of testing you favour wouldn't be superior, or where the kind of integrated testing I describe wouldn't be inappropriate. You really need to consider the goal of the test first.

Regards,

Kevin

Pete MacKichan
Pete MacKichan's picture
Joined: 2009-03-11
User offline. Last seen 2 years 43 weeks ago.

Sure it gives you an integrated result and sure there are the problems that you mention.

Language seems to me a pretty integrated thing - I do find it hard to think of reading as a discrete skill. However, more problematic for me is that this more 'transparent' approach is probably just as murky as any of the pervious methods used; testing reading is a discrete skill is all very well but how do you actually do it? Certainly tests of reading do test something, but do they test understanding? It seems to me that the only way that you can determine whether someone has understood a text is by asking them to do something with its content - everything else is just testing how well people are able to perform superficial text-based examination tasks.

Pete

fazira Kakzhanova
fazira Kakzhanova's picture
Joined: 2009-03-16
User offline. Last seen 2 years 43 weeks ago.

 

 Hi All,

 if the main aim of teaching of  foreign languages is to develop

  communicative competence , what are tests for?

If a student find rightly  that "have written " is the Perfect aspect,

 can I say that he/she knows language. Language is for expressing ideas,

exchaning opinions, judgements. Can  skill  tests really define 

 level  communicative competence?

What criteria and principles are on the basis of elicitation technics?

 

 

 

 

 

Bookmark and Share