This volume resulted from a call for papers to “... explore the state of the art of software quality assurance, with particular emphasis on testing to measure quality.” It is my belief that software testing as a discipline is ripe for theoretical breakthroughs. Researchers are considering the right questions, and there are promising new approaches and exciting new results. It seems that new understanding of the testing process can lead to practical tools and techniques that revolutionize software development. I don’t believe that testing will become easier or cheaper; rather, it will be more rational in the sense that expending effort will more dependably lead to better software. In this introductory essay I provide a personal view of testing, testing research, and their roles in software quality assurance.
KeywordsDefect Detection Test Point Random Testing Software Testing Software Quality
This is a preview of subscription content, log in to check access
Unable to display preview. Download preview PDF.
© Kluwer Academic Publishers 1997
Authors and Affiliations
- 1.Center for Software Quality Research, Department of Computer SciencePortland State UniversityPortlandUSA
“There is a huge value in learning with instant feedback,” Dr. Agarwal said. “Students are telling us they learn much better with instant feedback.”
But skeptics say the automated system is no match for live teachers. One longtime critic, Les Perelman, has drawn national attention several times for putting together nonsense essays that have fooled software grading programs into giving high marks. He has also been highly critical of studies that purport to show that the software compares well to human graders.
“My first and greatest objection to the research is that they did not have any valid statistical test comparing the software directly to human graders,” said Mr. Perelman, a retired director of writing and a current researcher at M.I.T.
He is among a group of educators who last month began circulating a petition opposing automated assessment software. The group, which calls itself Professionals Against Machine Scoring of Student Essays in High-Stakes Assessment, has collected nearly 2,000 signatures, including some from luminaries like .
“Let’s face the realities of automatic essay scoring,” the group’s statement reads in part. “Computers cannot ‘read.’ They cannot measure the essentials of effective written communication: accuracy, reasoning, adequacy of evidence, good sense, ethical stance, convincing argument, meaningful organization, clarity, and veracity, among others.”
But EdX expects its software to be adopted widely by schools and universities. EdX offers free online classes from Harvard, M.I.T. and the ; this fall, it will add classes from Wellesley, Georgetown and the . In all, 12 universities participate in EdX, which offers certificates for course completion and has said that it plans to continue to expand next year, including adding international schools.
The EdX assessment tool requires human teachers, or graders, to first grade 100 essays or essay questions. The system then uses a variety of machine-learning techniques to train itself to be able to grade any number of essays or answers automatically and almost instantaneously.
The software will assign a grade depending on the scoring system created by the teacher, whether it is a letter grade or numerical rank. It will also provide general feedback, like telling a student whether an answer was on topic or not.
Dr. Agarwal said he believed that the software was nearing the capability of human grading.
“This is machine learning and there is a long way to go, but it’s good enough and the upside is huge,” he said. “We found that the quality of the grading is similar to the variation you find from instructor to instructor.”
EdX is not the first to use automated assessment technology, which dates to early mainframe computers in the 1960s. There is now a range of companies offering commercial programs to grade written test answers, and four states — , , and — are using some form of the technology in secondary schools. A fifth, , has experimented with it. In some cases the software is used as a “second reader,” to check the reliability of the human graders.
But the growing influence of the EdX consortium to set standards is likely to give the technology a boost. On Tuesday, Stanford announced that it would work with EdX to develop a joint educational system that will incorporate the automated assessment technology.
Two start-ups, Coursera and Udacity, recently founded by Stanford faculty members to create “massive open online courses,” or MOOCs, are also committed to automated assessment systems because of the value of instant feedback.
“It allows students to get immediate feedback on their work, so that learning turns into a game, with students naturally gravitating toward resubmitting the work until they get it right,” said Daphne Koller, a computer scientist and a founder of Coursera.
Last year the Hewlett Foundation, a grant-making organization set up by one of the founders and his wife, sponsored two $100,000 prizes aimed at improving software that grades essays and short answers. More than 150 teams entered each category. A winner of one of the Hewlett contests, Vik Paruchuri, was hired by EdX to help design its assessment software.
“One of our focuses is to help kids learn how to think critically,” said Victor Vuchic, a program officer at the Hewlett Foundation. “It’s probably impossible to do that with multiple-choice tests. The challenge is that this requires human graders, and so they cost a lot more and they take a lot more time.”
Mark D. Shermis, a professor at the University of Akron in , supervised the Hewlett Foundation’s contest on automated essay scoring and wrote a paper about the experiment. In his view, the technology — though imperfect — has a place in educational settings.
With increasingly large classes, it is impossible for most teachers to give students meaningful feedback on writing assignments, he said. Plus, he noted, critics of the technology have tended to come from the nation’s best universities, where the level of pedagogy is much better than at most schools.
“Often they come from very prestigious institutions where, in fact, they do a much better job of providing feedback than a machine ever could,” Dr. Shermis said. “There seems to be a lack of appreciation of what is actually going on in the real world.”Continue reading the main story