PG NewsPG delivery
Pittsburgh Post-Gazette Home Page
PG News: Nation and World, Region and State, Neighborhoods, Business, Sports, Health and Science, Magazine, Forum
Sports: Headlines, Steelers, Pirates, Penguins, Collegiate, Scholastic
Lifestyle: Columnists, Food, Homes, Restaurants, Gardening, Travel, SEEN, Consumer, Pets
Arts and Entertainment: Movies, TV, Music, Books, Crossword, Lottery
Photo Journal: Post-Gazette photos
AP Wire: News and sports from the Associated Press
Business: Business: Business and Technology News, Personal Business, Consumer, Interact, Stock Quotes, PG Benchmarks, PG on Wheels
Classifieds: Jobs, Real Estate, Automotive, Celebrations and other Post-Gazette Classifieds
Web Extras: Marketplace, Bridal, Headlines by Email, Postcards
Weather: AccuWeather Forecast, Conditions, National Weather, Almanac
Health & Science: Health, Science and Environment
Search: Search by keyword or date
PG Store: Pittsburgh Post-Gazette merchandise
PG Delivery: Home Delivery, Back Copies, Mail Subscriptions


Headlines by E-mail

Headlines Region & State Neighborhoods Business
Sports Health & Science Magazine Forum

PG writers take IntelliMetric software for a test drive

Sunday, December 16, 2001

By Eleanor Chute, Post-Gazette Education Writer

As professional writers, we thought we might be in a reasonable position to test IntelliMetric, the artificial intelligence system being tried by the Pennsylvania Department of Education to grade state writing tests.


This is the essay submitted for computerized grading by staff writer Eleanor Chute. She tried to write at a high school grade level. The program gave her 4 of 6 possible points:

Reaching a conclusion on whether people are more respectful of one another's differences is a bit like looking at a glass of water and determining whether it is half empty or half full. Yes, some people are more respectful some of the time. No, some people are less respectful some of the time.

It would be easy to see the glass as half empty in light of the bombings in New York and Washington, D.C., on Sept. 11, 2001. The perpetrators were intolerant of religious differences and lacked regard for innocent human lives. The assailants turned their Islamic faith into a weapon of hate. After the bombings, some Americans showed that they, too, were intolerant of others as beatings of Arabs were reported throughout the nation. Some people were suspicious of anyone who looked different from themselves.

But the glass also could be viewed as half full. After the bombings, courageous firefighters, police officers and ordinary citizens rushed to help. People helped the handicapped or injured down long flights of stairs at risk to themselves. Rescuers died when the twin towers collapsed. People throughout America helped retrieve bodies or sent money for relief to the survivors.

While Sept. 11, 2001, was just one day, it provided a litmus test for how people treat and respect one another. The events showed how the best and worst of people can come to the top. Laws, of course, try to inhibit the worst of human behavior. However, the events of Sept. 11, 2001, revealed what was in people's hearts, not just what they were forbidden legally to do.

The events highlighted the difficulty of painting all people with one brush. If this writer were to argue that people are more respectful of one another's differences, then this writer wouldn't be recognizing the very differences that are to be respected.

Differences exist, and sometimes those differences aren't pretty. However, the respect of these differences enables each person to determine for himself or herself whether the glass is half empty or half full.


A company official gave us a password so we could test writing based on the sample topic on the company's Internet site.

Here's the topic:

"Everywhere we see clear and positive signs that people are becoming more respectful of one another's differences.

"Proponents of this position argue that there are many signs around us that point to our growing acceptance of one another's differences. Opponents of this position argue that there is little evidence that we have become more accepting of one another's differences.

"Write an essay for a classroom instructor in which you take a position on whether individuals are becoming more respectful of one another's differences. Be sure to defend your position with logical arguments and relevant examples."

I tried to write at a high school level, but I have to confess that the machine gave me only 4 of 6 points. Maybe I should have taken the test more seriously. Maybe I needed more inspiration. Maybe I was having a bad test day.

I followed the rules: Use 300 to 600 words. Don't do research. Use multiple paragraphs.

The directions said the essay would be "assessed on your ability to express, organize and support opinions and ideas rather than the position you express."

I submitted the essay to the computer, and in seconds, it gave this evaluation:

"An adequate essay with an organizational pattern that is evident and shows reasonable development of ideas. Word choice is adequate. The writer shows an adequate command of the language, with some errors in usage and/or mechanics."

Usage and mechanics? The report didn't say what the problems were, and grammar experts here couldn't find any problems. I made some changes to try to see if the machine had grammar preferences, adding and removing hyphens from "half full" and "half empty" and taking "but" off as the first word of a sentence.

It still didn't change my grade.

Larry Bosley, vice president for territory for Vantage Learning, which makes IntelliMetric, said that everyone getting a 4 received the same comments because such problems are characteristic of any 4 paper.

Human graders, he noted, use the same prepared remarks when evaluating essays.

Bosley, who read my essay, said he thought a human grader would have given it the same score and standard remarks. "There was not a lot of real strong development through it," he said, adding encouragingly, "I'm sure if you would have sat and worked on it, you would have gotten a 6."

Five colleagues also wrote essays on respect, each with multiple paragraphs, and all five were deemed "unscorable" by IntelliMetric on the first grading. Most were told that a section of the essay "appears to be out of context."

When they were submitted again without paragraph indentations, four were scorable and one still wasn't.

Three of the essays were serious, scoring two 6's and one 5. One obviously fictionalized account scored a 4. The unscorable one -- with or without paragraphs -- was an unconventional takeoff on Rodney Dangerfield.

In the real world, the unscorable essays would be forwarded to a human to grade. Unscorable essays aren't necessarily worse, it just means they are different from the essays the machine was trained to score, Bosley said.

The machine finds about 3 percent to 7 percent of the essays too unusual to grade, said Scott Elliott, Vantage Learning chief operating officer.

The IntelliMetric Web site also states that it can catch 95 percent of "inauthentic responses."

But one of the Post-Gazette essays was a fictionalized account of a "Diversity Club" at a school, and the machine scored it even though it contained an account of a chess game between the "jocks and the nerds" that concluded, "Everyone believed that the brainy students would win the chess game, and that the athletic students then would beat them up. Of course, that is exactly what happened, but no one was seriously hurt this time."

To think, I could have written that and still received a 4.

bottom navigation bar Terms of Use  Privacy Policy