Return to the Previous Page

Personal Data for the Taking

A class at Johns Hopkins was able to build detailed dossiers on Baltimore citizens using only public databases.
David Scull for The New York Times
A class at Johns Hopkins was able to build detailed dossiers on Baltimore citizens using only public databases.

By TOM ZELLER Jr.

Published: May 18, 2005

Senator Ted Stevens wanted to know just how much the Internet had turned private lives into open books. So the senator, a Republican from Alaska and the chairman of the Senate Commerce Committee, instructed his staff to steal his identity.

"I regret to say they were successful," the senator reported at a hearing he held last week on data theft.

His staff, Mr. Stevens reported, had come back not just with digital breadcrumbs on the senator, but also with insights on his daughter's rental property and some of the comings and goings of his son, a student in California. "For $65 they were told they could get my Social Security number," he said.

That would not surprise 41 graduate students in a computer security course at Johns Hopkins University. With less money than that, they became mini-data-brokers themselves over the last semester.

They proved what privacy advocates have been saying for years and what Senator Stevens recently learned: all it takes to obtain reams of personal data is Internet access, a few dollars and some spare time.

Working with a strict requirement to use only legal, public sources of information, groups of three to four students set out to vacuum up not just tidbits on citizens of Baltimore, but whole databases: death records, property tax information, campaign donations, occupational license registries. They then cleaned and linked the databases they had collected, making it possible to enter a single name and generate multiple layers of information on individuals. Each group could spend no more than $50.

Although big data brokers can buy the databases they crave - from local governments as well as credit agencies, retail outlets and other sources that students would not have access to - the exercise replicated, on a small scale, the methods of such companies.

They include ChoicePoint and LexisNexis, which have been called before Congress to explain, after thieves stole consumer data from their troves, just what it is they do and whether government oversight is in order. And as concerns over data security mount, inherent conflicts between convenient access to public records and a desire for personal privacy are beginning to show.

The Johns Hopkins project was conceived by Aviel D. Rubin, a professor of computer science and the technical director of the Information Security Institute at the university. He has used his graduate courses before to expose weaknesses in electronic voting technology and other aspects of a society that is increasingly dependent on - and at the mercy of - digital technology. "My expectations were that they would be able to find a lot of information, and in fact they did," he said.

Several groups managed to gather well over a million records, with hundreds of thousands of individuals represented in each database.

"Imagine what they could do if they had money and unlimited time," Dr. Rubin said.

In some instances, students visited local government offices and filed Freedom of Information Act requests for the data - or simply "asked nicely" - sometimes receiving whole databases on a compact disc. In other cases, they wrote special computer scripts, which they used to pick up whole databases from online sources like Maryland's registry of occupational licenses (barbers, architects, plumbers) or from free commercial address databases like Verizon's SuperPages, an online yellow pages directory.

Dr. Rubin said he was pleasantly surprised that his students turned up fewer Social Security numbers than he expected, although he wondered if even the benign tidbits - property details, occupations, political parties - when combined on a single individual, would be troubling to some.

David Albright is one such individual. In a single query, one student group's master database turned up his precise address, his phone number, his occupation (his architect's license expires in November), the name of his wife, their birth dates, the price he and his wife paid for their 2,200-square-foot brick home in 1990, his party registration and the elections he has voted in since 1978.

The query also highlighted the hazards of data aggregation: a gubernatorial campaign donation from 2002 was not made by him, Mr. Albright said, but apparently by another David Albright in Baltimore.

"It's hard to fully digest," Mr. Albright said when contacted by a reporter with these details. Mr. Albright thought that while the individual bits of information weren't that "creepy," their easy aggregation was troubling. "What would be disturbing is if by having all this information consolidated, it made stealing an identity easier," he said. "That would be a concern."

Like any other American, Mr. Albright deposited these tidbits in various databases as he conducted the routine transactions- voting, buying a house, donating money to a campaign - and they became public records. As more of those records are made available on the Internet (a Government Accountability Office study last November estimated that as many as 28 percent of county governments now make public records available online), anyone with Internet access, anywhere in the world, can dig them up.

"I think what this professor and students have done is a powerful object lesson in just how much information there is to be found about most of us online," said Beth Givens, director of the Privacy Rights Clearinghouse in San Diego, "and how difficult it is, how impossible it is, to control what's done with our information."

Journalists, private investigators, law enforcement officials and others who gather background information on individuals for a living tend to view as a boon the migration of public records databases to the Internet, as well as the combination of those records at one-stop shops like ChoicePoint and LexisNexis, which is owned by Reed Elsevier.

But some privacy advocates are arguing that ease of access has a downside, too. Social Security numbers, they say, remain easy to come by, particularly in the thousands of public documents now being scanned and made available online. Social Security numbers present a particular threat because they are the primary identifiers that let thieves open credit lines, apply for loans or otherwise pose as another person.

Betty Ostergren, a former insurance claims supervisor in Virginia, has become an expert in digging up scanned documents and other information from local government Web sites around the country.

"I don't want these records on the Internet," said Ms. Ostergren, whose Web site, the Virginia Watchdog ( www.opcva.com/watchdog ), documents her efforts, complete with defiant instructions on how to find sensitive information on public officials. "I hate to do it," she said, "but I'm trying to get my point across."

That includes the Social Security numbers and signatures of the director of central intelligence, Porter J. Goss, and his wife. They can be found in records made available on a county court Web site in Florida.

David Bloys, a private investigator in Texas, is equally concerned. He has helped draft a bill now before the Texas Legislature that would prohibit the bulk transfer and display over the Internet of documents filed with local government.

There are real dangers involved, Mr. Bloys said, when such information "migrates from practical obscurity inside the four walls of the courthouse to widespread dissemination, aggregation and export across the world via the Internet." However convenient online access has made things for legitimate users, the information is equally convenient for "stalkers, terrorists and identity thieves," he said.

The bill, introduced in Austin by Representative Carl Isett, a Republican, was unanimously approved by the State Affairs Committee on May 3, but did not make the deadline for a House vote. A spokesman said Mr. Isett was seeking to amend another bill with language from his proposal.

And just two weeks ago in Alaska, the American Civil Liberties Union - a strong advocate of openness and access to public documents - took up the cause of Maryjane Hinman, a nurse who had lobbied unsuccessfully to have her home address removed from the state's online registry of occupational licenses.

"We feel that open access to public records is key to a free society," said Jason Brandeis, the A.C.L.U. lawyer handling the suit, which seeks to bar Alaska from disseminating contact information for licensed nurses. "But a balance needs to be struck between the public interest in open access to government information, and the need to protect individual privacy."

Whether such a balance can ever be achieved when so much information is already available is an open question. And some people are troubled by recent trends against access.

"I have no problem with an individual who faces unusual threats from publication of her identity or identifying details being able under the law to seek special exception from openness," said Rebecca Daugherty, the director of the Freedom of Information Service Center for the Reporters Committee for Freedom of the Press in Virginia. "But the secrecy should be the exception not the rule."

Several Johns Hopkins students came to a similar conclusion. Despite their surprise at the number of records they could amass and combine, many still felt that the benefits of openness outweighed the risks.

"If some citizen is concerned about dead people remaining registered to vote, he can simply obtain the database of deaths and the voter registration database and cross-correlate," said 21-year-old Joshua Mason, whose group discovered 1,500 dead people listed as active registered voters. Fifty of those dead people somehow voted in the last election.

"The problem is, we don't know what we want," Dr. Rubin said, referring to the competing social interests in openness and privacy.

"It is clear that there are strong negative consequences to being able to collect and correlate all this information on people," he said, "but it is also possible that the consequences to personal freedom would be worse if it were outlawed."