! Original Post on The Wa-shiton Post, behind a paywall !
Hackers breached the Internet Archive, whose outsize cultural importance belies a small budget and lean infrastructure.
By Daniel Wu
October 18, 2024 at 6:00 a.m. EDT
There are few organizations dedicated to the gargantuan task of preserving the vast, ever-shifting record of human activity that is the internet. The largest such record belongs to a nonprofit based in an old church in San Francisco that operates on a smaller annual budget than the D.C. Public Library.
It is currently under siege.
Hackers struck the Internet Archive last week, leaking the information of millions of users and defacing it with a message taunting the nonprofit’s website for running on a shoestring budget. To prevent further leaks, the Internet Archive’s team took the site, including its popular Wayback Machine, offline. It’s the first time in its almost 30-year history that it has suffered an outage of longer than a few hours, founder Brewster Kahle told The Washington Post. Most of the site remains offline a week later.
The cyberattack kicked off a frenzied race to restore access to the Internet Archive and the more than 900 billion webpages it preserves on the Wayback Machine, its archival service. It was also a rude awakening. To Kahle, that hackers would set their sights on a free repository of digital history, seemingly without an agenda or a ransom, is hard to imagine.
“I don’t know,” Kahle said. “Why kick the cat?”
The attack drew allusions online to the burning of the Library of Alexandria, the sprawling repository of knowledge in ancient Egypt that writers of the time claim Julius Caesar accidentally torched. It’s a dramatic comparison, but most agree that the Internet Archive has played a foundational role in the upkeep of online history. Other web archival services exist, but the Internet Archive, which was founded in 1996, maintains the largest and oldest archive of the internet.
If you’ve ever had to search for an old or defunct website, you’ve probably been directed to the Internet Archive or its Wayback Machine. The organization archives websites cited by editors on Wikipedia. Attorneys plumb the Wayback Machine for evidence to use in court. The Internet Archive was among several groups that preserved deleted tweets by former president Donald Trump, it wrote in 2017.
Kahle and his team see the mission of the Internet Archive as a noble one — to build a “library of everything” and ensure records are kept in an online environment where websites change and disappear by the day.
“We’re all dreamers,” said Chris Freeland, the Internet Archive’s director of library services. “We believe in the mission of the Internet Archive, and we believe in the promise of the internet.”
But the site has, at times, courted controversy. The Internet Archive faces lawsuits from book publishers and music labels brought in 2020 and 2023 for digitizing copyrighted books and music, which the organization has argued should be permissible for noncommercial, archival purposes. Kahle said the hundreds of millions of dollars in penalties from the lawsuits could sink the Internet Archive.
Those lawsuits are ongoing. Now, the Internet Archive has also had to turn its attention to fending off cyberattacks. In May, the Internet Archive was hit with a distributed denial-of-service (DDoS) attack, a fairly common type of internet warfare that involves flooding a target site with fake traffic. The archive experienced intermittent outages as a result. Kahle said it was the first time the site had been targeted in its history.
Last week, the DDoS attacks resumed. But things escalated quickly. On Oct. 9, in a separate, more critical security breach, hackers inserted a message on the Internet Archive’s main page bragging they had stolen information from 31 million of its users. Have I Been Pwned, a service that checks for leaked emails and passwords online, confirmed that it received a database of email addresses and passwords and verified that they were stolen from the Internet Archive, cybersecurity news site BleepingComputer reported.
Scott Helme, a cybersecurity researcher, told The Post that if hackers compromised the Internet Archive to the extent that they were able to deface the website, they could have done much worse.
“With that level of access, genuinely, they could have done anything,” Helme said. “They could have put inappropriate materials. If they were politically motivated, they could have used the platform to make statements … they could have used the website to distribute malware.”
It was a five-alarm fire for Kahle, who quickly decided to take the site offline. It was chilling, he said, to read the hackers’ message on his website: “Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach?”
“That’s been heard loud and clear,” Kahle said. “They’re not wrong.”
Kahle and his team have spent the week since racing to identify and fix the vulnerabilities that left the Internet Archive open to attack. The organization has “industry standard” security systems, Kahle said, but he added that, until this year, the group had largely stayed out of the crosshairs of cybercriminals. Kahle said he’d opted not to prioritize additional investments in cybersecurity out of the Internet Archive’s limited budget of around $20 million to $30 million a year.
The group is also puzzling over why it came under attack. The Internet Archive’s preserved data was not compromised during the hack, Kahle said, and the team hasn’t faced a ransom demand.
A hacking group on X claimed responsibility for the DDoS attacks. But no one has reliably claimed the defacement and data breach that forced the Internet Archive to sequester itself, said Helme, the cybersecurity researcher. He added that the hackers’ decision to alert the Internet Archive of their intrusion and send the stolen data to Have I Been Pwned, the monitoring service, could imply they didn’t have further intentions with it.
“It could have just been someone flexing their muscles,” Helme said.
If the Internet Archive was a victim of circumstance, its staff — and a large contingent of supporters online — are angry that hackers chose the nonprofit as a target. Users on X noted the hack’s proximity to the U.S. presidential election and compared it to “pulling off a bank heist at a public library.”
“Why hack in?” Kahle said. “So that you can go and, I don’t know, read a book?”
The Internet Archive is not the only library service to have suffered a hacking attack in the past year. Cyberattacks halted the operations of the Seattle Public Library in May and the Calgary Public Library last week, the Seattle Times and CBC reported. The British Library is still reeling from a debilitating cyberattack last October that left some archives and school learning resources unavailable for almost a year.
“We’re facing these same threats,” Freeland, the Internet Archive director of library services, said. “We are all the same library system under the same attacks.”
The Internet Archive and its Wayback Machine service were offline for several days, during which the organization’s vast catalogue of webpages and other archives, including music, books, software and imagery, was inaccessible. The organization restored a read-only version of the Wayback Machine, Kahle said Monday on X, but is still working to bring the rest of the organization’s archives back online.
“People want access to the past,” Kahle said. “And our job is to help deliver it and … to be always there.”
Helme said the episode demonstrates the vulnerability of nonprofit services like the Internet Archive — and of the larger ecosystem of information online that depends on them.
“Perhaps they’ll find some more funding now that all of these headlines have happened,” Helme said. “And people suddenly realize how bad it would be if they were gone.”
Easier to rewrite history if TWBM is down
That could be a reason. Especially now close during election in US… Or they inserted malware somewhere, without knowing. Who knows…
It’s up. It’s most of the rest of the archive that’s down.
Why is Chevy Chase in the thumbnail?
10
For anyone who wants to support their efforts, they have an active paypal..