Well Google search method was just leaked… Wonder if this picked that up before they pulled it.
Can you tell me more about this? Or just a link would be amazing.
I’m worried about what this could mean about further SEO enshittification.
Thank you!
Thanks for the assist!
A quick search indicates that they’ve archived ~100PB of data.
Now I’m trying to come up with a way to archive the internet archive in a peer-to-peer/federated fashion while maintaining fidelity as much as possible…
Torrent?
ia already serves all their uploads as torrents
That wouldn’t distribute the load of storing it though. Anyone on the torrent would need to set aside 100PBs of storage for it, which is clearly never going to happen.
You’d want a federated (or otherwise distributed) storage scheme where thousands of people could each contribute a smaller portion of storage, while also being accessible to any federated client. 100,000 clients each contributing 1TB of storage would be enough to get you one copy of the full data set with no redundancy. Ideally you’d have more than that so that a single node going down doesn’t mean permanent data loss.
That wouldn’t distribute the load of storing it though. Anyone on the torrent would need to set aside 100PBs of storage for it, which is clearly never going to happen.
Torrents are designed for incomplete storage of data. You can store and verify few chunks without any problem.
You’d want a federated (or otherwise distributed) storage scheme where thousands of people could each contribute a smaller portion of storage, while also being accessible to any federated client.
Torrents. You may not have entirety of data, but you can request what you need from swarm. The only limitation is you need to know in which chunk data you need is.
Ideally you’d have more than that so that a single node going down doesn’t mean permanent data loss.
True.
True. Until you responded I actually completely forgot that you can selectively download torrents. Would be nice to not have to manually manage that at the user level though.
Some kind of bespoke torrent client that managed it under the hood could probably work without having to invent your own peer-to-peer protocol for it. I wonder how long it would take to compute the torrent hash values for 100PB of data? :D
Not sure you’d be able to find 100k people to host a 1TB server though. Plus, redundancy would be better anyway since it would provide more download avenues in case some node is slow or has gone down.
Yes, it’s a big ask, because it’s a lot of data. Any distributed solution will require either a large number of people or a huge commitment of storage capacity. Both 100,000 people and 1TB per node is a lot to ask for, but that’s basically the minimum viable level for that much data. Ten million people each committing 50GB would be great, and offer sufficient redundancy that you could lose 80% of the nodes before losing data, but that’s not a realistic number to expect to participate.
It’d be a lot more complicated than that, I think, if one wanted to effectively be able to address it like a file system, as well as holistically verify the integrity of the data and preventing unintentional and unwanted tampering
as well as holistically verify the integrity of the data and preventing unintentional and unwanted tampering
Torrents. Their hashes are derived from hashes of chunks. Just verify chunks.
if one wanted to effectively be able to address it like a file system
Sick. TIL!
Block chain
Overkill chain
That’s what IPFS is for. It’s ideal for that kind of stuff
Can DDOS attacks actually erase/corrupt stored data though? There’s no way they’re running all of this on a single server, with hundreds of PB’s worth of storage, right?
DDOS attacks block connection to the servers, they don’t actually harm the data itself. You could probably overload a server to the point of it shutting down, which might affect data in transit, but data at rest usually wouldn’t be harmed in any way; unless through some freak accident a server crash would render a drive unusable. But even then, servers are usually fully redundant, and have RAID systems in place that mirror the data, so kind of a dual redundancy. Plus actual backups on top of that; though with that amount of data they might have a priority system in place and not everything is fully backed up.
No. It affects availability. Not integrity or confidentiality.
Not technically by itself as far as I know
From what I’ve learned, it is possible to create a vulnerability within the system of a ddos attack would overload and cause a reset or fault. At that point, it’s possible to inject code and initiate a breach or takeover.
I can’t find the documentation on it so… Take it with a grain of salt. I thought I learned about it in college. Unsure.
That list sentence though…
- **“The cyberattacks share the timeline with the legal battle Internet Archive is facing from US book publishers, claiming copyright infringement and seeking combined damages of hundreds of millions of dollars from all libraries.” ** *
i wonder why print is dead
How is print books dead ?
https://www.statista.com/chart/24709/e-book-and-printed-book-penetration/
And that’s only units, in terms of revenue, ebooks is still pocket change in comparison.
i wasn’t speaking in comparison to ebooks. ebooks suck in every way imaginable.
What other long-form text format has beaten print books ?
why are you coming up with these categories? “print is dead” doesn’t mean “because there’s print 2.0 now”
—radio is dead —excuse me, but internet radio is nothing compared to am stations —yeah, obviously people who don’t listen to radio don’t want to listen to radio with extra steps —what other forms of radio has beaten radio?
what are you even
I am trying to understand what’s the argument behind your statement. I mean, there are more books being published than ever and there are more readers than ever. So, I fail to imagine how are books dead. That’s why I am asking these questions.
The argument is that no one reads books anymore. Most media consumed today is in modern video and audio formats like YouTube and podcasts. You shouldn’t compare paper books to ebooks, you should compare them to views on YouTube.
Go offline a couple of days until they are losing interest in DDOS’ing? Would that work?
That just means the DDOSer is taking Internet Archive down without any further work required.
True. That’s not something you want. Could use that downtime for extensive maintenance to roll out a more robust system (they are probably even working on that already in the background). For the end user it doesn’t really make a difference if down because of DDOS or because of maintenance I thought.
Foreign government, moneyed interests, or domestic dipshits, taking all bets.
I’m taking China and/or Russia for $10,000 Alex.
Domestic roscomnadzor paid by China orchestrated by USA. Or paid by USA and orchestrated by China. Either one.
Cloudflare
Someone facing an enormous lawsuit who realized their tweets / claims were accessible and needs to buy time for their legal team.
-me a day or two ago
just 56 for Burkina faso
who was trying to sue it out of existence recently? them.
Wasn’t that Pearson or some other shitty “educational” book publisher?
Corporate espionage is so brutal that state operatives run and hide when they learn who it is. Even law enforcement avoids them.
Barnes & Nobel going rouge.
Really? I thought they were more of a chartreuse myself…
But why a reddish kind of powder for your cheeks/lips, specifically?
To feign embarrassment.
Rogue*
Loooool
Warner bros for 10k please
if you have a spare corner in your server, host the archive warrior and help them out.
Is that the ArchiveTeam tool or something different? I can spare a VM for them.
Let’s fediverse archive.org!
yes! its the archive team warrior.
It’s archive team, not archive.org
To contribute: http://warrior.archiveteam.org/
Help? https://wiki.archiveteam.org/
Background on the project: https://netzpolitik.org/2023/archive-team-shutdowns-dont-stop-during-the-weekends/
wth, no docker?..
Alternatively, you may run the projects using the Docker warrior instance without the VM appliance. For further info, see our GitHub repository for Readme instructions. If you have any issues or feedback, chat on #warrior on hackint.
Hold my anchor, I’m going in.
Can we federate the internet archive…?
Sure thing, got room for 100PB?
Collectively we probably do
I could spare some hundreds of Gigs but I don’t really have the bandwidth to support it, personally.
Spooling up 10x VM, I have 50 terabyte of ammo at 10gbit. Give me the one-liner install and run.
Maybe temporarily switch to a different address? And leave fake addresses to catch the ddos. Then just keep changing addresses using an IPFS system to front-end the new address?
There’s no way to do this and let visitors know what the new addresses are, without also giving the new addresses to the attackers.
IPFS is a real solution though
Lol, no, the Blockchain has never been a “real solution”, and it never will be.
How is anyone still on the Web3 hype train?
IPFS is not built on a blockchain
Yeah, it’s just a modern peer-to-peer content distribution network
Describing a high intensity DDOS attack on one of the world’s most important resources as simply “mean” is unironically one of the funniest things I’ve read this year.
Hope they get some support soon.
Can someone eli5 to me why it’s hard to track down these dipshits ? Even if it’s a distributed attack, picking a single IP and doing a lookup for the domain name and checking with the registrar might actually reveal their identity right ? Of course I’m guessing law enforcement needs to be involved to force registrars to give up that info if it’s not publicly available? Are there laws that say a ddos is illegal ?
DDoS attacks are performed by botnets. What is a botnet? Well, you know about viruses etc, right? Your PC gets infected and it becomes a part of the botnet. Now police do the investigation, they look up IPs and they see YOUR IP and come to YOUR house. See what the problem is?
And, frankly, your PC doesn’t even have to be infected to become a part of an attack. There are plenty of hacked web sites, which still look like nothing has changed, but they will contain a hidden JavaScript code which will force your browser to flood the victim. Again, the police will only find YOU.
There is no domain name associated with the IPs.
Most importantly, usually, DDoS attack use infected devices (PCs, mobile phones, smart fridges, shady browser addons etc…) to get so many ip addresses and devices/locations and attack from everywhere at once.
most ddos use privat pcs controlled through a botnet
Terrible.
The Internet Archive needs to be distributed somehow. We can’t have a single point of failure like this or we’ve learned nothing since Alexandria.
I’ve got several terabytes just laying around that I’d happily devote to ancient copies of web pages.
This is why we need more websites to adopt secure client side scripting.
JavaScript may or may not be it, but the web needs to be reachable/archivable. It should also have attribution, but that’s a tangent.
As of January 2024, archive.org claims to have over 99 Petabytes of data stored.
dweb.archive.org loads for me
We might need something like a portal site for IPFS.
FBI? CIA? Or just some shit company pissed? Taking all bets.
Donated
Internet Archive is also being sued by the US book publishing and US recording industries associations, which are claiming copyright infringement and demanding combined damages of hundreds of millions of dollars and diminished services from all libraries.
“If our patrons around the globe think this latest situation is upsetting, then they should be very worried about what the publishing and recording industries have in mind,” added Kahle. “I think they are trying to destroy this library entirely and hobble all libraries everywhere. But just as we’re resisting the DDoS attack, we appreciate all the support in pushing back on this unjust litigation against our library and others.”
What the Internet Archive is doing seems to be to be a pretty textbook case of fair use to me.
The claim that the publishing and recording industries are somehow harmed by a site that can only make copies of content that was made freely available and isn’t being resold is ludicrous stupid.
The problem is that the litigation was entirely “just”, as far as the legal system goes. It’s an open-and-shut case and everyone saw it coming. The Internet Archive basically stood in front of a train and dared it to turn, and now they’re crying the victim. Doesn’t exactly entice me to send them donations to cover their lawyers and executives right now.
They really need to admit “okay, so that was a dumb idea, and ultimately not related to archiving the Internet anyway. We’re not going to do that again.”
Note that I’m not saying the publishers are “good guys” here, I hate the existing copyright system and would love to see it contested. Just not by Internet Archive. Let someone else who’s purpose is fighting those fights take it on and stick to preserving those precious archives out of harm’s way.
Sure would be nice if these companies could be scared off thus like target and pride month
They really need to admit “okay, so that was a dumb idea, and ultimately not related to archiving the Internet anyway. We’re not going to do that again.”
It literally archives internet pages and files. What do you think the internet archive does if it doesn’t do that?
The lawsuit was about them distributing unauthorized copies of books. Not archiving, and not internet pages or files.
And that was exactly the problem.
Your calling files, book documents to be specific, books, doesn’t change that IA is storing files, ebooks to be specific, nor that the ruling shall affect all Libraries, which includes the Internet Archive to be specific. And the actual issue, is that the publishers refuse to offer ebooks to Libraries as they assume it’ll cost sales when in fact the folks using the Library are there as they are not going to go buy one.
doesn’t change that IA is storing files, ebooks to be specific,
Emphasis added. Storing files is not the problem. Nobody cared when they were just scanning and storing them. The problem arose when they started giving out copies. And worse, giving out copies without restriction - libaries “lend” ebooks by using DRM systems to try to ensure that only a specific number of copies are out “in circulation” at any given time, and so the big publishers have turned a blind eye to that.
Internet Archive basically turned themselves into an ebook Pirate Bay, giving out as many copies as were asked for with no limits.
Again, I don’t agree with current copyright laws, I think the big publishers are gigantic heaps of slime and should be burned to the ground. The problem here is that it’s not Internet Archive that should be fighting this fight.
Emphasis added. Storing files is not the problem. Nobody cared when they were just scanning and storing them. The problem arose when they started giving out copies. And worse, giving out copies without restriction - libaries “lend” ebooks by using DRM systems to try to ensure that only a specific number of copies are out “in circulation” at any given time, and so the big publishers have turned a blind eye to that.
But libraries do not do that to limit access… (I think, unless there is some kind of copyright law making it necessary to restrict access). Don’t they do do that because they have a limited number of book copies that they need to maintain to meet the book lending demands in their area? Seems to me like they are just trying ro maximise people’s access to books given the constraints. Any digital library can obviously do this much faster.
Library, look it up. And the publishers always hated Libraries.
Unlimited copies, look it up. Internet Archive’s “emergency library” broke the customary limits that other libraries stick to in order to keep publishers off their backs - they were giving out as many copies of a book at once as people were requesting, rather than keeping a limited number “in circulation.”
It really was basically just a piracy site all of a sudden. It’s absolutely no surprise at all that the publishers came down on them like a ton of bricks.
I hate the existing copyright system and would love to see it contested.
My brother in Christ, they’re literally contesting it
Did you read literally the next sentences I wrote after that one? Here they are:
Just not by Internet Archive. Let someone else who’s purpose is fighting those fights take it on and stick to preserving those precious archives out of harm’s way.
The Internet Archive is like someone carrying around a precious baby. The baby is an irreplaceable archive of historical data being preserved for posterity. I do not want them to go and fight with a bear, even if the bear is awful and needs to be fought. I want them to run away from the bear to protect the baby, while someone else fights the bear. Someone better equipped for bear-fighting, and who won’t get that precious cargo destroyed in the process of fighting it.
Who else is better equipped? In my view it would solely depend on the lawyers that internet archive hires, and money plays a big factor in that.
Also, internet archive is going through the route process of how legislation gets overturned or upheld. Just because you perceive them as unworthy to bear the challenge doesn’t make that true, and as a result your commitment to not support them because they aren’t the one true chosen is ill-informed.
What makes the internet archive well-equipped for that? They have money from donations? Donations that were more than likely intended for preserving the archive, and not facilitating book piracy in an obviously illegal way that now requires them to piss those donations away in legal fees?
Who else is better equipped?
The EFF, for example. Fighting lawsuits for the sake of internet freedom is their reason for being. Sci-hub, for ebooks more specifically. Or Library Genesis. Those are organizations specifically devoted to fighting against excessive copyright restrictions on books.
Just because you perceive them as unworthy to bear the challenge
You’re not understanding what I’m saying here. I don’t think Internet Archive is unworthy to bear the challenge. I think they’re not well suited to it, and when they inevitably lose the lawsuits they’ve jumped head-first into they’re risking damage to other causes that are very important and unrelated to this particular fight.
Who else is better equipped? In my view it would solely depend on the lawyers that internet archive hires, and money plays a big factor in that.
The EFF. This kind of thing is why they exist.
The Archive making themselves an easier target was a huge misstep IMO. All it takes is one overreaching judge telling them they need to purge all copyrighted data (a common judgment in lawsuits like this) and the world becomes a worse place.
Realistically, they could just move their servers abroad to a country with less problematic copyright rules and wind up their US operations. It would make no difference to the end user, unless ISPs are also ordered to block access. And even then it’d only be a VPN away.
The risk of total data loss is not zero, but it’s also not the likely outcome.
Well said. Within the existing framework of copyright law, the emergency open library thing that got them sued seems obviously illegal, despite it being a good thing. What’s good and what’s legal don’t always line up.
The Internet Archive’s work is too important. The library portion (that does controlled digital lending of published books) is nice, but I wouldn’t be too hurt if it goes down. Regular public libraries can fill a lot of that role. But the archive itself is incredible, and losing that would be a huge shame.
It probably wouldn’t help their current lawsuit, at this point. Maybe right at the beginning, before it went to court and they could negotiate a bit in search of a reasonable settlement, but at this point they’ve already lost it hard.
What it would do is reassure me that they’re not going to do something dumb like this in the future, which would make me more willing to donate money to them knowing it’ll go to actual internet archiving activities instead of being thrown into big publishers’ pockets as part of more lawsuit settlements.
It’s an open-and-shut case and everyone saw it coming.
And yet whoever’s doing this evidently doesn’t expect to succeed via legal means.
This subthread switched specifically to the topic of their pending lawsuits, it’s not about the DDoS. I doubt the publishers are behind this DDoS because they’re already easily winning in the courts, there’s absolutely no need for them to risk blowing their case and getting countersued this way.
This subthread switched specifically to the topic of their pending lawsuits
Because Internet Archive implied a potential connection. And given the large-institution scale of the attack and the lack of motivation for any other actors on that scale, it seems like the most plausible explanation.
Then the Internet Archive is being an idiot and risking a lawsuit. Again. They’ve already been raked over the coals for copyright violation, I guess they want to add libel to the list as well?
The Internet Archive has plenty of enemies, many of whom don’t have an easy legal arsenal to throw at them like those big publishers did. The publishers have been playing smart so far and have won already through legal means, it makes no sense for them to suddenly turn stupid and launch this DDoS.
Man… These clowns are getting out of line.
I guess we gonna need to torrent harder. Stop feeding the parasite. If you want to support the artist pay them directly and torrent everything.
These clowns owners think they own you and entire human knowledge.
Offshore server farms running on cargoships connecting thru starlink
Peasants gonna need to get rich for this Op
You are simply wrong from the get go. This is the only way it’ll ever get addressed, is 100% in the stated purpose of the Internet Archive, the dumb part isn’t on the side of preservation efforts, there isn’t a separate issue nor is there a separate copyright the publishers are the same with the exact same unsustainable arguments regardless of web page, code, or ebook.
You are making the same mistake made upon a lot of patents, assuming “but on a computer” is somehow transformative.Okay well they’re being sued for millions of damages? If they just agree to those damages they put themselves at higher risks of losing other court cases and the money to run the site
They’re only at risk when they take risky behaviours. Simply archiving the Internet, like they’ve been doing for years, is not what they got sued over.
If they’re going to keep doing the same thing they got sued over then they’re going to keep losing court cases, because obviously they are. The definition of insanity is doing the same thing and expecting a different result. They should stop doing that.
Just not by Internet Archive.
Why not them in particular?
I explained why not in the sentence directly following the one that you quoted. Here it is again:
Let someone else who’s purpose is fighting those fights take it on and stick to preserving those precious archives out of harm’s way.
To explain in more detail: The Internet Archive is custodian to an irreplaceable archive of Internet history and raw data. If they go and get themselves destroyed at the hands of book publishers fighting lawsuits over ebook piracy, that archive is at risk of being destroyed along with them. Or being sold off at whatever going-out-of-business sale they have, perhaps even to those very giant publishers that destroyed them.
That is why not them in particular. Let someone who isn’t carrying around that precious archive go and get into fights like this.
That does make sense. They do have “more to lose” in that sense.
Then we fuck the train up
Could these publishers try to set up these court cases to position it in front of the US Supreme Court?
Do they have any idea who’s perpetrating the attack?
As stated in the Internet Archive Blog post:
The source of the attack is unknown.
If i go into conspiracy mode i would say record labels (they tent to have small peepees when it comes to, well everything) or some DICKtator country that doesnt like archived text of some sort.