how big is the database?
books can’t be that big, but i’m guessing the selection is simply huge?
The selection is literally all books that can be found on the internet.
So how big is that?
According to their total dataset size excluding duplicates, over 900 TB
Shit, my synology has more than that… alas, it is full of movie “archives”
You run a petabyte Synology at home?
Sure, that’s a bit more than $65.000 per year with Backblaze.
For anyone wanting to contribute but on a smaller and more feasible scale, you can help distribute their database using torrents.
I know the last time this came up there was a lot of user resistance to the torrent scheme. I’d be willing to seed 200-500gb but having minimum torrent archive sizes of like 1.5TB and larger really limits the number of people willing to give up that storage, as well as defeats a lot of the resiliency of torrents with how bloody long it takes to get a complete copy. I know that 1.5TB takes a massive chunk out of my already pretty full NAS, and I passed on seeding the first time for that reason.
It feels like they didn’t really subdivide the database as much as they should have…
There are plenty of small torrents. Use the torrent generator and tell the script how much space you have and it will give you the “best” (least seeded) torrents whose sum is the size you give it. It doesn’t have to be big, even a few GB is suitable for some smaller torrents.
Almost all the small torrents that I see pop up are already seeded relatively good (~10 seeders) though, which reinforces the fact that A. the torrents most desperately needing seeders are the older, largest ones and B. large torrents don’t attract seeders because of unreasonable space requirements.
Admittedly, newer torrents seem to be split into 300gb or less pieces, which is good, but there’s still a lot of monster torrents in that list.