00:00:08<appledash>JAA: There are not, but I've been running a script for the past ~week scraping URLs... The methodology was start at /, add every URL on the page to a list, and then repeat for every URL on the list, making sure to only track unique URLs... let's seee...
00:00:36<appledash>530,000 known URLs, 318,000 not yet fetched, 219041 of the total list being video URLs
00:02:43<appledash>I lamented to my friend that I wished I had access to Warrior for this and he was like "you idiot why don't you ask them"
00:02:54<appledash>and it had 100% not occurred to me to tell anyone but one friend
00:03:32tzt (tzt) joins
00:03:52<appledash>Right now the script is in something of a steadily declining state - that is, for every new page it fetches, it on average finds less than 1 new URL it hasn't seen yet.
00:25:04Arcorann (Arcorann) joins
00:49:04HP_Archivist quits [Ping timeout: 252 seconds]
00:51:32<xxia|m>big NSFW warning for VidLii. there was gore on the front page when I visited.
00:52:44pie_[bnc] quits []
00:52:57pie_ joins
00:53:12<appledash>^ I apologize if my initial warning was insufficient
01:17:40tbc1887 (tbc1887) joins
01:25:33Guest50 quits [Ping timeout: 265 seconds]
01:34:55Hackerpcs quits [Quit: Hackerpcs]
01:37:44Hackerpcs (Hackerpcs) joins
01:54:53Guest50 joins
02:02:02Hans5958 (Hans5958) joins
02:03:50pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
02:07:53pabs (pabs) joins
02:25:17<Hans5958>What are available projects to do? Imgur being capped by the tracker, I'm afraid having problems of Reddit, I am doing Telegram and URLTeam.
02:27:27<@JAA>Enjin and DPReview are also in progress. URLs has fat asterisks but is also running. The other projects have either low activity or are not running at all.
02:27:50dumbgoy__ joins
02:29:50Ivan226 quits [Ping timeout: 265 seconds]
02:29:51<Hans5958>I can't get DPReview for some reason, and Enjin is 403 here and there (except on GCP but this is home network wise)
02:30:07<Hans5958>But thanks for the suggestions!
02:31:44dumbgoy_ quits [Ping timeout: 252 seconds]
02:32:02<Hans5958>And URLs... not gonna risk it
02:51:16BlueMaxima quits [Read error: Connection reset by peer]
03:06:52Guest50 quits [Client Quit]
03:21:43umgr036 quits [Remote host closed the connection]
03:21:58umgr036 joins
03:24:02wickedplayer494 quits [Remote host closed the connection]
03:26:08Island joins
03:52:25tbc1887 quits [Read error: Connection reset by peer]
04:00:00treora quits [Client Quit]
04:01:22treora joins
04:13:04TastyWiener95 (TastyWiener95) joins
04:17:29Inhonion (TastyWiener95) joins
04:17:46TastyWiener95 quits [Excess Flood]
04:21:59Inhonion is now known as TastyWiener95
05:04:01TastyWiener95 quits [Ping timeout: 265 seconds]
05:13:18Hans5958 quits [Remote host closed the connection]
05:20:54Hans5958 (Hans5958) joins
05:29:22nicolas17 quits [Client Quit]
05:34:30Hans5958 leaves
05:36:25Hans5958 (Hans5958) joins
05:38:21Hans5958 quits [Client Quit]
05:38:37Hans5958 (Hans5958) joins
05:53:11Hans5958 quits [Client Quit]
06:14:34dvd_ quits [Remote host closed the connection]
06:32:02datechnoman quits [Quit: The Lounge - https://thelounge.chat]
06:33:37datechnoman (datechnoman) joins
06:48:26user_ joins
06:50:50umgr036 quits [Ping timeout: 265 seconds]
06:52:23hitgrr8 joins
07:04:04lexikiq quits [Client Quit]
08:03:41Ivan226 joins
08:06:24Island quits [Read error: Connection reset by peer]
08:10:24Ketchup901 quits [Remote host closed the connection]
08:10:39Ketchup901 (Ketchup901) joins
08:14:47wickedplayer494 joins
08:41:59user_ quits [Remote host closed the connection]
08:42:44umgr036 joins
09:39:15s-crypt22 is now known as s-crypt
09:47:48spirit quits [Client Quit]
09:48:12tbc1887 (tbc1887) joins
09:55:28Ivan226 quits [Ping timeout: 265 seconds]
09:55:30Ivan22637 joins
09:55:55Ivan226 joins
10:00:18Ivan22637 quits [Ping timeout: 265 seconds]
10:00:40qwertyasdfuiopghjkl quits [Client Quit]
10:02:17umgr036 quits [Remote host closed the connection]
10:05:55umgr036 joins
10:06:44umgr036 quits [Remote host closed the connection]
10:06:58umgr036 joins
10:22:56Megaweapon (Megaweapon) joins
10:44:19tbc1887 quits [Read error: Connection reset by peer]
11:01:57umgr036 quits [Remote host closed the connection]
11:04:26icedice joins
11:08:03Barto quits [Remote host closed the connection]
11:08:28Barto (Barto) joins
11:16:56Barto quits [Client Quit]
11:21:46Barto (Barto) joins
12:15:48adamus1red quits [Quit: SigTerm]
12:18:03adamus1red (adamus1red) joins
13:15:19Guest50 joins
13:24:17Guest50 quits [Read error: Connection reset by peer]
14:25:05<@Sanqui>if anybody wants to sponsor an archivebot pipeline, I could archive more forums -> more imgur links
14:26:08Arcorann quits [Ping timeout: 265 seconds]
14:31:07qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
14:37:33Guest50 joins
14:52:50Guest50 quits [Read error: Connection reset by peer]
15:53:26za3k quits [Remote host closed the connection]
16:13:42<icedice>Sanqui: Maybe ask IncogNET, they sponsor a bunch of projects with servers and seem pretty chill in general
16:13:53<icedice>Hmm
16:14:04<icedice>Seems like it's all privacy projects though: https://incognet.io/privacyprojects
16:14:11<icedice>Still, doesn't hurt to ask
16:14:42za3k joins
16:17:51za3k quits [Remote host closed the connection]
16:18:14<icedice>They're pretty pro-freedom of information anyway
16:19:14za3k joins
16:32:31dumbgoy joins
16:36:32dumbgoy__ quits [Ping timeout: 252 seconds]
16:47:54HugsNotDrugs quits [Ping timeout: 252 seconds]
16:48:11HugsNotDrugs joins
17:32:22<Maakuth|m>Sanqui: how much money are we talking about?
17:40:46<joepie91|m>icedice: is there something I'm missing, or are these torrents entirely devoid of trackers?
17:41:54<icedice>oof
17:42:10<icedice>Someone I know grabbed a bunch of the magazines when the site was still up
17:42:18<icedice>So those torrents should work, at least
17:42:49<joepie91|m>two of them are finding peers, veeeeery slow ones
17:43:05<joepie91|m>oh one of them has found 1 (one) decent peer
17:43:26<joepie91|m>this is concerning :/
17:43:37<icedice>If we want to be pro-active we should probably archive https://randomhoohaas.flyingomelette.com/ai/scans/
17:43:40<pokechu22>Sanqui: If I run a forum via archivebot, is there anything special I need to do for the imgur links to be found by you?
17:43:49<icedice>It has some of the sister magazines to one of the magazines that was DMCA'd
17:44:33Guest50 joins
17:46:20<icedice>joepie91|m: I'll ask the person who grabbed some stuff from there if they're seeding
17:46:29<icedice>And if not, if they can reseed it
17:46:48<@Sanqui>pokechu22: best tell me to run the forum, and I will handle it from there
17:46:49<joepie91|m>👍️
17:47:17<@Sanqui>Maakuth|m: how much money can get us another AB pipeline?
17:48:21<pokechu22>Sanqui: Well, I recently ran forums.dolphin-emu.org, and I also ran https://forum.cyberscore.me.uk/
17:49:19<pokechu22>I'm curious if you're extracting imgur links from all AB jobs, or just the ones you're running yourself
17:49:37<@Sanqui>pokechu22: I'll add them to the list
17:49:53<@Sanqui>all jobs is unrealistic, we will probably be able to process some 100s
17:51:57<Exorcism|m><Sanqui> "pokechu22: best tell me to run..." <- Why not also forum.refump.org? 🤔
17:52:15<icedice>Seems like they had a YouTube channel which is gone as well: https://www.youtube.com/@forestillusion
17:52:24<icedice>Those damn Nintendo ninja lawyers
17:53:26<pokechu22>forum.redump.org is a bit of a special case since most of it is inaccessible unless you're signed in
17:53:47<pokechu22>and archivebot doesn't let you sign in
17:53:54<Exorcism|m>true...
17:54:12<@Sanqui>login walls are a pain yeah
17:54:28<@Sanqui>somebody can extract the imgur links; it's not going to be me
17:54:44<icedice>Can you AutoHotKey it?
17:55:05<@Sanqui>there are probably better methods
17:55:26<icedice>There was a guy who made an AutoHotKey script when Mixtape.moe was going down and put it on GitHub Gist
17:55:39<schwarzkatz|m>autohotkey? that sounds like a terrible idea
17:55:43<pokechu22>a search for `imgur` gives me 711 posts with imgur links/embedded images on forum.redump.org
17:55:47<icedice>Can't remember his name, but it's bound to be in the logs from the days before the shutdown
17:55:47<pokechu22>I can probably do a bookmarklet
17:58:17<pokechu22>similarly 200 results for imgur on forum.no-intro.org, I can do that too later today
17:59:01<icedice>Can you do ResetEra while you're at it?
17:59:10<icedice>That site has search results behind a login as well
18:00:08<icedice>https://www.resetera.com/
18:00:14<icedice>It's a pretty big gaming forum
18:01:09<pokechu22>I don't have an account on there. But if it's a forum that has most content public then doing it via archivebot would probably be better since that would also save the content; forum.redump.org and forum.no-intro.org are just special cases because most of the content requires you to be signed in
18:03:38<@Sanqui>not sure if it has a cf wall or not
18:04:56<Maakuth|m>Sanqui: yeah. how much for an extra pipeline
18:18:35<icedice>pokechu22: True
18:19:11<icedice>Pretty cool site, has soundtracks in FLAC format for a bunch of anime and games: https://www.sittingonclouds.net/
18:30:26nicolas17 joins
19:05:46Guest50 quits [Ping timeout: 252 seconds]
19:10:42<Ryz>Sanqui, https://www.resetera.com/ is behind cf
19:34:17nepeat quits [Quit: ZNC - https://znc.in]
19:37:33nepeat (nepeat) joins
19:39:17lexikiq joins
19:41:07<icedice>https://forums.tomshardware.com/ hasn't been archived and has Imgur links, so that one might be worth a look once a slot opens up
19:41:25<icedice>I assume pokecommunity.com is still on the wait list, right?
19:41:44<icedice>Or did it run and finish while I was offline?
19:45:15<@Sanqui>Ryz: some cloudflare sites are ok with concurrency 2 delay 25 though
19:45:18<@Sanqui>while others wall instantly
19:45:55<lexikiq>Should probably ask in or check the topic in #imgone, icedice :P
19:49:03<appledash>Anyone around have any thoughts on VidLii, which I posted about last night? TL;DR site was acquired at some point, is now entirely unmoderated and overrun with "bad" content (NSFW/gore/nazi warning potentially if you visit the site,) and it's throwing 503 errors every other request.
19:50:02<icedice>I figured this was the place for archivation jobs of other sites
19:50:06<icedice>But you might be right
19:50:11<lexikiq>Fair
19:53:32<pokechu22>For forums here or #archivebot would make sense; I only joined #imgone earlier today so maybe it's also being coordinated there though
20:09:11<icedice>Things are moving pretty quickly in there though
20:09:48<icedice>So I figured it was better to put it here where it's easily visible
20:10:32<icedice><appledash> Anyone around have any thoughts on VidLii, which I posted about last night? TL;DR site was acquired at some point, is now entirely unmoderated and overrun with "bad" content (NSFW/gore/nazi warning potentially if you visit the site,) and it's throwing 503 errors every other request.
20:11:18<icedice>Sooner or later it will mess with whatever monetization method they have in place (if there even is one) and/or their hosting provider or domain registrar
20:11:25<nicolas17>I don't think it's worth archiving if it's already trashed like that
20:12:41nicolas17 quits [Client Quit]
20:17:51Pingerfowder quits [Quit: ZNC - https://znc.in]
20:18:00Ivan226 quits [Ping timeout: 265 seconds]
20:18:13<icedice>Seems like VidLii is hosted by Terrahost
20:18:23<icedice>So it's not going to get taken down, at least
20:18:42Pingerfowder (Pingerfowder) joins
20:25:18Guest50 joins
20:26:47icedice quits [Client Quit]
20:28:03icedice joins
20:28:25icedice quits [Changing host]
20:28:25icedice (icedice) joins
20:32:07Guest50 quits [Ping timeout: 252 seconds]
20:45:56Ivan226 joins
20:51:38Island joins
20:54:50<appledash>A dilution of good content with bad content doesn't mean the good content isn't there - and one can easily do filtering based on video titles and tags
20:55:27<@JAA>And at the very least, it's worth archiving the site without the videos themselves.
21:15:37<icedice>A few years ago we archived hate videos from YouTube when YouTube were about to clamp down on those
21:15:47<icedice>And those were 100% shit
21:16:08<icedice>VidLii probably has some gold nuggets under that layer of shit
21:16:57<icedice>joepie91|m: They're seeding, but their PC isn't always on
21:33:26sonick quits [Client Quit]
21:38:40AK (AK) joins
21:49:56that_lurker (that_lurker) joins
22:11:49hitgrr8 quits [Client Quit]
22:16:40nicolas17 joins
22:41:19<qwertyasdfuiopghjkl>You could also try to figure out when moderation disappeared and prioritize content posted before that date
22:46:38h3ndr1k quits [Quit: ]
22:47:03h3ndr1k (h3ndr1k) joins
23:09:21myself quits [Read error: Connection reset by peer]
23:09:30myself (myself) joins
23:14:48Island quits [Read error: Connection reset by peer]
23:25:10icedice quits [Client Quit]
23:27:50nicolas17 quits [Client Quit]
23:33:08Guest50 joins
23:35:01icedice joins
23:35:53icedice quits [Changing host]
23:35:53icedice (icedice) joins
23:43:05BlueMaxima joins
23:52:02icedice quits [Client Quit]
23:55:27Guest50 quits [Client Quit]
23:55:58Guest50 joins
23:57:03systwi_ quits [Quit: systwi_]
23:57:06nothere quits [Quit: Leaving]
23:57:26Ivan226 quits [Ping timeout: 265 seconds]
23:59:10icedice joins