| 00:00:08 | <appledash> | JAA: There are not, but I've been running a script for the past ~week scraping URLs... The methodology was start at /, add every URL on the page to a list, and then repeat for every URL on the list, making sure to only track unique URLs... let's seee... |
| 00:00:36 | <appledash> | 530,000 known URLs, 318,000 not yet fetched, 219041 of the total list being video URLs |
| 00:02:43 | <appledash> | I lamented to my friend that I wished I had access to Warrior for this and he was like "you idiot why don't you ask them" |
| 00:02:54 | <appledash> | and it had 100% not occurred to me to tell anyone but one friend |
| 00:03:32 | | tzt (tzt) joins |
| 00:03:52 | <appledash> | Right now the script is in something of a steadily declining state - that is, for every new page it fetches, it on average finds less than 1 new URL it hasn't seen yet. |
| 00:25:04 | | Arcorann (Arcorann) joins |
| 00:49:04 | | HP_Archivist quits [Ping timeout: 252 seconds] |
| 00:51:32 | <xxia|m> | big NSFW warning for VidLii. there was gore on the front page when I visited. |
| 00:52:44 | | pie_[bnc] quits [] |
| 00:52:57 | | pie_ joins |
| 00:53:12 | <appledash> | ^ I apologize if my initial warning was insufficient |
| 01:17:40 | | tbc1887 (tbc1887) joins |
| 01:25:33 | | Guest50 quits [Ping timeout: 265 seconds] |
| 01:34:55 | | Hackerpcs quits [Quit: Hackerpcs] |
| 01:37:44 | | Hackerpcs (Hackerpcs) joins |
| 01:54:53 | | Guest50 joins |
| 02:02:02 | | Hans5958 (Hans5958) joins |
| 02:03:50 | | pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.] |
| 02:07:53 | | pabs (pabs) joins |
| 02:25:17 | <Hans5958> | What are available projects to do? Imgur being capped by the tracker, I'm afraid having problems of Reddit, I am doing Telegram and URLTeam. |
| 02:27:27 | <@JAA> | Enjin and DPReview are also in progress. URLs has fat asterisks but is also running. The other projects have either low activity or are not running at all. |
| 02:27:50 | | dumbgoy__ joins |
| 02:29:50 | | Ivan226 quits [Ping timeout: 265 seconds] |
| 02:29:51 | <Hans5958> | I can't get DPReview for some reason, and Enjin is 403 here and there (except on GCP but this is home network wise) |
| 02:30:07 | <Hans5958> | But thanks for the suggestions! |
| 02:31:44 | | dumbgoy_ quits [Ping timeout: 252 seconds] |
| 02:32:02 | <Hans5958> | And URLs... not gonna risk it |
| 02:51:16 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 03:06:52 | | Guest50 quits [Client Quit] |
| 03:21:43 | | umgr036 quits [Remote host closed the connection] |
| 03:21:58 | | umgr036 joins |
| 03:24:02 | | wickedplayer494 quits [Remote host closed the connection] |
| 03:26:08 | | Island joins |
| 03:52:25 | | tbc1887 quits [Read error: Connection reset by peer] |
| 04:00:00 | | treora quits [Client Quit] |
| 04:01:22 | | treora joins |
| 04:13:04 | | TastyWiener95 (TastyWiener95) joins |
| 04:17:29 | | Inhonion (TastyWiener95) joins |
| 04:17:46 | | TastyWiener95 quits [Excess Flood] |
| 04:21:59 | | Inhonion is now known as TastyWiener95 |
| 05:04:01 | | TastyWiener95 quits [Ping timeout: 265 seconds] |
| 05:13:18 | | Hans5958 quits [Remote host closed the connection] |
| 05:20:54 | | Hans5958 (Hans5958) joins |
| 05:29:22 | | nicolas17 quits [Client Quit] |
| 05:34:30 | | Hans5958 leaves |
| 05:36:25 | | Hans5958 (Hans5958) joins |
| 05:38:21 | | Hans5958 quits [Client Quit] |
| 05:38:37 | | Hans5958 (Hans5958) joins |
| 05:53:11 | | Hans5958 quits [Client Quit] |
| 06:14:34 | | dvd_ quits [Remote host closed the connection] |
| 06:32:02 | | datechnoman quits [Quit: The Lounge - https://thelounge.chat] |
| 06:33:37 | | datechnoman (datechnoman) joins |
| 06:48:26 | | user_ joins |
| 06:50:50 | | umgr036 quits [Ping timeout: 265 seconds] |
| 06:52:23 | | hitgrr8 joins |
| 07:04:04 | | lexikiq quits [Client Quit] |
| 08:03:41 | | Ivan226 joins |
| 08:06:24 | | Island quits [Read error: Connection reset by peer] |
| 08:10:24 | | Ketchup901 quits [Remote host closed the connection] |
| 08:10:39 | | Ketchup901 (Ketchup901) joins |
| 08:14:47 | | wickedplayer494 joins |
| 08:14:56 | | wickedplayer494 is now authenticated as wickedplayer494 |
| 08:41:59 | | user_ quits [Remote host closed the connection] |
| 08:42:44 | | umgr036 joins |
| 09:39:15 | | s-crypt22 is now known as s-crypt |
| 09:47:48 | | spirit quits [Client Quit] |
| 09:48:12 | | tbc1887 (tbc1887) joins |
| 09:55:28 | | Ivan226 quits [Ping timeout: 265 seconds] |
| 09:55:30 | | Ivan22637 joins |
| 09:55:55 | | Ivan226 joins |
| 10:00:18 | | Ivan22637 quits [Ping timeout: 265 seconds] |
| 10:00:40 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 10:02:17 | | umgr036 quits [Remote host closed the connection] |
| 10:05:55 | | umgr036 joins |
| 10:06:44 | | umgr036 quits [Remote host closed the connection] |
| 10:06:58 | | umgr036 joins |
| 10:22:56 | | Megaweapon (Megaweapon) joins |
| 10:44:19 | | tbc1887 quits [Read error: Connection reset by peer] |
| 11:01:57 | | umgr036 quits [Remote host closed the connection] |
| 11:04:26 | | icedice joins |
| 11:08:03 | | Barto quits [Remote host closed the connection] |
| 11:08:28 | | Barto (Barto) joins |
| 11:16:56 | | Barto quits [Client Quit] |
| 11:21:46 | | Barto (Barto) joins |
| 12:15:48 | | adamus1red quits [Quit: SigTerm] |
| 12:18:03 | | adamus1red (adamus1red) joins |
| 13:15:19 | | Guest50 joins |
| 13:24:17 | | Guest50 quits [Read error: Connection reset by peer] |
| 14:25:05 | <@Sanqui> | if anybody wants to sponsor an archivebot pipeline, I could archive more forums -> more imgur links |
| 14:26:08 | | Arcorann quits [Ping timeout: 265 seconds] |
| 14:31:07 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 14:37:33 | | Guest50 joins |
| 14:52:50 | | Guest50 quits [Read error: Connection reset by peer] |
| 15:53:26 | | za3k quits [Remote host closed the connection] |
| 16:13:42 | <icedice> | Sanqui: Maybe ask IncogNET, they sponsor a bunch of projects with servers and seem pretty chill in general |
| 16:13:53 | <icedice> | Hmm |
| 16:14:04 | <icedice> | Seems like it's all privacy projects though: https://incognet.io/privacyprojects |
| 16:14:11 | <icedice> | Still, doesn't hurt to ask |
| 16:14:42 | | za3k joins |
| 16:17:51 | | za3k quits [Remote host closed the connection] |
| 16:18:14 | <icedice> | They're pretty pro-freedom of information anyway |
| 16:19:14 | | za3k joins |
| 16:32:31 | | dumbgoy joins |
| 16:36:32 | | dumbgoy__ quits [Ping timeout: 252 seconds] |
| 16:47:54 | | HugsNotDrugs quits [Ping timeout: 252 seconds] |
| 16:48:11 | | HugsNotDrugs joins |
| 17:32:22 | <Maakuth|m> | Sanqui: how much money are we talking about? |
| 17:40:46 | <joepie91|m> | icedice: is there something I'm missing, or are these torrents entirely devoid of trackers? |
| 17:41:54 | <icedice> | oof |
| 17:42:10 | <icedice> | Someone I know grabbed a bunch of the magazines when the site was still up |
| 17:42:18 | <icedice> | So those torrents should work, at least |
| 17:42:49 | <joepie91|m> | two of them are finding peers, veeeeery slow ones |
| 17:43:05 | <joepie91|m> | oh one of them has found 1 (one) decent peer |
| 17:43:26 | <joepie91|m> | this is concerning :/ |
| 17:43:37 | <icedice> | If we want to be pro-active we should probably archive https://randomhoohaas.flyingomelette.com/ai/scans/ |
| 17:43:40 | <pokechu22> | Sanqui: If I run a forum via archivebot, is there anything special I need to do for the imgur links to be found by you? |
| 17:43:49 | <icedice> | It has some of the sister magazines to one of the magazines that was DMCA'd |
| 17:44:33 | | Guest50 joins |
| 17:46:20 | <icedice> | joepie91|m: I'll ask the person who grabbed some stuff from there if they're seeding |
| 17:46:29 | <icedice> | And if not, if they can reseed it |
| 17:46:48 | <@Sanqui> | pokechu22: best tell me to run the forum, and I will handle it from there |
| 17:46:49 | <joepie91|m> | 👍️ |
| 17:47:17 | <@Sanqui> | Maakuth|m: how much money can get us another AB pipeline? |
| 17:48:21 | <pokechu22> | Sanqui: Well, I recently ran forums.dolphin-emu.org, and I also ran https://forum.cyberscore.me.uk/ |
| 17:49:19 | <pokechu22> | I'm curious if you're extracting imgur links from all AB jobs, or just the ones you're running yourself |
| 17:49:37 | <@Sanqui> | pokechu22: I'll add them to the list |
| 17:49:53 | <@Sanqui> | all jobs is unrealistic, we will probably be able to process some 100s |
| 17:51:57 | <Exorcism|m> | <Sanqui> "pokechu22: best tell me to run..." <- Why not also forum.refump.org? 🤔 |
| 17:52:15 | <icedice> | Seems like they had a YouTube channel which is gone as well: https://www.youtube.com/@forestillusion |
| 17:52:24 | <icedice> | Those damn Nintendo ninja lawyers |
| 17:53:26 | <pokechu22> | forum.redump.org is a bit of a special case since most of it is inaccessible unless you're signed in |
| 17:53:47 | <pokechu22> | and archivebot doesn't let you sign in |
| 17:53:54 | <Exorcism|m> | true... |
| 17:54:12 | <@Sanqui> | login walls are a pain yeah |
| 17:54:28 | <@Sanqui> | somebody can extract the imgur links; it's not going to be me |
| 17:54:44 | <icedice> | Can you AutoHotKey it? |
| 17:55:05 | <@Sanqui> | there are probably better methods |
| 17:55:26 | <icedice> | There was a guy who made an AutoHotKey script when Mixtape.moe was going down and put it on GitHub Gist |
| 17:55:39 | <schwarzkatz|m> | autohotkey? that sounds like a terrible idea |
| 17:55:43 | <pokechu22> | a search for `imgur` gives me 711 posts with imgur links/embedded images on forum.redump.org |
| 17:55:47 | <icedice> | Can't remember his name, but it's bound to be in the logs from the days before the shutdown |
| 17:55:47 | <pokechu22> | I can probably do a bookmarklet |
| 17:58:17 | <pokechu22> | similarly 200 results for imgur on forum.no-intro.org, I can do that too later today |
| 17:59:01 | <icedice> | Can you do ResetEra while you're at it? |
| 17:59:10 | <icedice> | That site has search results behind a login as well |
| 18:00:08 | <icedice> | https://www.resetera.com/ |
| 18:00:14 | <icedice> | It's a pretty big gaming forum |
| 18:01:09 | <pokechu22> | I don't have an account on there. But if it's a forum that has most content public then doing it via archivebot would probably be better since that would also save the content; forum.redump.org and forum.no-intro.org are just special cases because most of the content requires you to be signed in |
| 18:03:38 | <@Sanqui> | not sure if it has a cf wall or not |
| 18:04:56 | <Maakuth|m> | Sanqui: yeah. how much for an extra pipeline |
| 18:18:35 | <icedice> | pokechu22: True |
| 18:19:11 | <icedice> | Pretty cool site, has soundtracks in FLAC format for a bunch of anime and games: https://www.sittingonclouds.net/ |
| 18:30:26 | | nicolas17 joins |
| 19:05:46 | | Guest50 quits [Ping timeout: 252 seconds] |
| 19:10:42 | <Ryz> | Sanqui, https://www.resetera.com/ is behind cf |
| 19:34:17 | | nepeat quits [Quit: ZNC - https://znc.in] |
| 19:37:33 | | nepeat (nepeat) joins |
| 19:39:17 | | lexikiq joins |
| 19:41:07 | <icedice> | https://forums.tomshardware.com/ hasn't been archived and has Imgur links, so that one might be worth a look once a slot opens up |
| 19:41:25 | <icedice> | I assume pokecommunity.com is still on the wait list, right? |
| 19:41:44 | <icedice> | Or did it run and finish while I was offline? |
| 19:45:15 | <@Sanqui> | Ryz: some cloudflare sites are ok with concurrency 2 delay 25 though |
| 19:45:18 | <@Sanqui> | while others wall instantly |
| 19:45:55 | <lexikiq> | Should probably ask in or check the topic in #imgone, icedice :P |
| 19:49:03 | <appledash> | Anyone around have any thoughts on VidLii, which I posted about last night? TL;DR site was acquired at some point, is now entirely unmoderated and overrun with "bad" content (NSFW/gore/nazi warning potentially if you visit the site,) and it's throwing 503 errors every other request. |
| 19:50:02 | <icedice> | I figured this was the place for archivation jobs of other sites |
| 19:50:06 | <icedice> | But you might be right |
| 19:50:11 | <lexikiq> | Fair |
| 19:53:32 | <pokechu22> | For forums here or #archivebot would make sense; I only joined #imgone earlier today so maybe it's also being coordinated there though |
| 20:09:11 | <icedice> | Things are moving pretty quickly in there though |
| 20:09:48 | <icedice> | So I figured it was better to put it here where it's easily visible |
| 20:10:32 | <icedice> | <appledash> Anyone around have any thoughts on VidLii, which I posted about last night? TL;DR site was acquired at some point, is now entirely unmoderated and overrun with "bad" content (NSFW/gore/nazi warning potentially if you visit the site,) and it's throwing 503 errors every other request. |
| 20:11:18 | <icedice> | Sooner or later it will mess with whatever monetization method they have in place (if there even is one) and/or their hosting provider or domain registrar |
| 20:11:25 | <nicolas17> | I don't think it's worth archiving if it's already trashed like that |
| 20:12:41 | | nicolas17 quits [Client Quit] |
| 20:17:51 | | Pingerfowder quits [Quit: ZNC - https://znc.in] |
| 20:18:00 | | Ivan226 quits [Ping timeout: 265 seconds] |
| 20:18:13 | <icedice> | Seems like VidLii is hosted by Terrahost |
| 20:18:23 | <icedice> | So it's not going to get taken down, at least |
| 20:18:42 | | Pingerfowder (Pingerfowder) joins |
| 20:25:18 | | Guest50 joins |
| 20:26:47 | | icedice quits [Client Quit] |
| 20:28:03 | | icedice joins |
| 20:28:25 | | icedice is now authenticated as icedice |
| 20:28:25 | | icedice quits [Changing host] |
| 20:28:25 | | icedice (icedice) joins |
| 20:32:07 | | Guest50 quits [Ping timeout: 252 seconds] |
| 20:45:56 | | Ivan226 joins |
| 20:51:38 | | Island joins |
| 20:54:50 | <appledash> | A dilution of good content with bad content doesn't mean the good content isn't there - and one can easily do filtering based on video titles and tags |
| 20:55:27 | <@JAA> | And at the very least, it's worth archiving the site without the videos themselves. |
| 21:15:37 | <icedice> | A few years ago we archived hate videos from YouTube when YouTube were about to clamp down on those |
| 21:15:47 | <icedice> | And those were 100% shit |
| 21:16:08 | <icedice> | VidLii probably has some gold nuggets under that layer of shit |
| 21:16:57 | <icedice> | joepie91|m: They're seeding, but their PC isn't always on |
| 21:33:26 | | sonick quits [Client Quit] |
| 21:38:40 | | AK (AK) joins |
| 21:49:56 | | that_lurker (that_lurker) joins |
| 22:11:49 | | hitgrr8 quits [Client Quit] |
| 22:16:40 | | nicolas17 joins |
| 22:41:19 | <qwertyasdfuiopghjkl> | You could also try to figure out when moderation disappeared and prioritize content posted before that date |
| 22:46:38 | | h3ndr1k quits [Quit: ] |
| 22:47:03 | | h3ndr1k (h3ndr1k) joins |
| 23:09:21 | | myself quits [Read error: Connection reset by peer] |
| 23:09:30 | | myself (myself) joins |
| 23:14:48 | | Island quits [Read error: Connection reset by peer] |
| 23:25:10 | | icedice quits [Client Quit] |
| 23:27:50 | | nicolas17 quits [Client Quit] |
| 23:33:08 | | Guest50 joins |
| 23:35:01 | | icedice joins |
| 23:35:53 | | icedice is now authenticated as icedice |
| 23:35:53 | | icedice quits [Changing host] |
| 23:35:53 | | icedice (icedice) joins |
| 23:43:05 | | BlueMaxima joins |
| 23:52:02 | | icedice quits [Client Quit] |
| 23:55:27 | | Guest50 quits [Client Quit] |
| 23:55:58 | | Guest50 joins |
| 23:57:03 | | systwi_ quits [Quit: systwi_] |
| 23:57:06 | | nothere quits [Quit: Leaving] |
| 23:57:26 | | Ivan226 quits [Ping timeout: 265 seconds] |
| 23:59:10 | | icedice joins |