00:03:44<pabs>what non-crawling URL enumeration mechanisms are there other than CDX and search engines?
00:19:46<icedice>Is something going on with Tumblr or is it just standard archiving going on in ArchiveBot with no special circumstances behind it?
00:19:57<icedice>I'm seeing a lot of Tumblr blogs there
00:20:28<hlgs|m>i'm doing a backup with some help
00:20:42<hlgs|m>tumblr's been deleting blogs and i want to save what i care about and was going to save later this summer
00:20:42<@JAA>We have a channel for Tumblr, and the reason's been mentioned there: some ToS change in November.
00:20:59<hlgs|m>(what's the channel for tumblr?)
00:21:16<icedice>Please be dumblr
00:21:36<@JAA>#tumbledown (see wiki, where each major project has a page mentioning the channel)
00:22:01<icedice>I was about to say #tumbledown, just remembered it lol
00:22:11pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
00:24:31AmAnd0A quits [Read error: Connection reset by peer]
00:24:47AmAnd0A joins
00:24:56pabs (pabs) joins
00:27:25tbc1887 (tbc1887) joins
00:31:35Icyelut (Icyelut) joins
00:43:37Icyelut|2 (Icyelut) joins
00:43:59JohnnyJ joins
00:46:59Icyelut quits [Ping timeout: 252 seconds]
00:50:40AmAnd0A quits [Ping timeout: 252 seconds]
00:50:47<icedice>Does anyone here have time to archive https://forum.doom9.org/, https://forum.videohelp.com/, https://www.digitalfaq.com/forum/, and https://audiosex.pro/ to get whatever Imgur links remain there?
00:59:54Arcorann (Arcorann) joins
01:00:28tzz888 joins
01:12:37Lambro quits [Read error: Connection reset by peer]
01:20:00AmAnd0A joins
02:04:33dumbgoy_ joins
02:07:40dumbgoy quits [Ping timeout: 252 seconds]
02:16:25tzz888 quits [Remote host closed the connection]
02:37:18DopefishJustin joins
02:54:20Ruthalas5 (Ruthalas) joins
03:33:59marto_ quits [Quit: Ping timeout (120 seconds)]
03:34:06marto_ (marto_) joins
03:55:42icedice quits [Client Quit]
04:00:13<h2ibot>Thezt edited ISP Hosting (+180, ZoomInternet offline): https://wiki.archiveteam.org/?diff=49843&oldid=49839
04:32:25icedice (icedice) joins
04:41:00decky_e quits [Remote host closed the connection]
04:54:42<tech234a>Worth keeping an eye on Nintendo emulators; there was a DMCA against Dolphin’s Steam release: https://dolphin-emu.org/blog/2023/05/27/dolphin-steam-indefinitely-postponed/
05:02:19decky_e joins
05:18:40<icedice>XDA Forums is another one worth archiving and scraping for Imgur links
05:18:52<icedice>Nintendo being Nintendo
05:23:27<icedice>They poked the hornets nest by trying to get listed on Steam
05:23:51<icedice>Their software is legal, but Nintendo doesn't give a shit about legality
05:25:06<nicolas17>the takedown also makes no sense
05:25:36<nicolas17>you can send a takedown for a "coming soon" page because you think the software that *will* be published in the future there is a copyright violation?
06:34:10Minkafighter quits [Client Quit]
06:34:45Minkafighter joins
06:56:01dumbgoy_ quits [Read error: Connection reset by peer]
07:56:34Island quits [Read error: Connection reset by peer]
07:59:06BlueMaxima quits [Read error: Connection reset by peer]
08:36:58umgr036 joins
08:37:46umgr036 quits [Remote host closed the connection]
08:37:59umgr036 joins
08:41:45umgr036 quits [Remote host closed the connection]
08:51:27decky_e quits [Read error: Connection reset by peer]
09:00:46c3manu (c3manu) joins
09:38:24Minkafighter quits [Client Quit]
09:39:16manu|m joins
09:40:17Minkafighter joins
11:21:33Chris50109 (Chris5010) joins
11:23:20Chris5010 quits [Ping timeout: 252 seconds]
11:23:20Chris50109 is now known as Chris5010
12:04:14icedice quits [Client Quit]
12:21:05icedice (icedice) joins
12:44:01icedice quits [Client Quit]
12:44:13BigBrain_ (bigbrain) joins
12:47:31BigBrain quits [Ping timeout: 245 seconds]
12:49:27icedice (icedice) joins
12:50:47geezabiscuit quits [Ping timeout: 252 seconds]
13:03:38<masterX244>buttflare 521 stole me a few pages at planetminecraft crawl and due to how their shitty pagination works the missing data is at the end of the pagination and there is no way to skip to therre
13:10:24<icedice><nicolas17> the takedown also makes no sense
13:10:24<icedice><nicolas17> you can send a takedown for a "coming soon" page because you think the software that *will* be
13:10:31<icedice>They're a Japanese company
13:10:41<icedice>Copyright law tends to be whatever they want it to be
13:10:50<icedice>And if it's not they don't give a shit
13:12:21<icedice>What are the devs going to do? Spend hundreds of thousands of dollars trying to sue them in court which Nintendo can easily drag out to a year long court battle in order to bleed them dry?
13:12:45<icedice>This is the same company that DMCAs let's play videos on YouTube
13:13:52<FireFly>it was apparently not exactly a DMCA takedown; see https://mastodon.delroth.net/@delroth/110440308907131051
13:14:29<FireFly>but basically seems to just have been between Valve & Nintendo
13:14:34<FireFly>but yes, Ninty being Ninty
13:58:26Arcorann quits [Ping timeout: 252 seconds]
14:37:45umgr036 joins
14:38:44umgr036 quits [Remote host closed the connection]
14:38:59umgr036 joins
14:49:56geezabiscuit (geezabiscuit) joins
15:25:06user__ joins
15:27:40umgr036 quits [Ping timeout: 265 seconds]
15:38:03umgr036 joins
15:40:43user__ quits [Ping timeout: 265 seconds]
16:29:19user__ joins
16:32:55umgr036 quits [Ping timeout: 265 seconds]
17:15:37Dango360 (Dango360) joins
17:16:53_Dango360 (Dango360) joins
17:16:59Dango360_ quits [Ping timeout: 252 seconds]
17:20:46Dango360 quits [Ping timeout: 265 seconds]
17:21:56_Dango360 quits [Ping timeout: 252 seconds]
17:22:18Dango360 (Dango360) joins
17:27:38user__ quits [Ping timeout: 252 seconds]
17:28:17user_ joins
17:42:46G4te_Keep3r3492 joins
18:06:54user__ joins
18:08:41dumbgoy joins
18:10:10user_ quits [Ping timeout: 252 seconds]
18:12:50yts98 leaves
18:12:53yts98 joins
18:45:50katocala quits [Ping timeout: 265 seconds]
18:46:44katocala joins
18:50:21c3manu quits [Client Quit]
18:51:38katocala quits [Ping timeout: 265 seconds]
18:52:24katocala joins
18:54:59c3manu (c3manu) joins
19:11:55tzt quits [Remote host closed the connection]
19:12:18tzt (tzt) joins
19:18:42AmAnd0A quits [Ping timeout: 265 seconds]
19:18:47AmAnd0A joins
19:19:24AmAnd0A quits [Read error: Connection reset by peer]
19:20:26AmAnd0A joins
19:28:16Island joins
19:34:22c3manu quits [Client Quit]
19:40:22HP_Archivist (HP_Archivist) joins
19:45:52tzt quits [Ping timeout: 252 seconds]
19:51:04icedice quits [Client Quit]
19:53:59wickedplayer494 quits [Ping timeout: 265 seconds]
19:58:01tzt (tzt) joins
19:58:42wickedplayer494 joins
20:21:10icedice (icedice) joins
20:27:17AmAnd0A quits [Ping timeout: 252 seconds]
20:28:13AmAnd0A joins
20:29:42AmAnd0A quits [Read error: Connection reset by peer]
20:30:05AmAnd0A joins
20:31:36TheTechRobo quits [Remote host closed the connection]
20:32:12TheTechRobo (TheTechRobo) joins
20:49:08dumbgoy_ joins
20:52:14dumbgoy quits [Ping timeout: 252 seconds]
20:58:45tzt quits [Ping timeout: 265 seconds]
20:58:50icedice quits [Client Quit]
20:59:54tzt (tzt) joins
21:27:29icedice (icedice) joins
21:29:21<icedice>JAA: Seems like The PokéCommunity archivation job is not as "almost done" as I thought. Do you mind doing another Imgur batch from there once you have time?
22:06:46<joepie91|m>on the topic of Nintendo, https://torrentfreak.com/nintendos-war-with-1fichier-is-not-over-but-could-be-for-0-00-230419/ is a fascinating read
22:06:55AmAnd0A quits [Read error: Connection reset by peer]
22:07:28AmAnd0A joins
22:26:42<icedice>Doesn't surprise me that Nintendo didn't take them up on their offer
22:27:50<icedice>Nintendo wants it their way no matter what
22:36:01<joepie91|m>I'm not convinced Nintendo will win this in appeals
22:36:15<joepie91|m>there's some very interesting stuff going on there with the reasoning of the court
22:36:53<joepie91|m>the argument is essentially "everybody knows Nintendo, you should've known", but... that is not consistent with the expectation of "equal rule of law for all" that AFAIK the EU sets as a hard requirement for membership
22:37:16<joepie91|m>if 1fichier takes this to the EU, it could become a very tasty case
22:44:21<icedice>Yeah, but countries can ignore EU law once they're in
22:44:47<icedice>They might lose EU grants over it, but other than that there's nothing stopping them
22:46:03<icedice>For example anti-LGBT zones in Poland, pretty much everything going on in Hungary, or how countries like Sweden chose to ignore the repeal of the Data Retention Directive
22:47:10<icedice>And since France is #2 in the EU after Germany, nothing will happen to them
22:47:24<icedice>Germany needs their approval to rule the EU
23:10:09<andrew>Wow, grab-site can be pretty CPU intensive
23:10:15<andrew>I presume this is a Python moment?
23:17:22<@JAA>andrew: Is this a large crawl that has been retrieving stuff from numerous hosts?
23:23:37<andrew>JAA: it's a pretty large crawl intended to only crawl a couple hosts
23:23:59<andrew>it started veering off course and crawling some other stuff though, so I added a nice regex to the ignores list
23:24:27<@JAA>Hmm, typically, slowdowns are from the cookie jar, which is horrible once it accumulates cookies from a couple thousand hosts.
23:24:46<andrew>its CPU usage seems to be high when it's parsing through some large pages
23:24:56<andrew>with lots of links
23:25:36<@JAA>I know that grab-site does some things differently than AB in that area but am not familiar with the details, so can't comment on that.
23:25:53<@JAA>HTML parsing is a significant factor overall though.
23:25:58<andrew>is the HTML parsing done in Python
23:26:26<@JAA>I don't know what grab-site does. AB uses libxml2. wpull defaults to html5lib, which is pure Python and *SLOW*.
23:26:42<andrew>well that explains a lot :P
23:27:05<@JAA>But IIRC grab-site uses something else that's supposed to be faster.
23:27:34<@JAA>Yeah, ludios_wpull uses html5-parser.
23:28:51<@JAA>Another thing that can matter is the DB insertion, but that only comes into play when you go to ... hundreds of millions of URLs?