| 00:03:44 | <pabs> | what non-crawling URL enumeration mechanisms are there other than CDX and search engines? |
| 00:19:46 | <icedice> | Is something going on with Tumblr or is it just standard archiving going on in ArchiveBot with no special circumstances behind it? |
| 00:19:57 | <icedice> | I'm seeing a lot of Tumblr blogs there |
| 00:20:28 | <hlgs|m> | i'm doing a backup with some help |
| 00:20:42 | <hlgs|m> | tumblr's been deleting blogs and i want to save what i care about and was going to save later this summer |
| 00:20:42 | <@JAA> | We have a channel for Tumblr, and the reason's been mentioned there: some ToS change in November. |
| 00:20:59 | <hlgs|m> | (what's the channel for tumblr?) |
| 00:21:16 | <icedice> | Please be dumblr |
| 00:21:36 | <@JAA> | #tumbledown (see wiki, where each major project has a page mentioning the channel) |
| 00:22:01 | <icedice> | I was about to say #tumbledown, just remembered it lol |
| 00:22:11 | | pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.] |
| 00:24:31 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 00:24:47 | | AmAnd0A joins |
| 00:24:56 | | pabs (pabs) joins |
| 00:27:25 | | tbc1887 (tbc1887) joins |
| 00:31:35 | | Icyelut (Icyelut) joins |
| 00:43:37 | | Icyelut|2 (Icyelut) joins |
| 00:43:59 | | JohnnyJ joins |
| 00:46:59 | | Icyelut quits [Ping timeout: 252 seconds] |
| 00:50:40 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 00:50:47 | <icedice> | Does anyone here have time to archive https://forum.doom9.org/, https://forum.videohelp.com/, https://www.digitalfaq.com/forum/, and https://audiosex.pro/ to get whatever Imgur links remain there? |
| 00:59:54 | | Arcorann (Arcorann) joins |
| 01:00:28 | | tzz888 joins |
| 01:12:37 | | Lambro quits [Read error: Connection reset by peer] |
| 01:20:00 | | AmAnd0A joins |
| 02:04:33 | | dumbgoy_ joins |
| 02:07:40 | | dumbgoy quits [Ping timeout: 252 seconds] |
| 02:16:25 | | tzz888 quits [Remote host closed the connection] |
| 02:37:18 | | DopefishJustin joins |
| 02:37:18 | | DopefishJustin is now authenticated as DopefishJustin |
| 02:54:20 | | Ruthalas5 (Ruthalas) joins |
| 03:33:59 | | marto_ quits [Quit: Ping timeout (120 seconds)] |
| 03:34:06 | | marto_ (marto_) joins |
| 03:55:42 | | icedice quits [Client Quit] |
| 04:00:13 | <h2ibot> | Thezt edited ISP Hosting (+180, ZoomInternet offline): https://wiki.archiveteam.org/?diff=49843&oldid=49839 |
| 04:32:25 | | icedice (icedice) joins |
| 04:41:00 | | decky_e quits [Remote host closed the connection] |
| 04:54:42 | <tech234a> | Worth keeping an eye on Nintendo emulators; there was a DMCA against Dolphin’s Steam release: https://dolphin-emu.org/blog/2023/05/27/dolphin-steam-indefinitely-postponed/ |
| 05:02:19 | | decky_e joins |
| 05:18:40 | <icedice> | XDA Forums is another one worth archiving and scraping for Imgur links |
| 05:18:52 | <icedice> | Nintendo being Nintendo |
| 05:23:27 | <icedice> | They poked the hornets nest by trying to get listed on Steam |
| 05:23:51 | <icedice> | Their software is legal, but Nintendo doesn't give a shit about legality |
| 05:25:06 | <nicolas17> | the takedown also makes no sense |
| 05:25:36 | <nicolas17> | you can send a takedown for a "coming soon" page because you think the software that *will* be published in the future there is a copyright violation? |
| 06:34:10 | | Minkafighter quits [Client Quit] |
| 06:34:45 | | Minkafighter joins |
| 06:56:01 | | dumbgoy_ quits [Read error: Connection reset by peer] |
| 07:56:34 | | Island quits [Read error: Connection reset by peer] |
| 07:59:06 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 08:36:58 | | umgr036 joins |
| 08:37:46 | | umgr036 quits [Remote host closed the connection] |
| 08:37:59 | | umgr036 joins |
| 08:41:45 | | umgr036 quits [Remote host closed the connection] |
| 08:51:27 | | decky_e quits [Read error: Connection reset by peer] |
| 09:00:46 | | c3manu (c3manu) joins |
| 09:38:24 | | Minkafighter quits [Client Quit] |
| 09:39:16 | | manu|m joins |
| 09:40:17 | | Minkafighter joins |
| 11:21:33 | | Chris50109 (Chris5010) joins |
| 11:23:20 | | Chris5010 quits [Ping timeout: 252 seconds] |
| 11:23:20 | | Chris50109 is now known as Chris5010 |
| 12:04:14 | | icedice quits [Client Quit] |
| 12:21:05 | | icedice (icedice) joins |
| 12:44:01 | | icedice quits [Client Quit] |
| 12:44:13 | | BigBrain_ (bigbrain) joins |
| 12:47:31 | | BigBrain quits [Ping timeout: 245 seconds] |
| 12:49:27 | | icedice (icedice) joins |
| 12:50:47 | | geezabiscuit quits [Ping timeout: 252 seconds] |
| 13:03:38 | <masterX244> | buttflare 521 stole me a few pages at planetminecraft crawl and due to how their shitty pagination works the missing data is at the end of the pagination and there is no way to skip to therre |
| 13:10:24 | <icedice> | <nicolas17> the takedown also makes no sense |
| 13:10:24 | <icedice> | <nicolas17> you can send a takedown for a "coming soon" page because you think the software that *will* be |
| 13:10:31 | <icedice> | They're a Japanese company |
| 13:10:41 | <icedice> | Copyright law tends to be whatever they want it to be |
| 13:10:50 | <icedice> | And if it's not they don't give a shit |
| 13:12:21 | <icedice> | What are the devs going to do? Spend hundreds of thousands of dollars trying to sue them in court which Nintendo can easily drag out to a year long court battle in order to bleed them dry? |
| 13:12:45 | <icedice> | This is the same company that DMCAs let's play videos on YouTube |
| 13:13:52 | <FireFly> | it was apparently not exactly a DMCA takedown; see https://mastodon.delroth.net/@delroth/110440308907131051 |
| 13:14:29 | <FireFly> | but basically seems to just have been between Valve & Nintendo |
| 13:14:34 | <FireFly> | but yes, Ninty being Ninty |
| 13:58:26 | | Arcorann quits [Ping timeout: 252 seconds] |
| 14:37:45 | | umgr036 joins |
| 14:38:44 | | umgr036 quits [Remote host closed the connection] |
| 14:38:59 | | umgr036 joins |
| 14:49:56 | | geezabiscuit (geezabiscuit) joins |
| 15:25:06 | | user__ joins |
| 15:27:40 | | umgr036 quits [Ping timeout: 265 seconds] |
| 15:38:03 | | umgr036 joins |
| 15:40:43 | | user__ quits [Ping timeout: 265 seconds] |
| 16:29:19 | | user__ joins |
| 16:32:55 | | umgr036 quits [Ping timeout: 265 seconds] |
| 17:15:37 | | Dango360 (Dango360) joins |
| 17:16:53 | | _Dango360 (Dango360) joins |
| 17:16:59 | | Dango360_ quits [Ping timeout: 252 seconds] |
| 17:20:46 | | Dango360 quits [Ping timeout: 265 seconds] |
| 17:21:56 | | _Dango360 quits [Ping timeout: 252 seconds] |
| 17:22:18 | | Dango360 (Dango360) joins |
| 17:27:38 | | user__ quits [Ping timeout: 252 seconds] |
| 17:28:17 | | user_ joins |
| 17:42:46 | | G4te_Keep3r3492 joins |
| 18:06:54 | | user__ joins |
| 18:08:41 | | dumbgoy joins |
| 18:10:10 | | user_ quits [Ping timeout: 252 seconds] |
| 18:12:50 | | yts98 leaves |
| 18:12:53 | | yts98 joins |
| 18:45:50 | | katocala quits [Ping timeout: 265 seconds] |
| 18:46:44 | | katocala joins |
| 18:50:21 | | c3manu quits [Client Quit] |
| 18:51:38 | | katocala quits [Ping timeout: 265 seconds] |
| 18:52:24 | | katocala joins |
| 18:54:59 | | c3manu (c3manu) joins |
| 19:11:55 | | tzt quits [Remote host closed the connection] |
| 19:12:18 | | tzt (tzt) joins |
| 19:18:42 | | AmAnd0A quits [Ping timeout: 265 seconds] |
| 19:18:47 | | AmAnd0A joins |
| 19:19:24 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 19:20:26 | | AmAnd0A joins |
| 19:28:16 | | Island joins |
| 19:34:22 | | c3manu quits [Client Quit] |
| 19:40:22 | | HP_Archivist (HP_Archivist) joins |
| 19:45:52 | | tzt quits [Ping timeout: 252 seconds] |
| 19:51:04 | | icedice quits [Client Quit] |
| 19:53:59 | | wickedplayer494 quits [Ping timeout: 265 seconds] |
| 19:57:07 | | katocala is now authenticated as katocala |
| 19:58:01 | | tzt (tzt) joins |
| 19:58:42 | | wickedplayer494 joins |
| 20:01:01 | | wickedplayer494 is now authenticated as wickedplayer494 |
| 20:21:10 | | icedice (icedice) joins |
| 20:27:17 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 20:28:13 | | AmAnd0A joins |
| 20:29:42 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 20:30:05 | | AmAnd0A joins |
| 20:31:36 | | TheTechRobo quits [Remote host closed the connection] |
| 20:32:12 | | TheTechRobo (TheTechRobo) joins |
| 20:49:08 | | dumbgoy_ joins |
| 20:52:14 | | dumbgoy quits [Ping timeout: 252 seconds] |
| 20:58:45 | | tzt quits [Ping timeout: 265 seconds] |
| 20:58:50 | | icedice quits [Client Quit] |
| 20:59:54 | | tzt (tzt) joins |
| 21:27:29 | | icedice (icedice) joins |
| 21:29:21 | <icedice> | JAA: Seems like The PokéCommunity archivation job is not as "almost done" as I thought. Do you mind doing another Imgur batch from there once you have time? |
| 22:06:46 | <joepie91|m> | on the topic of Nintendo, https://torrentfreak.com/nintendos-war-with-1fichier-is-not-over-but-could-be-for-0-00-230419/ is a fascinating read |
| 22:06:55 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 22:07:28 | | AmAnd0A joins |
| 22:26:42 | <icedice> | Doesn't surprise me that Nintendo didn't take them up on their offer |
| 22:27:50 | <icedice> | Nintendo wants it their way no matter what |
| 22:36:01 | <joepie91|m> | I'm not convinced Nintendo will win this in appeals |
| 22:36:15 | <joepie91|m> | there's some very interesting stuff going on there with the reasoning of the court |
| 22:36:53 | <joepie91|m> | the argument is essentially "everybody knows Nintendo, you should've known", but... that is not consistent with the expectation of "equal rule of law for all" that AFAIK the EU sets as a hard requirement for membership |
| 22:37:16 | <joepie91|m> | if 1fichier takes this to the EU, it could become a very tasty case |
| 22:44:21 | <icedice> | Yeah, but countries can ignore EU law once they're in |
| 22:44:47 | <icedice> | They might lose EU grants over it, but other than that there's nothing stopping them |
| 22:46:03 | <icedice> | For example anti-LGBT zones in Poland, pretty much everything going on in Hungary, or how countries like Sweden chose to ignore the repeal of the Data Retention Directive |
| 22:47:10 | <icedice> | And since France is #2 in the EU after Germany, nothing will happen to them |
| 22:47:24 | <icedice> | Germany needs their approval to rule the EU |
| 23:10:09 | <andrew> | Wow, grab-site can be pretty CPU intensive |
| 23:10:15 | <andrew> | I presume this is a Python moment? |
| 23:17:22 | <@JAA> | andrew: Is this a large crawl that has been retrieving stuff from numerous hosts? |
| 23:23:37 | <andrew> | JAA: it's a pretty large crawl intended to only crawl a couple hosts |
| 23:23:59 | <andrew> | it started veering off course and crawling some other stuff though, so I added a nice regex to the ignores list |
| 23:24:27 | <@JAA> | Hmm, typically, slowdowns are from the cookie jar, which is horrible once it accumulates cookies from a couple thousand hosts. |
| 23:24:46 | <andrew> | its CPU usage seems to be high when it's parsing through some large pages |
| 23:24:56 | <andrew> | with lots of links |
| 23:25:36 | <@JAA> | I know that grab-site does some things differently than AB in that area but am not familiar with the details, so can't comment on that. |
| 23:25:53 | <@JAA> | HTML parsing is a significant factor overall though. |
| 23:25:58 | <andrew> | is the HTML parsing done in Python |
| 23:26:26 | <@JAA> | I don't know what grab-site does. AB uses libxml2. wpull defaults to html5lib, which is pure Python and *SLOW*. |
| 23:26:42 | <andrew> | well that explains a lot :P |
| 23:27:05 | <@JAA> | But IIRC grab-site uses something else that's supposed to be faster. |
| 23:27:34 | <@JAA> | Yeah, ludios_wpull uses html5-parser. |
| 23:28:51 | <@JAA> | Another thing that can matter is the DB insertion, but that only comes into play when you go to ... hundreds of millions of URLs? |