00:10:54Megame quits [Quit: Leaving]
00:10:59<nicolas17>brute-forcing manual IDs on crestron
00:11:11<nicolas17>found 102 so far
00:25:38<h2ibot>HadeanEon edited Deaths in 2022 (-448, BOT - Updating page: {{saved}} (216),…): https://wiki.archiveteam.org/?diff=55675&oldid=55458
00:25:39<h2ibot>HadeanEon edited Deaths in 2022/list (-235, BOT - Updating list): https://wiki.archiveteam.org/?diff=55676&oldid=55459
00:58:27nine quits [Quit: See ya!]
00:58:39nine joins
00:58:39nine quits [Changing host]
00:58:39nine (nine) joins
01:00:44<h2ibot>HadeanEon edited Deaths in 2023 (+1236, BOT - Updating page: {{saved}} (181),…): https://wiki.archiveteam.org/?diff=55677&oldid=55499
01:00:45<h2ibot>HadeanEon edited Deaths in 2023/list (+99, BOT - Updating list): https://wiki.archiveteam.org/?diff=55678&oldid=55500
01:14:07dabs quits [Read error: Connection reset by peer]
01:32:53<h2ibot>HadeanEon edited Deaths in 2024 (+917, BOT - Updating page: {{saved}} (219),…): https://wiki.archiveteam.org/?diff=55679&oldid=55503
01:32:54<h2ibot>HadeanEon edited Deaths in 2024/list (+88, BOT - Updating list): https://wiki.archiveteam.org/?diff=55680&oldid=55504
01:39:01nicolas17 quits [Ping timeout: 260 seconds]
01:46:55<h2ibot>HadeanEon edited Deaths in 2025 (+1377, BOT - Updating page: {{saved}} (126),…): https://wiki.archiveteam.org/?diff=55681&oldid=55628
01:46:56<h2ibot>HadeanEon edited Deaths in 2025/list (+117, BOT - Updating list): https://wiki.archiveteam.org/?diff=55682&oldid=55629
02:02:03camrod63 quits [Ping timeout: 258 seconds]
02:19:38<pokechu22>https://github.com/e621ng/e621ng/blob/f2e60832f2d15db0ac3982ca5546bb4aec15563c/app/logical/storage_manager.rb#L192-L203 and https://github.com/e621ng/e621ng/blob/f2e60832f2d15db0ac3982ca5546bb4aec15563c/app/logical/storage_manager.rb#L132-L153 seem to be the main logic on e621 for getting a filename, and we have those columns from the DB export. Based on
02:19:40<pokechu22>https://e926.net/posts/4423159 + https://e926.net/posts/58 + https://github.com/e621ng/e621ng/blob/f2e60832f2d15db0ac3982ca5546bb4aec15563c/config/danbooru_default_config.rb#L119-L125 large images don't actually exist so there's only preview (thumbnails), scaled (reduced size), original, and maybe crop, and the only protected files are deleted ones which we have the MD5 for
02:19:42<pokechu22>but can't actually do anything with.
02:29:03<pokechu22>more explicit about protected == deleted: https://github.com/e621ng/e621ng/blob/f2e60832f2d15db0ac3982ca5546bb4aec15563c/app/models/post.rb#L1446-L1448
03:01:35etnguyen03 (etnguyen03) joins
03:02:46ericgallager quits [Quit: This computer has gone to sleep]
03:21:58<pabs>https://socalpython.org/in-memoriam-michael/
03:26:37Naruyoko5 joins
03:30:26Naruyoko quits [Ping timeout: 260 seconds]
03:35:12ericgallager joins
03:51:05<BlankEclair>https://archive.fart.website/archivebot/viewer/?q=archive.fart.website
03:53:36etnguyen03 quits [Remote host closed the connection]
03:55:50Naruyoko joins
03:58:58Naruyoko5 quits [Ping timeout: 258 seconds]
04:10:57DogsRNice quits [Read error: Connection reset by peer]
04:28:46cm quits [Ping timeout: 260 seconds]
04:35:18ericgallager quits [Client Quit]
05:10:43sec^nd quits [Remote host closed the connection]
05:26:29Naruyoko5 joins
05:30:36Naruyoko quits [Ping timeout: 260 seconds]
05:33:56IceCodeNew|m uploaded an image: (425KiB) < https://matrix.hackint.org/_irc/v1/media/download/AWLO2uWKJzAAqKEHHTEcjxUeA812sQhh9o14ELivS3Uq0vVX-2o57z3UoljTYswAPz9FnXHyZ1qCAafcvbda_tlCfgSafQHAAG1hdHJpeC5vcmcvUm1ydnRkZHpMQklaeFFYWUhyUnhBT3ph >
05:33:58<IceCodeNew|m>Hi there,... (full message at <https://matrix.hackint.org/_irc/v1/media/download/AXMIsyfX9lxnUiVAs9F8s-uT9fEACuvt-4Ozm6xlydkFpx9Vt4UjEGxnBjvQqK-FNuRDgquEeErQ9UTKK87VtyVCfgSafVUgAGhhY2tpbnQub3JnL01Ccm5RRXNiTHBsdFFXckF4b3l4cm5kdg>)
05:34:16IceCodeNew|m uploaded an image: (789KiB) < https://matrix.hackint.org/_irc/v1/media/download/AVChK0Rfq3vBWIQOuDlagGkkCaxcmGG4IMp8R0Ul1a-adLWE9GI0e9wjk4nxh68XetFHbWwzrCQ4LGQ_W1N2aCFCfgSagdaQAG1hdHJpeC5vcmcvb2JvSUZKT29Wdnh6TFR0d0p2UGlJbnpC >
05:41:55<pokechu22>https://dictionary.goo.ne.jp/robots.txt links to https://dictionary.goo.ne.jp/sitemaps/sitemap.xml which seems to have lists of all entries
05:43:02<pokechu22>Do you have a page which says it's shutting down? I know https://blog.goo.ne.jp/ is shutting down
05:43:58<IceCodeNew|m>pokechu22: https://help.goo.ne.jp/help/article/2889/
05:44:32<pokechu22>Thanks
05:44:52IceCodeNew|m uploaded an image: (329KiB) < https://matrix.hackint.org/_irc/v1/media/download/ATdoVJoe5L-OXA85w0OpJaiL9I2gwdwsLWznVNsMuTEH6JixSE1G0qTXG-VAp5tFbsUeQmg2xN1JfY56EeSIajRCfgSbHQXgAG1hdHJpeC5vcmcvV3hJQ3ZIdEpPemZnbnNUR1pqb3lQUERB >
05:46:48<pokechu22>1550863 pages in the sitemaps
05:51:15ericgallager joins
05:51:33<yzqzss>IceCodeNew: are there geo restrictions on certain dictionary.goo.ne.jp pages?
05:53:34IceCodeNew|m sent a code block: https://matrix.hackint.org/_irc/v1/media/download/AWOmV8l7neuakW3ujZ_sxfkyIicX89KR79bPM83XPzwVnh_OCk_mlae0Q1ZvoNkOJB1DAunJaSdxLSrUuyQVP0xCfgSbnHnAAGhhY2tpbnQub3JnL1VqVUROenJrQUp0WURLR2hzdHRvaUlMQg
05:53:44<pokechu22>I'm able to load at least some from the US, and the archivebot job for https://blog.goo.ne.jp/ seems to have been running fine too
05:53:52<IceCodeNew|m>yzqzss: does not look like there are geo restrictions
05:54:26<yzqzss>that's great :)
05:54:39IceCodeNew|m sent a code block: https://matrix.hackint.org/_irc/v1/media/download/AfAqkTskyafklwmFiXYRHPV6EDyoYkwszK-l9COlQhVwqG9ul0xBHsx7ZhZL7cUmYzc6UbFMEKRa_sw_TqMhoJtCfgSbrIQgAGhhY2tpbnQub3JnL3Z4Q1pNeVNHUGxHakN0aW1oS2ljY2tWaQ
05:55:00<IceCodeNew|m>they are doing geo restrictions on the service shuting down page ;-)
05:55:20<IceCodeNew|m>* they are doing geo restrictions on the page of service shuting down notification ;-)
05:57:11<pokechu22>I've queued an archivebot job, which should show up at http://archivebot.com/?initialFilter=goo.ne.jp in a bit in addition to the blog.goo.ne.jp job. It might be a while before it starts though
05:58:45<pokechu22>1,550,863 pages is fairly large but it should be doable in under a month
06:13:18sec^nd (second) joins
06:20:53PredatorIWD25 joins
06:42:59tek_dmn (tek_dmn) joins
06:51:24<pokechu22>hmm, after fetching a few pages, it seems to be redirecting after every request, which doesn't happen to me
06:51:45<pokechu22>I've stopped the job for now to look into why it's doing that (which will probably end up happening tomorrow)
07:01:41<h2ibot>Pokechu22 edited Mailing Lists (+24, https://lists.hefn.org Sympa, not sure if there…): https://wiki.archiveteam.org/?diff=55683&oldid=55561
07:20:18cm joins
07:53:33camrod636 (camrod) joins
08:10:17Dada joins
08:12:28ericgallager quits [Client Quit]
08:26:37Island quits [Read error: Connection reset by peer]
08:31:14ericgallager joins
09:43:44ericgallager quits [Client Quit]
09:51:58funderscore is now known as f_|DSR
09:58:26<hlgs|m>hmm. i've got a weird URL that's had snapshots consistently delayed in showing up on the wayback machine, and now it says it's somehow been captured 10155 times today (my first time saving it today) https://www.pixiv.net/en/artworks/11956385
09:58:47<hlgs|m>(been saving it multiple times cause it's hit-or-miss whether it'll save the actual page contents or just a blank page)
09:58:51<hlgs|m>anyone got any idea what might be happening?
09:59:21<hlgs|m>oh weird, same thing for this one https://www.pixiv.net/en/artworks/92939091, hasn't been delayed but also says it's been captured over 10k times
09:59:37<hlgs|m>OH my bad, it's the host, not the page. interesting
09:59:45<hlgs|m>guess i'll try tomorrow. no clue why that first one keeps getting delayed though
10:30:21Deewiant quits [Remote host closed the connection]
10:31:18<h2ibot>Hans5958 edited Voice of America (+10, Grab repo is private): https://wiki.archiveteam.org/?diff=55684&oldid=55187
10:31:22Deewiant (Deewiant) joins
10:33:02pedantic-darwin quits [Ping timeout: 258 seconds]
10:33:26pedantic-darwin joins
10:44:05<c3manu>hlgs|m: https://web.archive.org/web/20250516161204/https://www.pixiv.net/en/artworks/92939091 is telling me there are "6 captures"
10:44:31<c3manu>and it seems to be loading a while. let's see whether the blank page will disappear eventually
10:44:34<hlgs|m>should definitely be more, it's missing at least 3 that were delayed, no idea when they'll show up
10:44:51<hlgs|m>(sorry, that's the other one, that one has all of them)
10:45:08<hlgs|m>it'll just be blank. it either saves a blank page, a page with an error, or the actual page, so i've just been trying until it gets it
10:45:30<hlgs|m>but i guess i can't today cause the host has been saved too much
10:46:09<hlgs|m>i've managed to get the other few dozen like this, these are just the last two. just a bit inconvenient, but i'll get them eventually
10:49:22<c3manu>hlgs|m: so which one is *not* working you say? just so i can try
10:50:24<hlgs|m>both are currently saying the host has been captured over 10k times today, so i can't save them again to hopefully grab the actual page, but this one https://www.pixiv.net/en/artworks/11956385 is also missing some captures i've done over the past couple of days that said they were delayed in registering and still haven't showed up
10:50:45<c3manu>ooh, "the host". that's different than the individual URL. right
10:50:51Mateon1 quits [Ping timeout: 260 seconds]
10:51:43<c3manu>yeah i'm getting the same message trying to archive it
10:51:49<hlgs|m>yeah, took me a second to realise
10:51:59<hlgs|m>nods. will try tomorrow i guess
10:53:06<c3manu>hlgs|m: https://www.pixiv.net/en/artworks/92939091 itself does not work for me as well (meaning outside of the WBM)
10:53:09<c3manu>"R-18 works cannot be displayedYou need a pixiv account to view this work. It won't be displayed to users under 18."
10:54:00<hlgs|m>ah, i think it's got a couple mature images. that's fine though, i've got the images themselves saved (and the urls, which use the artwork ID), i'm trying to get the page so there'll be an archive of who the artist was
10:54:35<c3manu>that's a good idea
10:55:06<hlgs|m>i should go back and save the artist's pages themselves later actually, so it'll be easier to find the urls of their works
10:55:17<c3manu>but it would also be cool if we had another way to grab the unrated ones
10:56:12<hlgs|m>nods. i can see everything logged in, but afaik the only way to grab a page like that would have my account info in the corner, so not an option i'm too fond of hah
10:57:09<c3manu>hlgs|m: and you would have to somehow feed it your session cookie or something. neither #archivebot nor SPN can do that, i think
10:57:34<hlgs|m>iirc the method i'd heard of is uploading a WACZ(?) file grabbed by other means, or something similar
10:58:03<hlgs|m>c3manu: (unrelated, but how long do archivebot uploads usually take? went back to check if that imgbox batch from last week is on the WBM now, but nothing yet)
10:58:10<c3manu>we have an option that sometimes can work around aggressive buttflare configurations (until it doesn't, i guess), but i don't think repeatedly feeding small list of URLs nor grabbing all of pixiv.net that way would be a good option ^^"
10:58:28<hlgs|m>lol
10:58:39Mateon1 joins
10:58:40<c3manu>shouldn't take longer than a few days.
10:59:32<hlgs|m>i've been doing the former over the past couple of days cause i'm stubborn, has worked at least, but it does mean resaving every hour up to 5 times lol
10:59:33<hlgs|m>c3manu: hmm, i think it's been about 5 now
11:00:02Bleo182600722719623455 quits [Quit: The Lounge - https://thelounge.chat]
11:02:46Bleo182600722719623455 joins
11:03:19<c3manu>imgbox batch?
11:04:25<c3manu>i remember some pixiv, deviantart and artstation ones: https://transfer.archivete.am/inline/RLSX5/saveimage.txt
11:05:37Mateon1 quits [Ping timeout: 258 seconds]
11:07:54<hlgs|m>the pixiv and artstation are good, deviantart links still haven't saved and there was also https://transfer.archivete.am/inline/anOv3/imgbox.txt
11:09:52<c3manu>lol
11:10:00<c3manu>that one's on me
11:10:08<hlgs|m>ah, whoops
11:10:23<hlgs|m>any idea why the deviantart ones still aren't up?
11:11:13<c3manu>i just !ao'd the list, not actually grabbing its contents with !ao <
11:12:29<c3manu>hlgs|m: the ones from the imgbox list aren't because doofus me forgot to actually grab the lists contents
11:12:35<c3manu>the other ones should be there: https://web.archive.org/web/20250512153612/https://www.deviantart.com/revenantanime/art/whimsicott-384395716
11:12:51<hlgs|m>oh whoops, i keep forgetting ^^;
11:13:31<c3manu>all good, i obviously know the feeling ;)
11:13:34<hlgs|m>oh, my bad, those ones did save! i was thinking of the ones in the imgbox list, forgot i'd stuck them on there (didn't want to have a file with like 4 links)
11:14:20<hlgs|m>thanks for being so helpful!! it really means a lot, makes this whole process a lot less stressful
11:15:15Mateon1 joins
11:16:13<c3manu>glad we can be of help :)
11:16:36<c3manu>the list https://transfer.archivete.am/anOv3/imgbox.txt just finished, and i can't see any errors in the logs. so they should be there in a few days from now ^^
11:16:37<hlgs|m>y'all are heroes
11:16:37<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/anOv3/imgbox.txt
11:17:44<hlgs|m>thank you!
11:18:22<c3manu>funny. my heroes are the madmen (and madlasses) hosting and managing that whole archive :D
11:18:34<c3manu>you're welcome
11:23:22<hlgs|m>them too, of course!
11:32:55MrMcNuggets (MrMcNuggets) joins
11:54:03grill (grill) joins
11:56:54icedice quits [Quit: Leaving]
12:04:20ericgallager joins
12:19:56ducky quits [Read error: Connection reset by peer]
12:21:29ducky (ducky) joins
13:02:58IDK (IDK) joins
13:04:56MrMcNuggets quits [Read error: Connection reset by peer]
13:10:57etnguyen03 (etnguyen03) joins
13:12:08MrMcNuggets (MrMcNuggets) joins
13:16:25lunik1 quits [Quit: :x]
13:16:56lunik1 joins
13:19:03<qwertyasdfuiopghjkl2>c3manu: You *can* actually specify a cookie for Save Page Now if you use the SPN2 API: https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit . For pixiv, there's an alternate frontend https://pixivfe-docs.pages.dev/instance-list/ which apparently doesn't require login for images, though that seems a bit broken when I try the 92939091 one.
13:23:49<h2ibot>Exorcism edited Discourse/archived (+92): https://wiki.archiveteam.org/?diff=55685&oldid=55166
13:26:31sec^nd quits [Remote host closed the connection]
13:26:54sec^nd (second) joins
13:32:43ericgallager quits [Client Quit]
13:41:38ericgallager joins
13:45:11etnguyen03 quits [Client Quit]
13:55:14<BlankEclair>btw, re pixiv, the web mobile api gives image urls for r-18 works without authentication
13:59:08Wohlstand (Wohlstand) joins
14:06:16grill quits [Ping timeout: 260 seconds]
14:18:31beardicus quits [Ping timeout: 260 seconds]
14:30:29<c3manu>oh, nice. ^cc hlgs|m
14:55:00Larsenv quits [Remote host closed the connection]
14:55:27Larsenv (Larsenv) joins
14:58:32ericgallager quits [Client Quit]
15:07:31ducky quits [Ping timeout: 260 seconds]
15:10:53<hlgs|m>ohh interesting! thanks for the info!
15:12:17<hlgs|m>hmm. weird behaviour with some tumblr urls now. trying to save three images, but it says they've already been saved once today? not sure how since the post is deleted, and the WBM says they haven't been archived... https://41.media.tumblr.com/fd5eab5d1a55764b30f54af5f202881d/tumblr_nhvcuqcEvJ1rs5fmzo2_1280.png https://36.media.tumblr.com/cc6202477ee66585498166922cc26091/tumblr_nhvcuqcEvJ1rs5fmzo3_1280.png
15:12:17<hlgs|m>https://41.media.tumblr.com/d87bc2125a48e3927380e63b3da6ab36/tumblr_nhvcuqcEvJ1rs5fmzo1_1280.png
15:12:42ducky (ducky) joins
15:15:54pabs quits [Read error: Connection reset by peer]
15:16:27pabs (pabs) joins
15:24:39etnguyen03 (etnguyen03) joins
15:27:06ericgallager joins
15:36:46etnguyen03 quits [Client Quit]
15:40:57VerifiedJ quits [Remote host closed the connection]
15:41:31VerifiedJ (VerifiedJ) joins
15:44:35Mateon2 joins
15:46:59Mateon1 quits [Ping timeout: 258 seconds]
15:46:59Mateon2 is now known as Mateon1
15:59:33Wohlstand quits [Client Quit]
16:07:58FiTheArchiver joins
16:09:51FiTheArchiver1 joins
16:14:01FiTheArchiver quits [Ping timeout: 260 seconds]
16:28:54nine quits [Quit: See ya!]
16:29:06nine joins
16:29:06nine quits [Changing host]
16:29:06nine (nine) joins
16:39:49FiTheArchiver1 quits [Client Quit]
16:45:00ericgallager quits [Client Quit]
16:53:57beardicus (beardicus) joins
17:17:37Mateon2 joins
17:19:45Mateon1 quits [Ping timeout: 258 seconds]
17:19:45Mateon2 is now known as Mateon1
17:20:48grill (grill) joins
17:22:07etnguyen03 (etnguyen03) joins
17:56:41@rewby quits [Ping timeout: 260 seconds]
18:10:32<h2ibot>Exorcism edited Discourse/archived (+101): https://wiki.archiveteam.org/?diff=55686&oldid=55685
18:25:51camrod636 quits [Ping timeout: 260 seconds]
18:29:58BornOn420 quits [Remote host closed the connection]
18:30:40BornOn420 (BornOn420) joins
18:38:48etnguyen03 quits [Client Quit]
18:54:28<pokechu22>Snivy, chrismrtn: I've generated URL lists for e621 and am running one in #archivebot now. These are only for the full-resolution images (the reduced resolution ones are also possible, but they seem like a lower priority). No login is needed for any images, except for deleted ones; the blacklist login check only applies to the frontend.
18:55:13<pokechu22>However, I don't see post comments in any of the things on https://e621.net/db_export/ - post descriptions *are* there, but comments don't seem to be. We'd need to scrape the frontend for those.
18:58:11tzt quits [Read error: Connection reset by peer]
18:58:54tzt (tzt) joins
19:02:41Megame (Megame) joins
19:04:47grill quits [Ping timeout: 258 seconds]
19:07:15DogsRNice joins
19:11:22itachi1706 quits [Quit: Bye :P]
19:17:45itachi1706 (itachi1706) joins
19:43:30nine quits [Ping timeout: 258 seconds]
19:44:36cm quits [Ping timeout: 260 seconds]
19:47:27Matthww quits [Quit: The Lounge - https://thelounge.chat]
19:47:34cm joins
19:56:44etnguyen03 (etnguyen03) joins
20:03:16cancername (cancername) joins
20:03:49nine joins
20:03:49nine quits [Changing host]
20:03:49nine (nine) joins
20:04:46Matthww joins
20:13:03rewby (rewby) joins
20:13:03@ChanServ sets mode: +o rewby
20:33:31ericgallager joins
20:41:46Riku_V quits [Ping timeout: 260 seconds]
20:42:06Riku_V (riku) joins
20:47:08Riku_V quits [Ping timeout: 258 seconds]
20:57:47Riku_V (riku) joins
21:02:28Riku_V quits [Ping timeout: 258 seconds]
21:04:58Riku_V (riku) joins
21:14:41etnguyen03 quits [Client Quit]
21:15:00etnguyen03 (etnguyen03) joins
21:24:46etnguyen03 quits [Client Quit]
21:40:14ericgallager quits [Client Quit]
21:42:24Dada quits [Remote host closed the connection]
22:05:46f_|DSR quits [Remote host closed the connection]
22:05:49f_|DSR (funderscore) joins
22:11:06dabs joins
22:16:38f_|DSR quits [Remote host closed the connection]
22:16:41f_|DSR (funderscore) joins
22:17:37BornOn420 quits [Ping timeout: 240 seconds]
22:31:11BornOn420 (BornOn420) joins
22:33:08Jens quits []
22:33:38Jens (JensRex) joins
22:44:09Doranwen quits [Quit: bbl]
22:44:31systwi_ quits [Quit: systwi_]
22:48:56nothere quits [Ping timeout: 260 seconds]
22:58:12Island joins
23:10:44nothere_ joins
23:37:53ericgallager joins
23:53:03camrod636 (camrod) joins
23:58:32hexa quits [Quit: WeeChat 4.4.3]