00:25:01etnguyen03 quits [Client Quit]
00:25:27beardicus (beardicus) joins
00:29:58beardicus quits [Ping timeout: 260 seconds]
00:48:54beardicus (beardicus) joins
00:49:54trix quits [Quit: WeeChat 4.4.3]
00:53:26beardicus quits [Ping timeout: 250 seconds]
01:06:46etnguyen03 (etnguyen03) joins
01:16:38Blueacid quits [Ping timeout: 260 seconds]
01:17:45beardicus (beardicus) joins
01:18:14Blueacid joins
01:47:02<TheTechRobo>Blueacid: Funny coincidence: Just saw in my GitHub feed that someone at IA started a project that does exactly what you asked about :-) https://github.com/internetarchive/doppelganger
01:47:50lennier2 joins
01:51:03lennier2_ quits [Ping timeout: 260 seconds]
01:54:19@JAA waves at corentin.
01:54:23<@JAA>Oh, not here currently.
01:55:11<TheTechRobo>corentin++
01:55:12<eggdrop>[karma] 'corentin' now has 2 karma!
02:02:43nicolas17 quits [Ping timeout: 260 seconds]
02:12:53wanderingperson joins
02:13:59th3z0l4_ quits [Remote host closed the connection]
02:16:45wanderingperson leaves
02:17:28th3z0l4 joins
02:21:57SootBector quits [Remote host closed the connection]
02:22:32SootBector (SootBector) joins
02:27:13beardicus quits [Ping timeout: 260 seconds]
02:32:49beardicus (beardicus) joins
02:55:12nicolas17 joins
03:01:59etnguyen03 quits [Client Quit]
03:14:26etnguyen03 (etnguyen03) joins
03:32:24etnguyen03 quits [Remote host closed the connection]
04:00:57beastbg8_ joins
04:01:12trix (trix) joins
04:05:13beastbg8 quits [Ping timeout: 260 seconds]
04:42:24moth_ quits [Remote host closed the connection]
04:46:34beardicus quits [Ping timeout: 250 seconds]
04:46:58beardicus (beardicus) joins
04:47:00Webuser080263 joins
04:52:32<Webuser080263>Hi, I mentioned the Criticker forums shutting down yesterday, and saw someone put it in Archivebot, thank you! I wanted to mention since I saw it looks like it's gotten stuck on 403s trying to download URLs starting with /film and /tv and maybe also /game when it gets to them, it's just the /forum that is closing. I expect the rest of the site to
04:52:33<Webuser080263>stick around, it's under active development, so if it would help, the job could ignore ^https?://\w+\.criticker\.com/film/.*, ^https?://\w+\.criticker\.com/tv/.*, and ^https?://\w+\.criticker\.com/game/.* if those are all that are 403ing.
04:59:34Webuser080263 quits [Client Quit]
05:01:28<eggdrop>[remind] OrIdow6: machine
05:05:10moth_ joins
05:10:22Island quits [Read error: Connection reset by peer]
05:28:35BlueMaxima quits [Read error: Connection reset by peer]
05:51:34beardicus quits [Ping timeout: 250 seconds]
05:58:53<@arkiver>OrIdow6: if you have access to the right machine, it would be good for us to still archive niconico shunga
06:01:06<@arkiver>JAA: is it easily possible to simply set up a little archivebot node on a machine with the right IP, or route requests through it, for the niconico website?
06:02:38<@JAA>arkiver: Setting it up as a special pipeline that only processes things specifically queued to it would be easy. Routing traffic through it would be messier, I think.
06:04:37<@arkiver>maybe that first thing can be done with the machine with Japanese IP?
06:05:00<@arkiver>i wonder how the Docker situation is nowadays for AB :P
06:05:23beardicus (beardicus) joins
06:05:24<@arkiver>it would be nice if someone can simply spin up a Docker image, give you the relevant information, and you register it with AB for single use
06:05:32<@arkiver>like for a specific project, until taken down again
06:08:46<h2ibot>PaulWise edited ArchiveBot/Ignore (+173, add pinterest ignore from pokechu22): https://wiki.archiveteam.org/?diff=54271&oldid=54169
06:08:50BornOn420 quits [Remote host closed the connection]
06:09:18BornOn420 (BornOn420) joins
06:15:53beardicus quits [Ping timeout: 260 seconds]
06:25:34<@JAA>Sure, I'll happily set that up.
06:36:01SootBector quits [Remote host closed the connection]
06:39:35Webuser855579 joins
06:40:41<Webuser855579>Apologies if this is redundant but it looks like this is shutting down soon: https://www.reddit.com/r/DataHoarder/comments/1i73xq6/blogtalkradiocom_is_shutting_down_on_jan_31st_2025/
06:42:26<@arkiver>JAA: do you mean we have something Docker-ish that is easy to use?
06:44:30beardicus (beardicus) joins
06:48:32<@JAA>arkiver: No, the container situation is unchanged: there are two unofficial hacky setups that kind of work but have quirks.
06:51:37SootBector (SootBector) joins
06:51:46<@JAA>Webuser855579: Thanks!
06:51:58<@JAA>Can't find an official public announcement, but apparently they sent out an email: https://podnews.net/article/blogtalkradio-customer-email
06:52:12<@JAA>Also, they apparently removed a bunch of content a few years ago: https://help.blogtalkradio.com/en/articles/3513987-what-happened-to-my-older-episodes-on-blogtalkradio
07:00:41Webuser855579 quits [Client Quit]
07:02:24<@OrIdow6>arkiver: So we're going with the AB thing or should I still go through with the original plan?
07:02:40<@OrIdow6>Still would need to generate a list of all the relevant images
07:02:52<@OrIdow6>And probably do some postprocessing to get the big thumbnails
07:03:38<@OrIdow6>Thank you JAA and TheTechRobo
07:07:55<@JAA>If it can be grabbed completely with a couple AB jobs like that, it sounds like the path of least resistance.
07:08:57<@OrIdow6>I might play around with qwarc anyway at some point so that wasn't lost
07:09:18<@OrIdow6>Feels like a good starting point to play around with crawl control schemes too
07:10:18<@JAA>I did write a PoC crawler with qwarc at one point. Not at all usable generically, but it did work.
07:11:02<@JAA>But yeah, I was going to say, I don't want to stop anyone from using qwarc. :-)
07:19:01<@OrIdow6>As long as it can happen soon I think it's the path of least resistance yet
07:19:35<@OrIdow6>Depending on site capacity might need to do some initial list generation in a separate script, I didn't see any magic url prefix for shuga as opposed to the whole site
07:20:00<@OrIdow6>"Depending on capacity" since if it supports it may as well do the whole site
07:22:48ducky quits [Ping timeout: 260 seconds]
07:23:13<@OrIdow6>I do see in the #niconino (should we move there BTW?) logs that they gave per-ip 429s on the video site back in 2021
07:23:40<@JAA>Yeah, let's move things there.
07:24:17<@JAA>I found a way to enumerate all episodes on BlogTalkRadio. The IDs go to over 12 million.
07:24:55ducky (ducky) joins
07:29:48ducky quits [Ping timeout: 260 seconds]
07:31:19AlsoHP_Archivist joins
07:31:19HP_Archivist quits [Read error: Connection reset by peer]
07:31:59ducky (ducky) joins
07:32:56<@JAA>An AB job is running now anyway as a fallback.
07:34:41<@JAA>There are sitemaps, which AB didn't like for some reason.
07:35:52<@JAA>Oh yeah, they're malformed, and wpull requires an `<?xml`.
07:38:17lennier2_ joins
07:39:43ducky quits [Ping timeout: 260 seconds]
07:41:03lennier2 quits [Ping timeout: 260 seconds]
07:44:55<@JAA>As expected, it's discovering a bunch of stuff anyway.
07:47:42beardicus quits [Ping timeout: 250 seconds]
07:48:49beardicus (beardicus) joins
07:51:32ducky (ducky) joins
07:51:33qwertyasdfuiopghjkl2 quits [Ping timeout: 260 seconds]
07:55:38beardicus quits [Ping timeout: 260 seconds]
08:01:18ducky quits [Ping timeout: 260 seconds]
08:03:18<@arkiver>OrIdow6: did you mention previously it would be like a few 100 GB?
08:06:22qwertyasdfuiopghjkl2 joins
08:06:53qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
08:07:38beardicus (beardicus) joins
08:11:58beardicus quits [Ping timeout: 250 seconds]
08:24:11BornOn420 quits [Remote host closed the connection]
08:24:27BornOn420 (BornOn420) joins
08:28:16loug8318142 joins
08:30:36<katia>arkiver: I did some work on dockerish archivebot for my own personal use but it’s all undocumented. It works well ime, a control plane archivebot can be run w one command with all its dependencies being happy. The pipeline is also container’d. I have a PR in archivebot for it I should pick up again an document / finish
08:32:01<@arkiver>that is pretty nice!
08:40:34<szczot3k>katia: archivebot on k8s when?
08:49:28<katia>szczot3k, it runs fine on k8s
08:49:51<katia>i was tempted to do a archivebot in kubernetes workshop at 38c3
08:52:04lennier2_ quits [Read error: Connection reset by peer]
08:52:46lennier2_ joins
08:55:21<szczot3k>archivenetes
08:55:37<szczot3k>or kubearchive
09:02:46beardicus (beardicus) joins
09:07:23beardicus quits [Ping timeout: 260 seconds]
09:30:43magmaus3 quits [Ping timeout: 260 seconds]
09:43:47beardicus (beardicus) joins
09:45:34magmaus3 (magmaus3) joins
09:48:10beardicus quits [Ping timeout: 250 seconds]
09:54:15magmaus3 quits [Read error: Connection reset by peer]
09:56:07magmaus3 (magmaus3) joins
10:02:50beardicus (beardicus) joins
10:06:09magmaus3 quits [Read error: Connection reset by peer]
10:06:30magmaus3 (magmaus3) joins
10:07:28beardicus quits [Ping timeout: 260 seconds]
10:26:10beardicus (beardicus) joins
10:29:18ducky (ducky) joins
10:36:16beardicus quits [Ping timeout: 250 seconds]
10:45:45<tzt>Japanese radio station Super! A&G+ is ceasing broadcasting on 2025-03-31 https://www.joqr.co.jp/qr/article/143923/
10:56:57beardicus (beardicus) joins
10:59:45qwertyasdfuiopghjkl2 (qwertyasdfuiopghjkl2) joins
10:59:47qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
11:00:29qwertyasdfuiopghjkl2 (qwertyasdfuiopghjkl2) joins
11:01:43beardicus quits [Ping timeout: 260 seconds]
11:17:25beardicus (beardicus) joins
11:21:46beardicus quits [Ping timeout: 250 seconds]
11:24:28magmaus3 quits [Ping timeout: 260 seconds]
11:29:24magmaus3 (magmaus3) joins
11:30:31magmaus3 quits [Remote host closed the connection]
11:30:51magmaus3 (magmaus3) joins
11:42:59beardicus (beardicus) joins
11:51:35<@arkiver>thanks tzt
11:52:12<@arkiver>tzt: is this the one at https://www.joqr.co.jp/ag/ ?
11:54:42benjins3 quits [Ping timeout: 250 seconds]
12:00:02Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat]
12:02:55Bleo18260072271962345 joins
12:03:10th3z0l4 quits [Read error: Connection reset by peer]
12:04:12th3z0l4 joins
12:11:01benjins3 joins
12:20:16beardicus quits [Ping timeout: 250 seconds]
12:33:33Miori quits [Quit: Ping timeout (120 seconds)]
12:33:42Miori joins
12:46:32SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
12:48:05SkilledAlpaca418962 joins
12:59:38NF885 (NF885) joins
13:07:00NF885 quits [Client Quit]
13:10:02beardicus (beardicus) joins
13:15:18qwertyasdfuiopghjkl2 quits [Ping timeout: 260 seconds]
13:31:04qwertyasdfuiopghjkl2 joins
13:31:32qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:32:19qwertyasdfuiopghjkl2 joins
13:32:47qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:34:13qwertyasdfuiopghjkl2 joins
13:34:41qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:35:31qwertyasdfuiopghjkl2 joins
13:35:59qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:37:31qwertyasdfuiopghjkl2 joins
13:38:00qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:39:04qwertyasdfuiopghjkl2 joins
13:39:33qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:39:57qwertyasdfuiopghjkl2 joins
13:40:26qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:41:01qwertyasdfuiopghjkl2 joins
13:41:30qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:42:36qwertyasdfuiopghjkl2 joins
13:43:05qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:43:16mls quits [Quit: leaving]
13:43:36qwertyasdfuiopghjkl2 joins
13:44:05qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:44:46qwertyasdfuiopghjkl2 joins
13:45:15qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:46:19qwertyasdfuiopghjkl2 joins
13:46:48qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:47:31qwertyasdfuiopghjkl2 joins
13:48:00qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:49:51qwertyasdfuiopghjkl2 joins
13:50:20qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:51:16qwertyasdfuiopghjkl2 joins
13:51:45qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:53:12qwertyasdfuiopghjkl2 joins
13:53:41qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:54:50qwertyasdfuiopghjkl2 joins
13:55:19qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:55:52qwertyasdfuiopghjkl2 joins
13:56:21qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:58:01qwertyasdfuiopghjkl2 joins
13:58:30qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:59:05qwertyasdfuiopghjkl2 joins
13:59:34qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
13:59:56qwertyasdfuiopghjkl2 joins
14:00:25qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
14:01:16qwertyasdfuiopghjkl2 joins
14:01:46qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
14:02:52qwertyasdfuiopghjkl2 joins
14:03:21qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
14:03:38qwertyasdfuiopghjkl2 joins
14:04:06qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
14:05:11qwertyasdfuiopghjkl2 joins
14:05:39qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
14:06:46qwertyasdfuiopghjkl2 joins
14:07:14qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
14:08:01qwertyasdfuiopghjkl2 joins
14:08:29qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
14:08:49qwertyasdfuiopghjkl2 joins
14:11:00ljcool2006_ joins
14:12:05TastyWiener95 quits [Quit: So long, farewell, auf wiedersehen, good night]
14:13:48ljcool2006 quits [Ping timeout: 250 seconds]
14:14:58kansei (kansei) joins
14:18:31ljcool2006__ joins
14:18:31ljcool2006_ quits [Read error: Connection reset by peer]
14:27:00ljcool2006_ joins
14:27:00ljcool2006__ quits [Read error: Connection reset by peer]
14:46:19<nicolas17>what happened with the plan to use a ticket system for things shutting down?
15:31:12<@arkiver>i believe there was some consensus on maybe using a repository on gitea
15:31:20DopefishJustin quits [Remote host closed the connection]
15:31:20<@arkiver>but there was on plan decided for a ticketing system
15:33:43DopefishJustin joins
15:36:28Cronfox quits [Ping timeout: 260 seconds]
15:38:42Cronfox (Cronfox) joins
15:55:01<szczot3k>https://jira.archiveteam.org/
16:23:48`
16:32:16FireFly eyes `
16:33:01<@arkiver>katia? :P
16:33:28<katia>arkiver!
16:33:41<@arkiver>hi :)
16:48:05<FireFly>typical katia behaviour
16:50:12<katia>h
16:50:19<szczot3k>FireFly++
16:50:19<eggdrop>[karma] 'FireFly' now has 1 karma!
17:20:42sec^nd quits [Remote host closed the connection]
17:21:03sec^nd (second) joins
17:39:38adamus1red quits [Quit: SigTerm]
17:42:11adamus1red (adamus1red) joins
17:57:55holbrooke joins
18:26:27Webuser097414 joins
18:27:12Webuser097414 quits [Client Quit]
18:39:15Webuser070235 joins
18:39:38Webuser070235 quits [Client Quit]
19:22:13Webuser848889 joins
19:22:14Webuser848889 quits [Client Quit]
19:32:15<h2ibot>Nulldata edited Deathwatch (+322, /* 2025 */ Added RateBeer): https://wiki.archiveteam.org/?diff=54272&oldid=54241
20:20:16Radzig2 joins
20:22:53Radzig quits [Ping timeout: 260 seconds]
20:22:54Radzig2 is now known as Radzig
20:28:51mls (mls) joins
20:44:48BlueMaxima joins
21:18:01AlsoHP_Archivist quits [Quit: Leaving]
21:18:17HP_Archivist (HP_Archivist) joins
21:31:43beardicus quits [Ping timeout: 260 seconds]
21:40:17Larsenv quits [Read error: Connection reset by peer]
21:40:52Larsenv (Larsenv) joins
21:41:25JayEmbee quits [Quit: WeeChat 2.3]
21:41:50Larsenv quits [Client Quit]
21:42:10beardicus (beardicus) joins
21:42:57loug8318142 quits [Quit: The Lounge - https://thelounge.chat]
21:42:59Larsenv (Larsenv) joins
21:43:34loug8318142 joins
21:46:53beardicus quits [Ping timeout: 260 seconds]
21:58:43G4te_Keep3r34924156 joins
22:03:24sec^nd quits [Remote host closed the connection]
22:03:43sec^nd (second) joins
22:07:55icedice (icedice) joins
22:10:23SootBector quits [Ping timeout: 276 seconds]
22:11:38SootBector (SootBector) joins
22:24:13i_have_n0_idea quits [Quit: Ping timeout (120 seconds)]
22:24:30i_have_n0_idea (i_have_n0_idea) joins
22:30:21<icedice>Very impressive Sailor Moon media archive that was partially archived in 2017, but the HDD ArchiveBot used for it ran out of storage space after about 800 GB iirc: https://missdream.org/
22:45:34etnguyen03 (etnguyen03) joins
23:10:53loug8318142 quits [Client Quit]
23:17:18lennier2_ quits [Ping timeout: 260 seconds]
23:22:22lennier2_ joins
23:47:08HP_Archivist quits [Client Quit]
23:57:17etnguyen03 quits [Client Quit]