00:08:47kuroger quits [Quit: ZNC 1.9.1 - https://znc.in]
00:11:04nyakase quits [Remote host closed the connection]
00:11:13kuroger (kuroger) joins
00:12:43nyakase (nyakase) joins
00:23:46kuroger quits [Client Quit]
00:27:13kuroger (kuroger) joins
00:34:01egallager joins
00:34:01etnguyen03 quits [Client Quit]
00:47:45FiTheArchiver quits [Read error: Connection reset by peer]
00:58:20lennier2 quits [Read error: Connection reset by peer]
00:58:36lennier2 joins
01:09:33Bleo18260072271962345 quits [Quit: Ping timeout (120 seconds)]
01:09:47Bleo18260072271962345 joins
01:13:48adamus1red quits [Quit: SigTerm]
01:15:18etnguyen03 (etnguyen03) joins
01:17:33adamus1red (adamus1red) joins
01:20:33BornOn420 quits [Remote host closed the connection]
01:21:09BornOn420 (BornOn420) joins
01:30:50Ryz quits [Read error: Connection reset by peer]
01:31:40Ryz (Ryz) joins
01:40:51@Fusl quits [Ping timeout: 260 seconds]
01:53:31Fusl (Fusl) joins
01:53:31@ChanServ sets mode: +o Fusl
02:12:57notarobot1 joins
02:17:08gust quits [Read error: Connection reset by peer]
02:19:55etnguyen03 quits [Client Quit]
02:33:06etnguyen03 (etnguyen03) joins
02:38:34<Gareth48>pokechu22 any way around the issue? Can archivebot be reconfigured to go depth first or can we possibly tweak its methodology to grab images?
02:38:51<Gareth48>Since I agree getting the images off this site is fairly important
02:39:12<pokechu22>There isn't any good way of doing that with archivebot. Not sure if JAA can do it with pull or something
02:40:09<Gareth48>I guess its good I'm attempting my own scrape though I'd much rather add what I have to a wayback machine compatible one than have a huge dump on my computer in an unuseable format
02:40:59<Gareth48>Thanks TheTechRobo I'll take a look into Matrix since I'm going to need the chat history visible
02:48:02etnguyen03 quits [Remote host closed the connection]
02:51:34gareth48|m joins
02:51:46<gareth48|m>Okay this should be from my Matrix clien
02:51:57<gareth48|m>s/clien/client/
02:52:59<gareth48|m>Okay note to self: not all features are going to be supported.
03:00:03<gareth48|m>In theory if you all ping me and I'm offline I'll see it now, going to test that real quick
03:00:48<Gareth48>gareth48|m Hello from the real IRC client
03:01:24<gareth48|m>That works, awesome! Okay now anyone can ping me and I'll see it overnight. Thanks for the recommendation TheTechRobo
03:16:10Webuser457589 joins
03:16:25Webuser457589 quits [Client Quit]
03:44:00kuroger quits [Quit: ZNC 1.9.1 - https://znc.in]
03:49:57kuroger (kuroger) joins
04:10:46Island quits [Read error: Connection reset by peer]
04:13:50Naruyoko5 joins
04:17:46Naruyoko quits [Ping timeout: 260 seconds]
05:27:34<@arkiver>projects are suffering currently due to a target not functioning correctly
05:27:55Naruyoko joins
05:31:16Naruyoko5 quits [Ping timeout: 260 seconds]
05:32:10Naruyoko5 joins
05:34:56Naruyoko quits [Ping timeout: 250 seconds]
05:40:31ThetaDev quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
05:40:55ThetaDev joins
05:47:56BlueMaxima quits [Ping timeout: 250 seconds]
06:11:47egallager quits [Quit: This computer has gone to sleep]
06:37:06<datechnoman>Thanks for the update
06:37:16<datechnoman>Was thinking it was IA. Not them for once :P
06:46:46Naruyoko joins
06:49:26Naruyoko5 quits [Ping timeout: 260 seconds]
07:03:59loug83181422 joins
07:16:55loug83181422 quits [Read error: Connection reset by peer]
07:17:20loug83181422 joins
07:17:28Gareth48 quits [Quit: Ooops, wrong browser tab.]
07:29:01Naruyoko5 joins
07:33:11Naruyoko quits [Ping timeout: 260 seconds]
07:48:21Shyy4 quits [Ping timeout: 260 seconds]
08:16:44Ketchup901 quits [Remote host closed the connection]
08:17:18Ketchup901 (Ketchup901) joins
08:23:43<@JAA>pokechu22, gareth48|m: Hmm, unless there's heavy rate limiting or similar, we should be able to do the <10k item pages quickly enough to not have images expire, maybe? Or do it in a couple chunks. That won't help with the shop and category listings etc., but at least it'd cover the critical part.
09:20:55hexa- quits [Ping timeout: 272 seconds]
09:36:37hexa- (hexa-) joins
09:48:47hexa- quits [Ping timeout: 272 seconds]
10:11:39FiTheArchiver joins
10:15:10hexa (hexa-) joins
10:47:06dendory3 joins
10:49:06dendory quits [Ping timeout: 250 seconds]
10:49:06dendory3 is now known as dendory
10:58:53Webuser959856 joins
10:59:50Webuser959856 quits [Client Quit]
11:00:04Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat]
11:02:54Bleo18260072271962345 joins
11:03:31BornOn420 quits [Ping timeout: 276 seconds]
11:13:46BornOn420 (BornOn420) joins
11:16:36@imer quits [Ping timeout: 260 seconds]
11:31:44imer (imer) joins
11:31:44@ChanServ sets mode: +o imer
11:34:20@imer quits [Killed (NickServ (GHOST command used by imer6))]
11:34:32imer (imer) joins
11:34:32@ChanServ sets mode: +o imer
11:58:37egallager joins
13:16:19snel joins
13:54:02ikki quits [Quit: Going offline, see ya! (www.adiirc.com)]
13:55:37<h2ibot>Liuxinyu970226 edited Deathwatch (+696, /* 2019 */): https://wiki.archiveteam.org/?diff=55001&oldid=54975
13:58:28<@arkiver>bzc6p is handling indafoto
14:00:36<gareth48|m>JAA: There isn't a huge rate limit that I know of, however the tokens expire 10 minutes from their point of generation. If the archivebot is breadth first by the time it gets to 90% of the images it'll already be too late to download them. It would need to prioritize the images on a per page basis as it goes versus creating a giant cache of pages to dump and iterating on that (which is how I've been told it currently works). If you look
14:00:36<gareth48|m>at the https://store.vket.com/en job you'll notice ~50% or so of all requests are failing and it's basically all images. If the archive bot can do something like that in any way that'll be the strategy.
14:00:36<gareth48|m>Another issue that might be relevant is that most of the carousel images on the shop page are added to the DOM after the site loads (unpacked from a weird data block). This was a problem that tripped my own webscraper up before I modified it to extract the tokens from the block before handling them before the images were properly initialized. Not sure if the way you all are downloading sites will have the same weakness but I figured I'd
14:00:36<gareth48|m>mention it. Let me know your thoughts!
14:06:38<@JAA>gareth48|m: Yeah, my point is that grabbing a few thousand pages should be feasible within 10 minutes, so the images wouldn't be expired by the time it finishes with those. Prioritisation isn't currently possible with AB.
14:07:30<@JAA>This would be a separate !ao < job specifically for the item pages and their page requisites only.
14:07:57<@JAA>Well, or jobs, depending on whether we can safely do the 10k quickly enough or need to split it up.
14:08:50<@JAA>I can take a look at this several hours from now (unless pokechu22 does it earlier).
14:10:54<gareth48|m>JAA: Okay that makes sense, sounds like the way to go, thanks for breaking it down for me. I agree, if you focused just on archiving a few sets of pages, i.e. the gallery pages for default tags and all 8000 product pages, that would get probably 90% of the way there. Appreciate y'all looking into this. I'll keep monitoring the jobs as they go and let you know what is and isn't working. Feel free to ping me when the jobs start, I have
14:10:54<gareth48|m>element watching this channel so I'll see it.
14:11:08<gareth48|m>* way there in 10 minutes. Appreciate
14:16:32VoynichCR (VoynichCR) joins
14:17:02SootBector quits [Remote host closed the connection]
14:17:25SootBector (SootBector) joins
14:21:29<@arkiver>i don't think arzon.jp is going to finish in time
14:21:40<@arkiver>JAA: it's sequential IDs mostly, is it something for qwarc perhaps?
14:21:44<@arkiver>else i'll set a project up for it
14:26:42<h2ibot>VoynichCr created WikiBot (+21, Redirected page to [[Wikibot]]): https://wiki.archiveteam.org/?title=WikiBot
15:04:29ducky quits [Ping timeout: 260 seconds]
15:05:32ducky (ducky) joins
15:14:19DopefishJustin quits [Remote host closed the connection]
15:23:27<nulldata>arkiver - did you see my message here a few days ago regarding indafoto.hu? We may need a project for it too as the AB job isn't going to finish in time at its current rate.
15:24:29snel quits [Client Quit]
15:24:48<nulldata>Oh never mind - according to the wiki bzc6p is grabbing them
15:25:14DopefishJustin joins
15:25:55<@arkiver>nulldata: yeah bzc6p is working on archiving it
15:26:08<@arkiver>i'm in contact with them, will check in tomorrow with them and see if we do need a project for it
15:26:27<nulldata>Thanks! :)
15:26:51VoynichCR quits [Client Quit]
15:27:31<@arkiver>!remindme 10h indafoto bzc6p
15:27:32<eggdrop>[remind] ok, i'll remind you at 2025-03-21T01:27:32Z
15:37:51<nyuuzyou>nuum.ru (ex wasd.tv) shuts down on June 1 - https://www.rbc.ru/technology_and_media/20/03/2025/67daa7c39a79472d3feee484
16:45:23sparky14925 (sparky1492) joins
16:47:26kuroger quits [Quit: ZNC 1.9.1 - https://znc.in]
16:48:46sparky1492 quits [Ping timeout: 250 seconds]
16:48:47sparky14925 is now known as sparky1492
16:55:55kuroger (kuroger) joins
16:56:14kansei quits [Quit: ZNC 1.9.1 - https://znc.in]
17:08:41kansei (kansei) joins
17:13:35sparky14922 (sparky1492) joins
17:13:43NeonGlitch (NeonGlitch) joins
17:17:41sparky1492 quits [Ping timeout: 260 seconds]
17:17:41sparky14922 is now known as sparky1492
18:06:53ATWF_notcarl joins
18:24:43VoynichCR (VoynichCR) joins
18:49:00Island joins
18:49:57FiTheArchiver quits [Quit: Leaving]
19:18:01nyakase quits [Remote host closed the connection]
19:20:28nyakase (nyakase) joins
19:25:12BennyOtt quits [Ping timeout: 250 seconds]
19:27:08BennyOtt (BennyOtt) joins
19:42:00Jake quits [Quit: Leaving for a bit!]
19:45:14Webuser161207 quits [Quit: Ooops, wrong browser tab.]
19:45:56sec^nd quits [Remote host closed the connection]
19:46:10sec^nd (second) joins
19:50:46Webuser603791 joins
19:51:20<pokechu22>VoynichCR: IIRC pabs ran a bunch of tuxfamily stuff in #archivebot
19:52:03<VoynichCR>nice
19:57:26SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
20:06:30SkilledAlpaca418962 joins
20:23:03nyakase quits [Remote host closed the connection]
20:27:09nyakase (nyakase) joins
20:27:34nyakase quits [Remote host closed the connection]
20:30:28nyakase (nyakase) joins
20:34:54<h2ibot>Tech234a edited Deathwatch (+352, /* 2025 */ Add Chrome Web Store Manifest V2): https://wiki.archiveteam.org/?diff=55004&oldid=55001
20:36:54<h2ibot>Tech234a edited Chrome Web Store (+114, Minor updates to timeline): https://wiki.archiveteam.org/?diff=55005&oldid=52369
20:36:55<steering>VoynichCR: https://lwn.net/Articles/1004988/
20:37:37<steering>looks like a bunch was done in jan-feb around it
20:39:12<tech234a>re Chrome Web Store: as a note https://chrome-stats.com/manifest-v3-migration indicates that about 1/3 of extensions on the Chrome Web Store still haven't been migrated to Manifest V3
20:39:13<steering>(AB and down-the-tube mostly)
20:42:20ThreeHM quits [Ping timeout: 250 seconds]
20:44:12ThreeHM (ThreeHeadedMonkey) joins
20:52:50SkilledAlpaca418962 quits [Client Quit]
21:01:36SkilledAlpaca418962 joins
21:05:28VoynichCR quits [Client Quit]
21:06:30<@JAA>arkiver: I can take a look at Arzon tomorrow. You say 'mostly' sequential; what does that mean? Is there anything important to grab beyond /item_${id}.html + the images?
21:32:42gust joins
21:42:02gust quits [Remote host closed the connection]
21:42:21gust joins
21:50:42chrismeller quits [Quit: chrismeller]
21:51:14chrismeller (chrismeller) joins
22:06:26tek_dmn quits [Ping timeout: 260 seconds]
22:12:58NeonGlitch quits [Quit: My Mac Mini has gone to sleep. ZZZzzz…]
22:16:31tek_dmn (tek_dmn) joins
22:22:31etnguyen03 (etnguyen03) joins
22:23:35<joepie91|m>errrr. wget-at is probably not supposed to be segfaulting...?
22:23:39<joepie91|m>(in the context of the warrior docker container)
22:23:51<joepie91|m>or dumping core at least
22:24:22<@JAA>That's not very typical, I'd like to make that point.
22:24:53<@JAA>There have been reports of it before, but it's not supposed to happen, yeah.
22:26:57<joepie91|m>the stacktrace is incredibly unhelpful but it is constantly coredumping on at least one VPS of mine
22:27:03<joepie91|m>#0 0x00007f8e8f6b7ebc n/a (/usr/lib/x86_64-linux-gnu/libc.so.6 + 0x8aebc)
22:27:05<joepie91|m>that's the only line in the stacktrace
22:35:16<@JAA>Hmm
22:36:02<@JAA>Well, if it's reproducible, that's a start at least.
22:36:39ATinySpaceMarine quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
22:37:28ATinySpaceMarine joins
22:40:44<@JAA>Could be worth a debugging session (without communication with the real tracker etc., of course).
22:40:47<@JAA>Cc arkiver
22:58:26lennier2_ joins
23:01:00lennier2 quits [Ping timeout: 250 seconds]
23:13:30etnguyen03 quits [Client Quit]
23:48:16etnguyen03 (etnguyen03) joins