00:00:06HP_Archivist quits [Ping timeout: 240 seconds]
00:01:52HP_Archivist (HP_Archivist) joins
00:02:49<jodizzle>TheTechRobo: I think I did something like that once or twice a while back, but I unfortunately don't remember the details. The results were also ambiguous.
00:03:10<TheTechRobo>"ambiguous"...?
00:03:18<TheTechRobo>What do you mean?
00:04:48<jodizzle>Like I wasn't sure whether everything properly resumed, I was just doing some messy grabs. Related to the discussion on this ticket (which I think you've seen): https://github.com/ArchiveTeam/grab-site/issues/58
00:05:20<jodizzle>I think my approach probably involved using grab-site with `--which-wpull-command` and trying to fiddle with the wpull command that popped out as best I could
00:09:57Arcorann (Arcorann) joins
00:11:37<TheTechRobo>What's the argument for resuming a crawl with wpull?
00:20:05<ivan>grab-site is actually a wpull launcher
00:22:25<ivan>well, it's a ludios_wpull launcher heh
00:22:38shufflertoxin quits [Remote host closed the connection]
00:26:02<TheTechRobo>ivan: I meant, what's the _option_ to resume a crawl ^^"
00:31:04HP_Archivist quits [Ping timeout: 265 seconds]
00:37:17<ivan>I think wpull might not need a special option for that beyond all the arguments originally used
00:37:28<ivan>but my memory is hazy so make a backup
00:38:52<@JAA>Note that if cookies are involved, things might get messy on the resumption.
00:42:00shufflertoxin (shufflertoxin) joins
00:45:48HP_Archivist (HP_Archivist) joins
00:46:04HP_Archivist quits [Remote host closed the connection]
00:47:16HP_Archivist (HP_Archivist) joins
00:47:21driib798942 (driib) joins
00:47:34HP_Archivist quits [Remote host closed the connection]
00:48:46HP_Archivist (HP_Archivist) joins
00:49:04HP_Archivist quits [Remote host closed the connection]
00:50:16HP_Archivist (HP_Archivist) joins
00:50:34HP_Archivist quits [Remote host closed the connection]
00:50:46driib79894 quits [Ping timeout: 240 seconds]
00:50:46driib798942 is now known as driib79894
00:51:47HP_Archivist (HP_Archivist) joins
00:52:04HP_Archivist quits [Remote host closed the connection]
00:53:16HP_Archivist (HP_Archivist) joins
00:53:34HP_Archivist quits [Remote host closed the connection]
00:54:47HP_Archivist (HP_Archivist) joins
00:55:04HP_Archivist quits [Remote host closed the connection]
00:56:16HP_Archivist (HP_Archivist) joins
01:00:47HP_Archivist quits [Ping timeout: 252 seconds]
01:21:25HP_Archivist (HP_Archivist) joins
01:31:05MisterCheezeCake joins
01:52:05sec^nd quits [Remote host closed the connection]
01:52:07scowlee quits [Ping timeout: 252 seconds]
01:53:17sec^nd (second) joins
01:53:20scowlee (scowlee) joins
02:03:27<@OrIdow6>Might think about writign MyAnimeFigures as a warrior project
02:03:46rm3 joins
02:04:27<@OrIdow6>First mention of it and I already got the name wrong, fantastic
02:06:32<@OrIdow6>I see the domain expires in June
02:07:49<thuban>are they closing? or have they just... not renewed it yet?
02:08:40<@OrIdow6>According to the post Arcorann linked in #at, site experiencing errors and the owner has disappeared
02:08:52<@OrIdow6>Issues, I should say
02:09:19<thuban>i don't idle in #at and i always forget that stuff happens there v_v
02:09:44<thuban>i should configure weechat to merge it into this buffer...
02:38:16<TheTechRobo>ivan: In that case, what prevents grab-site from just not creating the new control files etc if the destination path already has those files?
02:39:09HP_Archivist quits [Ping timeout: 265 seconds]
02:47:08<ivan>TheTechRobo: ludios_wpull doesn't create control files, grab-site main.py does
02:47:30<TheTechRobo>I realise that, hence why it would fall onto grab-site rather than wpull
02:48:46<ivan>oh, yeah, grab-site could avoid creating control files if it were fixed and stuff, but it wouldn't solve the resumption problem because of the other issues mentioned in the github issue
02:49:42<TheTechRobo>Right, and I understand, but it's not often that cookies actually matter enough, at least for the stuff I'm archiving
02:49:53<TheTechRobo>unless you were reffering to other issues
02:49:54<ivan>there's the other issue of the resume taking forever
02:50:03<TheTechRobo>Right, forgot about that one
02:50:55<ivan>it would be better to write a new archiving system that used a lot of headless chrome to run singlefile in many tabs and deduplicate the images and use custom zstd dictionaries to compress the rest
02:51:16<TheTechRobo>Isn't that just brozzler but with more features?
02:51:25<TheTechRobo>(and multiple tabs?)
02:51:48<@JAA>(And no WARCs)
02:52:02<@JAA>:-(
02:52:04<ivan>we could get some HARs lol
02:52:09<TheTechRobo>HARs?
02:52:10@JAA slaps ivan around a bit with a large trout
02:52:29<@JAA>In any case, putting warcprox in front of a browser is easy.
02:53:38<TheTechRobo>Still, I'd rather take a while to resume the crawl than to have to start over
02:53:52<TheTechRobo>Yes, I could use the wpull commands, but are most people going to want to do that?
02:54:48<ivan>https://en.wikipedia.org/wiki/HAR_(file_format) is HAR
02:55:11<TheTechRobo>Why not just use WARC?
02:55:30<ivan>something in Chromium has code to dump HARs, but not WARCs :)
02:55:32<TheTechRobo>...Or, I guess browsers don't have the metadata still, for the most part
02:55:35<TheTechRobo>ye ah
02:56:28<ivan>I wouldn't block a totally correctly written PR that enables resumption in grab-site
02:58:30<TheTechRobo>would it just be not trying to overwrite the control files and stuff?
02:58:51<ivan>something like that. perhaps an --argument to use an existing directory
02:59:10<TheTechRobo>Well, that's already done with --path, no? :-)
02:59:11<ivan>wpull_hooks.py also has a few things that need to be fixed as they open in "w" instead of "a"
02:59:24<ivan>TheTechRobo: sure, I just can't remember how grab-site works
03:01:38<@JAA>wpull also overwrites the log file IIRC. So if the previous run crashed and didn't write the log file to the WARC, you'd lose that.
03:04:23<@JAA>I.e. you'd want to change the log filename to something that doesn't exist yet to avoid that.
03:07:04AnotherIki quits [Read error: Connection reset by peer]
03:39:08MisterCheezeCake quits [Remote host closed the connection]
03:42:09sonick quits [Client Quit]
04:26:07Stiletto quits [Ping timeout: 252 seconds]
04:33:49march_happy quits [Ping timeout: 252 seconds]
04:35:04march_happy (march_happy) joins
04:46:06march_happy quits [Ping timeout: 240 seconds]
04:47:40march_happy (march_happy) joins
04:50:13shufflertoxin quits [Remote host closed the connection]
05:01:06march_happy quits [Ping timeout: 240 seconds]
05:43:36march_happy (march_happy) joins
06:31:26lennier1 quits [Ping timeout: 240 seconds]
06:32:43lennier1 (lennier1) joins
07:01:43sonick (sonick) joins
07:14:20shufflertoxin (shufflertoxin) joins
07:37:06MuHaMeDY joins
07:38:43MuHaMeDY quits [Remote host closed the connection]
08:09:39tbc1887 (tbc1887) joins
08:46:06tbc1887 quits [Read error: Connection reset by peer]
09:14:59<Sanqui>anybody knows why ArchiveBot would fail to grab the PDFs at e.g. https://score.cz/1.803.aktuality.clanek-score-25 ?
09:15:41<Sanqui>well, I guess I do understand that they are served with JS...
09:22:10<Sanqui>Fetching all of them with https://sanqui.net/etc/archiveteam/score-pdf-urls.txt now :)
10:37:41<@OrIdow6>So I tried to post a Reddit comment on that post about the Anime thing, either I've been shadowbanned or it needs approval
10:38:02<@OrIdow6>Will check in a day or so, I have some questions about it
10:57:19qwertyasdfuiopghjkl quits [Remote host closed the connection]
11:17:31march_happy quits [Ping timeout: 252 seconds]
11:18:15march_happy (march_happy) joins
11:50:26march_happy quits [Ping timeout: 240 seconds]
11:57:31march_happy (march_happy) joins
12:09:43spirit quits [Quit: Leaving]
13:27:08shufflertoxin quits [Remote host closed the connection]
13:43:44Arcorann quits [Ping timeout: 265 seconds]
13:51:01HP_Archivist (HP_Archivist) joins
13:51:17HP_Archivist quits [Remote host closed the connection]
13:52:29HP_Archivist (HP_Archivist) joins
13:52:47HP_Archivist quits [Remote host closed the connection]
13:53:59HP_Archivist (HP_Archivist) joins
13:54:17HP_Archivist quits [Remote host closed the connection]
13:55:29HP_Archivist (HP_Archivist) joins
13:55:47HP_Archivist quits [Remote host closed the connection]
13:56:15HP_Archivist (HP_Archivist) joins
14:29:19vukky quits [Remote host closed the connection]
14:29:31mgrytbak0 joins
14:29:45vukky (Vukky) joins
14:30:22mgrytbak quits [Read error: Connection reset by peer]
14:30:22mgrytbak0 is now known as mgrytbak
14:33:30Megame (Megame) joins
14:33:59Megame quits [Remote host closed the connection]
14:35:07Megame (Megame) joins
14:35:29Megame quits [Remote host closed the connection]
14:36:55Megame (Megame) joins
15:20:26HP_Archivist quits [Ping timeout: 240 seconds]
15:33:49HP_Archivist (HP_Archivist) joins
15:34:05HP_Archivist quits [Remote host closed the connection]
16:46:08HP_Archivist (HP_Archivist) joins
17:09:03<protondonor>on the Hobby Scuffles thread? you may need to make a post on this week's thread because that thread is now last week's thread
17:09:14<protondonor>OrIdow6 ^
17:30:48LeGoupil joins
18:02:24LeGoupil quits [Client Quit]
18:22:26HP_Archivist quits [Ping timeout: 240 seconds]
18:34:50MisterCheezeCake joins
18:34:52MisterCheezeCake quits [Remote host closed the connection]
19:18:05Iki joins
19:55:55wyatt8750 joins
19:57:05wyatt8740 quits [Ping timeout: 252 seconds]
20:16:26superkuh_ quits [Ping timeout: 240 seconds]
20:18:01superkuh joins
20:29:07Stiletto joins
21:33:52hackbug (hackbug) joins
21:34:24xit78 joins
21:43:21xit78 quits [Read error: Connection reset by peer]
21:49:01qwertyasdfuiopghjkl joins
22:06:20Arcorann (Arcorann) joins
22:33:59<@OrIdow6>Thank you protondonor, I;ll just send a PM
22:37:39<@OrIdow6>Looks like I can't do that from a throwaway
22:37:57<@OrIdow6>Can someone please message ThrowawayMFC2022 and tell them to either come to IRC or to reddit.com/r/archiveteam?
22:51:24IDK quits [Quit: Connection closed for inactivity]
22:53:41Megame quits [Client Quit]
22:57:00BlueMaxima joins
23:03:38HackMii quits [Remote host closed the connection]
23:04:53HackMii (hacktheplanet) joins
23:24:06march_happy quits [Ping timeout: 240 seconds]
23:24:54march_happy (march_happy) joins
23:35:24BlueMaxima quits [Read error: Connection reset by peer]
23:35:59BlueMaxima joins
23:36:38ave quits [Client Quit]
23:37:39nepeat quits [Quit: ZNC 1.8.2 - https://znc.in]
23:38:06nepeat (nepeat) joins
23:40:01ave (ave) joins
23:40:01Eighty quits [Ping timeout: 252 seconds]
23:40:59Eighty joins
23:40:59Eighty quits [Changing host]
23:40:59Eighty (Eighty) joins
23:56:38fuzzy8021 quits [Read error: Connection reset by peer]
23:57:02ave quits [Client Quit]
23:57:10fuzzy8021 (fuzzy8021) joins
23:59:39ave (ave) joins