| 00:00:06 | | HP_Archivist quits [Ping timeout: 240 seconds] |
| 00:01:52 | | HP_Archivist (HP_Archivist) joins |
| 00:02:49 | <jodizzle> | TheTechRobo: I think I did something like that once or twice a while back, but I unfortunately don't remember the details. The results were also ambiguous. |
| 00:03:10 | <TheTechRobo> | "ambiguous"...? |
| 00:03:18 | <TheTechRobo> | What do you mean? |
| 00:04:48 | <jodizzle> | Like I wasn't sure whether everything properly resumed, I was just doing some messy grabs. Related to the discussion on this ticket (which I think you've seen): https://github.com/ArchiveTeam/grab-site/issues/58 |
| 00:05:20 | <jodizzle> | I think my approach probably involved using grab-site with `--which-wpull-command` and trying to fiddle with the wpull command that popped out as best I could |
| 00:09:57 | | Arcorann (Arcorann) joins |
| 00:11:37 | <TheTechRobo> | What's the argument for resuming a crawl with wpull? |
| 00:20:05 | <ivan> | grab-site is actually a wpull launcher |
| 00:22:25 | <ivan> | well, it's a ludios_wpull launcher heh |
| 00:22:38 | | shufflertoxin quits [Remote host closed the connection] |
| 00:26:02 | <TheTechRobo> | ivan: I meant, what's the _option_ to resume a crawl ^^" |
| 00:31:04 | | HP_Archivist quits [Ping timeout: 265 seconds] |
| 00:37:17 | <ivan> | I think wpull might not need a special option for that beyond all the arguments originally used |
| 00:37:28 | <ivan> | but my memory is hazy so make a backup |
| 00:38:52 | <@JAA> | Note that if cookies are involved, things might get messy on the resumption. |
| 00:42:00 | | shufflertoxin (shufflertoxin) joins |
| 00:45:48 | | HP_Archivist (HP_Archivist) joins |
| 00:46:04 | | HP_Archivist quits [Remote host closed the connection] |
| 00:47:16 | | HP_Archivist (HP_Archivist) joins |
| 00:47:21 | | driib798942 (driib) joins |
| 00:47:34 | | HP_Archivist quits [Remote host closed the connection] |
| 00:48:46 | | HP_Archivist (HP_Archivist) joins |
| 00:49:04 | | HP_Archivist quits [Remote host closed the connection] |
| 00:50:16 | | HP_Archivist (HP_Archivist) joins |
| 00:50:34 | | HP_Archivist quits [Remote host closed the connection] |
| 00:50:46 | | driib79894 quits [Ping timeout: 240 seconds] |
| 00:50:46 | | driib798942 is now known as driib79894 |
| 00:51:47 | | HP_Archivist (HP_Archivist) joins |
| 00:52:04 | | HP_Archivist quits [Remote host closed the connection] |
| 00:53:16 | | HP_Archivist (HP_Archivist) joins |
| 00:53:34 | | HP_Archivist quits [Remote host closed the connection] |
| 00:54:47 | | HP_Archivist (HP_Archivist) joins |
| 00:55:04 | | HP_Archivist quits [Remote host closed the connection] |
| 00:56:16 | | HP_Archivist (HP_Archivist) joins |
| 01:00:47 | | HP_Archivist quits [Ping timeout: 252 seconds] |
| 01:21:25 | | HP_Archivist (HP_Archivist) joins |
| 01:31:05 | | MisterCheezeCake joins |
| 01:52:05 | | sec^nd quits [Remote host closed the connection] |
| 01:52:07 | | scowlee quits [Ping timeout: 252 seconds] |
| 01:53:17 | | sec^nd (second) joins |
| 01:53:20 | | scowlee (scowlee) joins |
| 02:03:27 | <@OrIdow6> | Might think about writign MyAnimeFigures as a warrior project |
| 02:03:46 | | rm3 joins |
| 02:04:27 | <@OrIdow6> | First mention of it and I already got the name wrong, fantastic |
| 02:06:32 | <@OrIdow6> | I see the domain expires in June |
| 02:07:49 | <thuban> | are they closing? or have they just... not renewed it yet? |
| 02:08:40 | <@OrIdow6> | According to the post Arcorann linked in #at, site experiencing errors and the owner has disappeared |
| 02:08:52 | <@OrIdow6> | Issues, I should say |
| 02:09:19 | <thuban> | i don't idle in #at and i always forget that stuff happens there v_v |
| 02:09:44 | <thuban> | i should configure weechat to merge it into this buffer... |
| 02:38:16 | <TheTechRobo> | ivan: In that case, what prevents grab-site from just not creating the new control files etc if the destination path already has those files? |
| 02:39:09 | | HP_Archivist quits [Ping timeout: 265 seconds] |
| 02:47:08 | <ivan> | TheTechRobo: ludios_wpull doesn't create control files, grab-site main.py does |
| 02:47:30 | <TheTechRobo> | I realise that, hence why it would fall onto grab-site rather than wpull |
| 02:48:46 | <ivan> | oh, yeah, grab-site could avoid creating control files if it were fixed and stuff, but it wouldn't solve the resumption problem because of the other issues mentioned in the github issue |
| 02:49:42 | <TheTechRobo> | Right, and I understand, but it's not often that cookies actually matter enough, at least for the stuff I'm archiving |
| 02:49:53 | <TheTechRobo> | unless you were reffering to other issues |
| 02:49:54 | <ivan> | there's the other issue of the resume taking forever |
| 02:50:03 | <TheTechRobo> | Right, forgot about that one |
| 02:50:55 | <ivan> | it would be better to write a new archiving system that used a lot of headless chrome to run singlefile in many tabs and deduplicate the images and use custom zstd dictionaries to compress the rest |
| 02:51:16 | <TheTechRobo> | Isn't that just brozzler but with more features? |
| 02:51:25 | <TheTechRobo> | (and multiple tabs?) |
| 02:51:48 | <@JAA> | (And no WARCs) |
| 02:52:02 | <@JAA> | :-( |
| 02:52:04 | <ivan> | we could get some HARs lol |
| 02:52:09 | <TheTechRobo> | HARs? |
| 02:52:10 | | @JAA slaps ivan around a bit with a large trout |
| 02:52:29 | <@JAA> | In any case, putting warcprox in front of a browser is easy. |
| 02:53:38 | <TheTechRobo> | Still, I'd rather take a while to resume the crawl than to have to start over |
| 02:53:52 | <TheTechRobo> | Yes, I could use the wpull commands, but are most people going to want to do that? |
| 02:54:48 | <ivan> | https://en.wikipedia.org/wiki/HAR_(file_format) is HAR |
| 02:55:11 | <TheTechRobo> | Why not just use WARC? |
| 02:55:30 | <ivan> | something in Chromium has code to dump HARs, but not WARCs :) |
| 02:55:32 | <TheTechRobo> | ...Or, I guess browsers don't have the metadata still, for the most part |
| 02:55:35 | <TheTechRobo> | ye ah |
| 02:56:28 | <ivan> | I wouldn't block a totally correctly written PR that enables resumption in grab-site |
| 02:58:30 | <TheTechRobo> | would it just be not trying to overwrite the control files and stuff? |
| 02:58:51 | <ivan> | something like that. perhaps an --argument to use an existing directory |
| 02:59:10 | <TheTechRobo> | Well, that's already done with --path, no? :-) |
| 02:59:11 | <ivan> | wpull_hooks.py also has a few things that need to be fixed as they open in "w" instead of "a" |
| 02:59:24 | <ivan> | TheTechRobo: sure, I just can't remember how grab-site works |
| 03:01:38 | <@JAA> | wpull also overwrites the log file IIRC. So if the previous run crashed and didn't write the log file to the WARC, you'd lose that. |
| 03:04:23 | <@JAA> | I.e. you'd want to change the log filename to something that doesn't exist yet to avoid that. |
| 03:07:04 | | AnotherIki quits [Read error: Connection reset by peer] |
| 03:39:08 | | MisterCheezeCake quits [Remote host closed the connection] |
| 03:42:09 | | sonick quits [Client Quit] |
| 04:26:07 | | Stiletto quits [Ping timeout: 252 seconds] |
| 04:33:49 | | march_happy quits [Ping timeout: 252 seconds] |
| 04:35:04 | | march_happy (march_happy) joins |
| 04:46:06 | | march_happy quits [Ping timeout: 240 seconds] |
| 04:47:40 | | march_happy (march_happy) joins |
| 04:50:13 | | shufflertoxin quits [Remote host closed the connection] |
| 05:01:06 | | march_happy quits [Ping timeout: 240 seconds] |
| 05:43:36 | | march_happy (march_happy) joins |
| 06:31:26 | | lennier1 quits [Ping timeout: 240 seconds] |
| 06:32:43 | | lennier1 (lennier1) joins |
| 07:01:43 | | sonick (sonick) joins |
| 07:14:20 | | shufflertoxin (shufflertoxin) joins |
| 07:37:06 | | MuHaMeDY joins |
| 07:38:43 | | MuHaMeDY quits [Remote host closed the connection] |
| 08:09:39 | | tbc1887 (tbc1887) joins |
| 08:46:06 | | tbc1887 quits [Read error: Connection reset by peer] |
| 09:14:59 | <Sanqui> | anybody knows why ArchiveBot would fail to grab the PDFs at e.g. https://score.cz/1.803.aktuality.clanek-score-25 ? |
| 09:15:41 | <Sanqui> | well, I guess I do understand that they are served with JS... |
| 09:22:10 | <Sanqui> | Fetching all of them with https://sanqui.net/etc/archiveteam/score-pdf-urls.txt now :) |
| 10:37:41 | <@OrIdow6> | So I tried to post a Reddit comment on that post about the Anime thing, either I've been shadowbanned or it needs approval |
| 10:38:02 | <@OrIdow6> | Will check in a day or so, I have some questions about it |
| 10:57:19 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 11:17:31 | | march_happy quits [Ping timeout: 252 seconds] |
| 11:18:15 | | march_happy (march_happy) joins |
| 11:50:26 | | march_happy quits [Ping timeout: 240 seconds] |
| 11:57:31 | | march_happy (march_happy) joins |
| 12:09:43 | | spirit quits [Quit: Leaving] |
| 13:27:08 | | shufflertoxin quits [Remote host closed the connection] |
| 13:43:44 | | Arcorann quits [Ping timeout: 265 seconds] |
| 13:51:01 | | HP_Archivist (HP_Archivist) joins |
| 13:51:17 | | HP_Archivist quits [Remote host closed the connection] |
| 13:52:29 | | HP_Archivist (HP_Archivist) joins |
| 13:52:47 | | HP_Archivist quits [Remote host closed the connection] |
| 13:53:59 | | HP_Archivist (HP_Archivist) joins |
| 13:54:17 | | HP_Archivist quits [Remote host closed the connection] |
| 13:55:29 | | HP_Archivist (HP_Archivist) joins |
| 13:55:47 | | HP_Archivist quits [Remote host closed the connection] |
| 13:56:15 | | HP_Archivist (HP_Archivist) joins |
| 14:29:19 | | vukky quits [Remote host closed the connection] |
| 14:29:31 | | mgrytbak0 joins |
| 14:29:45 | | vukky (Vukky) joins |
| 14:30:22 | | mgrytbak quits [Read error: Connection reset by peer] |
| 14:30:22 | | mgrytbak0 is now known as mgrytbak |
| 14:33:30 | | Megame (Megame) joins |
| 14:33:59 | | Megame quits [Remote host closed the connection] |
| 14:35:07 | | Megame (Megame) joins |
| 14:35:29 | | Megame quits [Remote host closed the connection] |
| 14:36:55 | | Megame (Megame) joins |
| 15:20:26 | | HP_Archivist quits [Ping timeout: 240 seconds] |
| 15:33:49 | | HP_Archivist (HP_Archivist) joins |
| 15:34:05 | | HP_Archivist quits [Remote host closed the connection] |
| 16:46:08 | | HP_Archivist (HP_Archivist) joins |
| 17:01:04 | | pcr is now authenticated as pcr |
| 17:09:03 | <protondonor> | on the Hobby Scuffles thread? you may need to make a post on this week's thread because that thread is now last week's thread |
| 17:09:14 | <protondonor> | OrIdow6 ^ |
| 17:30:48 | | LeGoupil joins |
| 18:02:24 | | LeGoupil quits [Client Quit] |
| 18:22:26 | | HP_Archivist quits [Ping timeout: 240 seconds] |
| 18:34:50 | | MisterCheezeCake joins |
| 18:34:52 | | MisterCheezeCake quits [Remote host closed the connection] |
| 19:18:05 | | Iki joins |
| 19:55:55 | | wyatt8750 joins |
| 19:57:05 | | wyatt8740 quits [Ping timeout: 252 seconds] |
| 20:16:26 | | superkuh_ quits [Ping timeout: 240 seconds] |
| 20:18:01 | | superkuh joins |
| 20:29:07 | | Stiletto joins |
| 21:33:52 | | hackbug (hackbug) joins |
| 21:34:24 | | xit78 joins |
| 21:43:21 | | xit78 quits [Read error: Connection reset by peer] |
| 21:49:01 | | qwertyasdfuiopghjkl joins |
| 22:06:20 | | Arcorann (Arcorann) joins |
| 22:33:59 | <@OrIdow6> | Thank you protondonor, I;ll just send a PM |
| 22:37:39 | <@OrIdow6> | Looks like I can't do that from a throwaway |
| 22:37:57 | <@OrIdow6> | Can someone please message ThrowawayMFC2022 and tell them to either come to IRC or to reddit.com/r/archiveteam? |
| 22:51:24 | | IDK quits [Quit: Connection closed for inactivity] |
| 22:53:41 | | Megame quits [Client Quit] |
| 22:57:00 | | BlueMaxima joins |
| 23:03:38 | | HackMii quits [Remote host closed the connection] |
| 23:04:53 | | HackMii (hacktheplanet) joins |
| 23:24:06 | | march_happy quits [Ping timeout: 240 seconds] |
| 23:24:54 | | march_happy (march_happy) joins |
| 23:35:24 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 23:35:59 | | BlueMaxima joins |
| 23:36:38 | | ave quits [Client Quit] |
| 23:37:39 | | nepeat quits [Quit: ZNC 1.8.2 - https://znc.in] |
| 23:38:06 | | nepeat (nepeat) joins |
| 23:40:01 | | ave (ave) joins |
| 23:40:01 | | Eighty quits [Ping timeout: 252 seconds] |
| 23:40:59 | | Eighty joins |
| 23:40:59 | | Eighty is now authenticated as Eighty |
| 23:40:59 | | Eighty quits [Changing host] |
| 23:40:59 | | Eighty (Eighty) joins |
| 23:56:38 | | fuzzy8021 quits [Read error: Connection reset by peer] |
| 23:57:02 | | ave quits [Client Quit] |
| 23:57:10 | | fuzzy8021 (fuzzy8021) joins |
| 23:59:39 | | ave (ave) joins |