| 00:04:57 | | yawkat (yawkat) joins |
| 00:11:16 | | yawkat quits [Ping timeout: 265 seconds] |
| 00:13:47 | | yawkat (yawkat) joins |
| 00:18:18 | | Arcorann (Arcorann) joins |
| 00:33:37 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 01:24:54 | | lennier1 quits [Client Quit] |
| 01:25:34 | | lennier1 (lennier1) joins |
| 01:27:06 | <lennier1> | Instagram used to be archivable through socialbot. But it's been quite a while since it's worked. |
| 01:27:44 | <lennier1> | There are programs to mass download Instagram photos, but they need an account. |
| 01:37:16 | | CoolCanuck quits [Remote host closed the connection] |
| 02:27:34 | | fangfufu quits [Read error: Connection reset by peer] |
| 02:54:26 | | fangfufu joins |
| 02:54:36 | | fangfufu is now authenticated as fangfufu |
| 03:12:49 | | lennier1 quits [Client Quit] |
| 03:13:14 | | lennier1 (lennier1) joins |
| 03:47:47 | | Megame (Megame) joins |
| 04:54:54 | <@JAA> | PhantomJS in wpull/AB was horribly broken many years ago already, so yeah, I removed it from AB. Instagram used to be scrapable with socialbot. Later, we had some AB pipelines specifically for individual Instagram pages (mostly profiles), but they ended up getting banned as well. |
| 05:40:23 | | Nulo quits [Read error: Connection reset by peer] |
| 05:49:13 | | Nulo joins |
| 06:48:55 | <Ryz> | Is there a way to archive https://mega.nz/file/gWhVGJAa#oAy9pRQtaN551kaJSm40AdkuBqhtrCoFKQ2YSCv4tYs ? Came from https://warcraftadventures.wordpress.com/2022/07/31/final-version-1-0-released/ - incase of DMCA because Blizzard Entertainment/Activision |
| 07:14:19 | | Megame quits [Client Quit] |
| 07:51:17 | | eroc1990 quits [Quit: The Lounge - https://thelounge.chat] |
| 08:11:02 | <mind_combatant> | so, for what it's worth, ironically, the wiki page for URLTeam ( https://wiki.archiveteam.org/index.php?title=URLTeam ) has at least two of it's references (numbers 3 and 4) linking to dead pages that i had to use the wayback machine to actually see. there's probably various others throughout the wiki, probably worth re-linking to a working wayback snapshot or some other archived copy either instead or in addition to. |
| 08:20:31 | | eroc1990 (eroc1990) joins |
| 08:20:35 | <mind_combatant> | reference number 2 doesn't even seem to have a working copy on the wayback machine, so that's cool. |
| 08:26:59 | <mind_combatant> | oh, wait, never mind, 2 does still exist, it's just that bit.ly's blog redirected me to an address that doesn't exist and never existed before i got the url and put it into the wayback machine. |
| 08:28:39 | | mutantmonkey quits [Remote host closed the connection] |
| 08:28:57 | | mutantmonkey (mutantmonkey) joins |
| 08:45:43 | | Minkafighter quits [Quit: The Lounge - https://thelounge.chat] |
| 08:46:23 | | Minkafighter joins |
| 10:00:00 | | Dragnog joins |
| 10:07:13 | | yawkat quits [Ping timeout: 265 seconds] |
| 11:34:24 | | yawkat (yawkat) joins |
| 11:41:46 | | Barto quits [Quit: WeeChat 3.6] |
| 11:43:33 | | Barto (Barto) joins |
| 11:56:51 | | mutantmonkey quits [Remote host closed the connection] |
| 11:57:15 | | mutantmonkey (mutantmonkey) joins |
| 12:10:57 | | nimaje quits [Ping timeout: 265 seconds] |
| 13:19:50 | | mutantmonkey quits [Remote host closed the connection] |
| 13:20:12 | | mutantmonkey (mutantmonkey) joins |
| 13:39:16 | | sec^nd quits [Ping timeout: 240 seconds] |
| 13:46:13 | | sec^nd (second) joins |
| 13:47:11 | | Nulo quits [Read error: Connection reset by peer] |
| 13:48:27 | | Nulo joins |
| 14:40:48 | | nimaje joins |
| 14:43:31 | | nimaje quits [Client Quit] |
| 14:45:11 | | nimaje joins |
| 14:50:35 | | nimaje quits [Client Quit] |
| 14:52:25 | | nimaje joins |
| 15:22:57 | | Arcorann quits [Ping timeout: 265 seconds] |
| 15:39:43 | | Dragnog quits [Client Quit] |
| 15:59:05 | | nimaje quits [Ping timeout: 265 seconds] |
| 16:00:16 | | nimaje joins |
| 16:00:28 | | nimaje quits [Client Quit] |
| 16:02:17 | | nimaje joins |
| 16:09:09 | | nimaje quits [Client Quit] |
| 16:10:45 | | nimaje joins |
| 16:39:13 | | mutantmonkey quits [Remote host closed the connection] |
| 16:39:38 | | mutantmonkey (mutantmonkey) joins |
| 16:42:13 | | wyatt8740 joins |
| 16:52:30 | | benjinsmith joins |
| 16:54:28 | | benjins quits [Ping timeout: 240 seconds] |
| 17:09:42 | <systwi_> | Ryz: MEGA file preservation can, currently, only be done manually. |
| 17:10:13 | <@JAA> | Ryz: There's no specific tooling for MEGA so far. Yet another thing I've been meaning to look into for a while. So apart from some browser-based thing (Brozzler or another MITM-proxied headless browser with the required scripting), the best thing we can do is download it and throw it into an IA item. |
| 17:10:36 | <@JAA> | Ninja'd... |
| 17:17:46 | | mutantmonkey quits [Ping timeout: 240 seconds] |
| 17:18:03 | | benjinss joins |
| 17:18:25 | | benjinss is now known as benjins |
| 17:18:29 | | benjins is now authenticated as benjins |
| 17:21:15 | | benjinsmith quits [Ping timeout: 265 seconds] |
| 17:30:52 | | nimaje quits [Ping timeout: 240 seconds] |
| 17:32:18 | | mutantmonkey (mutantmonkey) joins |
| 17:35:52 | | nimaje joins |
| 17:47:58 | | benjinsmith joins |
| 17:50:15 | | benjins quits [Ping timeout: 265 seconds] |
| 17:53:41 | | benjinsmith is now known as benjins |
| 17:53:41 | | benjins is now authenticated as benjins |
| 17:58:36 | | pabs quits [Read error: Connection reset by peer] |
| 17:58:37 | | qwertyasdfuiopghjkl joins |
| 17:59:37 | | pabs (pabs) joins |
| 18:02:54 | | sec^nd quits [Remote host closed the connection] |
| 18:03:50 | | sec^nd (second) joins |
| 19:23:15 | | Megame (Megame) joins |
| 19:41:40 | | lennier1 quits [Ping timeout: 240 seconds] |
| 19:43:46 | | lennier1 (lennier1) joins |
| 19:53:19 | | sec^nd quits [Remote host closed the connection] |
| 19:53:19 | | mutantmonkey quits [Remote host closed the connection] |
| 19:53:50 | | sec^nd (second) joins |
| 19:54:18 | | mutantmonkey (mutantmonkey) joins |
| 19:55:44 | | mutantmonkey quits [Remote host closed the connection] |
| 19:56:04 | | mutantmonkey (mutantmonkey) joins |
| 20:33:10 | | Megame quits [Client Quit] |
| 20:39:17 | <h2ibot> | JustAnotherArchivist moved Gitlab to GitLab (Capitalisation fix): https://wiki.archiveteam.org/?title=GitLab |
| 20:39:18 | <h2ibot> | JustAnotherArchivist edited GitLab (-17, Capitalisation fix): https://wiki.archiveteam.org/?diff=48787&oldid=48786 |
| 20:53:55 | | tzt quits [Ping timeout: 265 seconds] |
| 20:54:04 | | nimaje quits [Ping timeout: 240 seconds] |
| 20:57:21 | | nimaje joins |
| 21:28:32 | | IDK (IDK) joins |
| 22:14:14 | | sec^nd quits [Remote host closed the connection] |
| 22:14:44 | | sec^nd (second) joins |
| 22:38:25 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 22:39:44 | <Jake> | I might just be an idiot right now, but is there not a way to use curl to glob numerical ranges together sequentially together? Currently have "curl "https://example.com/[7600-8000]/[7600-8000].jpg"", looking to get the same number in each one, but it doesn't seem to work like that. Output: "https://example.com/7600/7608.jpg" |
| 22:40:46 | <@JAA> | You mean you want /7600/7600.jpg, /7601/7601.jpg, etc. for a total of 401 URLs? |
| 22:41:09 | <Jake> | Yup. |
| 22:41:17 | <Jake> | Sorry, I probably didn't explain it very well. |
| 22:41:50 | <nimaje> | I don't think there is a way for that, but libera/#curl probably knows more |
| 22:43:00 | <@JAA> | Yeah, I don't think so either. I'd probably do it with `seq|awk` or similar. |
| 22:44:10 | <Jake> | yeah. :( Thanks! |
| 22:59:37 | <h2ibot> | Themadprogramer edited Discourse (+48, Added Hugo Community): https://wiki.archiveteam.org/?diff=48788&oldid=48774 |
| 22:59:38 | <h2ibot> | ThreeHeadedMonkey edited Deathwatch (+312, Added MapKnitter and SpectralWorkbench): https://wiki.archiveteam.org/?diff=48789&oldid=48783 |
| 23:00:38 | <h2ibot> | KevinArchivesThings edited WikiTeam (+160, Added WARC search for editthis.info wikis): https://wiki.archiveteam.org/?diff=48790&oldid=48703 |
| 23:02:40 | <adamus1red> | Jake: Wouldn't a bash for loop do the trick? |
| 23:02:54 | <adamus1red> | for i in $(seq 7600 8000); |
| 23:03:12 | <@JAA> | That would create a new (process and) connection for each request, which slows things down significantly. |
| 23:03:22 | <Jake> | ^ |
| 23:03:38 | <Jake> | It's what I was doing before, but it is _extremely_ slow :( |
| 23:04:05 | | march_happy (march_happy) joins |
| 23:04:17 | <@JAA> | Not sure if it could perhaps be done with shell pattern expansion, but then you might run into argument list length limits. |
| 23:05:54 | <adamus1red> | Jake: use the loop to generate the list of URLs then xargs to run multiple requests per curl process and run multiple instances? |
| 23:06:21 | <Jake> | I think that's the current best plan! |
| 23:07:21 | <@JAA> | `seq|awk` is probably a few orders of magnitude faster than a shell loop, but for a few hundred numbers, that won't matter (or might even be faster due to the lack of subprocesses). |
| 23:08:05 | <thuban> | Jake: curl can take a list of 'configurations' with -K |
| 23:08:58 | <Jake> | Yup, I don't think we can get the URLs to be generated in curl though? |
| 23:09:31 | <thuban> | this is kind of a pain in the ass, because you need to prepend 'url=' to everything and duplicate (most of) any other configuration you would do, but it does mean you can do seq|awk|curl and let curl's native connection reuse (and parallelization) handle the whole batch |
| 23:10:30 | <thuban> | not _in curl_, afaik, no |
| 23:11:09 | <Jake> | ah. I see |
| 23:59:33 | | qwertyasdfuiopghjkl joins |