00:04:57yawkat (yawkat) joins
00:11:16yawkat quits [Ping timeout: 265 seconds]
00:13:47yawkat (yawkat) joins
00:18:18Arcorann (Arcorann) joins
00:33:37qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
01:24:54lennier1 quits [Client Quit]
01:25:34lennier1 (lennier1) joins
01:27:06<lennier1>Instagram used to be archivable through socialbot. But it's been quite a while since it's worked.
01:27:44<lennier1>There are programs to mass download Instagram photos, but they need an account.
01:37:16CoolCanuck quits [Remote host closed the connection]
02:27:34fangfufu quits [Read error: Connection reset by peer]
02:54:26fangfufu joins
03:12:49lennier1 quits [Client Quit]
03:13:14lennier1 (lennier1) joins
03:47:47Megame (Megame) joins
04:54:54<@JAA>PhantomJS in wpull/AB was horribly broken many years ago already, so yeah, I removed it from AB. Instagram used to be scrapable with socialbot. Later, we had some AB pipelines specifically for individual Instagram pages (mostly profiles), but they ended up getting banned as well.
05:40:23Nulo quits [Read error: Connection reset by peer]
05:49:13Nulo joins
06:48:55<Ryz>Is there a way to archive https://mega.nz/file/gWhVGJAa#oAy9pRQtaN551kaJSm40AdkuBqhtrCoFKQ2YSCv4tYs ? Came from https://warcraftadventures.wordpress.com/2022/07/31/final-version-1-0-released/ - incase of DMCA because Blizzard Entertainment/Activision
07:14:19Megame quits [Client Quit]
07:51:17eroc1990 quits [Quit: The Lounge - https://thelounge.chat]
08:11:02<mind_combatant>so, for what it's worth, ironically, the wiki page for URLTeam ( https://wiki.archiveteam.org/index.php?title=URLTeam ) has at least two of it's references (numbers 3 and 4) linking to dead pages that i had to use the wayback machine to actually see. there's probably various others throughout the wiki, probably worth re-linking to a working wayback snapshot or some other archived copy either instead or in addition to.
08:20:31eroc1990 (eroc1990) joins
08:20:35<mind_combatant>reference number 2 doesn't even seem to have a working copy on the wayback machine, so that's cool.
08:26:59<mind_combatant>oh, wait, never mind, 2 does still exist, it's just that bit.ly's blog redirected me to an address that doesn't exist and never existed before i got the url and put it into the wayback machine.
08:28:39mutantmonkey quits [Remote host closed the connection]
08:28:57mutantmonkey (mutantmonkey) joins
08:45:43Minkafighter quits [Quit: The Lounge - https://thelounge.chat]
08:46:23Minkafighter joins
10:00:00Dragnog joins
10:07:13yawkat quits [Ping timeout: 265 seconds]
11:34:24yawkat (yawkat) joins
11:41:46Barto quits [Quit: WeeChat 3.6]
11:43:33Barto (Barto) joins
11:56:51mutantmonkey quits [Remote host closed the connection]
11:57:15mutantmonkey (mutantmonkey) joins
12:10:57nimaje quits [Ping timeout: 265 seconds]
13:19:50mutantmonkey quits [Remote host closed the connection]
13:20:12mutantmonkey (mutantmonkey) joins
13:39:16sec^nd quits [Ping timeout: 240 seconds]
13:46:13sec^nd (second) joins
13:47:11Nulo quits [Read error: Connection reset by peer]
13:48:27Nulo joins
14:40:48nimaje joins
14:43:31nimaje quits [Client Quit]
14:45:11nimaje joins
14:50:35nimaje quits [Client Quit]
14:52:25nimaje joins
15:22:57Arcorann quits [Ping timeout: 265 seconds]
15:39:43Dragnog quits [Client Quit]
15:59:05nimaje quits [Ping timeout: 265 seconds]
16:00:16nimaje joins
16:00:28nimaje quits [Client Quit]
16:02:17nimaje joins
16:09:09nimaje quits [Client Quit]
16:10:45nimaje joins
16:39:13mutantmonkey quits [Remote host closed the connection]
16:39:38mutantmonkey (mutantmonkey) joins
16:42:13wyatt8740 joins
16:52:30benjinsmith joins
16:54:28benjins quits [Ping timeout: 240 seconds]
17:09:42<systwi_>Ryz: MEGA file preservation can, currently, only be done manually.
17:10:13<@JAA>Ryz: There's no specific tooling for MEGA so far. Yet another thing I've been meaning to look into for a while. So apart from some browser-based thing (Brozzler or another MITM-proxied headless browser with the required scripting), the best thing we can do is download it and throw it into an IA item.
17:10:36<@JAA>Ninja'd...
17:17:46mutantmonkey quits [Ping timeout: 240 seconds]
17:18:03benjinss joins
17:18:25benjinss is now known as benjins
17:21:15benjinsmith quits [Ping timeout: 265 seconds]
17:30:52nimaje quits [Ping timeout: 240 seconds]
17:32:18mutantmonkey (mutantmonkey) joins
17:35:52nimaje joins
17:47:58benjinsmith joins
17:50:15benjins quits [Ping timeout: 265 seconds]
17:53:41benjinsmith is now known as benjins
17:58:36pabs quits [Read error: Connection reset by peer]
17:58:37qwertyasdfuiopghjkl joins
17:59:37pabs (pabs) joins
18:02:54sec^nd quits [Remote host closed the connection]
18:03:50sec^nd (second) joins
19:23:15Megame (Megame) joins
19:41:40lennier1 quits [Ping timeout: 240 seconds]
19:43:46lennier1 (lennier1) joins
19:53:19sec^nd quits [Remote host closed the connection]
19:53:19mutantmonkey quits [Remote host closed the connection]
19:53:50sec^nd (second) joins
19:54:18mutantmonkey (mutantmonkey) joins
19:55:44mutantmonkey quits [Remote host closed the connection]
19:56:04mutantmonkey (mutantmonkey) joins
20:33:10Megame quits [Client Quit]
20:39:17<h2ibot>JustAnotherArchivist moved Gitlab to GitLab (Capitalisation fix): https://wiki.archiveteam.org/?title=GitLab
20:39:18<h2ibot>JustAnotherArchivist edited GitLab (-17, Capitalisation fix): https://wiki.archiveteam.org/?diff=48787&oldid=48786
20:53:55tzt quits [Ping timeout: 265 seconds]
20:54:04nimaje quits [Ping timeout: 240 seconds]
20:57:21nimaje joins
21:28:32IDK (IDK) joins
22:14:14sec^nd quits [Remote host closed the connection]
22:14:44sec^nd (second) joins
22:38:25qwertyasdfuiopghjkl quits [Remote host closed the connection]
22:39:44<Jake>I might just be an idiot right now, but is there not a way to use curl to glob numerical ranges together sequentially together? Currently have "curl "https://example.com/[7600-8000]/[7600-8000].jpg"", looking to get the same number in each one, but it doesn't seem to work like that. Output: "https://example.com/7600/7608.jpg"
22:40:46<@JAA>You mean you want /7600/7600.jpg, /7601/7601.jpg, etc. for a total of 401 URLs?
22:41:09<Jake>Yup.
22:41:17<Jake>Sorry, I probably didn't explain it very well.
22:41:50<nimaje>I don't think there is a way for that, but libera/#curl probably knows more
22:43:00<@JAA>Yeah, I don't think so either. I'd probably do it with `seq|awk` or similar.
22:44:10<Jake>yeah. :( Thanks!
22:59:37<h2ibot>Themadprogramer edited Discourse (+48, Added Hugo Community): https://wiki.archiveteam.org/?diff=48788&oldid=48774
22:59:38<h2ibot>ThreeHeadedMonkey edited Deathwatch (+312, Added MapKnitter and SpectralWorkbench): https://wiki.archiveteam.org/?diff=48789&oldid=48783
23:00:38<h2ibot>KevinArchivesThings edited WikiTeam (+160, Added WARC search for editthis.info wikis): https://wiki.archiveteam.org/?diff=48790&oldid=48703
23:02:40<adamus1red>Jake: Wouldn't a bash for loop do the trick?
23:02:54<adamus1red>for i in $(seq 7600 8000);
23:03:12<@JAA>That would create a new (process and) connection for each request, which slows things down significantly.
23:03:22<Jake>^
23:03:38<Jake>It's what I was doing before, but it is _extremely_ slow :(
23:04:05march_happy (march_happy) joins
23:04:17<@JAA>Not sure if it could perhaps be done with shell pattern expansion, but then you might run into argument list length limits.
23:05:54<adamus1red>Jake: use the loop to generate the list of URLs then xargs to run multiple requests per curl process and run multiple instances?
23:06:21<Jake>I think that's the current best plan!
23:07:21<@JAA>`seq|awk` is probably a few orders of magnitude faster than a shell loop, but for a few hundred numbers, that won't matter (or might even be faster due to the lack of subprocesses).
23:08:05<thuban>Jake: curl can take a list of 'configurations' with -K
23:08:58<Jake>Yup, I don't think we can get the URLs to be generated in curl though?
23:09:31<thuban>this is kind of a pain in the ass, because you need to prepend 'url=' to everything and duplicate (most of) any other configuration you would do, but it does mean you can do seq|awk|curl and let curl's native connection reuse (and parallelization) handle the whole batch
23:10:30<thuban>not _in curl_, afaik, no
23:11:09<Jake>ah. I see
23:59:33qwertyasdfuiopghjkl joins