#archiveteam-bs log for 2022-08-06

Home Search Previous day Next day

00:04:57		yawkat (yawkat) joins
00:11:16		yawkat quits [Ping timeout: 265 seconds]
00:13:47		yawkat (yawkat) joins
00:18:18		Arcorann (Arcorann) joins
00:33:37		qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
01:24:54		lennier1 quits [Client Quit]
01:25:34		lennier1 (lennier1) joins
01:27:06	<lennier1>	Instagram used to be archivable through socialbot. But it's been quite a while since it's worked.
01:27:44	<lennier1>	There are programs to mass download Instagram photos, but they need an account.
01:37:16		CoolCanuck quits [Remote host closed the connection]
02:27:34		fangfufu quits [Read error: Connection reset by peer]
02:54:26		fangfufu joins
02:54:36		fangfufu is now authenticated as fangfufu
03:12:49		lennier1 quits [Client Quit]
03:13:14		lennier1 (lennier1) joins
03:47:47		Megame (Megame) joins
04:54:54	<@JAA>	PhantomJS in wpull/AB was horribly broken many years ago already, so yeah, I removed it from AB. Instagram used to be scrapable with socialbot. Later, we had some AB pipelines specifically for individual Instagram pages (mostly profiles), but they ended up getting banned as well.
05:40:23		Nulo quits [Read error: Connection reset by peer]
05:49:13		Nulo joins
06:48:55	<Ryz>	Is there a way to archive https://mega.nz/file/gWhVGJAa#oAy9pRQtaN551kaJSm40AdkuBqhtrCoFKQ2YSCv4tYs ? Came from https://warcraftadventures.wordpress.com/2022/07/31/final-version-1-0-released/ - incase of DMCA because Blizzard Entertainment/Activision
07:14:19		Megame quits [Client Quit]
07:51:17		eroc1990 quits [Quit: The Lounge - https://thelounge.chat]
08:11:02	<mind_combatant>	so, for what it's worth, ironically, the wiki page for URLTeam ( https://wiki.archiveteam.org/index.php?title=URLTeam ) has at least two of it's references (numbers 3 and 4) linking to dead pages that i had to use the wayback machine to actually see. there's probably various others throughout the wiki, probably worth re-linking to a working wayback snapshot or some other archived copy either instead or in addition to.
08:20:31		eroc1990 (eroc1990) joins
08:20:35	<mind_combatant>	reference number 2 doesn't even seem to have a working copy on the wayback machine, so that's cool.
08:26:59	<mind_combatant>	oh, wait, never mind, 2 does still exist, it's just that bit.ly's blog redirected me to an address that doesn't exist and never existed before i got the url and put it into the wayback machine.
08:28:39		mutantmonkey quits [Remote host closed the connection]
08:28:57		mutantmonkey (mutantmonkey) joins
08:45:43		Minkafighter quits [Quit: The Lounge - https://thelounge.chat]
08:46:23		Minkafighter joins
10:00:00		Dragnog joins
10:07:13		yawkat quits [Ping timeout: 265 seconds]
11:34:24		yawkat (yawkat) joins
11:41:46		Barto quits [Quit: WeeChat 3.6]
11:43:33		Barto (Barto) joins
11:56:51		mutantmonkey quits [Remote host closed the connection]
11:57:15		mutantmonkey (mutantmonkey) joins
12:10:57		nimaje quits [Ping timeout: 265 seconds]
13:19:50		mutantmonkey quits [Remote host closed the connection]
13:20:12		mutantmonkey (mutantmonkey) joins
13:39:16		sec^nd quits [Ping timeout: 240 seconds]
13:46:13		sec^nd (second) joins
13:47:11		Nulo quits [Read error: Connection reset by peer]
13:48:27		Nulo joins
14:40:48		nimaje joins
14:43:31		nimaje quits [Client Quit]
14:45:11		nimaje joins
14:50:35		nimaje quits [Client Quit]
14:52:25		nimaje joins
15:22:57		Arcorann quits [Ping timeout: 265 seconds]
15:39:43		Dragnog quits [Client Quit]
15:59:05		nimaje quits [Ping timeout: 265 seconds]
16:00:16		nimaje joins
16:00:28		nimaje quits [Client Quit]
16:02:17		nimaje joins
16:09:09		nimaje quits [Client Quit]
16:10:45		nimaje joins
16:39:13		mutantmonkey quits [Remote host closed the connection]
16:39:38		mutantmonkey (mutantmonkey) joins
16:42:13		wyatt8740 joins
16:52:30		benjinsmith joins
16:54:28		benjins quits [Ping timeout: 240 seconds]
17:09:42	<systwi_>	Ryz: MEGA file preservation can, currently, only be done manually.
17:10:13	<@JAA>	Ryz: There's no specific tooling for MEGA so far. Yet another thing I've been meaning to look into for a while. So apart from some browser-based thing (Brozzler or another MITM-proxied headless browser with the required scripting), the best thing we can do is download it and throw it into an IA item.
17:10:36	<@JAA>	Ninja'd...
17:17:46		mutantmonkey quits [Ping timeout: 240 seconds]
17:18:03		benjinss joins
17:18:25		benjinss is now known as benjins
17:18:29		benjins is now authenticated as benjins
17:21:15		benjinsmith quits [Ping timeout: 265 seconds]
17:30:52		nimaje quits [Ping timeout: 240 seconds]
17:32:18		mutantmonkey (mutantmonkey) joins
17:35:52		nimaje joins
17:47:58		benjinsmith joins
17:50:15		benjins quits [Ping timeout: 265 seconds]
17:53:41		benjinsmith is now known as benjins
17:53:41		benjins is now authenticated as benjins
17:58:36		pabs quits [Read error: Connection reset by peer]
17:58:37		qwertyasdfuiopghjkl joins
17:59:37		pabs (pabs) joins
18:02:54		sec^nd quits [Remote host closed the connection]
18:03:50		sec^nd (second) joins
19:23:15		Megame (Megame) joins
19:41:40		lennier1 quits [Ping timeout: 240 seconds]
19:43:46		lennier1 (lennier1) joins
19:53:19		sec^nd quits [Remote host closed the connection]
19:53:19		mutantmonkey quits [Remote host closed the connection]
19:53:50		sec^nd (second) joins
19:54:18		mutantmonkey (mutantmonkey) joins
19:55:44		mutantmonkey quits [Remote host closed the connection]
19:56:04		mutantmonkey (mutantmonkey) joins
20:33:10		Megame quits [Client Quit]
20:39:17	<h2ibot>	JustAnotherArchivist moved Gitlab to GitLab (Capitalisation fix): https://wiki.archiveteam.org/?title=GitLab
20:39:18	<h2ibot>	JustAnotherArchivist edited GitLab (-17, Capitalisation fix): https://wiki.archiveteam.org/?diff=48787&oldid=48786
20:53:55		tzt quits [Ping timeout: 265 seconds]
20:54:04		nimaje quits [Ping timeout: 240 seconds]
20:57:21		nimaje joins
21:28:32		IDK (IDK) joins
22:14:14		sec^nd quits [Remote host closed the connection]
22:14:44		sec^nd (second) joins
22:38:25		qwertyasdfuiopghjkl quits [Remote host closed the connection]
22:39:44	<Jake>	I might just be an idiot right now, but is there not a way to use curl to glob numerical ranges together sequentially together? Currently have "curl "https://example.com/[7600-8000]/[7600-8000].jpg"", looking to get the same number in each one, but it doesn't seem to work like that. Output: "https://example.com/7600/7608.jpg"
22:40:46	<@JAA>	You mean you want /7600/7600.jpg, /7601/7601.jpg, etc. for a total of 401 URLs?
22:41:09	<Jake>	Yup.
22:41:17	<Jake>	Sorry, I probably didn't explain it very well.
22:41:50	<nimaje>	I don't think there is a way for that, but libera/#curl probably knows more
22:43:00	<@JAA>	Yeah, I don't think so either. I'd probably do it with `seq\|awk` or similar.
22:44:10	<Jake>	yeah. :( Thanks!
22:59:37	<h2ibot>	Themadprogramer edited Discourse (+48, Added Hugo Community): https://wiki.archiveteam.org/?diff=48788&oldid=48774
22:59:38	<h2ibot>	ThreeHeadedMonkey edited Deathwatch (+312, Added MapKnitter and SpectralWorkbench): https://wiki.archiveteam.org/?diff=48789&oldid=48783
23:00:38	<h2ibot>	KevinArchivesThings edited WikiTeam (+160, Added WARC search for editthis.info wikis): https://wiki.archiveteam.org/?diff=48790&oldid=48703
23:02:40	<adamus1red>	Jake: Wouldn't a bash for loop do the trick?
23:02:54	<adamus1red>	for i in $(seq 7600 8000);
23:03:12	<@JAA>	That would create a new (process and) connection for each request, which slows things down significantly.
23:03:22	<Jake>	^
23:03:38	<Jake>	It's what I was doing before, but it is _extremely_ slow :(
23:04:05		march_happy (march_happy) joins
23:04:17	<@JAA>	Not sure if it could perhaps be done with shell pattern expansion, but then you might run into argument list length limits.
23:05:54	<adamus1red>	Jake: use the loop to generate the list of URLs then xargs to run multiple requests per curl process and run multiple instances?
23:06:21	<Jake>	I think that's the current best plan!
23:07:21	<@JAA>	`seq\|awk` is probably a few orders of magnitude faster than a shell loop, but for a few hundred numbers, that won't matter (or might even be faster due to the lack of subprocesses).
23:08:05	<thuban>	Jake: curl can take a list of 'configurations' with -K
23:08:58	<Jake>	Yup, I don't think we can get the URLs to be generated in curl though?
23:09:31	<thuban>	this is kind of a pain in the ass, because you need to prepend 'url=' to everything and duplicate (most of) any other configuration you would do, but it does mean you can do seq\|awk\|curl and let curl's native connection reuse (and parallelization) handle the whole batch
23:10:30	<thuban>	not _in curl_, afaik, no
23:11:09	<Jake>	ah. I see
23:59:33		qwertyasdfuiopghjkl joins

Home Search Previous day Next day