#archiveteam-bs log for 2023-05-17

Home Search Previous day Next day

00:13:46		BigBrain quits [Ping timeout: 245 seconds]
00:15:40		BigBrain (bigbrain) joins
00:26:48	<tech234a>	https://blog.google/technology/safety-security/updating-our-inactive-account-policies/
00:36:15	<pabs>	JAA: the opensource.com job has completed, can you do the grey area cookie for downloads thing?
00:38:39	<pabs>	the cookie btw, so you don't need to search IRC logs: STYXKEY_Drupal_visitor_gatedemail=osdc-gated-content
00:41:46		lennier1 quits [Ping timeout: 252 seconds]
00:44:11		lennier1 (lennier1) joins
00:51:53	<h2ibot>	Jscott edited Current Projects (+0, /* Warrior-based projects */): https://wiki.archiveteam.org/?diff=49788&oldid=49736
01:06:05	<fireonlive>	oops
01:08:50	<SketchCow>	!!!!!!!!!!!!!!!!!!!
01:08:53	<SketchCow>	I was cast to -bs!
01:08:54	<SketchCow>	Me!
01:09:31	<fireonlive>	nah, me
01:09:35	<fireonlive>	:p
01:10:03	<fireonlive>	luckily for y'all i should be gone in a few days!
01:10:13	<Terbium>	we are all confined into the prison known as "-bs" together :P
01:10:25	<@JAA>	pabs: Ack, setting the extraction up now, though the last data hasn't been uploaded yet it seems.
01:12:21		test___ quits [Remote host closed the connection]
01:19:17	<pabs>	great, thanks!
01:21:24		nostalgebraist joins
01:59:13	<wickedplayer494>	Ah fuck, I really did do 2013 instead of 2023 on that DPReview bullet
01:59:15	<wickedplayer494>	Good catch, Jason!
02:06:56	<hlgs\|m>	huh. getting a "save page now browser crashed on [url]" error for several urls
02:07:10	<hlgs\|m>	they're all links to files
02:07:33		tbc1887 quits [Read error: Connection reset by peer]
02:07:39	<pabs>	AB !ao < them instead?
02:08:31		AlsoTheTechRobo is now known as TheTechRobo
02:08:36	<@JAA>	#internetarchive for IA/WBM/SPN stuff.
02:08:45	<hlgs\|m>	i could hmm. this was both with the spn scripts and manually
02:08:48	<hlgs\|m>	oh thanks!
02:09:05	<hlgs\|m>	i don't actually know what channels there are <.<
02:10:16	<@JAA>	Most are documented somewhere on our wiki, but yeah, there isn't a simple list.
02:12:56	<hlgs\|m>	(oh never mind, the urls have been saved, the error was just throwing me for a loop)
02:13:01	<hlgs\|m>	(and throwing the spn scripts into a loop)
02:13:08	<hlgs\|m>	(ignore me)
02:33:49		nostalgebraist quits [Client Quit]
02:40:58		xkey quits [Client Quit]
02:41:21		xkey (xkey) joins
02:47:58		decky_e joins
02:50:26		BigBrain quits [Ping timeout: 245 seconds]
03:27:12		decky_e quits [Read error: Connection reset by peer]
03:59:13	<fireonlive>	thanks ja a
03:59:46		jackt1365\|m joins
04:00:00		aGerman quits [Client Quit]
04:03:08		aGerman (aGerman) joins
04:08:22		decky_e joins
04:49:58		Island quits [Read error: Connection reset by peer]
04:51:24		BigBrain (bigbrain) joins
05:04:24	<@JAA>	pabs: I'm using the presence of a 'gated-form' as the indicator. And there is indeed content outside of /downloads/ and the book which is behind a wall, e.g. https://opensource.com/content/cheat-sheet-gimp
05:05:18	<pabs>	ah, thanks for checking, that approach sounds good
05:12:26		dumbgoy quits [Ping timeout: 265 seconds]
05:57:44	<h2ibot>	Nemo bis edited Google (+510, /* Vital Signs */ YouTube, Google Docs and…): https://wiki.archiveteam.org/?diff=49789&oldid=46262
06:17:09		Justin[home] is now known as DopefishJustin
06:33:09		Minkafighter quits [Quit: The Lounge - https://thelounge.chat]
06:33:44		Minkafighter joins
07:17:20		that_lurker quits [Client Quit]
07:18:29		that_lurker (that_lurker) joins
07:19:56		BlueMaxima quits [Client Quit]
07:28:11		Arcorann (Arcorann) joins
07:32:52		birdjj quits [Client Quit]
07:46:30		parfait (kdqep) joins
07:46:59		birdjj joins
07:55:19		LeGoupil joins
07:57:18		birdjj1 joins
07:59:44		birdjj quits [Ping timeout: 252 seconds]
07:59:45		birdjj1 is now known as birdjj
07:59:59		fullpwnmedia joins
08:02:58		TastyWiener95 quits [Client Quit]
08:04:37		TastyWiener95 (TastyWiener95) joins
08:23:03		decky_e quits [Read error: Connection reset by peer]
08:48:12		birdjj quits [Client Quit]
08:48:33		birdjj joins
09:02:27		birdjj quits [Read error: Connection reset by peer]
09:02:31		birdjj1 joins
09:05:14		birdjj1 quits [Read error: Connection reset by peer]
09:05:16		birdjj joins
09:32:58		birdjj quits [Read error: Connection reset by peer]
09:33:03		Megame (Megame) joins
09:33:15		birdjj joins
09:50:49		sonick quits [Client Quit]
10:07:31		vukky joins
10:26:33		threedeeitguy quits [Quit: The Lounge - https://thelounge.chat]
10:51:00		Megame quits [Client Quit]
10:55:47		threedeeitguy joins
11:36:26		threedeeitguy quits [Ping timeout: 252 seconds]
11:54:20		vukky quits [Client Quit]
11:54:25		vukky joins
11:55:30		threedeeitguy joins
12:00:05		vukky is now authenticated as vukky
12:00:09		vukky quits [Client Quit]
12:00:14		vukky (vukky) joins
12:02:49		vukky quits [Changing host]
12:02:49		vukky (vukky) joins
12:10:21		nighthnh099_ joins
12:13:35	<nighthnh099_>	mentioned this in the archivebot channel, I found a gacha game that will shut down tomorrow, I did some https intercepting essentially to get the urls which I saved in #archivebot, the list is incomplete and also I'm not sure what to do about the apis the game uses
12:14:16	<nighthnh099_>	the game is Tales of Asteria (jp.co.bandainamcogames.NBGI0197) by the way
12:14:41	<nighthnh099_>	needs a japanese region google account to download, but I think it will work fine if installed with an apk
12:25:46	<pabs>	maybe upload the APK to an archive.org item?
12:28:56	<nighthnh099_>	I was going to do that but my problem would be the game apis and the rest of the game's data
12:29:27	<nighthnh099_>	for the rest of the data, maybe I should just upload the downloaded data since I'm having a hard time archiving the raw data from the server it downloads from
12:29:53	<nighthnh099_>	for the game apis, I have a pcap file with some responses
12:44:38	<pabs>	do the APIs require logins? if not, we could archive the URLs in that pcap
12:44:49	<pabs>	if they do, then upload the pcap I guess
13:01:46		Raya joins
13:03:48		threedeeitguy7 joins
13:04:51	<Raya>	Hello! Hope this is the right place for this. I'm having a hard time using the warrior today. Archiveteam's choice defaults o imgur, but their tasks all give "server returned bad response" and get stuck in loops. I can try install other projects, but no other tasks will start, and I can't shut the warrior down. I have to force-shutdown with "stop immediately" to get it to run any other tasks
13:05:08		kiwec joins
13:05:28		kiwec leaves
13:07:11	<nighthnh099_>	pabs: the urls don't need logins but they do need parameters; I already archived the ones I am aware of in that txt file earlier
13:07:51	<nighthnh099_>	my concern is the stuff that I don't have, since there doesn't seem to be an active effort to archive that game in it's own community
13:09:01	<masterx244\|m>	Raya: #imgone and the 403s are expected due to some imgur fuckery
13:12:25		icedice (icedice) joins
13:12:58	<Raya>	ty! Thought this was a more generic issue, will check in there. Also there's some projects that fail to load and the warrior doesn't do anything about them, should I report that in the specific channels or is it useless/redundant/they know?
13:17:10	<icedice>	Sanqui Sanqui\|m JAA Could you take a look at the ArchiveBot archivation job for Bulbagarden Forums when you have time? iirc it got some error and aborted itself: https://archive.fart.website/archivebot/viewer/job/ckr2m
13:17:54	<icedice>	It'd be nice to know if it has to be run again before Imgur has wiped everything
13:19:39	<masterx244\|m>	<Raya> "ty! Thought this was a more..." <- Some projects fail in the warrior due to outdated code and only work when running with their dedicated docker image. thats known
13:20:29		dumbgoy joins
13:34:37		Arcorann quits [Ping timeout: 265 seconds]
13:35:38		sonick (sonick) joins
13:39:24		todb joins
13:50:35		parfait_ joins
13:52:31		Pingerfowder quits [Quit: ZNC - https://znc.in]
13:52:39		Pingerfowder (Pingerfowder) joins
13:52:42		hyvac joins
13:53:46		parfait quits [Ping timeout: 252 seconds]
13:55:31	<todb>	Hello AT! In my flailing around looking for archive solutions to battle CVE linkrot, I came across your website. I could sure use some help, guidance, tips, and criticism on my quixotic effort to have useful archives of security vulnerability information that spans all sorts of websites, is fragile, and is rotting away from under us. Background and so far: https://github.com/todb/junkdrawer/blob/main/cve-kev-refs/README.md
13:56:45	<todb>	I figure if anyone has already solved most of the first-steps problems I'm running into, it'd be you. Thanks in advance.
14:04:09		hitgrr8 joins
14:05:19		threedeeitguy quits [Client Quit]
14:05:19		threedeeitguy7 is now known as threedeeitguy
14:15:58	<pabs>	todb: not sure but I think for "Automating archiving new CVE ID references", the URLs project (channel #//) is the best option for this. basically it regularly downloads things and passes the URLs found therein to volunteers who download those URLs. you would need a page that contains the the new URLs to be archived
14:16:29	<pabs>	usually this is used for news and link posting sites, by grabbing time-based index pages
14:16:55	<pabs>	https://wiki.archiveteam.org/index.php/URLs https://tracker.archiveteam.org/urls/
14:17:34	<pabs>	this is the github repo with all the URL sources https://github.com/ArchiveTeam/urls-sources
14:19:55	<todb>	pabs ah thanks for the pointer I'll check it out! Yeah I have two overall goals -- archive everything new, and also archive what's still around in the extant CVE refs.
14:21:37		threedeeitguy is now authenticated as threedeeitguy
14:23:35	<pabs>	how many links are we talking in the second category? presumably those are just the link itself and any page resources? ie no outlinks or subdirectories?
14:24:37	<pabs>	hopefully the second category is also not stuff loaded by JS, because ArchiveBot doesn't support that, and AB would be a reasonable way to do it if the list isn't too big
14:24:59	<pabs>	if it is big then it will have to go through the URLs project I am guessing
14:25:08		anarcat (anarcat) joins
14:26:10	<todb>	pabs: so there are about 215k CVE IDs in the world today. My wild guess is that there are maybe 3 references on average for each, but not all of them are unique. So.... 600k links? No subdirs or anything, they're all endpoints, and they're links to things like mailing list archives, specific advisories, blogs, tweets, etc
14:26:20		katocala quits [Remote host closed the connection]
14:26:47	<todb>	There is /loads/ of JS in there. Tweets, for example. Very heavyweight
14:27:49	<anarcat>	that's not so big for archivebot :)
14:27:53	<pabs>	that makes it slightly more complicated. there is snscrape for twitter stuff, but I'm not sure it does individual tweets, usually we do entire accounts (except there is a 3200-recent-tweets limit right now)
14:28:10	<anarcat>	tweets?
14:28:19	<anarcat>	i mean archivebot should be able to deal with one tweet, iirc
14:28:32	<pabs>	the only thing that can do JS properly these days is SPN2 I thought
14:28:37	<anarcat>	ah
14:28:41	<todb>	The most interesting CVE ID refs that are tweets also are whole twitter threads
14:29:35	<todb>	Twitter is the reason why I started worrying about all this -- a huge swath of infosec twitter left or got banned in October and thus, all their vuln intel disappeared.
14:29:50	<pabs>	ouch.
14:30:03	<anarcat>	ouch indeed
14:30:04	<pabs>	anarcat suggested elsewhere that starting a wiki page about this would be good. then we can map out all the potential issues
14:30:05	<todb>	https://github.com/todb/junkdrawer/tree/main/cve-twitter-refs
14:30:27	<anarcat>	todb: yeah pabs shared that link elsewhere already, but that's not writeable by us :)
14:30:41	<anarcat>	i mean if you want archiveteam people to jump in there, you need to jump in the tools too ;)
14:30:43	<todb>	pabs: i am 100% on board with getting help through the AT wiki :)
14:30:47	<anarcat>	although i think you first need to request an account
14:31:04	<anarcat>	i forgot how that works, i got the edit bit at some point and promptly forgot :p
14:31:52	<pabs>	ok, snscrape does support tweets+threads https://github.com/JustAnotherArchivist/snscrape
14:32:01	<pabs>	(not sure if that is working right now though)
14:32:09	<todb>	i shall poke around for howto get an AT wiki username tysm
14:32:39	<anarcat>	i can just make a page i guess
14:33:30	<anarcat>	actually, just go to https://wiki.archiveteam.org/index.php?title=Special:CreateAccount&returnto=Main+Page
14:35:11	<pabs>	todb: to start with, can you make a text file, one per line, of all the refs (minus twitter). then do this: curl --upload-file cve-refs https://transfer.archivete.am/cve-refs.txt
14:35:17	<todb>	anarcat: sweet thanks done. Also, that is a wild captcha :)
14:35:31	<pabs>	then we can run ArchiveBot over them: http://archiveteam.org/index.php?title=ArchiveBot
14:35:31	<anarcat>	isn't it :)
14:35:36	<anarcat>	yeah
14:35:55	<anarcat>	that's a great start
14:35:56	<pabs>	this AB will give us a baseline without the tweets and without JS stuff, but we can go from there
14:36:07	<todb>	roger that will do
14:36:07	<anarcat>	i think it will fetch some of the twitter stuff, personnally
14:36:10	<anarcat>	but maybe i got that wrong
14:36:23	<anarcat>	i think the thing with snscrape is that it recurses through tweets
14:36:26	<anarcat>	so it gets the threading right
14:36:33	<anarcat>	and replies and so on
14:36:41	<anarcat>	but AB should be able to get one tweet, no?
14:37:04	<pabs>	there is no content for me in the browser when JS is off
14:37:59	<pabs>	hmm, ISTR there being a u-a trick for changing that but can't remember the details :(
14:40:47		pabs peruses https://wiki.archiveteam.org/index.php?title=Twitter
14:40:53	<anarcat>	pabs: yeah, but if AB pulls all the bits and shoves them in the wayback machine, and then you have JS on when you browse wayback, it should work right?
14:41:11	<anarcat>	i mean in any case i think it's a good idea to start with a cve-refs.txt and shove that in archivebot
14:41:23	<anarcat>	then we can trim that down to social media and shove that in snscrape
14:41:44	<anarcat>	and of course at some point JAA will be awake and will correct all the bullshit i said and set us up straight again :)
14:42:12	<pabs>	but AB would need to run JS to pull the tweet content though
14:43:45	<pabs>	because it simply isn't in the HTML and isn't linked to by it
14:43:59	<anarcat>	ah right
14:44:00	<anarcat>	makes sense
14:44:03	<anarcat>	stupid web
14:44:39	<pabs>	indeed, gopher ftw :)
14:44:50	<anarcat>	ha
14:44:55	<anarcat>	i'm fine with plain HTML
14:45:07	<anarcat>	but this is getting -ot :p
15:01:25		Guest7273 joins
15:31:16	<@JAA>	pabs: Tweet scraping works fine.
15:31:37	<@JAA>	anarcat: AB hasn't been able to grab tweets properly for a good while now.
15:34:35		Island joins
15:35:20		LeGoupil quits [Client Quit]
15:39:21		hyvac quits [Remote host closed the connection]
15:39:42		hyvac joins
15:40:19		nostalgebraist joins
15:40:25		nostalgebraist quits [Client Quit]
15:53:49		hyvac quits [Ping timeout: 265 seconds]
15:54:49	<@JAA>	icedice: Just ran the log through wpull2-log-extract-errors, the only significant errors were the pagination of profile posts on one huge profile, which are too slow and exceed AB's time limit.
15:55:01	<@JAA>	Cc Sanqui ^
16:04:27		parfait_ quits [Ping timeout: 265 seconds]
16:09:16		test___ (decky_e) joins
16:13:50		Billy549 quits [Ping timeout: 252 seconds]
16:15:50		test___ quits [Ping timeout: 252 seconds]
16:24:40		cascode joins
16:25:11		lflare quits [Ping timeout: 252 seconds]
16:25:37	<todb>	pabs: thanks again for your help; https://transfer.archivete.am/P6uNh/cve-refs.txt is up now (all minus twitter links). I'll read up on https://wiki.archiveteam.org/index.php?title=ArchiveBot to learn how to track progress and run a node and all that. I'm super new to all this.
16:32:55	<todb>	(I also extracted all the Twitter links and threw them up at https://transfer.archivete.am/EZdNi/cve-twitter-refs.txt , there's only 405 of them but maybe half or so are already gone or are useless.)
16:37:15	<icedice>	<JAA> icedice: Just ran the log through wpull2-log-extract-errors, the only significant errors were the pagination of profile posts on one huge profile, which are too slow and exceed AB's time limit.
16:37:22	<icedice>	Ah, that not a big issue then
16:37:31	<icedice>	Pretty much nobody reads profile posts
16:37:43	<icedice>	Threads and images are what's important
16:37:56	<icedice>	I thought the archivation job aborted itself or something lol
16:40:15		icedice quits [Client Quit]
16:58:57		Guest7273 quits [Client Quit]
17:11:42	<h2ibot>	JustAnotherArchivist edited ArchiveBot (-175, /* Volunteer to run a Pipeline */ Make it…): https://wiki.archiveteam.org/?diff=49790&oldid=49131
17:13:42	<h2ibot>	Hans5958 edited URLTeam/Dead (+2125, Checking round on 2023-05-16): https://wiki.archiveteam.org/?diff=49791&oldid=49186
17:13:43	<h2ibot>	Hans5958 edited URLTeam (-2636, Checking round on 2023-05-16, put example on…): https://wiki.archiveteam.org/?diff=49792&oldid=49785
17:14:12		icedice (icedice) joins
17:14:22	<icedice>	JAA: Could you check that Serebii Forums finished successfully without any major errors? I missed the end of that archivation job: https://archive.fart.website/archivebot/viewer/job/2c1vq
17:14:42	<h2ibot>	Vukky edited Deathwatch (+26, link to PTCGO section of website instead of the…): https://wiki.archiveteam.org/?diff=49795&oldid=49779
17:14:47	<icedice>	That's the last big Pokémon forum I requested archivation for
17:16:39	<icedice>	I'll be back later
17:16:41	<todb>	Alright created https://wiki.archiveteam.org/index.php?title=ArchiveBot/CVE&modqueued=1 per advice here. Hope I'm doing it right.
17:16:43		icedice quits [Client Quit]
17:19:23	<@JAA>	I don't think this belongs on a subpage of ArchiveBot, but it can be moved later.
17:19:43	<h2ibot>	Todb created ArchiveBot/CVE (+2765, Kick off a CVE reference project.): https://wiki.archiveteam.org/?title=ArchiveBot/CVE
17:20:13	<todb>	JAA: yeah I'm super noob and not sure how organization on the wiki works yet
17:37:00	<nicolas17>	there's some random video uploaded to the dynabook / tb2b ftp lol
17:38:08	<nicolas17>	(fullpwnmedia said the ftp server has write access)
17:38:34	<nicolas17>	er fullpwndotnet
17:49:32		icedice (icedice) joins
17:56:05		nighthnh099_ quits [Client Quit]
18:00:51	<h2ibot>	JAABot edited URLTeam/Dead (+0): https://wiki.archiveteam.org/?diff=49797&oldid=49791
18:01:33	<@JAA>	TIL my bot does that.
18:12:44		rr9 quits [Client Quit]
18:13:13		rr (rr) joins
18:15:59		rr quits [Client Quit]
18:16:45		rr (rr) joins
18:19:40		HP_Archivist (HP_Archivist) joins
18:40:39		hyvac joins
18:43:56		cascode quits [Read error: Connection reset by peer]
18:44:03		cascode joins
18:44:25		cascode quits [Read error: Connection reset by peer]
18:44:42		cascode joins
18:55:39		katocala joins
18:56:02		katocala is now authenticated as katocala
19:09:50		cascode quits [Ping timeout: 252 seconds]
19:10:01		hyvac quits [Remote host closed the connection]
19:10:09		hyvac joins
19:10:50		cascode joins
19:32:52	<Raya>	Hey - I've been working on reddit all day. Then tried switching to imgur, got a "project did not install correctly" message. Tried rebooting a couple times. Now, most projects give me the same message. For example, reddit returns this:
19:32:54	<Raya>	2023-05-17 19:32:28,257 - seesaw.warrior - ERROR - Project failed to install: Cloning into '/home/warrior/projects/reddit'...
19:32:54	<Raya>	fatal: unable to access 'https://github.com/ArchiveTeam/reddit-grab/': gnutls_handshake() failed: The TLS connection was non-properly terminated.
19:32:54	<Raya>	git returned 128
19:32:56	<Raya>	2023-05-17 19:32:28,261 - seesaw.warrior - DEBUG - Result of the install process: False
19:32:58	<Raya>	2023-05-17 19:32:28,262 - seesaw.warrior - WARNING - Project reddit did not install correctly and we're ignoring this problem.
19:33:51	<Raya>	Do you have a clue what this could be? Few projects seem to be working, and the ones I worked until 20 mins ago suddenly don't anymore
19:33:54	<nicolas17>	I heard github is having problems
19:34:33	<Raya>	Oooh that'd explain it
19:35:59		cascode quits [Read error: Connection reset by peer]
19:36:16		cascode joins
19:43:25	<fireonlive>	current advice is to just retry until it grabs
19:54:59	<threedeeitguy>	My warriors broke so I just switched to docker, no more issues.
20:00:22		parfait (kdqep) joins
20:07:25	<@JAA>	The container images don't require contacting GitHub, so that makes sense.
20:07:39	<@JAA>	Well, the project images don't, the warrior image does I think.
20:08:31	<@JAA>	icedice: Seven random thread pages failed on forums.serebii.net, i.e. it can be considered complete as well.
20:09:33	<@JAA>	And all but one of those are indeed broken on the server side, returning 500.
20:13:52		nimaje1 is now known as nimaje
20:22:02		parfait quits [Read error: Connection reset by peer]
20:45:46		umgr036 joins
20:46:36		umgr036 quits [Remote host closed the connection]
20:46:49		umgr036 joins
20:46:57		umgr036 quits [Client Quit]
20:54:56		cascode quits [Ping timeout: 265 seconds]
20:54:59		cascode joins
21:01:59		lexikiq joins
21:15:52		HP_Archivist quits [Client Quit]
21:34:18		hitgrr8 quits [Client Quit]
21:36:53		test___ (decky_e) joins
21:36:53		test___ is now known as decky_e
21:37:57		cascode quits [Read error: Connection reset by peer]
21:38:10		cascode joins
21:42:17	<icedice>	<JAA> And all but one of those are indeed broken on the server side, returning 500.
21:42:22	<icedice>	Nothing to do about that then
21:43:01	<icedice>	Good that it got archived and Imgur didn't 429 on Bulbagarden Forums or Serebii Forums
21:43:22		decky_e leaves
21:47:04	<masterX244>	i ignore imgur straight away on my scrapes due to that
21:47:27		decky_e (decky_e) joins
21:48:04	<@JAA>	icedice: I only checked onsite stuff for errors.
21:48:41	<icedice>	Ah
21:49:15	<icedice>	Could you check a few of the Imgur links from both sites when you have time?
21:51:20		cascode quits [Ping timeout: 252 seconds]
21:51:43		cascode joins
21:51:46	<@JAA>	Not sure when I have time for that, maybe Friday.
21:51:54	<icedice>	Ok
21:51:56	<icedice>	Thanks
21:52:18	<icedice>	Could you check what hosting provider they were archived via?
21:52:32	<icedice>	If it's OVH or Hetzner we pretty much already know the answer
21:52:59	<icedice>	Though the sites did get archived pretty early on
21:53:02	<icedice>	So who knows
21:53:19		cascode quits [Read error: Connection reset by peer]
21:53:35		cascode joins
21:53:59	<@JAA>	Most likely one of those, yes.
21:55:24	<@JAA>	I think they were with --no-offsite-links anyway, so any links to images/albums/whatever would be missing anyway.
22:03:32	<andrew>	is there any web crawler (like wget -m or httrack) that supports concurrency and WARC writing?
22:03:42	<andrew>	or is it better to just run httrack behind a WARC-writing proxy or something
22:03:59		decky_e quits [Ping timeout: 252 seconds]
22:04:22		decky_e (decky_e) joins
22:06:48	<andrew>	oh maybe I can use grab-site :D
22:10:01	<masterX244>	yeah, grab-site is the way to go for warcs as a normal user
22:10:27	<andrew>	ideally I'd like to also have convenient access through the filesystem with wget's link rewriting
22:10:40	<andrew>	but I suppose I can just replay the WARC and point wget at it
22:12:10		lflare (lflare) joins
22:19:57	<andrew>	and 5 yaks later I'm rebuilding ffmpeg from source
22:21:34		hyvac quits [Client Quit]
22:27:40		MrTumnus joins
22:38:22	<icedice>	<JAA> I think they were with --no-offsite-links anyway, so any links to images/albums/whatever would be missing anyway.
22:38:26	<icedice>	What the shit
22:38:50	<icedice>	Getting Imgur links archived at the same time was the reason that they were being archived now
22:39:14	<pokechu22>	The plan is to extract them from the WARC, not the log, to my understanding
22:39:20	<icedice>	I guess the links could be extracted from the WARC and archived separately
22:39:26	<icedice>	Yeah, true
22:39:28	<pokechu22>	and then throw them into #imgone, yeah
22:39:37	<pokechu22>	rather than trying to make archivebot run slowly to not get banned by imgur
22:39:57	<icedice>	Assuming Imgur doesn't wipe everything by then
22:40:19	<icedice>	Any idea if they've started deleting stuff yet?
22:42:04	<fireonlive>	some signs of deletion yes but very slowly (as the catalogue is massive)
22:42:42	<icedice>	Yeah, I thought so
22:43:16	<icedice>	Noticed some three day old upload had gotten deleted, so I figured that was probably by Imgur
22:44:31	<icedice>	Them not locking down uploads to registered accounts when announcing the purge is just another level of assholeishness
22:44:32	<fireonlive>	none of my self-uploaded canaries have died yet, and a chunk of a list of 'worked recently' only a few are gone (but can be hard to tell why)
22:45:03	<fireonlive>	yeah; there's a lot of newly uploaded stuff by people who have no idea that's just oging to go away it seems
22:45:19	<icedice>	Like they're actively encouraging people to get their shit deleted at this point
22:45:56	<icedice>	True, it's impossible to know for sure if it's Imgur or the uploader that deleted it
22:46:38	<icedice>	Has any NSFW subreddits banned Imgur links yet?
22:46:44	<icedice>	They really should tbh
22:47:14	<icedice>	Future links, not the ones already there
22:48:56	<fireonlive>	not sure. some of the ones I monitor still have imgur links trickling in but I havne't checked for rule updates
22:58:27		nicolas17 quits [Client Quit]
23:00:37		Raya quits [Client Quit]
23:14:26		eroc1990 quits [Client Quit]
23:20:12		BlueMaxima joins
23:31:59		wyatt8740 quits [Ping timeout: 252 seconds]
23:32:29		wyatt8740 joins
23:58:36		wyatt8740 quits [Ping timeout: 265 seconds]
23:58:51		wyatt8740 joins

Home Search Previous day Next day