#archiveteam-bs log for 2025-11-24

Home Search Previous day Next day

00:00:47		DopefishJustin quits [Remote host closed the connection]
00:08:16		DopefishJustin joins
00:08:16		DopefishJustin is now authenticated as DopefishJustin
00:11:43		Wohlstand quits [Client Quit]
00:12:48		etnguyen03 (etnguyen03) joins
00:37:20		etnguyen03 quits [Client Quit]
00:59:17		beastbg8__ quits [Read error: Connection reset by peer]
01:06:39	<klea>	https://wiki.archiveteam.org/index.php/Hacker_News links to https://github.com/HackerNews/API, maybe implement this api?
01:06:54	<klea>	i don't how the limits to query the api are however
01:07:06		astrinaut leaves [][]
01:09:13		etnguyen03 (etnguyen03) joins
01:09:37	<klea>	the repo says: "There is currently no rate limit."
01:10:07	<klea>	but given maxitem returns 46029159 i'm not sure about that
01:10:47		DogsRNice_ joins
01:14:21		DogsRNice quits [Ping timeout: 272 seconds]
01:17:19	<that_lurker>	The urls project is currently fetching https://news.ycombinator.com/newest and https://news.ycombinator.com/newcomments on random intervals
01:22:36		sg72 quits [Remote host closed the connection]
01:23:45		sg72 joins
01:27:09		etnguyen03 quits [Client Quit]
01:30:45		etnguyen03 (etnguyen03) joins
01:38:54		BennyOtt quits [Ping timeout: 256 seconds]
01:45:46	<klea>	that_lurker: afaik fetching the direct api data from https://hacker-news.firebaseio.com/v0/item/$ID.json seems like a better idea?
01:45:52	<klea>	but yeah that's nice :3
01:46:37	<klea>	s/:3// # shouldn't add on public discussion, not profesional enough
01:57:18		Webuser854399 joins
01:58:45		Webuser854399 quits [Client Quit]
02:03:19	<klea>	--
02:03:50	<klea>	https://wiki.archiveteam.org/index.php/Dev/Tracker <- i noticed just like the official Archive Team tracker is crossed off from that page, i wonder why the official AT tracker can't be published fully
02:05:03	<klea>	"The first line allows spawning maximum of 2 processes. The second line restarts Passenger after 10,000 requests to free memory caused by memory leaks. "
02:06:39	<BlankEclair>	https://wiki.archiveteam.org/index.php/Tracker#History: "Sometime in the late 2010s the open-source tracker was gradually replaced with the proprietary one"
02:08:07	<BlankEclair>	oh nvm, disregard me, i thought you were asking why it's not the official AT tracker ^^;
02:12:12	<nicolas17>	BlankEclair: maybe you can comment on the professionalism of using :3 in this channel (see above)
02:12:28	<BlankEclair>	i was tempted to interject, but i opted not to
02:12:33	<BlankEclair>	but since you prompted...
02:12:37	<nicolas17>	I think we need more :3 here
02:12:39	<nicolas17>	not less
02:12:39	<BlankEclair>	why nyot use :3?
02:12:52	<BlankEclair>	i once reported a security vulnyability entirely in UwUspeak
02:15:34		jinn6 quits [Quit: WeeChat 4.7.1]
02:15:50		jinn6 joins
02:54:30	<nulldata>	BlankEclair - https://www.youtube.com/watch?v=QXUSvSUsx80
03:09:09	<NatTheCat>	lol nicolas17 checks out.... how scummy
03:09:36	<NatTheCat>	and yes, very true. more :3 is never a bad thing
03:39:29		etnguyen03 quits [Client Quit]
03:44:34		etnguyen03 (etnguyen03) joins
04:05:44		etnguyen03 quits [Remote host closed the connection]
04:13:08		lennier2_ joins
04:16:07		lennier2 quits [Ping timeout: 272 seconds]
04:45:19		gosc joins
04:45:51	<gosc>	I wonder if there's a quicker way to get a large amount of webpages saved at the same time without having to ask here?
04:46:40	<gosc>	there used to be google sheets for wayback machine but they've since made it so that it would only run after like 2 days or something
04:47:54		Island quits [Read error: Connection reset by peer]
05:01:13		beastbg8 (beastbg8) joins
05:02:30		sec^nd quits [Remote host closed the connection]
05:02:59		sec^nd (second) joins
05:14:14		arch quits [Ping timeout: 256 seconds]
05:17:10		arch (arch) joins
05:59:03	<pabs>	gosc: the SPN email API still works
05:59:25	<pabs>	also asking for AB !ao < here works
06:19:22		DogsRNice_ quits [Read error: Connection reset by peer]
06:20:14		driib97 quits [Quit: Ping timeout (120 seconds)]
06:38:37		unknownsrc quits [Ping timeout: 272 seconds]
07:00:53		unknownsrc (unknownsrc) joins
07:24:39		Webuser760579 joins
07:25:42		Webuser760579 quits [Client Quit]
07:26:45		mcint quits [Ping timeout: 272 seconds]
07:27:03		mcint (mcint) joins
07:50:26		BennyOtt (BennyOtt) joins
09:04:17		valdikss quits [Ping timeout: 272 seconds]
09:05:32		valdikss joins
09:09:13		valdikss quits [Client Quit]
09:10:04		valdikss joins
09:32:44		Wohlstand (Wohlstand) joins
09:36:15		choochaa quits [Remote host closed the connection]
09:36:37		choochaa (choochaa) joins
09:38:36		HackMii quits [Remote host closed the connection]
09:38:54		HackMii (hacktheplanet) joins
10:02:40		skyrocket quits [Ping timeout: 256 seconds]
10:03:24		skyrocket joins
10:06:52		Afanasiy joins
10:07:12		nathang2184 quits [Ping timeout: 256 seconds]
10:08:20		Afanasiy quits [Client Quit]
10:08:40		Webuser327504 joins
10:08:59		Webuser327504 quits [Client Quit]
10:23:17		nathang2184 joins
10:34:46		cyanbox quits [Read error: Connection reset by peer]
11:13:07	<cruller>	arkiver: I asked KCN Kyoto about kinet-tv.ne.jp. They said the sites will be deleted.
11:13:18	<cruller>	Fortunately, the Google search results for site:http://www.kinet-tv.ne.jp return 1,320 results, indicating there are very few pages. Therefore, I'll create a page list (referencing https://wiki.archiveteam.org/index.php/Site_exploration).
11:16:05		evergreen5 quits [Quit: Bye]
11:16:41		evergreen5 joins
11:24:33		justaguy is now known as mystique_altrosky
11:47:50		Commander001 joins
12:00:03		Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
12:02:45		Bleo182600722719623455222 joins
12:21:48		Wohlstand quits [Client Quit]
12:22:04		Wohlstand (Wohlstand) joins
12:39:09		ymgve_ joins
12:43:25		ymgve quits [Ping timeout: 272 seconds]
13:04:04		colla is now authenticated as colla
13:14:59		Commander001 quits [Remote host closed the connection]
13:19:16		Commander001 joins
13:22:54		mystique_altrosky is now authenticated as mystique_altrosky
14:54:31		ThreeHM quits [Ping timeout: 272 seconds]
14:55:58		ThreeHM (ThreeHeadedMonkey) joins
15:02:19		gosc_1 joins
15:05:55		gosc quits [Ping timeout: 272 seconds]
16:13:41		aninternettroll quits [Ping timeout: 272 seconds]
16:17:05		aninternettroll (aninternettroll) joins
16:17:38		sg72 quits [Remote host closed the connection]
16:18:46		sg72 joins
17:00:00	<klea>	--
17:00:02	<klea>	I didn't know how big #archivebot's request count was but i made this and it helped me see how many reqs AB makes: websocat ws://archivebot.com:4568/ \| jq -r '.job_data = {u: (.started_by//null), c: (.started_in//null), n: (.note//null), url: (.url//null), id: .ident} \| "Queried \(.url) for \(.job_data.id) req by \(.job_data.u) in \(.job_data.c) for url(s) \(.job_url) with
17:00:04	<klea>	note: \(.job_data.note)"'
17:13:51		ThetaDev quits [Ping timeout: 272 seconds]
17:14:02		ThetaDev joins
17:20:56		Webuser852275 joins
17:22:42		Webuser852275 quits [Client Quit]
18:26:17		Cornelius quits [Quit: Cornelius]
18:27:08		Cornelius (Cornelius) joins
18:52:11	<Thibaultmol>	Q: are there backups of 3D print models from websites like printables? (Besides the thingiverse collection on archive.org itself, not sure how complete that even if)
18:52:16	<Thibaultmol>	is*
18:55:11	<justauser\|m>	archiveteam_thingiverse should be complete as of 2015.
18:56:46	<justauser\|m>	archiveteam_googlepoly, remix3d.com_20191220000000, some WARCs in archiveteam_chromebot,
18:57:20	<justauser\|m>	archiveteam_claraio, archiveteam_tinkercad_*...
18:59:14	<pokechu22>	I believe katia has been looking into that. I tried to do an archivebot job on their behalf but it ended up not working well because of rate-limits on the main site leading to fake 404s on valid URLs (even at a 4 second delay), but that was as a normal (mostly) recursive job of the frontend pages as opposed to just the models themselves
19:00:18	<katia>	Printables requires some hundreds of thousands of API requests for getting direct links
19:02:59		Cornelius quits [Client Quit]
19:03:54		Cornelius (Cornelius) joins
19:05:55	<katia>	But yes I’ve done printables in the past
19:06:12	<katia>	Via archivebot
19:06:33	<katia>	I got all models and PDFs for everything at the time
19:06:58	<katia>	Should do another run at some point
19:14:41		jspiros quits []
19:28:20		andrewnyr quits [Quit: Ping timeout (120 seconds)]
19:28:46		andrewnyr joins
19:38:08		gosc_1 quits [Quit: Leaving]
19:40:39		Cuphead2527480 (Cuphead2527480) joins
19:45:33		SootBector quits [Remote host closed the connection]
19:46:40		SootBector (SootBector) joins
20:00:21		cyanbox joins
20:22:19		kdy quits [Remote host closed the connection]
20:30:47		kdy (kdy) joins
20:32:45		DogsRNice joins
20:38:08		that_lurker quits [Remote host closed the connection]
20:40:03		jspiros (jspiros) joins
20:43:35		that_lurker (that_lurker) joins
21:08:45		MrMcNuggets quits [Quit: WeeChat 4.3.2]
21:11:36		MrMcNuggets (MrMcNuggets) joins
21:25:26		HP_Archivist quits [Quit: Leaving]
21:29:38		cmlow joins
21:30:35		TastyWiener95 quits [Quit: So long, farewell, auf wiedersehen, good night]
22:00:24		Cuphead2527480 quits [Client Quit]
22:04:20		HP_Archivist (HP_Archivist) joins
22:10:27	<Guest>	klea: the api doesnt have a ratelimit. if you have enough concurrent downloads you can download the entire thing in a few hours (i believe between 30-50GB uncompressed).
22:18:03	<Guest>	was there anything happening to HN?
22:35:04		etnguyen03 (etnguyen03) joins
22:37:44		Island joins
23:04:13	<hexagonwin>	ah crap, even my browsertrix crashed. if nobody's interested guess i should try developing something..
23:06:18	<@JAA>	Browsertrix writes bad WARCs anyway doesn't it?
23:08:02	<hexagonwin>	idk, but it's still much better than doing nothing
23:09:10	<hexagonwin>	my prev message here 23h ago jic you missed it https://termbin.com/o7mq
23:09:37	<@JAA>	Yeah, I saw. Haven't had time to look into it myself.
23:12:20	<hexagonwin>	archivebot still at 327GB sadly (vs my now dead crawler 424GB)
23:20:05		Guest58 quits [Quit: My Mac has gone to sleep. ZZZzzz…]
23:22:04		nomadgeek (nomadgeek) joins
23:28:49		nine quits [Quit: See ya!]
23:29:02		nine joins
23:29:02		nine is now authenticated as nine
23:29:02		nine quits [Changing host]
23:29:02		nine (nine) joins
23:38:48		superkuh_ joins
23:39:40	<hexagonwin>	not a good script, but this seems to work well https://termbin.com/gu15
23:40:24	<hexagonwin>	is there any way to have wget get multiple URLs in one run with different headers?
23:42:05		superkuh quits [Ping timeout: 272 seconds]
23:51:36		Guest58 joins
23:52:16		HugsNotDrugs` quits [Ping timeout: 256 seconds]
23:52:39		HugsNotDrugs joins

Home Search Previous day Next day