#archiveteam-bs log for 2021-07-29

Home Search Previous day Next day

00:00:48		dm4v quits [Read error: Connection reset by peer]
00:03:09		dm4v joins
00:03:11		dm4v is now authenticated as dm4v
00:03:12		dm4v quits [Changing host]
00:03:12		dm4v (dm4v) joins
00:15:28		leo60228 (leo60228) joins
00:19:56		leo60228 quits [Ping timeout: 258 seconds]
01:01:15		dm4v quits [Read error: Connection reset by peer]
01:02:31		dm4v joins
01:02:33		dm4v is now authenticated as dm4v
01:02:33		dm4v quits [Changing host]
01:02:33		dm4v (dm4v) joins
01:39:56		Wayward (wayward) joins
02:17:13		Krownest (Krownest) joins
02:20:18		Krownest1 quits [Ping timeout: 258 seconds]
02:20:58		Jake0 (Jake) joins
02:21:27		britmob quits [Ping timeout: 258 seconds]
02:21:28		Jake quits [Ping timeout: 250 seconds]
02:21:28		Jake0 is now known as Jake
02:53:41		HP_Archivist (HP_Archivist) joins
03:12:52		AlsoHP_Archivist joins
03:16:16		HP_Archivist quits [Ping timeout: 258 seconds]
03:31:56		superkuh_ quits [Remote host closed the connection]
03:32:11		superkuh_ joins
03:33:57		wizards quits [Remote host closed the connection]
03:44:42	<@JAA>	I looked a bit into Windows Community. Ugly. Comments are loaded via GET XHR, pagination works via a custom 'token' header. Obviously definitely won't work in the WBM.
03:45:47		OrIdow6 (OrIdow6) joins
03:45:47		@ChanServ sets mode: +o OrIdow6
03:46:08	<@JAA>	But I'll try to get a copy of all conversations later.
03:46:52	<@JAA>	(Assuming it doesn't vanish in the meantime. They haven't given a concrete deadline.)
03:49:26		OrIdow6^2 (OrIdow6) joins
03:49:26		@ChanServ sets mode: +o OrIdow6^2
03:50:21		qw3rty_ joins
03:52:02		@OrIdow6 quits [Ping timeout: 250 seconds]
03:54:13		qw3rty__ quits [Ping timeout: 258 seconds]
03:59:36		@OrIdow6^2 is now known as @OrIdow6
04:17:51		nertzy quits [Read error: Connection reset by peer]
04:18:53		nertzy joins
04:18:54		nertzy is now authenticated as nertzy
04:52:50		AlsoHP_Archivist quits [Client Quit]
04:53:07		HP_Archivist (HP_Archivist) joins
05:11:51		Video joins
05:14:44		Megame (Megame) joins
05:46:32		HP_Archivist quits [Ping timeout: 258 seconds]
06:38:52		HP_Archivist (HP_Archivist) joins
06:39:30	<@OrIdow6>	Video: You can obviously share it here or on the wiki
06:40:30	<@OrIdow6>	Keep in mind that, unless it's not possible, ArchiveTeam likes to do full capture to warcs instead of just scraping - so if you know of an instance of this forum software that is going down likely a different script would be made for ArchiveTeam use
06:44:06	<@OrIdow6>	Especially if it's an API as opposed to the regular interface
06:44:43	<@OrIdow6>	But knowledge of structure, and components that discover content as opposed to saving it, are important
06:54:40	<Video>	OrIdow6: does the fact that flarum forums tend to have that "endless scrolling" for threads/posts change anything
07:00:49		fuzzy8021 quits [Read error: Connection reset by peer]
07:01:15		fuzzy8021 (fuzzy8021) joins
07:07:24	<@OrIdow6>	Video: Depends how it's implemented
07:07:50	<@OrIdow6>	E.g. GET with offset is fine; POST with JS-generated session key thing is not
07:09:58	<@OrIdow6>	"Fine" in the sense that WARCS will both capture and play back well
07:11:17	<Video>	for the API: when querying a thread, the api stores a list of ids for each post in the JSON response, which you have to throw into a separate request (i believe it was like /api/posts) and specify each post id in one continuous string
07:20:12	<@OrIdow6>	Could work, sounds like it depends on the /api/posts requests - if it's POST it won't play back (rather, it will, but just in a broken way); or if it's GET and the order is randomized or it has a timestamp; but basically if it's JUST a transformed version of another response it would play back
07:20:20	<@OrIdow6>	Deterministically
07:21:22	<@OrIdow6>	I won't discourage you from writing your own script - even on "no chance" things ArchiveTeam (and especially me, due to what I've ended up sort of specializing in) likes to use WARC because there's already a bunch of infrastructure in place for handling it
07:22:00	<@OrIdow6>	That's to say, don't think the ArchiveTeam way is necessarily the only adequate way
07:22:36	<@OrIdow6>	And obviously even when you're working with and API there's a benefit to capturing the response headers etc
07:26:02		FalconK quits [Quit: WeeChat 3.2]
07:26:44		FalconK (FalconK) joins
07:38:28		Krownest quits [Ping timeout: 258 seconds]
07:57:18		nertzy quits [Ping timeout: 250 seconds]
08:00:18	<Video>	OrIdow6: apologies for the delay - flarum's api does do things in GET requests. you can see an example of an API response at https://discuss.flarum.org/api/discussions/27852
08:07:51		qwertyasdfuiopghjkl joins
08:25:02		HP_Archivist quits [Ping timeout: 250 seconds]
08:37:48	<@OrIdow6>	Video: Trying it in a normal page (not API), seems fine
08:38:07	<@OrIdow6>	But anyhow, is there a specific site running this software going down? Or this is just general?
08:38:20	<Video>	this is just general afaik
08:38:50	<Video>	in a community i'm a part of, the previous owner completely nuked a forum with this software and there was no archive of it
08:38:56	<Video>	nuked its forum**
08:43:09		jamesatjaminit (jamesatjaminit) joins
08:45:25	<@OrIdow6>	Oh
08:50:15	<Video>	i wrote the script to ensure this could never happen again
08:51:40	<@OrIdow6>	Nothing you to prevent it from running it yourself, or getting other people here to help run it, but as an overall project that makes it low-priority
08:51:59	<@OrIdow6>	Someday hopefully we will have a project to scrape all forums on the web at #msgbored
08:52:47	<@OrIdow6>	Also, looks like the site works without Javascript, so if you want to get an instance into the Wayback Machine, albeit in an older-looking form (and perhaps with reduced functionality), #archivebot should work
08:54:12	<@OrIdow6>	Anyway always possible someone (including me, who knows) might want to work on it anyway
08:54:25	<@OrIdow6>	But overall low-prioeirt
08:54:31	<@OrIdow6>	*priority
09:00:16		jamesatjaminit quits [Client Quit]
09:41:18		Krownest (Krownest) joins
09:47:18		mutantm0nkey (mutantmonkey) joins
09:49:57		mutantmnky quits [Ping timeout: 258 seconds]
09:55:54		qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds]
10:49:46		Stilett0 quits [Ping timeout: 250 seconds]
11:16:30		Dj-Wawa quits [Quit: Dj-Wawa]
11:16:55		Dj-Wawa joins
11:16:55		Dj-Wawa is now authenticated as Dj-Wawa
11:25:31		Dj-Wawa quits [Client Quit]
11:26:12		Dj-Wawa joins
11:26:12		Dj-Wawa is now authenticated as Dj-Wawa
11:54:40	<luckcolors>	hey people. I'm trying to archive a login-walled google site i have access to
11:54:49	<luckcolors>	i'm using grab-site
11:55:27	<luckcolors>	I've made a netscape cookies jar with my session cookies and i've feed it to wpul via the flag
11:55:42	<luckcolors>	and it is using the cookies i can see it in the warc recording
11:56:21	<luckcolors>	but google still thinks i'm not logged in as on the first request it immediately redirects to the login page
11:56:58	<luckcolors>	any gotchas?
11:57:10	<luckcolors>	i've already tried changing the User-agent
12:09:27	<ArchivalEfforts>	luckcolors How did you create the cookie file? I recently had issues when the extension I used prefixed some cookies with "#HttpOnly_".
12:09:30	<ArchivalEfforts>	After I removed that it worked.
12:13:13	<luckcolors>	I actually switched 3 different exstensions
12:13:55	<luckcolors>	since this one is broken with the "containers" feture https://addons.mozilla.org/it/firefox/addon/cookies-txt/
12:14:46	<luckcolors>	this one creates a.txt file wich for some reason wpul can't parse properly
12:14:47	<luckcolors>	https://addons.mozilla.org/it/firefox/addon/cookie-quick-manager/
12:16:15	<luckcolors>	Then i've used this one wich seems to work: https://addons.mozilla.org/it/firefox/addon/export-cookies-txt/
12:16:38	<luckcolors>	But yeah now that i look at it it does add the "#HttpOnly_"
12:16:48	<luckcolors>	I'll try removing them and see if it works
12:27:06	<luckcolors>	ArchivalEfforts: thanks you man. you saved me from going insane
12:27:07	<luckcolors>	XD
12:27:16	<luckcolors>	it's actually working
12:27:43	<luckcolors>	not to limit it so it doesn't actually try to change account
12:28:29	<luckcolors>	how can i regex /?authuser=1
12:28:44	<ArchivalEfforts>	Glad I could help, took me a while to figure that out when I ran into it
12:29:27	<ArchivalEfforts>	Sorry, can't help with regex
12:29:39	<luckcolors>	no worries
12:33:53		BlueMaxima joins
12:43:11		wizards joins
12:45:08	<@OrIdow6>	I think the only char you need to escape there is the ?
12:48:13	<luckcolors>	i'm just using this as regex
12:48:14	<luckcolors>	authuser=\d
12:48:18	<luckcolors>	should suffice
12:48:44	<luckcolors>	i kinda forgot to only ignore that particular page EG.
12:48:46	<luckcolors>	^https?://sites\.google\.com/site
12:48:57	<luckcolors>	i don't want to append the trailing / right?
12:49:12	<luckcolors>	else it will not fetch any path longer than that
12:49:56	<luckcolors>	like this will ignore anything under /site ^https?://sites\.google\.com/site/
12:52:31	<luckcolors>	ok no this is not how exact matches are made
12:52:51		luckcolors opens dusty regex book
12:59:03	<luckcolors>	ended up using ^https?://sites\.google\.com/site$ wich i think is not ideal since it won't ignore potential url params
12:59:08	<luckcolors>	but works so
13:21:07		rewby quits [Remote host closed the connection]
13:21:22		rewby (rewby) joins
13:34:12		Video quits [Ping timeout: 258 seconds]
14:15:59		sdomi quits [Ping timeout: 258 seconds]
14:17:23		britmob256 joins
14:20:00		britmob joins
16:05:37		Arcorann quits [Ping timeout: 258 seconds]
16:09:26		CookMePlox joins
16:12:28	<CookMePlox>	Hey folks! Does anyone have suggestions for getting Internet Archive to respond to an email request? I've convinced the owner of runescape.com to ask for their site to get un-blacklisted from the wayback machine, but they sent something to info@archive.org and haven't heard back in 6 months
16:13:46		bradp is now authenticated as bradp
16:15:01	<h2ibot>	Anput uploaded File:My folder.png: https://wiki.archiveteam.org/?title=File%3AMy%20folder.png
16:34:49		lennier1 quits [Client Quit]
16:45:06		Krownest quits [Ping timeout: 258 seconds]
16:51:52	<thuban>	http://www.rrrrthats5rs.com/ :<
17:00:24		Krownest (Krownest) joins
17:10:03		Matthww8 quits [Quit: Ping timeout (120 seconds)]
17:10:26		Matthww8 joins
17:13:28		CookMePlox quits [Remote host closed the connection]
17:18:06		BlueMaxima quits [Read error: Connection reset by peer]
17:20:25	<@HCross>	arkiver: - see above from CookMePlox who isn't here any more
17:51:36		HP_Archivist (HP_Archivist) joins
17:54:17		lennier1 (lennier1) joins
17:58:26		Stiletto joins
18:02:40		grafck quits [Ping timeout: 250 seconds]
18:13:37		Daloader joins
18:44:38		Daloader quits [Client Quit]
18:58:08	<@arkiver>	HCross: checking
18:58:17	<@arkiver>	ouch
18:58:35	<@arkiver>	HCross: thanks, will ping internally
19:19:12		driib quits [Client Quit]
19:19:27		driib (driib) joins
19:22:08		Minkafighter2 quits [Quit: The Lounge - https://thelounge.chat]
19:22:39		Minkafighter2 joins
19:48:56		JackeithWelley quits [Remote host closed the connection]
20:09:45		jonboy3452 joins
20:09:45		Jonboy3451 quits [Read error: Connection reset by peer]
20:35:38		gazorpazorp quits [Ping timeout: 250 seconds]
20:42:00		HP_Archivist quits [Client Quit]
20:56:08		average_student joins
21:01:48		HP_Archivist (HP_Archivist) joins
21:22:53		Megame quits [Client Quit]
21:30:46		gazorpazorp joins
21:36:28		Matthww88 joins
21:37:12		Matthww8 quits [Ping timeout: 258 seconds]
21:37:12		Matthww88 is now known as Matthww8
21:46:12		Video joins
22:02:29		leo60228 (leo60228) joins
22:12:28		leo60228 quits [Ping timeout: 258 seconds]
22:19:50		qwertyasdfuiopghjkl joins
22:23:47		leo60228 (leo60228) joins
22:27:53	<Stiletto>	<@JAA> Per the notice on https://3dwarehouse.sketchup.com/: 'Heads up! After July 27, SketchUp 2017 models will no longer be available for download on 3D Warehouse unless it was originally uploaded in that format.'
22:28:19	<Stiletto>	not sure if anyone was actually looking more into this, but it seems they have extended the deadline to August 11.
22:53:53		qwertyasdfuiopghjkl quits [Remote host closed the connection]
22:57:32		qwertyasdfuiopghjkl joins
23:23:51	<Jake>	(I was a little bit, but the API was VERY slow, like 3 minutes per request slow...)
23:23:59	<Jake>	Glad to see they added some more time, I'll take a look again tonight.
23:42:27		lennier1 quits [Client Quit]
23:42:51		lennier1 (lennier1) joins
23:43:36		qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds]

Home Search Previous day Next day