#archiveteam-bs log for 2023-12-04

Home Search Previous day Next day

00:08:51		mr_sarge (sarge) joins
00:10:50		sarge quits [Ping timeout: 240 seconds]
00:20:10		Doran quits [Remote host closed the connection]
00:20:39		BlueMaxima quits [Read error: Connection reset by peer]
00:20:47		Doranwen (Doranwen) joins
00:20:48		BlueMaxima joins
00:22:01		Doranwen quits [Remote host closed the connection]
00:22:31		Doranwen (Doranwen) joins
00:23:42		Doranwen quits [Remote host closed the connection]
00:25:13		Doranwen (Doranwen) joins
00:44:13		eroc19907 (eroc1990) joins
00:44:50		eroc1990 quits [Ping timeout: 240 seconds]
00:53:50		sdomi quits [Ping timeout: 240 seconds]
00:56:21		sdomi (sdomi) joins
01:02:57		thuban quits [Ping timeout: 272 seconds]
01:33:46		Inti83 joins
01:33:47	<eggdrop>	[tell] Inti83: [2023-12-03T19:51:39Z] <JAA> https://hackint.logs.kiska.pw/archiveteam-bs/20231203#c393229
01:34:07	<Inti83>	Thanks XD was just reading logs
01:36:24	<Inti83>	thanks - we did see that cont.ar and cine.ar have cloudflare, someone here said they may have a local contact in Argentina Cabase IXP
01:37:00	<Inti83>	We are doing all we can with grab-site - thankd for the testing tool, helps!
01:38:08	<Inti83>	Also we were wondering if there is anything that can help so that grab-site doesn't start from the beginning when it fails, if there is some flag to start from where it left off, we couldnt find anything in the docs
01:38:58	<@JAA>	That is a long-standing wishlist entry: https://github.com/ArchiveTeam/grab-site/issues/58
01:39:04	<@JAA>	So, no.
01:40:12		Inti83 quits [Remote host closed the connection]
01:42:50		eroc19907 quits [Ping timeout: 240 seconds]
01:43:38		eroc1990 (eroc1990) joins
02:05:50		kitonthe1et quits [Ping timeout: 240 seconds]
02:08:13		kitonthe1et joins
02:16:25		kitonthe1et quits [Ping timeout: 272 seconds]
02:36:33		kitonthenet joins
02:41:45		kitonthenet quits [Ping timeout: 272 seconds]
02:41:51		Megame quits [Client Quit]
03:01:03		simon8162 quits [Quit: ZNC 1.8.2 - https://znc.in]
03:21:54		kitonthenet joins
03:23:17		Irenes quits [Client Quit]
03:29:02		Wohlstand quits [Client Quit]
03:30:17		Irenes (ireneista) joins
03:30:31		kitonthenet quits [Ping timeout: 272 seconds]
03:33:25		icedice quits [Client Quit]
03:34:10		simon816 (simon816) joins
03:38:48		katocala quits [Remote host closed the connection]
03:40:38		katocala joins
03:40:38		katocala is now authenticated as katocala
03:47:18		katocala quits [Remote host closed the connection]
03:52:30	<pabs>	nicolas17: #gitgud and archive.softwareheritage.org for GitHub repos :) (and #codearchiver for other git repos)
03:54:14	<h2ibot>	Inti83 edited Argentina (-4, /* Guidelines for Adding Websites */): https://wiki.archiveteam.org/?diff=51252&oldid=51242
03:56:20		Earendil7 quits [Ping timeout: 240 seconds]
03:57:38		katocala joins
03:57:38		katocala is now authenticated as katocala
04:26:43		Earendil7 (Earendil7) joins
04:37:10		kitonthe2et joins
04:38:32		thuban joins
04:41:20		kitonthe2et quits [Ping timeout: 240 seconds]
04:44:06	<Naruyoko>	https://twitter.com/kogekidogso This account is now inactive. It looks like it was saved with ArchiveBot when I bringed this up last week, but the associated querie.me account might not be.
04:44:07	<eggdrop>	nitter: https://nitter.net/kogekidogso
04:45:13	<Naruyoko>	(And peing)
04:45:21	<@JAA>	Yes, it was run through ArchiveBot, but that was only a very superficial crawl. I'll rerun it as soon as there's space.
04:45:36	<Naruyoko>	Thank you
04:45:51	<@JAA>	Querie.me looks scripty.
04:47:07	<@JAA>	I can't seem to get to any account page there...? Or is it just a matter of following the links in their tweets?
04:48:39	<Naruyoko>	You can get a list of their answers as "recent answers" here: https://querie.me/user/r1OYTzyfrTY0Fn4ZIBXI4nEJPs63/recent
04:48:51	<Naruyoko>	However they use infinite scrolling thing
04:49:09	<@JAA>	Yeah, that page is entirely useless without JavaScript, so archiving it is going to be difficult.
04:51:47	<nicolas17>	hm
04:51:57	<Naruyoko>	I'm now seeing if I can load the list by holding down arrow
04:52:44	<@JAA>	I'm trying to do some curl magic.
04:52:52	<nicolas17>	JAA: suppose I write code to archive a querie.me page, by parsing the JS crap if needed to figure out what URLs to recurse into
04:53:00		parfait (kdqep) joins
04:53:01	<@JAA>	They only load 5 answers per request by default, but you can do far more.
04:53:26	<@JAA>	1000 is slow but works. :-)
04:53:34	<nicolas17>	we're not mass-archiving the entire site so this is not a DPoS project, just for one-off pages
04:53:54	<nicolas17>	how should I write that code? would a wget-at lua script be appropriate anyway?
04:53:54	<@JAA>	No extra URLs need to be fetched for the individual answers, it seems.
04:54:14	<@JAA>	So I'll just do the user page API crap and then throw the answer page URLs into AB.
04:54:51	<Naruyoko>	I see, loading 1000 at once is much more efficient than me scrolling down endlessly
04:55:06	<nicolas17>	ah hm I guess it could be a script in any technology, that produces a URL list for archivebot
04:58:23	<@JAA>	There are over 2000 answers, so yes. :-)
04:59:23	<@JAA>	The very technologically advanced extraction:
04:59:31	<@JAA>	`function querie { pp="$1"; curl "https://querie.me/api/qas?kind=recent&count=1000&userId=r1OYTzyfrTY0Fn4ZIBXI4nEJPs63${pp}" -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/120.0' -H 'Accept: /' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'Referer: https://querie.me/user/r1OYTzyfrTY0Fn4ZIBXI4nEJPs63/recent' --compressed -s \| tee
04:59:37	<@JAA>	"/tmp/querie-json-${pp}" \| jq -r '.[] \| .id' >"/tmp/querie-ids-${pp}"; ls -al /tmp/querie-{json,ids}-"${pp}"; pp="&startAfterId=$(tail -n 1 "/tmp/querie-ids-${pp}")"; printf '%q\n' "$pp"; }`
05:00:10	<@JAA>	No loop because I wasn't sure how it'd behave at the end.
05:00:22	<@JAA>	(It returns an empty array then.)
05:05:39	<Naruyoko>	It looks like you'll get an empty array as response at the end, from observing other user
05:06:12	<@arkiver>	there must be some prize we can award JAA for that one-liner
05:06:32	<@JAA>	arkiver: This isn't even my final form.
05:06:42	<@arkiver>	oh no
05:08:53	<@JAA>	This is the one-liner I'm most proud of so far: `try-except` in a single line in Python: https://web.archive.org/web/20230311201616/https://bpa.st/657RU
05:08:56	<Naruyoko>	Meanwhile, peing.net simply has a page number (3/page)
05:09:39	<@arkiver>	i think we now have a bot for queuing for all long term channels except #shreddit (which by default gets everything)
05:10:12	<@arkiver>	JAA: well... congrats i guess :P
05:10:32	<@arkiver>	Naruyoko: what is up with peing.net?
05:10:49	<Naruyoko>	The person has an account there too
05:13:19	<Naruyoko>	I don't know how well the individual pages save, since it's excluded
05:13:39	<@JAA>	arkiver: I think you can rewrite any Python code as a single line. Pattern matching and exception groups are hard, but it should be possible. I have this idea of writing a tool to do the conversion. Maybe when I'm retired or something. :-P
05:14:02	<@arkiver>	JAA: conversion to one-liners?
05:14:04	<@arkiver>	or from
05:14:12	<@JAA>	To
05:14:15	<@arkiver>	oh no
05:14:17	<@JAA>	:-)
05:14:24	<@arkiver>	thisisfine
05:14:34	<project10>	no god, please, no...
05:14:42	<@JAA>	We'll be able to ship our pipeline.py as a single line! Imagine the savings from not having to store all those LF characters!
05:14:49	<@arkiver>	79 chars please!
05:15:04	<@JAA>	The 80s called, they want their monitor back. :-P
05:15:08	<@arkiver>	i feel an old discussion coming up
05:15:13	<@JAA>	:-)
05:15:37		@arkiver and JAA have fundamental differences when it comes to Python line length
05:15:46	<@JAA>	Nah, it's obviously just for fun to prove that it's possible. The Python grammar very much requires separate lines in several places. `try-except` is one of them.
05:15:54	<@JAA>	Hence that complicated one-liner to still do it.
05:16:03	<@arkiver>	maybe we'll make JAA into one line
05:16:12	<@arkiver>	one dimensional JAA
05:16:16	<@arkiver>	no more 3 dimensions
05:16:31	<@arkiver>	just for fun :)
05:16:48	<@JAA>	Like your Python code is basically one-dimensional because it has no width? :-)
05:17:14	<@arkiver>	like my Python code is basically one-dimensional because it has no width, exactly!
05:17:53	<@JAA>	No depth to it either I guess. :-P
05:18:56	<@arkiver>	yes
05:19:04	<@arkiver>	nice clean python code without personality
05:24:29	<@JAA>	Here's the API data from Querie for that account as JSONL, because I had it anyway: https://transfer.archivete.am/whGJs/querie.me_user_r1OYTzyfrTY0Fn4ZIBXI4nEJPs63.jsonl.zst
05:24:57		kitonthe2et joins
05:25:28	<@JAA>	Produced by concatenating the querie-json-* files in the right order + `jq -c '.[]'`
05:25:43	<@JAA>	(Yes, this could be done better, and I would if I had to do this more than once.)
05:27:13	<@JAA>	The job for those 2003 answers is running now.
05:29:20		kitonthe2et quits [Ping timeout: 240 seconds]
05:38:47		eroc1990 quits [Client Quit]
05:46:43		eroc1990 (eroc1990) joins
05:46:55		Earendil7 quits [Client Quit]
05:47:57		benjins quits [Ping timeout: 272 seconds]
06:04:05		Arcorann (Arcorann) joins
06:24:13		Island quits [Read error: Connection reset by peer]
06:42:24	<Naruyoko>	https://transfer.archivete.am/STyXI/peing.net_18kogekisoccer.txt
06:42:25	<eggdrop>	inline (for browser viewing): https://transfer.archivete.am/inline/STyXI/peing.net_18kogekisoccer.txt
06:53:16		DogsRNice quits [Read error: Connection reset by peer]
07:03:07	<@JAA>	Naruyoko: Thanks, done.
07:05:20		aninternettroll quits [Ping timeout: 240 seconds]
07:16:04		aninternettroll (aninternettroll) joins
08:00:22		nfriedly quits [Remote host closed the connection]
08:05:50		Barto quits [Ping timeout: 240 seconds]
08:25:36		c3manu (c3manu) joins
08:44:20		lennier2 quits [Ping timeout: 240 seconds]
08:45:53		Arcorann quits [Ping timeout: 265 seconds]
08:47:27		Arcorann (Arcorann) joins
09:18:57		nconturbines joins
09:19:27		nconturbines leaves
09:30:30		decky_e joins
09:33:20		decky quits [Ping timeout: 240 seconds]
10:00:02		Bleo1826 quits [Client Quit]
10:01:21		Bleo1826 joins
10:04:18		benjins joins
10:48:50		parfait quits [Ping timeout: 240 seconds]
11:27:04		BlueMaxima quits [Read error: Connection reset by peer]
11:51:54		nfriedly joins
11:53:17		Wohlstand (Wohlstand) joins
12:13:32		Wohlstand quits [Client Quit]
12:33:32		Arcorann quits [Ping timeout: 265 seconds]
12:45:14		Wohlstand (Wohlstand) joins
12:56:57		katia quits [Quit: o.o]
12:59:14		katia (katia) joins
12:59:24		redbees quits [Quit: ZNC 1.7.5+deb4 - https://znc.in]
12:59:32		redbees joins
12:59:33		Hecz quits [Remote host closed the connection]
12:59:33		Sluggs joins
13:00:41		Sluggs_ quits [Read error: Connection reset by peer]
13:02:16		Hecz joins
13:02:17		Hecz is now authenticated as Hecz
13:02:17		Hecz quits [Changing host]
13:02:17		Hecz (Hecz) joins
13:22:20		Megame (Megame) joins
13:56:51	<qwertyasdfuiopghjkl>	http://fileformats.archiveteam.org/ is down
14:01:32		Megame1_ (Megame) joins
14:02:16		Megame1_ quits [Remote host closed the connection]
14:02:32		Megame1_ (Megame) joins
14:02:43		kitonthe1et joins
14:04:50		Megame quits [Ping timeout: 240 seconds]
14:05:30		Megame1_ is now known as Megame
14:07:39		kitonthe1et quits [Ping timeout: 272 seconds]
14:37:39	<@arkiver>	thanks qwertyasdfuiopghjkl , i reported it to someone who may be able to fix it
14:43:36		kitonthe2et joins
14:53:26		mindstrut joins
14:54:31		kitonthe2et quits [Ping timeout: 272 seconds]
15:05:17		Perk quits [Ping timeout: 272 seconds]
15:07:51		Perk joins
15:08:30	<project10>	perhaps we could add it to #nodeping?
15:17:19		Megame quits [Ping timeout: 272 seconds]
15:19:01		Megame (Megame) joins
15:20:22		kitonthe1et joins
15:27:20		kitonthe1et quits [Ping timeout: 240 seconds]
15:31:48		aninternettroll_ (aninternettroll) joins
15:31:48		aninternettroll quits [Read error: Connection reset by peer]
15:31:58		aninternettroll_ is now known as aninternettroll
15:32:42	<h2ibot>	Megame edited Deathwatch (+130, http://www.baanboard.com/ - Dec 31): https://wiki.archiveteam.org/?diff=51253&oldid=51232
15:44:54		lennier2 joins
15:48:47		kitonthe2et joins
15:51:59		angenieux (angenieux) joins
15:55:20		kitonthe2et quits [Ping timeout: 240 seconds]
16:03:54		Island joins
16:11:18		Megame quits [Client Quit]
17:15:00	<Pedrosso>	https://www.fz.se/ supposedly existed since 1996, might it be a good idea to have a proactive grab?
17:52:12		useretail quits [Quit: Leaving]
17:52:31		useretail joins
18:00:07	<TheTechRobo>	digitize.archiveteam.org is also (still) down
18:01:04	<@JAA>	digitize.archiveteam.org is permanently down, and its contents were integrated into the main wiki years ago, from what I've been told.
18:22:34		Barto (Barto) joins
18:22:53		sknebel quits [Ping timeout: 272 seconds]
18:24:01		sknebel (sknebel) joins
18:37:27		Barto quits [Ping timeout: 272 seconds]
19:01:57		e853962747e3759 joins
19:03:03	<e853962747e3759>	Hi, can anyone please teach me how to add a webpage to the wayback machine crawls? At least the main website every 6 hours. If the linked pages from it could be auto-archived also, even better . maybe a check for each page if anything significant has changed instead of making too many duplicates
19:06:04	<that_lurker>	why do you want it be crawled so often
19:10:37	<nicolas17>	yesterday savepagenow had a global backlog of like 18 hours lol
19:10:45		Barto (Barto) joins
19:14:05	<@JAA>	e853962747e3759: Were you here a couple days ago already?
19:14:42		e853962747e3759 quits [Ping timeout: 265 seconds]
19:14:50	<that_lurker>	god dammit you spooked them
19:14:57	<that_lurker>	:P
19:15:07	<@JAA>	lol
19:15:29	<that_lurker>	I was so ready to give them my "instuctions.jpg" :P
19:18:03	<fireonlive>	;d
19:18:30	<fireonlive>	webchat needs like a 'btw if you leave this in a background tab you'll disconnect'
19:18:42	<fireonlive>	gone are the times of tabs just having fun 24/7
19:19:50		Barto quits [Ping timeout: 240 seconds]
19:37:22		Megame (Megame) joins
19:54:08		Barto (Barto) joins
19:54:11		bladem (bladem) joins
20:03:34		Island quits [Read error: Connection reset by peer]
20:03:54		Island joins
20:04:17		Megame quits [Client Quit]
20:20:51		AlsoHP_Archivist joins
20:25:07		HP_Archivist quits [Ping timeout: 272 seconds]
20:29:19		kitonthenet joins
20:33:50		kitonthenet quits [Ping timeout: 240 seconds]
20:37:28		e853962747e3759 joins
20:38:34	<e853962747e3759>	Hi, can anyone please teach me how to add a webpage to the wayback machine crawls? At least the main website every 6 hours. If the linked pages from it could be auto-archived also, even better . maybe a check for each page if anything significant has changed instead of making too many duplicates
20:38:46	<@JAA>	e853962747e3759: Were you here a couple days ago already?
20:41:05	<e853962747e3759>	so is this some sort of faux intellectual elitest thing? I am not worthy of the knowledge to be able to archive a website?
20:42:15	<nicolas17>	just trying not to answer the same question a dozen times to people who will ignore them and ask again
20:45:18	<Pedrosso>	faux intellectual elitest...+
20:45:20	<Pedrosso>	?*'
20:45:36	<Pedrosso>	What's that even supposed to mean?
20:45:53	<e853962747e3759>	the last comment i saw here was started with "yesterday save page now..." Can someone please copy paste the explanation if there was one
20:45:53	<e853962747e3759>	just 2 comments, no explanation
20:45:54		e853962747e3759 quits [Client Quit]
20:46:07	<immibis>	it means "i'm a troll" as if the cat-on-keyboard nickname didn't give it away
20:46:24	<Pedrosso>	I don't understand how but, I'll take your word for it
20:47:07	<nicolas17>	https://hackint.logs.kiska.pw/%2F%2F/20231202 probably?
20:47:13	<Pedrosso>	they left
20:47:29	<Pedrosso>	oh
20:47:32	<Pedrosso>	Yeah probably
20:47:45	<Pedrosso>	I thought you sent a link to what explenation they were asking for
20:47:58	<nicolas17>	I don't think we have anything that can poll every 6 hours
20:48:39	<Pedrosso>	They may be 'trolling' but it's a good idea to at least grab it once, right?
20:48:59	<Pedrosso>	"It is an important and significant news aggregator"
20:50:59	<@JAA>	Adding it to #// makes the most sense.
20:51:27	<Pedrosso>	does #// grab entire sites? I thought those were just an equivalent to !ao
20:51:29	<fireonlive>	it has a news sources thing iirc
20:52:04	<nicolas17>	JAA: oh btw, how should I handle new/updated support.apple.com articles?
20:52:28	<@JAA>	It does not, but their question was to grab the homepage regularly plus links on it. Which is exactly what #// already does for lots of news sites and other things.
20:52:55	<@JAA>	nicolas17: AB !ao?
20:53:19	<Pedrosso>	Ahhh
20:53:26	<nicolas17>	I don't need something to grab 8000 pages periodically because I'm already doing it, but I can give a list of the changes I did find
20:54:24	<@JAA>	If you already grab them, grab them as WARCs and upload that? That also creates a direct record that the unchanged pages did indeed not change.
20:54:35	<@JAA>	Rather than changes being missed, for example.
20:55:05	<nicolas17>	hrmmm grabbing them as WARC would need significant changes :P I have a git repo of file content alone atm
20:55:49		kitonthe1et joins
20:59:11	<fireonlive>	ah i see
21:00:26	<nicolas17>	oh right I'm even mangling the data I store (apparently <link rel="alternate"> tags linking to other languages get regularly shuffled so I strip them out to get readable diffs)
21:00:44		e853962747e3759 joins
21:01:09	<e853962747e3759>	Is this a joke? I thought the internet archive organization is a normal organization that works with volunteers and archivists to archive the internet
21:01:31	<@JAA>	You disconnected...
21:01:38	<@JAA>	Also, we are not the Internet Archive.
21:03:05		e853962747e3759 quits [Remote host closed the connection]
21:03:06	<@JAA>	And yes, we do try to work with everyone, but if you keep disconnecting, it's hard to communicate.
21:03:12	<nicolas17>	lol
21:03:15	<@JAA>	Case in point...
21:03:27		e7269535e6632 joins
21:03:56	<nicolas17>	hard to answer your questions if you keep disconnecting
21:04:48	<e7269535e6632>	How do i prevent it from disconnecting? I am also having significant problems with the chat box here. Did I miss any comments or explanations?
21:05:27	<@JAA>	Keep the webchat tab in the foreground or move it to a separate window.
21:06:29	<@JAA>	https://hackint.logs.kiska.pw/archiveteam-bs/20231204#c393345
21:07:35	<nicolas17>	JAA: "grab them as WARCs and upload that?" that won't appear on WBM will it?
21:08:25	<@JAA>	nicolas17: We can make that happen.
21:08:38	<nicolas17>	qwarc might work for this...
21:11:21		kitonthe1et quits [Ping timeout: 272 seconds]
21:12:01	<that_lurker>	e7269535e6632: In case you did not yet disconnect. Why do you want the site to be archived so often and what is the site? Also connecting trough irc would be better if you are having trouble with the webirc client
21:15:31		BlueMaxima joins
21:21:49		e7269535e6632 quits [Ping timeout: 265 seconds]
21:22:13	<that_lurker>	(╯°□°)╯︵ ┻━┻
21:25:21	<@JAA>	lol
21:29:22	<fireonlive>	christ lol
21:31:50	<Pedrosso>	!tell e7269535e6632 "can anyone please teach me how to add a webpage to the wayback machine crawls?" To archive a site on your own which is what it sounds like you're asking for use https://github.com/ArchiveTeam/grab-site and upload to IA through an item. It won't show up on the wayback machine but it will be saved which is really the point.
21:31:50	<Pedrosso>	To have it in the wayback machine it'd have to be queried to AT's ArchiveBot, one of their other projects. I'm not sure what other ways there are.
21:31:51	<eggdrop>	[tell] ok, I'll tell e7269535e6632 when they join next
21:32:20	<nicolas17>	inb4 they join with a different nickname next time
21:32:26	<Pedrosso>	Haha
21:32:32	<fireonlive>	!tell e7269535e6632 <Pedrosso> To have it in the wayback machine it'd have to be queried to AT's ArchiveBot, one of their other projects. I'm not sure what other ways there are.
21:32:32	<eggdrop>	[tell] ok, I'll tell e7269535e6632 when they join next
21:32:37	<fireonlive>	(cut in two)
21:32:38	<Pedrosso>	Thank
21:32:41	<fireonlive>	:)
21:32:42	<Pedrosso>	I bet they may, but they didn't last time
21:32:48	<Pedrosso>	(about joining with a different nick)
21:33:17	<fireonlive>	they may have been a7427a63 from the 2nd but unknown
21:33:30	<that_lurker>	They could also hopefully be reading the logs and see that, if they are in fact having issues with the webchat
21:33:42	<nicolas17>	was e853962747e3759 earlier today
21:34:05	<that_lurker>	and most likely <a7427a63 as well
21:51:15	<@JAA>	Maybe they at least got the log link and are reading there. If so, hi. :-)
21:56:25	<fireonlive>	supppppppp
22:00:37		ThetaDev quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
22:00:45		ThetaDev joins
22:09:21		Wohlstand quits [Client Quit]
22:18:34	<nicolas17>	https://argenteam.net/ this website provides crowdsourced subtitles in spanish, mainly for uhh questionably-obtained movies
22:19:04	<nicolas17>	you can find subtitles by the torrent infohash so you know it syncs properly with the exact video you have
22:19:08	<nicolas17>	it's shutting down at the end of the year
22:19:34	<nicolas17>	they said they will soon publish a torrent with all 100k subtitles they have done
22:20:31	<fireonlive>	oh nice
22:21:19	<nicolas17>	there's also a forum with 127475 threads but it seems to be login-walled, so that could be complicated to archive
22:25:36	<nicolas17>	hm some forums are open https://foro.argenteam.net/viewforum.php?f=11
22:29:04	<immibis>	i would think that subtitles work no matter how you got a copy of a movie, so there's no need to call them questionably-obtained
22:29:41	<nicolas17>	immibis: well, the website actually has magnet: links to the video the subtitles were made for >.>
22:30:07	<nicolas17>	so I'm sure many people use it primarily as a torrent search index too
22:30:23	<Pedrosso>	Would having someone sign up for a throwaway account and giving the cookie for archival something that'd work? For the login-walls I mean
22:30:51	<nicolas17>	Pedrosso: I signed up and I can see all forums normally
22:31:23	<nicolas17>	I'm just not sure if that can be used for archival
22:31:23	<Pedrosso>	Can the archivebot use custom cookies?
22:31:34	<@JAA>	No
22:31:45	<nicolas17>	and well, my username shows up on every page :P
22:32:04	<@JAA>	Things archived with accounts also can't go into the WBM generally speaking.
22:35:03	<Pedrosso>	I see
22:35:20	<Pedrosso>	Then how are such things generally saved?
22:37:14	<@JAA>	I've done some with wpull and cookies. The WARCs are somewhere, either in IA just for download and local playback or still sitting in my pile of stuff to upload.
22:38:35	<Pedrosso>	https://web.archive.org/web/20150416181917/https://www.furaffinity.net/view/1/ seems to have a user "~smaugit"
22:39:12	<Pedrosso>	"generally speaking" so it was just a special case?
22:40:18	<nicolas17>	forums (viewforum.php?f=) 4, 11, 35, 46, 55, 64 are publicly accessible
22:40:36	<fireonlive>	https://www.furaffinity.net/user/smaugit/ < people have nice things to say about the account haha
22:40:43	<Pedrosso>	yep, they sure do
22:42:59	<nicolas17>	forums 1, 4, 11, 14, 27, 35, 46, 55, 63, 64, 66, 67, 73 are accessible on a brand new account
22:44:04	<nicolas17>	of the remaining IDs, when logged in some return "forum doesn't exist" and others return "you're not authorized to see this forum" (probably private stuff for trusted translators, moderators, etc)
22:45:11	<h2ibot>	FireonLive edited Issuu (+115, move to partially saved to now, can be changed…): https://wiki.archiveteam.org/?diff=51254&oldid=50096
22:47:07	<Pedrosso>	(Also, since the last one was in 2015, another proactive grab of furaffinity might be warranted, maybe?)
22:47:38	<@JAA>	Pedrosso: I can only think of one or two cases where such archives went into the WBM. For SPUF, Valve people gave us an account to continue archiving past the shutdown deadline, allowing us to cover everything. And I think there was another one that I can't remember right now.
22:49:10	<nicolas17>	I could make a more anonymous account :P
22:49:18	<nicolas17>	but anyway
22:49:26	<nicolas17>	there's a few public forums
22:49:38	<nicolas17>	and there's the main site to deal with
22:55:33		c3manu quits [Remote host closed the connection]
22:55:51		BornOn420 quits [Ping timeout: 272 seconds]
23:06:36		BornOn420 (BornOn420) joins
23:08:52	<nicolas17>	oh fun, the pages are not deterministic
23:09:07	<nicolas17>	"Codec info = AVC Baseline@L4.2 \| V_MPEG4/ISO/AVC"
23:09:23	<nicolas17>	gets turned into <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="56143725333a3f3833161a627864"> and the cfemail field changes across requests
23:16:15		aninternettroll quits [Read error: Connection reset by peer]
23:16:31		aninternettroll (aninternettroll) joins
23:17:12	<nicolas17>	anyway I'm doing a simplistic wget of all movie IDs now
23:17:30	<nicolas17>	because they have a <meta name="og:url"> with the canonical URL
23:20:00		DogsRNice joins
23:25:31	<fireonlive>	-+rss- YouTuber who intentionally crashed plane is sentenced to 6 months in prison: https://twitter.com/bnonews/status/1731748816250974335 https://news.ycombinator.com/item?id=38523704
23:25:31	<eggdrop>	nitter: https://nitter.net/bnonews/status/1731748816250974335
23:26:15		BornOn420 quits [Ping timeout: 272 seconds]
23:37:05		BornOn420 (BornOn420) joins
23:43:59		BornOn420 quits [Ping timeout: 272 seconds]
23:55:24		BornOn420 (BornOn420) joins
23:57:10	<nicolas17>	should take me 30 minutes to get all IDs
23:58:12		Island quits [Read error: Connection reset by peer]

Home Search Previous day Next day