00:08:51 | | mr_sarge (sarge) joins |
00:10:50 | | sarge quits [Ping timeout: 240 seconds] |
00:20:10 | | Doran quits [Remote host closed the connection] |
00:20:39 | | BlueMaxima quits [Read error: Connection reset by peer] |
00:20:47 | | Doranwen (Doranwen) joins |
00:20:48 | | BlueMaxima joins |
00:22:01 | | Doranwen quits [Remote host closed the connection] |
00:22:31 | | Doranwen (Doranwen) joins |
00:23:42 | | Doranwen quits [Remote host closed the connection] |
00:25:13 | | Doranwen (Doranwen) joins |
00:44:13 | | eroc19907 (eroc1990) joins |
00:44:50 | | eroc1990 quits [Ping timeout: 240 seconds] |
00:53:50 | | sdomi quits [Ping timeout: 240 seconds] |
00:56:21 | | sdomi (sdomi) joins |
01:02:57 | | thuban quits [Ping timeout: 272 seconds] |
01:33:46 | | Inti83 joins |
01:33:47 | <eggdrop> | [tell] Inti83: [2023-12-03T19:51:39Z] <JAA> https://hackint.logs.kiska.pw/archiveteam-bs/20231203#c393229 |
01:34:07 | <Inti83> | Thanks XD was just reading logs |
01:36:24 | <Inti83> | thanks - we did see that cont.ar and cine.ar have cloudflare, someone here said they may have a local contact in Argentina Cabase IXP |
01:37:00 | <Inti83> | We are doing all we can with grab-site - thankd for the testing tool, helps! |
01:38:08 | <Inti83> | Also we were wondering if there is anything that can help so that grab-site doesn't start from the beginning when it fails, if there is some flag to start from where it left off, we couldnt find anything in the docs |
01:38:58 | <@JAA> | That is a *long*-standing wishlist entry: https://github.com/ArchiveTeam/grab-site/issues/58 |
01:39:04 | <@JAA> | So, no. |
01:40:12 | | Inti83 quits [Remote host closed the connection] |
01:42:50 | | eroc19907 quits [Ping timeout: 240 seconds] |
01:43:38 | | eroc1990 (eroc1990) joins |
02:05:50 | | kitonthe1et quits [Ping timeout: 240 seconds] |
02:08:13 | | kitonthe1et joins |
02:16:25 | | kitonthe1et quits [Ping timeout: 272 seconds] |
02:36:33 | | kitonthenet joins |
02:41:45 | | kitonthenet quits [Ping timeout: 272 seconds] |
02:41:51 | | Megame quits [Client Quit] |
03:01:03 | | simon8162 quits [Quit: ZNC 1.8.2 - https://znc.in] |
03:21:54 | | kitonthenet joins |
03:23:17 | | Irenes quits [Client Quit] |
03:29:02 | | Wohlstand quits [Client Quit] |
03:30:17 | | Irenes (ireneista) joins |
03:30:31 | | kitonthenet quits [Ping timeout: 272 seconds] |
03:33:25 | | icedice quits [Client Quit] |
03:34:10 | | simon816 (simon816) joins |
03:38:48 | | katocala quits [Remote host closed the connection] |
03:40:38 | | katocala joins |
03:40:38 | | katocala is now authenticated as katocala |
03:47:18 | | katocala quits [Remote host closed the connection] |
03:52:30 | <pabs> | nicolas17: #gitgud and archive.softwareheritage.org for GitHub repos :) (and #codearchiver for other git repos) |
03:54:14 | <h2ibot> | Inti83 edited Argentina (-4, /* Guidelines for Adding Websites */): https://wiki.archiveteam.org/?diff=51252&oldid=51242 |
03:56:20 | | Earendil7 quits [Ping timeout: 240 seconds] |
03:57:38 | | katocala joins |
03:57:38 | | katocala is now authenticated as katocala |
04:26:43 | | Earendil7 (Earendil7) joins |
04:37:10 | | kitonthe2et joins |
04:38:32 | | thuban joins |
04:41:20 | | kitonthe2et quits [Ping timeout: 240 seconds] |
04:44:06 | <Naruyoko> | https://twitter.com/kogekidogso This account is now inactive. It looks like it was saved with ArchiveBot when I bringed this up last week, but the associated querie.me account might not be. |
04:44:07 | <eggdrop> | nitter: https://nitter.net/kogekidogso |
04:45:13 | <Naruyoko> | (And peing) |
04:45:21 | <@JAA> | Yes, it was run through ArchiveBot, but that was only a very superficial crawl. I'll rerun it as soon as there's space. |
04:45:36 | <Naruyoko> | Thank you |
04:45:51 | <@JAA> | Querie.me looks scripty. |
04:47:07 | <@JAA> | I can't seem to get to any account page there...? Or is it just a matter of following the links in their tweets? |
04:48:39 | <Naruyoko> | You can get a list of their answers as "recent answers" here: https://querie.me/user/r1OYTzyfrTY0Fn4ZIBXI4nEJPs63/recent |
04:48:51 | <Naruyoko> | However they use infinite scrolling thing |
04:49:09 | <@JAA> | Yeah, that page is entirely useless without JavaScript, so archiving it is going to be difficult. |
04:51:47 | <nicolas17> | hm |
04:51:57 | <Naruyoko> | I'm now seeing if I can load the list by holding down arrow |
04:52:44 | <@JAA> | I'm trying to do some curl magic. |
04:52:52 | <nicolas17> | JAA: suppose I write code to archive a querie.me page, by parsing the JS crap if needed to figure out what URLs to recurse into |
04:53:00 | | parfait (kdqep) joins |
04:53:01 | <@JAA> | They only load 5 answers per request by default, but you can do far more. |
04:53:26 | <@JAA> | 1000 is slow but works. :-) |
04:53:34 | <nicolas17> | we're not mass-archiving the entire site so this is not a DPoS project, just for one-off pages |
04:53:54 | <nicolas17> | how should I write that code? would a wget-at lua script be appropriate anyway? |
04:53:54 | <@JAA> | No extra URLs need to be fetched for the individual answers, it seems. |
04:54:14 | <@JAA> | So I'll just do the user page API crap and then throw the answer page URLs into AB. |
04:54:51 | <Naruyoko> | I see, loading 1000 at once is much more efficient than me scrolling down endlessly |
04:55:06 | <nicolas17> | ah hm I guess it could be a script in any technology, that produces a URL list for archivebot |
04:58:23 | <@JAA> | There are over 2000 answers, so yes. :-) |
04:59:23 | <@JAA> | The very technologically advanced extraction: |
04:59:31 | <@JAA> | `function querie { pp="$1"; curl "https://querie.me/api/qas?kind=recent&count=1000&userId=r1OYTzyfrTY0Fn4ZIBXI4nEJPs63${pp}" -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/120.0' -H 'Accept: */*' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'Referer: https://querie.me/user/r1OYTzyfrTY0Fn4ZIBXI4nEJPs63/recent' --compressed -s | tee |
04:59:37 | <@JAA> | "/tmp/querie-json-${pp}" | jq -r '.[] | .id' >"/tmp/querie-ids-${pp}"; ls -al /tmp/querie-{json,ids}-"${pp}"; pp="&startAfterId=$(tail -n 1 "/tmp/querie-ids-${pp}")"; printf '%q\n' "$pp"; }` |
05:00:10 | <@JAA> | No loop because I wasn't sure how it'd behave at the end. |
05:00:22 | <@JAA> | (It returns an empty array then.) |
05:05:39 | <Naruyoko> | It looks like you'll get an empty array as response at the end, from observing other user |
05:06:12 | <@arkiver> | there must be some prize we can award JAA for that one-liner |
05:06:32 | <@JAA> | arkiver: This isn't even my final form. |
05:06:42 | <@arkiver> | oh no |
05:08:53 | <@JAA> | This is the one-liner I'm most proud of so far: `try-except` in a single line in Python: https://web.archive.org/web/20230311201616/https://bpa.st/657RU |
05:08:56 | <Naruyoko> | Meanwhile, peing.net simply has a page number (3/page) |
05:09:39 | <@arkiver> | i think we now have a bot for queuing for all long term channels except #shreddit (which by default gets everything) |
05:10:12 | <@arkiver> | JAA: well... congrats i guess :P |
05:10:32 | <@arkiver> | Naruyoko: what is up with peing.net? |
05:10:49 | <Naruyoko> | The person has an account there too |
05:13:19 | <Naruyoko> | I don't know how well the individual pages save, since it's excluded |
05:13:39 | <@JAA> | arkiver: I *think* you can rewrite any Python code as a single line. Pattern matching and exception groups are hard, but it should be possible. I have this idea of writing a tool to do the conversion. Maybe when I'm retired or something. :-P |
05:14:02 | <@arkiver> | JAA: conversion to one-liners? |
05:14:04 | <@arkiver> | or from |
05:14:12 | <@JAA> | To |
05:14:15 | <@arkiver> | oh no |
05:14:17 | <@JAA> | :-) |
05:14:24 | <@arkiver> | thisisfine |
05:14:34 | <project10> | no god, please, no... |
05:14:42 | <@JAA> | We'll be able to ship our pipeline.py as a single line! Imagine the savings from not having to store all those LF characters! |
05:14:49 | <@arkiver> | 79 chars please! |
05:15:04 | <@JAA> | The 80s called, they want their monitor back. :-P |
05:15:08 | <@arkiver> | i feel an old discussion coming up |
05:15:13 | <@JAA> | :-) |
05:15:37 | | @arkiver and JAA have fundamental differences when it comes to Python line length |
05:15:46 | <@JAA> | Nah, it's obviously just for fun to prove that it's possible. The Python grammar very much requires separate lines in several places. `try-except` is one of them. |
05:15:54 | <@JAA> | Hence that complicated one-liner to still do it. |
05:16:03 | <@arkiver> | maybe we'll make JAA into one line |
05:16:12 | <@arkiver> | one dimensional JAA |
05:16:16 | <@arkiver> | no more 3 dimensions |
05:16:31 | <@arkiver> | just for fun :) |
05:16:48 | <@JAA> | Like your Python code is basically one-dimensional because it has no width? :-) |
05:17:14 | <@arkiver> | like my Python code is basically one-dimensional because it has no width, exactly! |
05:17:53 | <@JAA> | No depth to it either I guess. :-P |
05:18:56 | <@arkiver> | yes |
05:19:04 | <@arkiver> | nice clean python code without personality |
05:24:29 | <@JAA> | Here's the API data from Querie for that account as JSONL, because I had it anyway: https://transfer.archivete.am/whGJs/querie.me_user_r1OYTzyfrTY0Fn4ZIBXI4nEJPs63.jsonl.zst |
05:24:57 | | kitonthe2et joins |
05:25:28 | <@JAA> | Produced by concatenating the querie-json-* files in the right order + `jq -c '.[]'` |
05:25:43 | <@JAA> | (Yes, this could be done better, and I would if I had to do this more than once.) |
05:27:13 | <@JAA> | The job for those 2003 answers is running now. |
05:29:20 | | kitonthe2et quits [Ping timeout: 240 seconds] |
05:38:47 | | eroc1990 quits [Client Quit] |
05:46:43 | | eroc1990 (eroc1990) joins |
05:46:55 | | Earendil7 quits [Client Quit] |
05:47:57 | | benjins quits [Ping timeout: 272 seconds] |
06:04:05 | | Arcorann (Arcorann) joins |
06:24:13 | | Island quits [Read error: Connection reset by peer] |
06:42:24 | <Naruyoko> | https://transfer.archivete.am/STyXI/peing.net_18kogekisoccer.txt |
06:42:25 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/STyXI/peing.net_18kogekisoccer.txt |
06:53:16 | | DogsRNice quits [Read error: Connection reset by peer] |
07:03:07 | <@JAA> | Naruyoko: Thanks, done. |
07:05:20 | | aninternettroll quits [Ping timeout: 240 seconds] |
07:16:04 | | aninternettroll (aninternettroll) joins |
08:00:22 | | nfriedly quits [Remote host closed the connection] |
08:05:50 | | Barto quits [Ping timeout: 240 seconds] |
08:25:36 | | c3manu (c3manu) joins |
08:44:20 | | lennier2 quits [Ping timeout: 240 seconds] |
08:45:53 | | Arcorann quits [Ping timeout: 265 seconds] |
08:47:27 | | Arcorann (Arcorann) joins |
09:18:57 | | nconturbines joins |
09:19:27 | | nconturbines leaves |
09:30:30 | | decky_e joins |
09:33:20 | | decky quits [Ping timeout: 240 seconds] |
10:00:02 | | Bleo1826 quits [Client Quit] |
10:01:21 | | Bleo1826 joins |
10:04:18 | | benjins joins |
10:48:50 | | parfait quits [Ping timeout: 240 seconds] |
11:27:04 | | BlueMaxima quits [Read error: Connection reset by peer] |
11:51:54 | | nfriedly joins |
11:53:17 | | Wohlstand (Wohlstand) joins |
12:13:32 | | Wohlstand quits [Client Quit] |
12:33:32 | | Arcorann quits [Ping timeout: 265 seconds] |
12:45:14 | | Wohlstand (Wohlstand) joins |
12:56:57 | | katia quits [Quit: o.o] |
12:59:14 | | katia (katia) joins |
12:59:24 | | redbees quits [Quit: ZNC 1.7.5+deb4 - https://znc.in] |
12:59:32 | | redbees joins |
12:59:33 | | Hecz quits [Remote host closed the connection] |
12:59:33 | | Sluggs joins |
13:00:41 | | Sluggs_ quits [Read error: Connection reset by peer] |
13:02:16 | | Hecz joins |
13:02:17 | | Hecz is now authenticated as Hecz |
13:02:17 | | Hecz quits [Changing host] |
13:02:17 | | Hecz (Hecz) joins |
13:22:20 | | Megame (Megame) joins |
13:56:51 | <qwertyasdfuiopghjkl> | http://fileformats.archiveteam.org/ is down |
14:01:32 | | Megame1_ (Megame) joins |
14:02:16 | | Megame1_ quits [Remote host closed the connection] |
14:02:32 | | Megame1_ (Megame) joins |
14:02:43 | | kitonthe1et joins |
14:04:50 | | Megame quits [Ping timeout: 240 seconds] |
14:05:30 | | Megame1_ is now known as Megame |
14:07:39 | | kitonthe1et quits [Ping timeout: 272 seconds] |
14:37:39 | <@arkiver> | thanks qwertyasdfuiopghjkl , i reported it to someone who may be able to fix it |
14:43:36 | | kitonthe2et joins |
14:53:26 | | mindstrut joins |
14:54:31 | | kitonthe2et quits [Ping timeout: 272 seconds] |
15:05:17 | | Perk quits [Ping timeout: 272 seconds] |
15:07:51 | | Perk joins |
15:08:30 | <project10> | perhaps we could add it to #nodeping? |
15:17:19 | | Megame quits [Ping timeout: 272 seconds] |
15:19:01 | | Megame (Megame) joins |
15:20:22 | | kitonthe1et joins |
15:27:20 | | kitonthe1et quits [Ping timeout: 240 seconds] |
15:31:48 | | aninternettroll_ (aninternettroll) joins |
15:31:48 | | aninternettroll quits [Read error: Connection reset by peer] |
15:31:58 | | aninternettroll_ is now known as aninternettroll |
15:32:42 | <h2ibot> | Megame edited Deathwatch (+130, http://www.baanboard.com/ - Dec 31): https://wiki.archiveteam.org/?diff=51253&oldid=51232 |
15:44:54 | | lennier2 joins |
15:48:47 | | kitonthe2et joins |
15:51:59 | | angenieux (angenieux) joins |
15:55:20 | | kitonthe2et quits [Ping timeout: 240 seconds] |
16:03:54 | | Island joins |
16:11:18 | | Megame quits [Client Quit] |
17:15:00 | <Pedrosso> | https://www.fz.se/ supposedly existed since 1996, might it be a good idea to have a proactive grab? |
17:52:12 | | useretail quits [Quit: Leaving] |
17:52:31 | | useretail joins |
18:00:07 | <TheTechRobo> | digitize.archiveteam.org is also (still) down |
18:01:04 | <@JAA> | digitize.archiveteam.org is permanently down, and its contents were integrated into the main wiki years ago, from what I've been told. |
18:22:34 | | Barto (Barto) joins |
18:22:53 | | sknebel quits [Ping timeout: 272 seconds] |
18:24:01 | | sknebel (sknebel) joins |
18:37:27 | | Barto quits [Ping timeout: 272 seconds] |
19:01:57 | | e853962747e3759 joins |
19:03:03 | <e853962747e3759> | Hi, can anyone please teach me how to add a webpage to the wayback machine crawls? At least the main website every 6 hours. If the linked pages from it could be auto-archived also, even better . maybe a check for each page if anything significant has changed instead of making too many duplicates |
19:06:04 | <that_lurker> | why do you want it be crawled so often |
19:10:37 | <nicolas17> | yesterday savepagenow had a global backlog of like 18 hours lol |
19:10:45 | | Barto (Barto) joins |
19:14:05 | <@JAA> | e853962747e3759: Were you here a couple days ago already? |
19:14:42 | | e853962747e3759 quits [Ping timeout: 265 seconds] |
19:14:50 | <that_lurker> | god dammit you spooked them |
19:14:57 | <that_lurker> | :P |
19:15:07 | <@JAA> | lol |
19:15:29 | <that_lurker> | I was so ready to give them my "instuctions.jpg" :P |
19:18:03 | <fireonlive> | ;d |
19:18:30 | <fireonlive> | webchat needs like a 'btw if you leave this in a background tab you'll disconnect' |
19:18:42 | <fireonlive> | gone are the times of tabs just having fun 24/7 |
19:19:50 | | Barto quits [Ping timeout: 240 seconds] |
19:37:22 | | Megame (Megame) joins |
19:54:08 | | Barto (Barto) joins |
19:54:11 | | bladem (bladem) joins |
20:03:34 | | Island quits [Read error: Connection reset by peer] |
20:03:54 | | Island joins |
20:04:17 | | Megame quits [Client Quit] |
20:20:51 | | AlsoHP_Archivist joins |
20:25:07 | | HP_Archivist quits [Ping timeout: 272 seconds] |
20:29:19 | | kitonthenet joins |
20:33:50 | | kitonthenet quits [Ping timeout: 240 seconds] |
20:37:28 | | e853962747e3759 joins |
20:38:34 | <e853962747e3759> | Hi, can anyone please teach me how to add a webpage to the wayback machine crawls? At least the main website every 6 hours. If the linked pages from it could be auto-archived also, even better . maybe a check for each page if anything significant has changed instead of making too many duplicates |
20:38:46 | <@JAA> | e853962747e3759: Were you here a couple days ago already? |
20:41:05 | <e853962747e3759> | so is this some sort of faux intellectual elitest thing? I am not worthy of the knowledge to be able to archive a website? |
20:42:15 | <nicolas17> | just trying not to answer the same question a dozen times to people who will ignore them and ask again |
20:45:18 | <Pedrosso> | faux intellectual elitest...+ |
20:45:20 | <Pedrosso> | ?*' |
20:45:36 | <Pedrosso> | What's that even supposed to mean? |
20:45:53 | <e853962747e3759> | the last comment i saw here was started with "yesterday save page now..." Can someone please copy paste the explanation if there was one |
20:45:53 | <e853962747e3759> | just 2 comments, no explanation |
20:45:54 | | e853962747e3759 quits [Client Quit] |
20:46:07 | <immibis> | it means "i'm a troll" as if the cat-on-keyboard nickname didn't give it away |
20:46:24 | <Pedrosso> | I don't understand how but, I'll take your word for it |
20:47:07 | <nicolas17> | https://hackint.logs.kiska.pw/%2F%2F/20231202 probably? |
20:47:13 | <Pedrosso> | they left |
20:47:29 | <Pedrosso> | oh |
20:47:32 | <Pedrosso> | Yeah probably |
20:47:45 | <Pedrosso> | I thought you sent a link to what explenation they were asking for |
20:47:58 | <nicolas17> | I don't think we have anything that can poll every 6 hours |
20:48:39 | <Pedrosso> | They may be 'trolling' but it's a good idea to at least grab it once, right? |
20:48:59 | <Pedrosso> | "It is an important and significant news aggregator" |
20:50:59 | <@JAA> | Adding it to #// makes the most sense. |
20:51:27 | <Pedrosso> | does #// grab entire sites? I thought those were just an equivalent to !ao |
20:51:29 | <fireonlive> | it has a news sources thing iirc |
20:52:04 | <nicolas17> | JAA: oh btw, how should I handle new/updated support.apple.com articles? |
20:52:28 | <@JAA> | It does not, but their question was to grab the homepage regularly plus links on it. Which is exactly what #// already does for lots of news sites and other things. |
20:52:55 | <@JAA> | nicolas17: AB !ao? |
20:53:19 | <Pedrosso> | Ahhh |
20:53:26 | <nicolas17> | I don't need something to grab 8000 pages periodically because I'm already doing it, but I can give a list of the changes I did find |
20:54:24 | <@JAA> | If you already grab them, grab them as WARCs and upload that? That also creates a direct record that the unchanged pages did indeed not change. |
20:54:35 | <@JAA> | Rather than changes being missed, for example. |
20:55:05 | <nicolas17> | hrmmm grabbing them as WARC would need significant changes :P I have a git repo of file content alone atm |
20:55:49 | | kitonthe1et joins |
20:59:11 | <fireonlive> | ah i see |
21:00:26 | <nicolas17> | oh right I'm even mangling the data I store (apparently <link rel="alternate"> tags linking to other languages get regularly shuffled so I strip them out to get readable diffs) |
21:00:44 | | e853962747e3759 joins |
21:01:09 | <e853962747e3759> | Is this a joke? I thought the internet archive organization is a normal organization that works with volunteers and archivists to archive the internet |
21:01:31 | <@JAA> | You disconnected... |
21:01:38 | <@JAA> | Also, we are not the Internet Archive. |
21:03:05 | | e853962747e3759 quits [Remote host closed the connection] |
21:03:06 | <@JAA> | And yes, we do try to work with everyone, but if you keep disconnecting, it's hard to communicate. |
21:03:12 | <nicolas17> | lol |
21:03:15 | <@JAA> | Case in point... |
21:03:27 | | e7269535e6632 joins |
21:03:56 | <nicolas17> | hard to answer your questions if you keep disconnecting |
21:04:48 | <e7269535e6632> | How do i prevent it from disconnecting? I am also having significant problems with the chat box here. Did I miss any comments or explanations? |
21:05:27 | <@JAA> | Keep the webchat tab in the foreground or move it to a separate window. |
21:06:29 | <@JAA> | https://hackint.logs.kiska.pw/archiveteam-bs/20231204#c393345 |
21:07:35 | <nicolas17> | JAA: "grab them as WARCs and upload that?" that won't appear on WBM will it? |
21:08:25 | <@JAA> | nicolas17: We can make that happen. |
21:08:38 | <nicolas17> | qwarc might work for this... |
21:11:21 | | kitonthe1et quits [Ping timeout: 272 seconds] |
21:12:01 | <that_lurker> | e7269535e6632: In case you did not yet disconnect. Why do you want the site to be archived so often and what is the site? Also connecting trough irc would be better if you are having trouble with the webirc client |
21:15:31 | | BlueMaxima joins |
21:21:49 | | e7269535e6632 quits [Ping timeout: 265 seconds] |
21:22:13 | <that_lurker> | (╯°□°)╯︵ ┻━┻ |
21:25:21 | <@JAA> | lol |
21:29:22 | <fireonlive> | christ lol |
21:31:50 | <Pedrosso> | !tell e7269535e6632 "can anyone please teach me how to add a webpage to the wayback machine crawls?" To archive a site on your own which is what it sounds like you're asking for use https://github.com/ArchiveTeam/grab-site and upload to IA through an item. It won't show up on the wayback machine but it will be saved which is really the point. |
21:31:50 | <Pedrosso> | To have it in the wayback machine it'd have to be queried to AT's ArchiveBot, one of their other projects. I'm not sure what other ways there are. |
21:31:51 | <eggdrop> | [tell] ok, I'll tell e7269535e6632 when they join next |
21:32:20 | <nicolas17> | inb4 they join with a different nickname next time |
21:32:26 | <Pedrosso> | Haha |
21:32:32 | <fireonlive> | !tell e7269535e6632 <Pedrosso> To have it in the wayback machine it'd have to be queried to AT's ArchiveBot, one of their other projects. I'm not sure what other ways there are. |
21:32:32 | <eggdrop> | [tell] ok, I'll tell e7269535e6632 when they join next |
21:32:37 | <fireonlive> | (cut in two) |
21:32:38 | <Pedrosso> | Thank |
21:32:41 | <fireonlive> | :) |
21:32:42 | <Pedrosso> | I bet they may, but they didn't last time |
21:32:48 | <Pedrosso> | (about joining with a different nick) |
21:33:17 | <fireonlive> | they may have been a7427a63 from the 2nd but unknown |
21:33:30 | <that_lurker> | They could also hopefully be reading the logs and see that, if they are in fact having issues with the webchat |
21:33:42 | <nicolas17> | was e853962747e3759 earlier today |
21:34:05 | <that_lurker> | and most likely <a7427a63 as well |
21:51:15 | <@JAA> | Maybe they at least got the log link and are reading there. If so, hi. :-) |
21:56:25 | <fireonlive> | supppppppp |
22:00:37 | | ThetaDev quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
22:00:45 | | ThetaDev joins |
22:09:21 | | Wohlstand quits [Client Quit] |
22:18:34 | <nicolas17> | https://argenteam.net/ this website provides crowdsourced subtitles in spanish, mainly for uhh questionably-obtained movies |
22:19:04 | <nicolas17> | you can find subtitles by the torrent infohash so you know it syncs properly with the exact video you have |
22:19:08 | <nicolas17> | it's shutting down at the end of the year |
22:19:34 | <nicolas17> | they said they will soon publish a torrent with all 100k subtitles they have done |
22:20:31 | <fireonlive> | oh nice |
22:21:19 | <nicolas17> | there's also a forum with 127475 threads but it seems to be login-walled, so that could be complicated to archive |
22:25:36 | <nicolas17> | hm some forums are open https://foro.argenteam.net/viewforum.php?f=11 |
22:29:04 | <immibis> | i would think that subtitles work no matter how you got a copy of a movie, so there's no need to call them questionably-obtained |
22:29:41 | <nicolas17> | immibis: well, the website actually has magnet: links to the video the subtitles were made for >.> |
22:30:07 | <nicolas17> | so I'm sure many people use it primarily as a torrent search index too |
22:30:23 | <Pedrosso> | Would having someone sign up for a throwaway account and giving the cookie for archival something that'd work? For the login-walls I mean |
22:30:51 | <nicolas17> | Pedrosso: I signed up and I can see all forums normally |
22:31:23 | <nicolas17> | I'm just not sure if that can be used for archival |
22:31:23 | <Pedrosso> | Can the archivebot use custom cookies? |
22:31:34 | <@JAA> | No |
22:31:45 | <nicolas17> | and well, my username shows up on every page :P |
22:32:04 | <@JAA> | Things archived with accounts also can't go into the WBM generally speaking. |
22:35:03 | <Pedrosso> | I see |
22:35:20 | <Pedrosso> | Then how are such things generally saved? |
22:37:14 | <@JAA> | I've done some with wpull and cookies. The WARCs are somewhere, either in IA just for download and local playback or still sitting in my pile of stuff to upload. |
22:38:35 | <Pedrosso> | https://web.archive.org/web/20150416181917/https://www.furaffinity.net/view/1/ seems to have a user "~smaugit" |
22:39:12 | <Pedrosso> | "generally speaking" so it was just a special case? |
22:40:18 | <nicolas17> | forums (viewforum.php?f=) 4, 11, 35, 46, 55, 64 are publicly accessible |
22:40:36 | <fireonlive> | https://www.furaffinity.net/user/smaugit/ < people have nice things to say about the account haha |
22:40:43 | <Pedrosso> | yep, they sure do |
22:42:59 | <nicolas17> | forums 1, 4, 11, 14, 27, 35, 46, 55, 63, 64, 66, 67, 73 are accessible on a brand new account |
22:44:04 | <nicolas17> | of the remaining IDs, when logged in some return "forum doesn't exist" and others return "you're not authorized to see this forum" (probably private stuff for trusted translators, moderators, etc) |
22:45:11 | <h2ibot> | FireonLive edited Issuu (+115, move to partially saved to now, can be changed…): https://wiki.archiveteam.org/?diff=51254&oldid=50096 |
22:47:07 | <Pedrosso> | (Also, since the last one was in 2015, another proactive grab of furaffinity might be warranted, maybe?) |
22:47:38 | <@JAA> | Pedrosso: I can only think of one or two cases where such archives went into the WBM. For SPUF, Valve people gave us an account to continue archiving past the shutdown deadline, allowing us to cover everything. And I think there was another one that I can't remember right now. |
22:49:10 | <nicolas17> | I could make a more anonymous account :P |
22:49:18 | <nicolas17> | but anyway |
22:49:26 | <nicolas17> | there's a few public forums |
22:49:38 | <nicolas17> | and there's the main site to deal with |
22:55:33 | | c3manu quits [Remote host closed the connection] |
22:55:51 | | BornOn420 quits [Ping timeout: 272 seconds] |
23:06:36 | | BornOn420 (BornOn420) joins |
23:08:52 | <nicolas17> | oh fun, the pages are not deterministic |
23:09:07 | <nicolas17> | "Codec info = AVC Baseline@L4.2 | V_MPEG4/ISO/AVC" |
23:09:23 | <nicolas17> | gets turned into <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="56143725333a3f3833161a627864"> and the cfemail field changes across requests |
23:16:15 | | aninternettroll quits [Read error: Connection reset by peer] |
23:16:31 | | aninternettroll (aninternettroll) joins |
23:17:12 | <nicolas17> | anyway I'm doing a simplistic wget of all movie IDs now |
23:17:30 | <nicolas17> | because they have a <meta name="og:url"> with the canonical URL |
23:20:00 | | DogsRNice joins |
23:25:31 | <fireonlive> | -+rss- YouTuber who intentionally crashed plane is sentenced to 6 months in prison: https://twitter.com/bnonews/status/1731748816250974335 https://news.ycombinator.com/item?id=38523704 |
23:25:31 | <eggdrop> | nitter: https://nitter.net/bnonews/status/1731748816250974335 |
23:26:15 | | BornOn420 quits [Ping timeout: 272 seconds] |
23:37:05 | | BornOn420 (BornOn420) joins |
23:43:59 | | BornOn420 quits [Ping timeout: 272 seconds] |
23:55:24 | | BornOn420 (BornOn420) joins |
23:57:10 | <nicolas17> | should take me 30 minutes to get all IDs |
23:58:12 | | Island quits [Read error: Connection reset by peer] |