00:02:10 | | loug8318142 quits [Quit: The Lounge - https://thelounge.chat] |
00:13:24 | | SootBector quits [Remote host closed the connection] |
00:13:45 | | SootBector (SootBector) joins |
00:23:49 | <pabs> | xkey: re AB ignores, see also https://wiki.archiveteam.org/index.php/ArchiveBot/Ignore |
00:24:58 | <pabs> | tzt: sounds like something to add to https://wiki.archiveteam.org/index.php/Deathwatch |
01:16:19 | <@JAA> | Re Alf's question, I've been thinking about this recently: it would be good to have a simple prominent thing on the wiki homepage as a very low entry barrier for how to tell us about a shuttering site (operators or users), contact us in case we cause issues, etc. |
01:17:37 | | szczot3k3 (szczot3k) joins |
01:17:56 | <@JAA> | There's a lot of stuff linked from the homepage that probably explains it more or less, but it's a lot of stuff to go through for a first-time visitor. |
01:19:30 | <@JAA> | The first question in the FAQ is kind of that, but the link to the FAQ is not exactly prominent. |
01:21:13 | | szczot3k quits [Ping timeout: 260 seconds] |
01:21:13 | | szczot3k3 is now known as szczot3k |
01:23:54 | <@JAA> | !tell Alf There's no FAQ entry, but if you tell us what the site is, we'll make sure it gets archived. Include quirks would be helpful if applicable, e.g. if there's a rate limit we should obey or not easily discoverable parts of the site. |
01:23:54 | <eggdrop> | [tell] ok, I'll tell Alf when they join next |
01:24:19 | <@JAA> | Including* meh |
01:47:28 | <h2ibot> | Thezt edited Deathwatch (+255, Add Star.ne.jp shutdown): https://wiki.archiveteam.org/?diff=54145&oldid=54127 |
01:53:26 | | graham9 joins |
02:09:48 | | nicolas17 quits [Quit: Konversation terminated!] |
02:10:01 | | nicolas17 joins |
02:13:48 | <pabs> | yeah, I encountered the need for such a low-entry-barrier FAQ, when I posted a HN thread about AT: https://news.ycombinator.com/item?id=42447579 |
02:16:30 | <pabs> | TheTechRobo++ |
02:16:31 | <eggdrop> | [karma] 'TheTechRobo' now has 9 karma! |
02:16:41 | <pabs> | (for the #jseater topic change) |
02:16:55 | <TheTechRobo> | narc :P |
02:17:00 | <pabs> | :) |
02:24:02 | | nicolas17 quits [Client Quit] |
02:24:15 | | nicolas17 joins |
03:44:27 | | Wohlstand quits [Ping timeout: 252 seconds] |
03:48:35 | | Webuser851025 joins |
03:50:06 | | Webuser851025 quits [Changing host] |
03:50:06 | | Webuser851025 joins |
03:52:33 | | Webuser851025 quits [Client Quit] |
03:58:07 | | pixel leaves [Error from remote client] |
04:01:12 | <@OrIdow6> | JAA: Agreed, we need a giant green button saying "report a site shutting down" |
04:01:17 | | BlueMaxima quits [Read error: Connection reset by peer] |
04:34:41 | | Hans5958 leaves |
04:40:33 | | cm quits [Ping timeout: 252 seconds] |
04:41:27 | | ljcool2006 joins |
04:44:16 | | cm joins |
04:44:25 | <ljcool2006> | >Livestream is shutting down in January 2025. Keep streaming on Vimeo |
04:44:46 | <ljcool2006> | livestream doesn't seem to have an article on the wiki yet |
04:51:41 | <@JAA> | Acquired in 2017, lots of stuff already redirects to Vimeo, including the announcement of the acquisition, which didn't set a timeline: https://web.archive.org/web/20230701084346/https://livestream.com/blog/livestream-vimeo-acquisition |
04:51:57 | <@JAA> | The announcement's just a banner on the page. |
04:52:41 | <@JAA> | Considering it's live streaming, there's probably not very much to archive. |
04:56:03 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+300, /* 2025 */ Add Livestream): https://wiki.archiveteam.org/?diff=54146&oldid=54145 |
04:58:04 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+2, /* 2025 */ Fix ref): https://wiki.archiveteam.org/?diff=54147&oldid=54146 |
05:02:05 | <h2ibot> | Wireball edited List of websites excluded from the Wayback Machine (+23, The School Bus Conversion Network (e.g. forums)…): https://wiki.archiveteam.org/?diff=54148&oldid=54081 |
05:05:07 | | katocala quits [Ping timeout: 252 seconds] |
05:05:16 | | BornOn420 quits [Remote host closed the connection] |
05:05:25 | | katocala joins |
05:05:54 | | BornOn420 (BornOn420) joins |
05:07:41 | | DogsRNice quits [Read error: Connection reset by peer] |
05:16:07 | | katocala quits [Ping timeout: 252 seconds] |
05:17:05 | | katocala joins |
05:21:04 | <@arkiver> | i'll look into livestream.com ! |
05:31:54 | <@arkiver> | JAA: there's quite some data still on livestream.com , they host some old streams |
05:39:05 | <@arkiver> | does anyone have an idea for a channel for "Vimeo Livestream" or "livestream.com"? |
05:39:08 | | th3z0l4 joins |
05:39:38 | | th3z0l4_ quits [Ping timeout: 260 seconds] |
05:41:19 | <tzt> | deadtrickle? because its no longer a stream |
06:00:15 | <h2ibot> | JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=54149&oldid=54148 |
06:45:38 | <Stagnant_> | https://forum.bradleysmoker.com - SMF forum with over 22 years worth of posts will "discontinue in beginning of 2025". Can someone add it to archivebot? Doesn't seem to be rate limited. |
07:02:55 | <@OrIdow6> | ^ is there a writeup anywhere of AB tricks to get around session IDs? |
07:07:13 | <@arkiver> | JAA: ^ |
07:07:18 | <@arkiver> | tzt: fine with me :) |
07:07:24 | <@arkiver> | #deadtrickle for livestream.com |
07:07:51 | <pokechu22> | OrIdow6: Run it as !a https://forum.bradleysmoker.com/?archiveteam so that the first page load gets session IDs, and then it can later load https://forum.bradleysmoker.com/ and https://forum.bradleysmoker.com/index.php with the cookie set to avoid session IDs on those. And then just hope that the session IDs don't expire in the middle of the job and cause issues that way |
07:10:11 | <@OrIdow6> | pokechu22: Huh, do you then ignore the URLs with the session IDs so it doesn't do a parallel crawl of them? |
07:10:18 | <@OrIdow6> | Testing it in a browser |
07:11:12 | <pokechu22> | IIRC wpull does some stuff to strip session IDs from URLs (which also means that the URLs saved in the WARC for the first page won't directly work, as those ones will have session IDs but they'll be requested without). But even if the URL has a session ID in it, as long as the cookie's been set later page loads won't have session IDs |
07:12:08 | <@OrIdow6> | ah |
07:12:43 | <pokechu22> | huh, that's weird; I'm also not seeing session IDs when using curl |
07:13:16 | <@OrIdow6> | Thx pokechu, started the job for it |
07:14:10 | <pokechu22> | Looks like they don't do that session ID thing when using curl's UA but do with (current) firefox UAs. Interesting, though probably not relevant in this case since we have the ?archiveteam workaround |
07:15:37 | <@OrIdow6> | Huh |
07:15:58 | <@OrIdow6> | Stagnant_: Running |
07:16:08 | <Stagnant_> | i'm not seeing any session ids on my firefox browser. i do see them with chrome though |
07:16:14 | <Stagnant_> | Great, thanks ;) |
07:17:05 | <pokechu22> | They show up only on the very first time you load the site (easiest thing to try is to load the page in private browsing); afterwards they don't show up |
07:17:27 | <Stagnant_> | Oh I see |
08:32:32 | | i_have_n0_idea9 quits [Quit: The Lounge - https://thelounge.chat] |
08:38:51 | | i_have_n0_idea9 (i_have_n0_idea) joins |
08:49:41 | <eggdrop> | [remind] OrIdow6: my little pony |
08:50:57 | | opl joins |
08:58:50 | | loug8318142 joins |
09:15:13 | | nulldata quits [Quit: So long and thanks for all the fish!] |
09:15:17 | <xkey> | pabs: thanks for the link! |
09:16:08 | | nulldata (nulldata) joins |
09:25:20 | <opl> | hello! forwarding this from someone more involved in the fighting game community: "there's a big japanese arcade called a-cho that is shutting down soon and they have a very long running youtube channel with thousands and thousands of videos of matches and tournaments and whatnot. Anyway it turns out they are going to be deleting their youtube |
09:25:20 | <opl> | channel <https://x.com/Chickzama/status/1875045024507019668> |
09:25:20 | <opl> | <https://x.com/chibax7jp/status/1875084159624114489>" |
09:25:21 | <eggdrop> | nitter: https://xcancel.com/Chickzama/status/1875045024507019668 |
09:25:24 | <opl> | according to the tweets, the arcade shuts down on 2025-01-31, and the youtube channels are to be deleted on 2025-02-28 |
09:25:29 | <opl> | the youtube channels in question are <https://www.youtube.com/channel/UCCfnriDcUslGMUMX4Ctkyjg> (at GAMEacho/a-cho GAME) and <https://www.youtube.com/channel/UCkXtcsyQ6g8coNrclPvt29w> (at zero3japan/a-cho battle movie). i believe they're both eligible to get queued up for archival, but i'm not familiar enough with the process to attempt that myself |
09:29:24 | <@OrIdow6> | !remindme 1d my little pony |
09:29:26 | <eggdrop> | [remind] ok, i'll remind you at 2025-01-04T09:29:24Z |
09:41:42 | | Island quits [Read error: Connection reset by peer] |
09:44:29 | <opl> | i'm also noticing some of the tournament results pages aren't archived by IA. roots: http://www.a-cho.com/ac/res_2018.html (contains links to previous years at the bottom) http://www.a-cho.com/ac/res_2019.html http://www.a-cho.com/ac/res_2020.html (pages for just 2019 and 2020?) |
10:00:59 | <h2ibot> | JAABot edited CurrentWarriorProject (-2): https://wiki.archiveteam.org/?diff=54150&oldid=53879 |
10:04:38 | <Flashfire42> | Hey um why is archiveteams choice youtube atm? It saves the IPs of warriors in the url and some people may not want that without opting in? |
10:05:33 | <Flashfire42> | cc JAA arkiver ? |
10:06:56 | <@arkiver> | youtube has introduced new blocking, i wanted to get the queue down a bit |
10:27:43 | <joepie91|m> | arkiver: note that this is likely getting the IPs blocked of people running the warrior |
10:43:59 | | sec^nd quits [Remote host closed the connection] |
10:47:39 | | pixel (pixel) joins |
11:04:27 | | Gadelhas562873 quits [Ping timeout: 252 seconds] |
11:09:54 | | Gadelhas562873 joins |
11:11:25 | | pseudorizer quits [Ping timeout: 252 seconds] |
11:12:57 | | pseudorizer (pseudorizer) joins |
11:18:31 | <@arkiver> | joepie91|m: Flashfire42: i've moved it back to telegram |
11:19:13 | <@arkiver> | on blocking - the items being handed out was pretty stable, no obvious sign of blocking (then we would see the number of requested items going down) |
11:23:38 | <@arkiver> | and thanks joepie91|m - i did not consider the point enough in switching over. will keep it like it is now (so no youtube default) |
11:37:20 | | sec^nd (second) joins |
11:38:19 | | Webuser716927 joins |
11:38:43 | | Webuser716927 quits [Client Quit] |
11:51:50 | <joepie91|m> | 👍️ |
11:52:07 | <joepie91|m> | as for the blocking, it seems to work in waves, it doesn't seem to be a fully automated/rolling process |
11:52:23 | <joepie91|m> | people seem to have gotten blocked for using yt-dlp in the past even though they've stopped using it for example |
11:52:36 | <joepie91|m> | so presumably there's some kind of batch data crunching going on that spits out IPs to block every once in a while |
11:59:21 | <h2ibot> | Bzc6p edited Internet Archive (+106, /* Wayback Machine Save Page Now */ mention…): https://wiki.archiveteam.org/?diff=54151&oldid=53670 |
12:00:03 | | Bleo182600722719623 quits [Quit: The Lounge - https://thelounge.chat] |
12:00:21 | <h2ibot> | JAABot edited CurrentWarriorProject (+2): https://wiki.archiveteam.org/?diff=54152&oldid=54150 |
12:02:54 | | Bleo182600722719623 joins |
12:04:43 | <@OrIdow6> | https://x.com/acho_kyoto/status/1871138196291256535 a-cho good shutdown notice |
12:04:43 | <eggdrop> | nitter: https://xcancel.com/acho_kyoto/status/1871138196291256535 |
12:09:23 | <h2ibot> | Bzc6p edited Cafeblog.hu (+189, /* Site reconnaissance */ more info): https://wiki.archiveteam.org/?diff=54153&oldid=54132 |
12:12:13 | <@OrIdow6> | opl: Not sure abou the Youtube channel but I'm running at least one more ArchiveBot crawl for a-cho, because it seems their site is half-broken and the last one didn't discover those pages |
12:13:51 | | lflare quits [Quit: Bye] |
12:14:12 | <@OrIdow6> | !remindme 2w see if the a-cho job got http://www.a-cho.com/ac/res_2019.html and http://www.a-cho.com/ac/res_2020.html |
12:14:13 | <eggdrop> | [remind] ok, i'll remind you at 2025-01-17T12:14:12Z |
12:14:14 | | lflare (lflare) joins |
12:18:17 | <opl> | thanks, OrIdow6. any idea what should be done with the youtube content? i know downthetube exists, but it seems the queue is rather long so i'd be worried about the videos never making it to the front before deletion |
12:19:24 | <h2ibot> | Bzc6p edited Blogger.hu (+301, add discovery numbers, update status): https://wiki.archiveteam.org/?diff=54154&oldid=54131 |
12:21:19 | <@OrIdow6> | opl: So with ArchiveTeam we do have video archival systems, most notable #down-the-tube, however videos are HUGE especially when people want to archive them in bulk and the result of that is that there are criteria for what gets included that I'm not familiar with, so if someone who is wants to comment they can |
12:21:25 | <h2ibot> | Bzc6p edited Blogger.hu (+0, /* Archiving */ fix crash date): https://wiki.archiveteam.org/?diff=54155&oldid=54154 |
12:21:45 | <opl> | i'm sure some people will attempt their own archive jobs, but i imagine the results of those would become nearly impossible to find afterwards. there's also the problem of there being days of recordings there, so i imagine most people making an attempt won't have enough storage space for it all |
12:21:46 | <@OrIdow6> | For your own purposes, a yt-dlp wrapper would work; or yt-dlp itself but it requires using a command-line |
12:22:43 | <@OrIdow6> | If the fighting game community is organized enough best thing to do would be to have it host them itself I think |
12:23:09 | <@OrIdow6> | But yeah $$$ |
12:23:11 | | Webuser981789 joins |
12:23:28 | | Webuser981789 quits [Client Quit] |
12:24:25 | <h2ibot> | Bzc6p edited Kepfeltoltes.eu (+24, /* Archiving */ add 2024 numbers): https://wiki.archiveteam.org/?diff=54156&oldid=53345 |
12:24:26 | <opl> | yeah, i'm familiar with yt-dlp. i think i'm mostly trying to figure out here if anyone has figured out how to solve the problem of link rot in situations like this. after all, it doesn't matter if you have a copy of everything if no one can discover it |
12:25:22 | <opl> | i'm not actually part of that scene, but i do hate to see history gone |
12:25:38 | <@OrIdow6> | Pretty common sentiment here :D |
12:26:03 | <@OrIdow6> | But sadly I don't know of any such universal thing, tho I'm not familiar with Youtube archival specifically |
12:26:46 | <@OrIdow6> | If something like IPFS (and funnily enough we had a brief conversation about that here a few days ago) wasn't hugely inefficient and impossible to use that might've been a good solution |
12:27:29 | <@OrIdow6> | It's too bad web.archive.org is that if you let anyone say "hey, I'm hosting this!" and have it redirect to their site or whatever, but there are a bunch of obvious issues with that |
12:28:19 | <@OrIdow6> | But uh, I do encourage you to stick in the channel and see if you get any other comments about Youtube specifically |
12:28:55 | <opl> | yeah, indexing dead content is ultimately a massive pain that i've spent many nights pondering |
12:29:41 | <opl> | ipfs... i tried using it once, and it seemed like a solid idea. never used it seriously enough to discover the issues. might need to read through that conversation out of curiosity |
12:29:57 | <@OrIdow6> | It uses a gigantic amount of RAM mostly |
12:31:53 | <opl> | and i'll definitely be sticking around. would asking in down-the-tube specifically have been better, or would it just be more likely to get lost there? not super familiar with the conventions here |
12:32:50 | <opl> | plus i'm noticing the project channels aren't saved in the current log archives, so i'm hesitant to even ask there in case someone else comes with the same problem later |
12:36:27 | <h2ibot> | Bzc6p edited EOldal (+36, add company navbox): https://wiki.archiveteam.org/?diff=54157&oldid=49818 |
12:37:04 | <@OrIdow6> | opl: Worth a shot asking there, I'd say mention that they're being deleted; it's all the same people, but some channels fill up with bots talking and some fill up with irrelevant-to-them conversations and no one reads everything |
12:37:56 | <@OrIdow6> | You've asked here already for logging purposes |
12:39:16 | <opl> | ok, will do. i didn't want to ask in two different channels at the same time since i wasn't sure if it's frowned upon >.> |
12:40:09 | | SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962] |
12:41:30 | | SkilledAlpaca418962 joins |
13:03:10 | | IDK (IDK) joins |
13:22:40 | | NF885 (NF885) joins |
13:24:20 | <NF885> | looks like the Vine Archive site at https://vine.co has died |
13:26:37 | <NF885> | e.g. https://vine.co/twitter shows a broken loading gif |
13:29:15 | | NF885 quits [Client Quit] |
13:46:21 | | Shjosan quits [Read error: Connection reset by peer] |
13:47:10 | | Shjosan (Shjosan) joins |
13:47:43 | | Wohlstand (Wohlstand) joins |
13:50:15 | | Shjosan quits [Client Quit] |
13:52:00 | | Shjosan (Shjosan) joins |
14:59:07 | | graham9 quits [Quit: The Lounge - https://thelounge.chat] |
15:14:48 | | Wohlstand quits [Ping timeout: 260 seconds] |
15:55:12 | | Webuser171702 joins |
15:55:54 | | Webuser171702 quits [Client Quit] |
16:53:58 | | qwertyasdfuiopghjkl2 quits [Ping timeout: 260 seconds] |
16:55:32 | | loug8318142 quits [Quit: The Lounge - https://thelounge.chat] |
16:55:50 | | loug8318142 joins |
17:07:56 | | qwertyasdfuiopghjkl2 joins |
17:07:56 | | qwertyasdfuiopghjkl2 is now authenticated as qwertyasdfuiopghjkl2 |
17:08:30 | | qwertyasdfuiopghjkl2 quits [Max SendQ exceeded] |
17:10:09 | | qwertyasdfuiopghjkl2 joins |
17:10:09 | | qwertyasdfuiopghjkl2 is now authenticated as qwertyasdfuiopghjkl2 |
17:10:43 | | qwertyasdfuiopghjkl2 quits [Max SendQ exceeded] |
17:11:18 | | qwertyasdfuiopghjkl2 joins |
17:11:18 | | qwertyasdfuiopghjkl2 is now authenticated as qwertyasdfuiopghjkl2 |
17:11:52 | | qwertyasdfuiopghjkl2 quits [Max SendQ exceeded] |
17:13:14 | | qwertyasdfuiopghjkl2 joins |
17:13:14 | | qwertyasdfuiopghjkl2 is now authenticated as qwertyasdfuiopghjkl2 |
17:13:48 | | qwertyasdfuiopghjkl2 quits [Max SendQ exceeded] |
17:15:20 | | qwertyasdfuiopghjkl2 joins |
17:15:20 | | qwertyasdfuiopghjkl2 is now authenticated as qwertyasdfuiopghjkl2 |
17:15:54 | | qwertyasdfuiopghjkl2 quits [Max SendQ exceeded] |
17:16:11 | | qwertyasdfuiopghjkl2 joins |
17:16:11 | | qwertyasdfuiopghjkl2 is now authenticated as qwertyasdfuiopghjkl2 |
17:16:45 | | qwertyasdfuiopghjkl2 quits [Max SendQ exceeded] |
17:20:45 | | i_have_n0_idea9 quits [Quit: The Lounge - https://thelounge.chat] |
17:25:00 | | i_have_n0_idea9 (i_have_n0_idea) joins |
17:26:17 | | i_have_n0_idea9 quits [Client Quit] |
17:26:45 | | i_have_n0_idea9 (i_have_n0_idea) joins |
17:29:34 | | graham9 joins |
17:30:22 | | qwertyasdfuiopghjkl2 joins |
17:30:22 | | qwertyasdfuiopghjkl2 is now authenticated as qwertyasdfuiopghjkl2 |
17:30:57 | | qwertyasdfuiopghjkl2 quits [Max SendQ exceeded] |
18:23:50 | | Webuser330602 joins |
18:23:56 | | Webuser330602 quits [Client Quit] |
19:08:55 | | Wohlstand (Wohlstand) joins |
19:41:59 | | hackbug quits [Remote host closed the connection] |
19:43:07 | | graham9 quits [Client Quit] |
20:01:54 | | abirkill quits [Quit: Let us prepare to grapple with the ineffable itself, and see if we may not eff it after all.] |
21:01:54 | | hackbug (hackbug) joins |
21:25:28 | | loug8318142 quits [Quit: The Lounge - https://thelounge.chat] |
22:18:49 | | graham9 joins |