00:00:02 | <eggdrop> | [remind] thuban: re-run youtube channels for https://wiki.archiveteam.org/index.php/Political_parties/Georgia |
00:03:10 | | duck joins |
00:03:26 | | duck quits [Client Quit] |
00:08:19 | <nicolas17> | Apple developer documentation seems hard to archive... I'll make a wiki page about it |
00:11:42 | | DebutanteDebbie joins |
00:12:54 | <DebutanteDebbie> | Hi, I'm just wondering I did an edit on List of websites excluded from the Wayback Machine and its says its waiting for a moderator to approve. How long will it take for my edit to be approved? |
00:14:32 | <pabs> | its usually reviewed regularly, when the moderators have time. probably a day or a few |
00:15:17 | <@arkiver> | i'll have a look |
00:15:47 | <DebutanteDebbie> | pabs thank you. |
00:15:47 | <DebutanteDebbie> | also, arkiver thank you. Appreciate it. |
00:17:20 | <h2ibot> | Debutantedebbie edited List of websites excluded from the Wayback Machine (+143, Added sites that I know of which are excluded…): https://wiki.archiveteam.org/?diff=53647&oldid=53552 |
00:17:30 | <DebutanteDebbie> | Thank you appreciate your help. |
00:17:36 | <pabs> | thanks for the edit :) |
00:17:37 | <@arkiver> | done |
00:18:20 | <h2ibot> | Skodwarde32 edited MSN Messenger (-5): https://wiki.archiveteam.org/?diff=53648&oldid=37334 |
00:18:33 | <DebutanteDebbie> | pabs My pleasure. |
00:18:45 | <DebutanteDebbie> | arkiver again thank you for taking care of this. |
00:18:50 | <@arkiver> | :) |
00:19:43 | | DebutanteDebbie quits [Client Quit] |
00:21:49 | | person1234 joins |
00:23:13 | <thuban> | hey arkiver, what's the status of #down-the-tube? (i had the impression it wasn't directly backed by ia storage and shouldn't be affected by the outage, but maybe i am misremembering) |
00:24:42 | <@arkiver> | thuban: it would go into temporary storage as well, like #telegrab and #// |
00:24:50 | <@arkiver> | but it's currently broken unfortunately, i need to make a fix |
00:24:55 | <Vokun> | Definitely misremembering. Maybe you're thinking of #youtubearchive:hackint.org |
00:25:06 | <@arkiver> | ah |
00:25:07 | <@arkiver> | right |
00:25:15 | <@arkiver> | err #youtubearchive is like a work in progress |
00:25:31 | <person1234> | could https://cokemachineglow.com/ and https://www.originalsoundversion.com/ be archived? The former is defunct and the latter appears to be, and both are used as sources on Wikipedia |
00:26:11 | <thuban> | Vokun: right, thanks |
00:27:09 | <pokechu22> | person1234: looking into those now |
00:28:18 | <Vokun> | a 'work in progress' that would take like 110% of IA's yearly income to back up lol |
00:28:35 | <thuban> | arkiver: i'd like to re-run the georgian parliamentary channels before the election on the 26th, so if we could do those (manually if the bot won't be up) that would be good :) |
00:29:15 | <thuban> | probably not too much new there since i ran them a couple of months ago |
00:29:41 | | etnguyen03 quits [Client Quit] |
00:29:49 | | person1234 quits [Client Quit] |
00:30:34 | <pokechu22> | last pages of those are http://www.originalsoundversion.com/blog/page/435/ and http://cokemachineglow.com/?pg=654 and neither has a sitemap it seems, but should be fine otherwise |
00:51:09 | | DogsRNice joins |
01:00:32 | <h2ibot> | JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=53649&oldid=53647 |
01:21:22 | | etnguyen03 (etnguyen03) joins |
01:23:58 | | wyatt8750 joins |
01:24:24 | | wyatt8740 quits [Ping timeout: 258 seconds] |
01:28:45 | | pabs quits [Ping timeout: 260 seconds] |
01:30:24 | | wyatt8750 quits [Read error: Connection reset by peer] |
01:30:40 | | wyatt8740 joins |
01:33:08 | | pixel leaves [Error from remote client] |
01:35:24 | | pabs (pabs) joins |
01:50:42 | <h2ibot> | Switchnode edited Current Projects (-44, add veoh; move curiouscat to recent; remove mildom): https://wiki.archiveteam.org/?diff=53650&oldid=53562 |
01:52:57 | | icedice quits [Quit: Leaving] |
02:07:43 | | etnguyen03 quits [Client Quit] |
02:11:39 | | etnguyen03 (etnguyen03) joins |
02:12:45 | <h2ibot> | Switchnode edited Veoh (+261, update status; add infobox details): https://wiki.archiveteam.org/?diff=53651&oldid=53619 |
02:57:03 | | etnguyen03 quits [Remote host closed the connection] |
03:04:41 | | JaffaCakes118 quits [Remote host closed the connection] |
03:20:33 | <pabs> | driib: sounds like flickr has some moderate deletion danger "Flickr themselves started purging any public images over a certain amount on free accounts." https://news.ycombinator.com/item?id=41941117 |
03:26:09 | | pabs quits [Remote host closed the connection] |
03:26:55 | | pabs (pabs) joins |
03:30:42 | <@JAA> | DigitalDragons: Yeah, there's a reason why irclogs has never been officially announced anywhere... :-) |
03:32:56 | <DigitalDragons> | Fair enough :p |
03:52:22 | <eggdrop> | [remind] nicolas17: is IA back? upload zimbardo.com warc |
03:53:41 | <nicolas17> | unfortunately, |
03:53:50 | <nicolas17> | !remindme 1w is IA back? upload zimbardo.com warc |
03:53:51 | <eggdrop> | [remind] ok, i'll remind you at 2024-11-01T03:53:50Z |
04:27:04 | | cm quits [Ping timeout: 255 seconds] |
04:30:03 | | cm joins |
05:52:15 | | DogsRNice quits [Read error: Connection reset by peer] |
06:20:41 | | pixel (pixel) joins |
06:31:57 | | loug8318142 joins |
06:51:28 | | tek_dmn quits [Quit: ZNC - https://znc.in] |
07:05:01 | | Unholy236192464537713 quits [Remote host closed the connection] |
07:05:49 | | Unholy236192464537713 (Unholy2361) joins |
07:22:55 | | pixel leaves |
08:29:12 | | BPCZ quits [Quit: eh???] |
08:32:29 | | BPCZ (BPCZ) joins |
08:57:58 | | qwertyasdfuiopghjkl quits [Ping timeout: 255 seconds] |
09:15:14 | | nulldata quits [Quit: So long and thanks for all the fish!] |
09:16:08 | | nulldata (nulldata) joins |
09:29:17 | | vix5110_ joins |
09:44:15 | | beastbg8 quits [Read error: Connection reset by peer] |
09:46:44 | | beastbg8 (beastbg8) joins |
09:59:45 | | driib quits [Quit: The Lounge - https://thelounge.chat] |
10:00:02 | | driib (driib) joins |
10:05:46 | | PredatorIWD2 quits [Read error: Connection reset by peer] |
10:11:16 | | PredatorIWD2 joins |
10:15:17 | | sralracer joins |
10:15:43 | | sralracer is now authenticated as sralracer |
10:25:53 | | Xanthos joins |
10:25:53 | | Xanthon quits [Read error: Connection reset by peer] |
10:26:00 | | Xanthos is now known as Xanthon |
10:26:01 | | Xanthon is now authenticated as Xanthon |
10:26:01 | | Xanthon quits [Changing host] |
10:26:01 | | Xanthon (Xanthon) joins |
11:00:02 | | Bleo18260072271962 quits [Quit: The Lounge - https://thelounge.chat] |
11:02:45 | | Bleo18260072271962 joins |
11:19:39 | | Wohlstand (Wohlstand) joins |
11:42:31 | | SkilledAlpaca418 quits [Quit: SkilledAlpaca418] |
11:44:11 | | SkilledAlpaca418 joins |
11:46:27 | <h2ibot> | OrIdow6 edited Deathwatch (+317, /* 2024 */ Niconico): https://wiki.archiveteam.org/?diff=53652&oldid=53620 |
12:08:45 | | BornOn420 quits [Ping timeout: 240 seconds] |
12:33:53 | | useretail joins |
12:38:59 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
12:54:49 | | ThreeHM (ThreeHeadedMonkey) joins |
13:29:51 | | khaoohs joins |
13:31:03 | | khaoohs quits [Client Quit] |
13:31:40 | | khaoohs joins |
13:48:03 | | xarph quits [Ping timeout: 258 seconds] |
13:49:14 | | xarph joins |
14:30:53 | | MrMcNuggets (MrMcNuggets) joins |
14:35:28 | | seacow quits [Ping timeout: 255 seconds] |
14:35:48 | | Wohlstand quits [Remote host closed the connection] |
15:06:09 | | anon2024 joins |
15:06:18 | <anon2024> | Hello, this is important |
15:07:27 | <anon2024> | Character.AI is undergoing a massive copyright purge. The recent incident in Florida was the last straw after previous copyright complaints. Many boys got delisted and deleted due to copyright issues and being unlicensed |
15:08:28 | <anon2024> | Would be good to archivebot it for now until we get a possible warrior |
15:09:37 | <that_lurker> | can the indicidual AI's be downloaded or would you just want to grab the info of them? |
15:09:57 | <that_lurker> | s/indicidual/individual |
15:13:16 | | anon2024 quits [Ping timeout: 255 seconds] |
15:17:37 | | lflare quits [Quit: Bye] |
15:18:03 | | lflare (lflare) joins |
15:19:50 | | anon2024 joins |
15:20:32 | <anon2024> | https://archive.ph/6UXkE - this tiktok contains various complaints of the removal |
15:21:09 | <anon2024> | and in the news, the company said that they will remove unlicensed bots |
15:21:26 | <ymgve> | it's not like the actual AI engine can be archived |
15:21:35 | <ymgve> | at best you get the character descriptions/intro texts |
15:22:15 | | anon2024 quits [Client Quit] |
15:23:17 | | imer quits [Quit: Oh no] |
15:23:52 | | imer (imer) joins |
15:27:07 | | Xanthos joins |
15:28:45 | | Xanthon quits [Ping timeout: 260 seconds] |
15:28:45 | | Xanthos is now known as Xanthon |
15:28:47 | | Xanthon is now authenticated as Xanthon |
15:28:47 | | Xanthon quits [Changing host] |
15:28:47 | | Xanthon (Xanthon) joins |
16:08:44 | | Wohlstand (Wohlstand) joins |
16:17:55 | <lennier2_> | I don't know that's there's even enough public data on character.ai to need a project, but grabbing the descriptions and intros sounds reasonable to me. |
16:20:52 | <pabs> | PSA: if you are monitoring the ArchiveBot websocket, katia's AB websocket repeater project is useful to let you download the stream once but monitor it from several clients at once https://github.com/iakat/archivebot-dashboard-repeater |
16:20:53 | <pabs> | I am running it like this: ws-repeater/ ; UPSTREAM=ws://archivebot.archivingyoursh.it/stream gunicorn app:app -b 127.0.0.1:4568 --worker-class uvicorn.workers.UvicornWorker : curl -s ws://localhost:4568/stream |
16:21:07 | <pabs> | katia++ |
16:21:07 | <eggdrop> | [karma] 'katia' now has 55 karma! |
16:29:42 | <that_lurker> | pabs: Out of curiosity what or how many clients do you monitor the stream on? |
16:30:08 | <katia> | 13 |
16:30:19 | <katia> | ask me how i know |
16:30:26 | <katia> | (pabs told me) |
16:30:26 | <pabs> | 13 right now yeah, Ryz has some ideas for more :) |
16:32:09 | <pabs> | code forges, wikis, smolnet job/urls, onion urls, blogspot/blogger, flickr 403s, imgur, pastebin.com, mediafire, SWF files, webrings, mailman2 |
16:32:39 | <pabs> | my monitoring is basically `curl -s | jq` with a bunch of horrifying regexes |
16:32:47 | <that_lurker> | I still have anomaly detection for (my) jobs in my todo list. |
16:34:16 | <pabs> | collecting them is relatively easy, (semi-)manually filtering the results and followup archiving can be time-consuming |
16:35:56 | <pabs> | oh, and one monitoring a specific job that auto-generates ignores |
16:39:55 | | collat joins |
16:42:12 | | Fiduro4830139107 quits [Quit: The Lounge - https://thelounge.chat] |
16:42:22 | | Fiduro4830139107 joins |
16:45:45 | | pixel (pixel) joins |
16:50:33 | | BornOn420 (BornOn420) joins |
16:58:32 | <Ryz> | Heh, hmm, trying to think |
16:59:07 | <Ryz> | pabs, maybe mp4s? Obviously with some ignores on stuff like New York Times and other common links that are hit on? |
16:59:58 | <Ryz> | Could be expanded to a group of video files I think; one much more obscure one that used to be common is .flv |
17:00:23 | <pabs> | are they interesting in some way? |
17:00:41 | <Ryz> | https://www.lifewire.com/what-is-an-flv-file-2621348 - they are related to Adobe Flash, tis be a 'Flash Video' file |
17:01:04 | <pabs> | yeah, I can see flv, but I mean video in general. mp4s must be very common |
17:06:55 | <Ryz> | Much of the video content with some exceptions is gated by YouTube, Facebook, Instagram, through JS and obfuscating the links of those things since they are signatured or have things that cut the videos themselves into many many pieces, which means much of the leftovers are those who can afford to host videos; the curious thing for finding videos |
17:06:56 | <Ryz> | is if we are able to find more of it that websites bother to host themselves to give hints on where to find more of it (since there's content that's only accessible through video which isn't easily searchable within it unlike say text) |
17:07:16 | <Ryz> | This can also be used to determine what video files to ignore in the future for ignoreset badvideos for ArchiveBot if it'll updated |
17:07:55 | <Ryz> | I don't think I see Gamespot or IGN .mp4s as much or anymore, probably they might've secured them from being accessible easily in some way |
17:08:01 | <Ryz> | pabs ^ |
17:08:32 | <pokechu22> | I still see (usually short) videos in wp-content fairly often, which I think are generally not that interesting |
17:14:41 | <pabs> | well, if you can work up a match regex and an ignore regex, I can run it |
17:14:55 | | BornOn420 quits [Client Quit] |
17:15:50 | <Ryz> | pabs, another one could be YouTube, but only for thumbnail metadata; because ArchiveBot tends to get them when the URLs are 200s to them, since we can't really grab videos at all and are only to get the other limited metadata, but it's before they give 429s, and won't be able to archive the thumbnails; would be helpful to have it print out |
17:15:50 | <Ryz> | thumbnail metadata and such |
17:16:11 | | MOSTech6502 joins |
17:16:20 | <Ryz> | There are a lot of obscure YouTube videos that flow by, especially from ArchiveBot archiving forums and such |
17:18:50 | <pabs> | sounds ok. match+ignore regex needed though. and if the HTTP response code needs checking, mention which regex combines with which code |
17:21:12 | <MOSTech6502> | Hey, I have a Fileplanet question. Found a public URL with some links to things hosted formerly on Fileplanet, I want to get one of them. Tried using the file search interface here (https://wiki.archiveteam.org/index.php/Fileplanet), it's not listed |
17:21:41 | <MOSTech6502> | is this still relevant? > The "ftp2" files (another ~300k files at a total of ~1.2TB) cannot be shared publically since there are private files mixed in, we saved them to IA anyways so maybe in the future we can sort them out. If you are looking for files from Fileplanet that are not included in the public archives, contact User:Schbirid with archived URLs that prove their previous availability to the public, e.g. via archived |
17:21:41 | <MOSTech6502> | fileplanet.com pages. |
17:24:30 | <Ryz> | pabs, for videos, you can use this for reference: https://github.com/ArchiveTeam/ArchiveBot/blob/master/db/ignore_patterns/badvideos.json |
17:25:00 | <Ryz> | Probably gonna need some guidance on making matches and ignores, it's just giving you the stuff a bit similar to ignores in ArchiveBot? |
18:16:05 | <asie> | https://about.ask.fm/closure-notice-the-platform-to-be-deactivated-december-1-2024/ |
18:16:35 | <asie> | the Q&A social network is shutting down in slightly over a month |
18:17:02 | <asie> | notable in the Polish manga community as all our publishers standardized on using it to answer public fan inquiries for some reason, but also had way more users than that |
18:19:08 | <asie> | it seems they anticipated AT though as they blocked all user pages for app-only access |
18:26:00 | | MrMcNuggets quits [Quit: WeeChat 4.3.2] |
18:27:30 | | collat quits [Ping timeout: 258 seconds] |
18:36:44 | <h2ibot> | Magmaus3 edited Deathwatch (+201, /* 2024 */ ask.fm shutdown): https://wiki.archiveteam.org/?diff=53653&oldid=53652 |
18:43:16 | | damanoh joins |
18:44:32 | | damanoh quits [Client Quit] |
18:58:37 | <magmaus3> | i'll also make a stub wiki page rq |
19:00:48 | <h2ibot> | Magmaus3 created ASK.fm (+474, ASKfm page (stub)): https://wiki.archiveteam.org/?title=ASK.fm |
19:29:17 | <@arkiver> | thanks asie |
19:32:58 | | BornOn420 (BornOn420) joins |
19:34:53 | <h2ibot> | DigitalDragon edited ASK.fm (+0, ASKfm is not CuriousCat): https://wiki.archiveteam.org/?diff=53655&oldid=53654 |
19:37:22 | <qwertyasdfuiopghjkl> | asie: The "app-only" thing seems to luckily just be a popup that can easily be removed with inspect element to access the rest of the page, and I remember the popup already being a thing a while ago. |
19:39:52 | | collat joins |
19:53:45 | | loug8318142 quits [Ping timeout: 258 seconds] |
20:27:21 | | tek_dmn (tek_dmn) joins |
20:27:37 | | Unholy236192464537713 quits [Quit: Ping timeout (120 seconds)] |
20:27:49 | | Unholy236192464537713 (Unholy2361) joins |
20:30:08 | | Dango360_ quits [Quit: Leaving] |
20:30:21 | | Dango360 (Dango360) joins |
20:31:17 | <@arkiver> | alright let's make a channel for ask.fm |
20:31:19 | <@arkiver> | any ideas? |
20:32:06 | <Flashfire42> | #asktiveteam |
20:32:27 | <@arkiver> | not sure about that one... |
20:32:35 | <Flashfire42> | #askformoretime |
20:33:05 | <Flashfire42> | or just #askformore |
20:35:39 | <lennier2_> | #dontaskfm |
20:36:15 | <@arkiver> | that one is nice |
20:36:21 | <@arkiver> | will keep the competition open a little longer :P |
20:36:48 | <DigitalDragons> | #casketfm |
20:36:50 | | mls (mls) joins |
20:36:59 | <imer> | I WAS JUST ABOUT TO WRITE THAT |
20:37:01 | <imer> | nice |
20:37:11 | <imer> | dontaskfm is best so far imo |
20:37:11 | <DigitalDragons> | >:D |
20:37:43 | <Flashfire42> | well I am squatting them all for the minute. I will give perms to whoever needs them when they get chosen when I finish work in about 6-7 hours |
20:37:53 | <@HCross> | Ask FM was heavily used in cyberbullying in schools (at least in the UK) |
20:38:13 | <Flashfire42> | in oz as well |
20:38:37 | <@arkiver> | that sucks :/ |
20:40:56 | <flashfire42|m> | So was curious cat tho. I’d just keep in mind potential for exclusions of individual pages in the future and grab what we can for posterity |
20:42:50 | | MOSTech6502 quits [Read error: Connection reset by peer] |
20:43:36 | <@arkiver> | looks like sequential IDs |
20:44:09 | <@arkiver> | let's do #dontaskfm , it's too perfect |
20:45:45 | <lennier2_> | Yay! :) |
20:46:04 | <@arkiver> | thanks lennier2_ :) |
20:46:44 | <imer> | lennier2_++ |
20:46:44 | <eggdrop> | [karma] 'lennier2_' now has 1 karma! |
20:49:11 | | etnguyen03 (etnguyen03) joins |
20:52:11 | <magmaus3> | "ASKfm is not CuriousCat" ← oops 🥴 |
20:52:51 | <lennier2_> | Wild that every Q&A platform seems to be shutting down, though. |
20:53:15 | <magmaus3> | yeah |
20:55:22 | | tek_dmn- (tek_dmn) joins |
20:56:35 | | tek_dmn quits [Ping timeout: 260 seconds] |
21:00:32 | | Unholy236192464537713 quits [Client Quit] |
21:00:49 | | Unholy236192464537713 (Unholy2361) joins |
21:05:08 | <h2ibot> | Magmaus3 edited ASK.fm (+9, add irc channel): https://wiki.archiveteam.org/?diff=53656&oldid=53655 |
21:09:59 | | vix5110_ quits [Client Quit] |
21:12:16 | | useretail quits [Quit: Leaving] |
21:12:48 | | MOSTech6502 joins |
21:17:44 | | etnguyen03 quits [Client Quit] |
21:19:14 | | etnguyen03 (etnguyen03) joins |
21:35:04 | | MOSTech6502 quits [Read error: Connection reset by peer] |
21:35:23 | | MOSTech6502 joins |
21:39:18 | | ThreeHM quits [Remote host closed the connection] |
21:41:13 | | ThreeHM (ThreeHeadedMonkey) joins |
21:41:15 | <h2ibot> | JustAnotherArchivist edited Deathwatch (-12, /* 2024 */ Link to ASK.fm page): https://wiki.archiveteam.org/?diff=53657&oldid=53653 |
21:53:04 | | BlueMaxima joins |
22:15:02 | | loug8318142 joins |
22:19:22 | | Wohlstand quits [Quit: Wohlstand] |
22:56:27 | <h2ibot> | Manu edited Webring/fediring.net (+307, /* Add more archived pages */): https://wiki.archiveteam.org/?diff=53658&oldid=53645 |
23:00:35 | | MOSTech6502 quits [Remote host closed the connection] |
23:00:55 | | MOSTech6502 joins |
23:20:42 | <c3manu> | https://www.reuters.com/world/middle-east/explosions-heard-irans-capital-tehran-nearby-karaj-semi-official-iranian-media-2024-10-25/ |
23:28:07 | | etnguyen03 quits [Client Quit] |
23:39:38 | | etnguyen03 (etnguyen03) joins |
23:39:55 | | MOSTech6502 quits [Ping timeout: 260 seconds] |
23:39:55 | | wickedplayer494 quits [Ping timeout: 258 seconds] |
23:40:13 | | wickedplayer494 joins |
23:45:58 | | Juesto (Juest) joins |
23:47:59 | | Juest quits [Ping timeout: 258 seconds] |
23:47:59 | | Juesto is now known as Juest |
23:57:03 | | Juesto (Juest) joins |
23:57:57 | | Juest quits [Ping timeout: 258 seconds] |
23:57:57 | | Juesto is now known as Juest |