00:00:02<eggdrop>[remind] thuban: re-run youtube channels for https://wiki.archiveteam.org/index.php/Political_parties/Georgia
00:03:10duck joins
00:03:26duck quits [Client Quit]
00:08:19<nicolas17>Apple developer documentation seems hard to archive... I'll make a wiki page about it
00:11:42DebutanteDebbie joins
00:12:54<DebutanteDebbie>Hi, I'm just wondering I did an edit on List of websites excluded from the Wayback Machine and its says its waiting for a moderator to approve. How long will it take for my edit to be approved?
00:14:32<pabs>its usually reviewed regularly, when the moderators have time. probably a day or a few
00:15:17<@arkiver>i'll have a look
00:15:47<DebutanteDebbie>pabs thank you.
00:15:47<DebutanteDebbie>also, arkiver thank you. Appreciate it.
00:17:20<h2ibot>Debutantedebbie edited List of websites excluded from the Wayback Machine (+143, Added sites that I know of which are excluded…): https://wiki.archiveteam.org/?diff=53647&oldid=53552
00:17:30<DebutanteDebbie>Thank you appreciate your help.
00:17:36<pabs>thanks for the edit :)
00:17:37<@arkiver>done
00:18:20<h2ibot>Skodwarde32 edited MSN Messenger (-5): https://wiki.archiveteam.org/?diff=53648&oldid=37334
00:18:33<DebutanteDebbie>pabs My pleasure.
00:18:45<DebutanteDebbie>arkiver again thank you for taking care of this.
00:18:50<@arkiver>:)
00:19:43DebutanteDebbie quits [Client Quit]
00:21:49person1234 joins
00:23:13<thuban>hey arkiver, what's the status of #down-the-tube? (i had the impression it wasn't directly backed by ia storage and shouldn't be affected by the outage, but maybe i am misremembering)
00:24:42<@arkiver>thuban: it would go into temporary storage as well, like #telegrab and #//
00:24:50<@arkiver>but it's currently broken unfortunately, i need to make a fix
00:24:55<Vokun>Definitely misremembering. Maybe you're thinking of #youtubearchive:hackint.org
00:25:06<@arkiver>ah
00:25:07<@arkiver>right
00:25:15<@arkiver>err #youtubearchive is like a work in progress
00:25:31<person1234>could https://cokemachineglow.com/ and https://www.originalsoundversion.com/ be archived? The former is defunct and the latter appears to be, and both are used as sources on Wikipedia
00:26:11<thuban>Vokun: right, thanks
00:27:09<pokechu22>person1234: looking into those now
00:28:18<Vokun>a 'work in progress' that would take like 110% of IA's yearly income to back up lol
00:28:35<thuban>arkiver: i'd like to re-run the georgian parliamentary channels before the election on the 26th, so if we could do those (manually if the bot won't be up) that would be good :)
00:29:15<thuban>probably not too much new there since i ran them a couple of months ago
00:29:41etnguyen03 quits [Client Quit]
00:29:49person1234 quits [Client Quit]
00:30:34<pokechu22>last pages of those are http://www.originalsoundversion.com/blog/page/435/ and http://cokemachineglow.com/?pg=654 and neither has a sitemap it seems, but should be fine otherwise
00:51:09DogsRNice joins
01:00:32<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=53649&oldid=53647
01:21:22etnguyen03 (etnguyen03) joins
01:23:58wyatt8750 joins
01:24:24wyatt8740 quits [Ping timeout: 258 seconds]
01:28:45pabs quits [Ping timeout: 260 seconds]
01:30:24wyatt8750 quits [Read error: Connection reset by peer]
01:30:40wyatt8740 joins
01:33:08pixel leaves [Error from remote client]
01:35:24pabs (pabs) joins
01:50:42<h2ibot>Switchnode edited Current Projects (-44, add veoh; move curiouscat to recent; remove mildom): https://wiki.archiveteam.org/?diff=53650&oldid=53562
01:52:57icedice quits [Quit: Leaving]
02:07:43etnguyen03 quits [Client Quit]
02:11:39etnguyen03 (etnguyen03) joins
02:12:45<h2ibot>Switchnode edited Veoh (+261, update status; add infobox details): https://wiki.archiveteam.org/?diff=53651&oldid=53619
02:57:03etnguyen03 quits [Remote host closed the connection]
03:04:41JaffaCakes118 quits [Remote host closed the connection]
03:20:33<pabs>driib: sounds like flickr has some moderate deletion danger "Flickr themselves started purging any public images over a certain amount on free accounts." https://news.ycombinator.com/item?id=41941117
03:26:09pabs quits [Remote host closed the connection]
03:26:55pabs (pabs) joins
03:30:42<@JAA>DigitalDragons: Yeah, there's a reason why irclogs has never been officially announced anywhere... :-)
03:32:56<DigitalDragons>Fair enough :p
03:52:22<eggdrop>[remind] nicolas17: is IA back? upload zimbardo.com warc
03:53:41<nicolas17>unfortunately,
03:53:50<nicolas17>!remindme 1w is IA back? upload zimbardo.com warc
03:53:51<eggdrop>[remind] ok, i'll remind you at 2024-11-01T03:53:50Z
04:27:04cm quits [Ping timeout: 255 seconds]
04:30:03cm joins
05:52:15DogsRNice quits [Read error: Connection reset by peer]
06:20:41pixel (pixel) joins
06:31:57loug8318142 joins
06:51:28tek_dmn quits [Quit: ZNC - https://znc.in]
07:05:01Unholy236192464537713 quits [Remote host closed the connection]
07:05:49Unholy236192464537713 (Unholy2361) joins
07:22:55pixel leaves
08:29:12BPCZ quits [Quit: eh???]
08:32:29BPCZ (BPCZ) joins
08:57:58qwertyasdfuiopghjkl quits [Ping timeout: 255 seconds]
09:15:14nulldata quits [Quit: So long and thanks for all the fish!]
09:16:08nulldata (nulldata) joins
09:29:17vix5110_ joins
09:44:15beastbg8 quits [Read error: Connection reset by peer]
09:46:44beastbg8 (beastbg8) joins
09:59:45driib quits [Quit: The Lounge - https://thelounge.chat]
10:00:02driib (driib) joins
10:05:46PredatorIWD2 quits [Read error: Connection reset by peer]
10:11:16PredatorIWD2 joins
10:15:17sralracer joins
10:25:53Xanthos joins
10:25:53Xanthon quits [Read error: Connection reset by peer]
10:26:00Xanthos is now known as Xanthon
10:26:01Xanthon quits [Changing host]
10:26:01Xanthon (Xanthon) joins
11:00:02Bleo18260072271962 quits [Quit: The Lounge - https://thelounge.chat]
11:02:45Bleo18260072271962 joins
11:19:39Wohlstand (Wohlstand) joins
11:42:31SkilledAlpaca418 quits [Quit: SkilledAlpaca418]
11:44:11SkilledAlpaca418 joins
11:46:27<h2ibot>OrIdow6 edited Deathwatch (+317, /* 2024 */ Niconico): https://wiki.archiveteam.org/?diff=53652&oldid=53620
12:08:45BornOn420 quits [Ping timeout: 240 seconds]
12:33:53useretail joins
12:38:59qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
12:54:49ThreeHM (ThreeHeadedMonkey) joins
13:29:51khaoohs joins
13:31:03khaoohs quits [Client Quit]
13:31:40khaoohs joins
13:48:03xarph quits [Ping timeout: 258 seconds]
13:49:14xarph joins
14:30:53MrMcNuggets (MrMcNuggets) joins
14:35:28seacow quits [Ping timeout: 255 seconds]
14:35:48Wohlstand quits [Remote host closed the connection]
15:06:09anon2024 joins
15:06:18<anon2024>Hello, this is important
15:07:27<anon2024>Character.AI is undergoing a massive copyright purge. The recent incident in Florida was the last straw after previous copyright complaints. Many boys got delisted and deleted due to copyright issues and being unlicensed
15:08:28<anon2024>Would be good to archivebot it for now until we get a possible warrior
15:09:37<that_lurker>can the indicidual AI's be downloaded or would you just want to grab the info of them?
15:09:57<that_lurker>s/indicidual/individual
15:13:16anon2024 quits [Ping timeout: 255 seconds]
15:17:37lflare quits [Quit: Bye]
15:18:03lflare (lflare) joins
15:19:50anon2024 joins
15:20:32<anon2024>https://archive.ph/6UXkE - this tiktok contains various complaints of the removal
15:21:09<anon2024>and in the news, the company said that they will remove unlicensed bots
15:21:26<ymgve>it's not like the actual AI engine can be archived
15:21:35<ymgve>at best you get the character descriptions/intro texts
15:22:15anon2024 quits [Client Quit]
15:23:17imer quits [Quit: Oh no]
15:23:52imer (imer) joins
15:27:07Xanthos joins
15:28:45Xanthon quits [Ping timeout: 260 seconds]
15:28:45Xanthos is now known as Xanthon
15:28:47Xanthon quits [Changing host]
15:28:47Xanthon (Xanthon) joins
16:08:44Wohlstand (Wohlstand) joins
16:17:55<lennier2_>I don't know that's there's even enough public data on character.ai to need a project, but grabbing the descriptions and intros sounds reasonable to me.
16:20:52<pabs>PSA: if you are monitoring the ArchiveBot websocket, katia's AB websocket repeater project is useful to let you download the stream once but monitor it from several clients at once https://github.com/iakat/archivebot-dashboard-repeater
16:20:53<pabs>I am running it like this: ws-repeater/ ; UPSTREAM=ws://archivebot.archivingyoursh.it/stream gunicorn app:app -b 127.0.0.1:4568 --worker-class uvicorn.workers.UvicornWorker : curl -s ws://localhost:4568/stream
16:21:07<pabs>katia++
16:21:07<eggdrop>[karma] 'katia' now has 55 karma!
16:29:42<that_lurker>pabs: Out of curiosity what or how many clients do you monitor the stream on?
16:30:08<katia>13
16:30:19<katia>ask me how i know
16:30:26<katia>(pabs told me)
16:30:26<pabs>13 right now yeah, Ryz has some ideas for more :)
16:32:09<pabs>code forges, wikis, smolnet job/urls, onion urls, blogspot/blogger, flickr 403s, imgur, pastebin.com, mediafire, SWF files, webrings, mailman2
16:32:39<pabs>my monitoring is basically `curl -s | jq` with a bunch of horrifying regexes
16:32:47<that_lurker>I still have anomaly detection for (my) jobs in my todo list.
16:34:16<pabs>collecting them is relatively easy, (semi-)manually filtering the results and followup archiving can be time-consuming
16:35:56<pabs>oh, and one monitoring a specific job that auto-generates ignores
16:39:55collat joins
16:42:12Fiduro4830139107 quits [Quit: The Lounge - https://thelounge.chat]
16:42:22Fiduro4830139107 joins
16:45:45pixel (pixel) joins
16:50:33BornOn420 (BornOn420) joins
16:58:32<Ryz>Heh, hmm, trying to think
16:59:07<Ryz>pabs, maybe mp4s? Obviously with some ignores on stuff like New York Times and other common links that are hit on?
16:59:58<Ryz>Could be expanded to a group of video files I think; one much more obscure one that used to be common is .flv
17:00:23<pabs>are they interesting in some way?
17:00:41<Ryz>https://www.lifewire.com/what-is-an-flv-file-2621348 - they are related to Adobe Flash, tis be a 'Flash Video' file
17:01:04<pabs>yeah, I can see flv, but I mean video in general. mp4s must be very common
17:06:55<Ryz>Much of the video content with some exceptions is gated by YouTube, Facebook, Instagram, through JS and obfuscating the links of those things since they are signatured or have things that cut the videos themselves into many many pieces, which means much of the leftovers are those who can afford to host videos; the curious thing for finding videos
17:06:56<Ryz>is if we are able to find more of it that websites bother to host themselves to give hints on where to find more of it (since there's content that's only accessible through video which isn't easily searchable within it unlike say text)
17:07:16<Ryz>This can also be used to determine what video files to ignore in the future for ignoreset badvideos for ArchiveBot if it'll updated
17:07:55<Ryz>I don't think I see Gamespot or IGN .mp4s as much or anymore, probably they might've secured them from being accessible easily in some way
17:08:01<Ryz>pabs ^
17:08:32<pokechu22>I still see (usually short) videos in wp-content fairly often, which I think are generally not that interesting
17:14:41<pabs>well, if you can work up a match regex and an ignore regex, I can run it
17:14:55BornOn420 quits [Client Quit]
17:15:50<Ryz>pabs, another one could be YouTube, but only for thumbnail metadata; because ArchiveBot tends to get them when the URLs are 200s to them, since we can't really grab videos at all and are only to get the other limited metadata, but it's before they give 429s, and won't be able to archive the thumbnails; would be helpful to have it print out
17:15:50<Ryz>thumbnail metadata and such
17:16:11MOSTech6502 joins
17:16:20<Ryz>There are a lot of obscure YouTube videos that flow by, especially from ArchiveBot archiving forums and such
17:18:50<pabs>sounds ok. match+ignore regex needed though. and if the HTTP response code needs checking, mention which regex combines with which code
17:21:12<MOSTech6502>Hey, I have a Fileplanet question. Found a public URL with some links to things hosted formerly on Fileplanet, I want to get one of them. Tried using the file search interface here (https://wiki.archiveteam.org/index.php/Fileplanet), it's not listed
17:21:41<MOSTech6502>is this still relevant? > The "ftp2" files (another ~300k files at a total of ~1.2TB) cannot be shared publically since there are private files mixed in, we saved them to IA anyways so maybe in the future we can sort them out. If you are looking for files from Fileplanet that are not included in the public archives, contact User:Schbirid with archived URLs that prove their previous availability to the public, e.g. via archived
17:21:41<MOSTech6502>fileplanet.com pages.
17:24:30<Ryz>pabs, for videos, you can use this for reference: https://github.com/ArchiveTeam/ArchiveBot/blob/master/db/ignore_patterns/badvideos.json
17:25:00<Ryz>Probably gonna need some guidance on making matches and ignores, it's just giving you the stuff a bit similar to ignores in ArchiveBot?
18:16:05<asie>https://about.ask.fm/closure-notice-the-platform-to-be-deactivated-december-1-2024/
18:16:35<asie>the Q&A social network is shutting down in slightly over a month
18:17:02<asie>notable in the Polish manga community as all our publishers standardized on using it to answer public fan inquiries for some reason, but also had way more users than that
18:19:08<asie>it seems they anticipated AT though as they blocked all user pages for app-only access
18:26:00MrMcNuggets quits [Quit: WeeChat 4.3.2]
18:27:30collat quits [Ping timeout: 258 seconds]
18:36:44<h2ibot>Magmaus3 edited Deathwatch (+201, /* 2024 */ ask.fm shutdown): https://wiki.archiveteam.org/?diff=53653&oldid=53652
18:43:16damanoh joins
18:44:32damanoh quits [Client Quit]
18:58:37<magmaus3>i'll also make a stub wiki page rq
19:00:48<h2ibot>Magmaus3 created ASK.fm (+474, ASKfm page (stub)): https://wiki.archiveteam.org/?title=ASK.fm
19:29:17<@arkiver>thanks asie
19:32:58BornOn420 (BornOn420) joins
19:34:53<h2ibot>DigitalDragon edited ASK.fm (+0, ASKfm is not CuriousCat): https://wiki.archiveteam.org/?diff=53655&oldid=53654
19:37:22<qwertyasdfuiopghjkl>asie: The "app-only" thing seems to luckily just be a popup that can easily be removed with inspect element to access the rest of the page, and I remember the popup already being a thing a while ago.
19:39:52collat joins
19:53:45loug8318142 quits [Ping timeout: 258 seconds]
20:27:21tek_dmn (tek_dmn) joins
20:27:37Unholy236192464537713 quits [Quit: Ping timeout (120 seconds)]
20:27:49Unholy236192464537713 (Unholy2361) joins
20:30:08Dango360_ quits [Quit: Leaving]
20:30:21Dango360 (Dango360) joins
20:31:17<@arkiver>alright let's make a channel for ask.fm
20:31:19<@arkiver>any ideas?
20:32:06<Flashfire42>#asktiveteam
20:32:27<@arkiver>not sure about that one...
20:32:35<Flashfire42>#askformoretime
20:33:05<Flashfire42>or just #askformore
20:35:39<lennier2_>#dontaskfm
20:36:15<@arkiver>that one is nice
20:36:21<@arkiver>will keep the competition open a little longer :P
20:36:48<DigitalDragons>#casketfm
20:36:50mls (mls) joins
20:36:59<imer>I WAS JUST ABOUT TO WRITE THAT
20:37:01<imer>nice
20:37:11<imer>dontaskfm is best so far imo
20:37:11<DigitalDragons>>:D
20:37:43<Flashfire42>well I am squatting them all for the minute. I will give perms to whoever needs them when they get chosen when I finish work in about 6-7 hours
20:37:53<@HCross>Ask FM was heavily used in cyberbullying in schools (at least in the UK)
20:38:13<Flashfire42>in oz as well
20:38:37<@arkiver>that sucks :/
20:40:56<flashfire42|m>So was curious cat tho. I’d just keep in mind potential for exclusions of individual pages in the future and grab what we can for posterity
20:42:50MOSTech6502 quits [Read error: Connection reset by peer]
20:43:36<@arkiver>looks like sequential IDs
20:44:09<@arkiver>let's do #dontaskfm , it's too perfect
20:45:45<lennier2_>Yay! :)
20:46:04<@arkiver>thanks lennier2_ :)
20:46:44<imer>lennier2_++
20:46:44<eggdrop>[karma] 'lennier2_' now has 1 karma!
20:49:11etnguyen03 (etnguyen03) joins
20:52:11<magmaus3>"ASKfm is not CuriousCat" ← oops 🥴
20:52:51<lennier2_>Wild that every Q&A platform seems to be shutting down, though.
20:53:15<magmaus3>yeah
20:55:22tek_dmn- (tek_dmn) joins
20:56:35tek_dmn quits [Ping timeout: 260 seconds]
21:00:32Unholy236192464537713 quits [Client Quit]
21:00:49Unholy236192464537713 (Unholy2361) joins
21:05:08<h2ibot>Magmaus3 edited ASK.fm (+9, add irc channel): https://wiki.archiveteam.org/?diff=53656&oldid=53655
21:09:59vix5110_ quits [Client Quit]
21:12:16useretail quits [Quit: Leaving]
21:12:48MOSTech6502 joins
21:17:44etnguyen03 quits [Client Quit]
21:19:14etnguyen03 (etnguyen03) joins
21:35:04MOSTech6502 quits [Read error: Connection reset by peer]
21:35:23MOSTech6502 joins
21:39:18ThreeHM quits [Remote host closed the connection]
21:41:13ThreeHM (ThreeHeadedMonkey) joins
21:41:15<h2ibot>JustAnotherArchivist edited Deathwatch (-12, /* 2024 */ Link to ASK.fm page): https://wiki.archiveteam.org/?diff=53657&oldid=53653
21:53:04BlueMaxima joins
22:15:02loug8318142 joins
22:19:22Wohlstand quits [Quit: Wohlstand]
22:56:27<h2ibot>Manu edited Webring/fediring.net (+307, /* Add more archived pages */): https://wiki.archiveteam.org/?diff=53658&oldid=53645
23:00:35MOSTech6502 quits [Remote host closed the connection]
23:00:55MOSTech6502 joins
23:20:42<c3manu>https://www.reuters.com/world/middle-east/explosions-heard-irans-capital-tehran-nearby-karaj-semi-official-iranian-media-2024-10-25/
23:28:07etnguyen03 quits [Client Quit]
23:39:38etnguyen03 (etnguyen03) joins
23:39:55MOSTech6502 quits [Ping timeout: 260 seconds]
23:39:55wickedplayer494 quits [Ping timeout: 258 seconds]
23:40:13wickedplayer494 joins
23:45:58Juesto (Juest) joins
23:47:59Juest quits [Ping timeout: 258 seconds]
23:47:59Juesto is now known as Juest
23:57:03Juesto (Juest) joins
23:57:57Juest quits [Ping timeout: 258 seconds]
23:57:57Juesto is now known as Juest