00:00:50<pabs>datechnoman: I think we need your URL stash for *.yay.boo URLs
00:04:52MrMcNuggets quits [Quit: WeeChat 4.3.2]
00:07:42MrMcNuggets (MrMcNuggets) joins
00:23:07<datechnoman>pabs - Ill give it a crack and see what I can find
00:33:40ducky quits [Ping timeout: 260 seconds]
00:36:22gosc joins
00:36:23<eggdrop>[tell] gosc: [2025-11-27T18:43:17Z] <arkiver> thanks for the info earlier
00:46:42ducky (ducky) joins
00:56:32ducky quits [Ping timeout: 260 seconds]
00:57:54ducky (ducky) joins
01:10:04ducky quits [Ping timeout: 260 seconds]
01:10:17<klea>anti-bot tech in no particular order: https://codeberg.org/xfnw/yap https://git.gammaspectra.live/git/go-away https://github.com/TecharoHQ/anubis (CloudFlare #mitm) (Fastly i believe too? #mitm?)
01:11:50ducky (ducky) joins
01:12:05<klea>is there some kind of wiki page from where to make more wiki pages (like a template or smth)?
01:12:29<klea>i mean more for things like projects or documenting stuff that exists
01:13:15<@JAA>The project pages use a template for the infobox, but in this case, it'd probably just be a normal page.
01:13:34<klea>so just steal some existing project's infobox template
01:13:44<@JAA>If it's for a project, yes.
01:15:01<klea>JAA: archiving the entirety of "RTVE Play", ie, Spanish's radio-television platform, would be a project?
01:15:30<@JAA>Sure
01:17:35<klea>JAA: is there some list of all the options the infobox thingy can have?
01:17:44<@JAA>klea: Yes, on the template page.
01:17:59<@JAA>https://wiki.archiveteam.org/index.php/Template:Infobox_project
01:18:10<klea>thanks
01:21:50<klea>would it be better to make a project to archive RTVE and under it make one for the RTVE play part or better make two, i'm not too sure, since the play area is dedicated more to audiovisual (video-only?) content, when RTVE also has a radio thingy, live tv, and news and i suppose the last two aren't interesting to archive?
01:22:36<klea>tho podcasts are under rtve.es/play/radio, the page: https://www.rtve.es/play/radio/podcasts/ doesn't brand it as part of RTVE Play
01:26:55<pokechu22>klea: I believe RTVE is one of the things that cuphead2527480/CuppyMan has been working on, but I'm not sure of the details
01:27:15<klea>JAA: i submitted RTVE Play for creation, it's on the modqueue
01:27:42<klea>pokechu22: can you give me some way of contacting them?
01:28:28<pokechu22>They sometimes appear in #archivebot, but I don't have any other good way to contact them
01:28:35<klea>thanks
01:29:09<klea>also, rtve is like a entire news organization, so not small
01:30:06<pokechu22>Yeah, I think it was specifically RTVE video, but I'm not 100% sure
01:30:20<klea>pokechu22: does AB allow downloading big videos?
01:30:57<klea>like if you give it 'https://www.rtve.es/play/videos/this-is-philosophy/popper-kuhn/6813791/' what would it do, would it be able to later be replayed, or we have to download it using yt-dlp (which i know works because i can use mpv to view it)
01:31:57<pokechu22>It can download large files by URL, though it doesn't parse HLS so you need to manually convert m3u8 to a list of URLs. I don't think !ao https://www.rtve.es/play/videos/this-is-philosophy/popper-kuhn/6813791/ would directly work
01:32:21<nicolas17>many videos are geo-restricted to spain
01:32:58<klea>nicolas17: oh :(
01:33:11<klea>is there some way i could help by offering some of my bandwidth?
01:34:09<klea>tho i guess maybe then i should just start downloading them myself and uploading to archive.org on my own, and ask IA directly to make a category?
01:34:32<klea>pokechu22: so i'd have to give a url like https://rtvehlsvodlote7.rtve.es/mediavodv2/resources/TE_STHIPHI/mp4/4/8/1676656887884.mp4/1676656887884-audio=192837-video=4500000.m3u8?idasset=6813791&hls_client_manifest_version=3 for it to start getting all the files or would i have to parse the m3u8 myself first?
01:35:11<nicolas17>they seem to have like 3 different video distribution systems :P
01:35:33<klea>lol, nicolas17 could you document them on https://wiki.archiveteam.org/wiki/RTVE_Play once it's approved?
01:35:44ducky quits [Ping timeout: 260 seconds]
01:35:44<pokechu22>It needs to be parsed into a list of .ts files. I *think* cuphead2527480 has already been working on this so you'd want to ask them (!tell cuphead2527480 <msg> in #archivebot should ping both of you when they next join)
01:35:56<klea>if you want i can give you my draft on transfer.archivette.am since i believe you don't have to wait for queues
01:36:37<nicolas17>like https://rtvedrmstaging.rtve.es/59/78/16177859/hls/alta/v_alta.m3u8?assetid=16177859
01:36:58<nicolas17>that seems to be a whole different system than the mediavodv2 thing?
01:37:25<nicolas17>and I think the master playlist at https://rtvedrmstaging.rtve.es/59/78/16177859/16177859.m3u8 *used* to work for me but now doesn't
01:37:29<h2ibot>Klea created RTVE Play (+394, Created the RTVE Play page): https://wiki.archiveteam.org/?title=RTVE%20Play
01:37:50<klea>yay, thanks JAA for confirming me
01:37:54<klea>testing?
01:38:13<klea>nicolas17: https://rtvedrmstaging.rtve.es/59/78/16177859/16177859.m3u8 seems to work for me
01:38:20<nicolas17>yeah, georestricted then
01:38:27<klea>yes
01:38:29<h2ibot>JustAnotherArchivist changed the user rights of User:Klea
01:38:30<klea>it says so even on headers
01:38:39<klea>h2ibot has delays?
01:39:29<klea>https://transfer.archivete.am/bTUqX/rtvestaginghwithdr
01:39:29<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/bTUqX/rtvestaginghwithdr
01:40:09<@JAA>h2ibot is just the messenger.
01:40:38<klea>https://rtvedrmstaging.rtve.es/api/geoblock/${asset_id}/es.json seems like a interesting url?
01:40:50<@JAA>The script that generates these messages is polling for new edits every minute, IIRC.
01:41:41<klea>oh cool this redirects: https://www.rtve.es/play/videos/_/_/6813791/
01:41:57<klea>so you can change from asset id to webplayer
01:42:13<nicolas17>also the url to those m3u8s is always like
01:42:30<nicolas17>89/67/123456789
01:42:39<nicolas17>last 4 digits are used in the directory path
01:42:43<klea>nicolas17: the video you gave me is no longer available: https://www.rtve.es/play/videos/grand-prix/temporada-2024-programa-1/16177859/
01:42:57<klea>i guess that's an example of a unavailable video?
01:43:12<klea>also, the url you gave me seemed to be slightly low quality (iirc, i didn't watch a big part of it)
01:43:33<nicolas17>well for me the webpage says "Este contenido tiene restringidos los derechos de emisión en su ubicación" because of geoblock
01:43:39<klea>oh
01:43:56<klea>aaa
01:44:01<klea>i want to provably save a page
01:44:25<nicolas17>as for quality yes, v_alta seems to be the lowest quality, see your paste :p
01:44:27<klea>ie, save the page from my IP, but in a way that AT does the ssl stuff on your side, so everyone can prove what i'm saying
01:44:39<nicolas17>and hdfull/v_hdfull is highest
01:44:59<klea>oh
01:46:37ducky (ducky) joins
01:46:38<klea>nicolas17: don't you see a "Disponible hasta: 07-07-2025" line?
01:46:44<nicolas17>yes that *too*
01:46:52<klea>then you're seeing it's no longer available
01:47:01<klea>that text means "available until:"
01:47:26<klea>and the date is on DD-MM-YYYY form (i'd need more examples to check but it's not common to have MM-DD-YYYY in EU)
01:56:32<h2ibot>Nicolas17v2 edited Adobe Aero (+291, update status): https://wiki.archiveteam.org/?diff=57904&oldid=57884
01:57:32<h2ibot>Nicolas17v2 edited Adobe Aero (-23, fix duplicate infobox parameters): https://wiki.archiveteam.org/?diff=57905&oldid=57904
01:58:20<klea>nicolas17: could you check if you can play this podcast or download it or if it's also (potentially?) geoblocked too? https://www.rtve.es/play/audios/el-vuelo-del-fenix/
01:58:36ducky quits [Ping timeout: 260 seconds]
01:59:15ducky (ducky) joins
01:59:51<gosc>arkiver, thinking of adding the https thing I mentioned to Mobile Phone Applications/https is this fine? since that guide applies to more than just what I was using it for
01:59:55<nicolas17>it plays
02:00:05<klea>yay!
02:00:24<nicolas17>also, I didn't check now, but in the past some videos played and some didn't
02:00:28<klea>is there a option thingy, with two options (iirc, i'm lazy to check again), with one of them saying "descargar"
02:00:33<klea>yeah that's possible
02:00:48<klea>since they source some original content, but also some content in colaboration with others (which may or may not be original)
02:01:24<klea>nicolas17: should i include the information asset api url in the wiki or better not?
02:01:36<klea>iirc this chatlog to have access to the logs on the website it required passwd?
02:01:44<nicolas17>this channel doesn't
02:01:47<klea>oh
02:02:57<nicolas17>if we're seriously gonna save this it might justify a new channel, idk
02:03:03<klea>oh
02:03:15<klea>yeah i guess too lol
02:03:30<klea>is there some channel specific to archiving video?
02:07:12<@JAA>Technically, #videobot exists, but it's very dead and has been for many years.
02:07:32<klea>should we resurrect it or make smth else?
02:08:35<@JAA>Other than that, things have been happening in specific channels for particular sites, no overarching general thing.
02:08:45<klea>oh
02:08:55<@JAA>#down-the-tube for YouTube, #vinewhine for Vine, etc.
02:09:51Overlordz quits [Quit: Leaving]
02:11:34<h2ibot>Klea edited RTVE Play (+1021, Expand article by adding geoblock, and some…): https://wiki.archiveteam.org/?diff=57906&oldid=57903
02:12:15tzt quits [Read error: Connection reset by peer]
02:13:03tzt (tzt) joins
02:14:34<klea>i'm tempted to make h2ibot be unable to ping me on #archiveteam-bs, JAA is there some reason why I shouldn't do that?
02:17:16ducky quits [Ping timeout: 260 seconds]
02:17:58<nicolas17>does archivebot only follow links in HTML?
02:18:38ducky (ducky) joins
02:19:23<nicolas17>if I try to archive https://mesu.apple.com/assets/com_apple_MobileAsset_UARP_A2618/com_apple_MobileAsset_UARP_A2618.xml, will it do a futile request to https://updates.cdn-apple.com/2025/patches/082-64953/1CC0CA1E-ECF8-458F-B820-55020FE5E2BA/ or will it ignore that because it's not an <a href>?
02:19:56chrismeller8 quits [Quit: chrismeller8]
02:25:26<pokechu22>nicolas17: it *sorta* follows links in js/json, but the exact rules for it are unclear. It also follows links in sitemap XMLs, but I don't think that would apply here. If it extracted links, the ones I would expect to see are https://updates.cdn-apple.com/2025/patches/082-64953/1CC0CA1E-ECF8-458F-B820-55020FE5E2BA/ and
02:25:28<pokechu22>https://mesu.apple.com/assets/com_apple_MobileAsset_UARP_A2618/com_apple_MobileAsset_UARP_A2618/959670e0684014dbfa25f3ab80db4f3a78dc8781.zip with the latter relative link feeling more likely (and possibly occuring even on an !ao job); that's my expectation for .json though
02:25:42<pokechu22>given that both are junk, seems like trying it is the best way to find out :)
02:27:43chrismeller8 (chrismeller) joins
02:28:00ducky quits [Ping timeout: 260 seconds]
02:28:38ducky (ducky) joins
02:33:37<h2ibot>Klea edited List of websites excluded from the Wayback Machine (+57, Add pigtailsinpaint): https://wiki.archiveteam.org/?diff=57907&oldid=57901
02:34:37MrMcNuggets quits [Client Quit]
02:43:05chaoticbee quits [Remote host closed the connection]
02:46:21tzt quits [Ping timeout: 272 seconds]
02:49:10<@JAA>klea: ¯\_(ツ)_/¯
02:50:28<klea>i think i did that
02:53:58<klea>how should the wiki page listing anti-bot software be named?
02:56:17<gosc>JAA, made an account, can you remove the edit verification thing? thanks
02:56:24<gosc>it's calmevening
02:57:24<nicolas17>did you make edits already?
02:58:01<gosc>one
02:59:04<nicolas17>just edit and when JAA gets tired of having to approve each edit he'll change user permissions :P
02:59:21<gosc>alright
03:03:46<@JAA>Yep, that :-)
03:16:02ducky quits [Read error: Connection reset by peer]
03:18:57ducky (ducky) joins
03:24:21nathang2184 quits [Ping timeout: 272 seconds]
03:28:12ducky quits [Ping timeout: 260 seconds]
03:29:03ducky (ducky) joins
03:38:56ducky quits [Ping timeout: 260 seconds]
03:39:14ducky (ducky) joins
04:00:47<Ryz>Heya folks, I need help on extracting contents from https://dataportal.greatersudbury.ca/ - as I'm not too sure if there's anything more to grab than the AB job that I ran moments earlier; asking here since databases tend to be trickier to extract~
04:01:58<Ryz>Huh, there's also this, https://opendata.greatersudbury.ca/ - which I'm about to run in AB in a few moments
04:05:04<pokechu22>https://opendata.greatersudbury.ca looks like arcgis. I'll add it to my todo list
04:14:24ducky quits [Ping timeout: 260 seconds]
04:20:39chaoticbee (chaoticbee) joins
04:21:40ducky (ducky) joins
04:28:08Shjosan quits [Quit: Am sleepy (-, – )…zzzZZZ]
04:30:40Shjosan (Shjosan) joins
04:33:44chaoticbee quits [Remote host closed the connection]
04:35:02<pabs>klea: Obstacles [to archiving], we can include Cloudflare, other bot blockers, admin requests etc
04:35:30chaoticbee (chaoticbee) joins
05:00:36ducky quits [Ping timeout: 260 seconds]
05:01:11SootBector quits [Remote host closed the connection]
05:01:59ducky (ducky) joins
05:02:24SootBector (SootBector) joins
05:03:36chaoticbee quits [Remote host closed the connection]
05:04:54chaoticbee (chaoticbee) joins
05:08:26tzt (tzt) joins
05:21:43Dada joins
05:26:26sg72 quits [Remote host closed the connection]
05:27:35sg72 joins
05:55:30Wohlstand quits [Quit: Wohlstand]
06:12:55DogsRNice quits [Read error: Connection reset by peer]
06:14:48ducky quits [Ping timeout: 260 seconds]
06:15:22ducky (ducky) joins
06:36:15driib97 quits [Ping timeout: 272 seconds]
06:38:50midou quits [Ping timeout: 256 seconds]
06:44:15driib97 (driib) joins
06:44:29ericgallager quits [Ping timeout: 272 seconds]
06:48:54midou joins
06:49:38ericgallager joins
06:53:34midou quits [Ping timeout: 256 seconds]
06:59:12midou joins
07:06:01ericgallager quits [Ping timeout: 272 seconds]
07:07:51midou quits [Read error: Connection reset by peer]
07:07:55driib97 quits [Ping timeout: 272 seconds]
07:09:20midou joins
07:09:57driib97 (driib) joins
07:30:11driib97 quits [Client Quit]
07:30:31driib97 (driib) joins
07:35:07driib97 quits [Client Quit]
07:35:27driib97 (driib) joins
07:36:38midou quits [Ping timeout: 256 seconds]
07:40:36driib97 quits [Ping timeout: 256 seconds]
07:46:53midou joins
07:54:40ducky quits [Ping timeout: 260 seconds]
07:55:44ducky (ducky) joins
08:13:51nine quits [Quit: See ya!]
08:14:04nine joins
08:14:04nine quits [Changing host]
08:14:04nine (nine) joins
08:26:24ducky quits [Ping timeout: 260 seconds]
08:36:19ducky (ducky) joins
08:56:30<h2ibot>Calmevening edited Adobe Aero (+18): https://wiki.archiveteam.org/?diff=57908&oldid=57905
08:56:44ducky quits [Ping timeout: 260 seconds]
09:03:55nathang2184 joins
09:37:50nicolas17_ (nicolas17) joins
09:39:55nicolas17 quits [Ping timeout: 272 seconds]
10:01:03@imer quits [Quit: Ping timeout (120 seconds)]
10:01:31imer (imer) joins
10:01:31@ChanServ sets mode: +o imer
10:03:40@imer quits [Excess Flood]
10:04:10imer (imer) joins
10:04:10@ChanServ sets mode: +o imer
10:13:10iPwnedYourIOTSmartdog4 joins
10:13:29emphie quits [Ping timeout: 272 seconds]
10:14:31emphie joins
10:16:01iPwnedYourIOTSmartdog quits [Ping timeout: 272 seconds]
10:16:02iPwnedYourIOTSmartdog4 is now known as iPwnedYourIOTSmartdog
10:26:38trix quits [Ping timeout: 256 seconds]
10:27:03trix (trix) joins
10:28:41midou quits [Ping timeout: 272 seconds]
10:39:10midou joins
10:41:20pie_ quits []
10:41:28pie_ (pie_) joins
11:12:17ducky (ducky) joins
11:17:02gosc_1 joins
11:19:59gosc quits [Ping timeout: 272 seconds]
11:28:52ducky quits [Ping timeout: 260 seconds]
11:30:42ducky (ducky) joins
11:47:32ducky quits [Ping timeout: 260 seconds]
11:49:37ducky (ducky) joins
12:00:02Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
12:02:47Bleo182600722719623455222 joins
12:04:20ducky quits [Ping timeout: 260 seconds]
12:04:57nathang2184 quits [Ping timeout: 272 seconds]
12:06:45ducky (ducky) joins
12:12:58nathang2184 joins
12:18:48ducky quits [Ping timeout: 260 seconds]
12:20:48ducky (ducky) joins
12:30:28ducky quits [Ping timeout: 260 seconds]
12:32:15ducky (ducky) joins
13:12:01etnguyen03 (etnguyen03) joins
13:27:14etnguyen03 quits [Client Quit]
13:30:12ducky quits [Ping timeout: 260 seconds]
13:31:46nicolas17_ is now known as nicolas17
13:32:33ducky (ducky) joins
13:35:12etnguyen03 (etnguyen03) joins
14:03:11<justauser|m>TheTechRobo: 12-03.
14:05:08<klea>pabs: thx for the name
14:07:03<klea>pabs: https://wiki.archiveteam.org/index.php/Anubis exists :o:
14:10:55<justauser|m>There is also https://wiki.archiveteam.org/index.php/Dealing_with_Cloudflare by me.
14:12:05Matthww quits [Quit: Ping timeout (120 seconds)]
14:12:11<justauser|m>And a somewhat unspoken piece about AB+CF.
14:12:24Matthww joins
14:13:21<klea>i'm linking to both on the new page :)
14:18:55twiswist (twiswist) joins
14:22:00ducky quits [Ping timeout: 260 seconds]
14:28:21<h2ibot>Klea created Obstacles (+749, Create page, needs a lot of content): https://wiki.archiveteam.org/?title=Obstacles
14:32:40ducky (ducky) joins
14:37:52ducky quits [Ping timeout: 260 seconds]
14:39:45Wohlstand (Wohlstand) joins
14:43:40ducky (ducky) joins
14:44:56<klea>is there some easy (automatable) way to take a url like https://transfer.archivete.am/MAHfc/hacker-news-36000001.txt and get the IA id where the data's stored?
14:44:58<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/MAHfc/hacker-news-36000001.txt
14:45:23<h2ibot>Klea edited Hacker News (+147, add the fact AB archived part of the API): https://wiki.archiveteam.org/?diff=57910&oldid=55233