| 00:00:50 | <pabs> | datechnoman: I think we need your URL stash for *.yay.boo URLs |
| 00:04:52 | | MrMcNuggets quits [Quit: WeeChat 4.3.2] |
| 00:07:42 | | MrMcNuggets (MrMcNuggets) joins |
| 00:23:07 | <datechnoman> | pabs - Ill give it a crack and see what I can find |
| 00:33:40 | | ducky quits [Ping timeout: 260 seconds] |
| 00:36:22 | | gosc joins |
| 00:36:23 | <eggdrop> | [tell] gosc: [2025-11-27T18:43:17Z] <arkiver> thanks for the info earlier |
| 00:46:42 | | ducky (ducky) joins |
| 00:56:32 | | ducky quits [Ping timeout: 260 seconds] |
| 00:57:54 | | ducky (ducky) joins |
| 01:10:04 | | ducky quits [Ping timeout: 260 seconds] |
| 01:10:17 | <klea> | anti-bot tech in no particular order: https://codeberg.org/xfnw/yap https://git.gammaspectra.live/git/go-away https://github.com/TecharoHQ/anubis (CloudFlare #mitm) (Fastly i believe too? #mitm?) |
| 01:11:50 | | ducky (ducky) joins |
| 01:12:05 | <klea> | is there some kind of wiki page from where to make more wiki pages (like a template or smth)? |
| 01:12:29 | <klea> | i mean more for things like projects or documenting stuff that exists |
| 01:13:15 | <@JAA> | The project pages use a template for the infobox, but in this case, it'd probably just be a normal page. |
| 01:13:34 | <klea> | so just steal some existing project's infobox template |
| 01:13:44 | <@JAA> | If it's for a project, yes. |
| 01:15:01 | <klea> | JAA: archiving the entirety of "RTVE Play", ie, Spanish's radio-television platform, would be a project? |
| 01:15:30 | <@JAA> | Sure |
| 01:17:35 | <klea> | JAA: is there some list of all the options the infobox thingy can have? |
| 01:17:44 | <@JAA> | klea: Yes, on the template page. |
| 01:17:59 | <@JAA> | https://wiki.archiveteam.org/index.php/Template:Infobox_project |
| 01:18:10 | <klea> | thanks |
| 01:21:50 | <klea> | would it be better to make a project to archive RTVE and under it make one for the RTVE play part or better make two, i'm not too sure, since the play area is dedicated more to audiovisual (video-only?) content, when RTVE also has a radio thingy, live tv, and news and i suppose the last two aren't interesting to archive? |
| 01:22:36 | <klea> | tho podcasts are under rtve.es/play/radio, the page: https://www.rtve.es/play/radio/podcasts/ doesn't brand it as part of RTVE Play |
| 01:26:55 | <pokechu22> | klea: I believe RTVE is one of the things that cuphead2527480/CuppyMan has been working on, but I'm not sure of the details |
| 01:27:15 | <klea> | JAA: i submitted RTVE Play for creation, it's on the modqueue |
| 01:27:42 | <klea> | pokechu22: can you give me some way of contacting them? |
| 01:28:28 | <pokechu22> | They sometimes appear in #archivebot, but I don't have any other good way to contact them |
| 01:28:35 | <klea> | thanks |
| 01:29:09 | <klea> | also, rtve is like a entire news organization, so not small |
| 01:30:06 | <pokechu22> | Yeah, I think it was specifically RTVE video, but I'm not 100% sure |
| 01:30:20 | <klea> | pokechu22: does AB allow downloading big videos? |
| 01:30:57 | <klea> | like if you give it 'https://www.rtve.es/play/videos/this-is-philosophy/popper-kuhn/6813791/' what would it do, would it be able to later be replayed, or we have to download it using yt-dlp (which i know works because i can use mpv to view it) |
| 01:31:57 | <pokechu22> | It can download large files by URL, though it doesn't parse HLS so you need to manually convert m3u8 to a list of URLs. I don't think !ao https://www.rtve.es/play/videos/this-is-philosophy/popper-kuhn/6813791/ would directly work |
| 01:32:21 | <nicolas17> | many videos are geo-restricted to spain |
| 01:32:58 | <klea> | nicolas17: oh :( |
| 01:33:11 | <klea> | is there some way i could help by offering some of my bandwidth? |
| 01:34:09 | <klea> | tho i guess maybe then i should just start downloading them myself and uploading to archive.org on my own, and ask IA directly to make a category? |
| 01:34:32 | <klea> | pokechu22: so i'd have to give a url like https://rtvehlsvodlote7.rtve.es/mediavodv2/resources/TE_STHIPHI/mp4/4/8/1676656887884.mp4/1676656887884-audio=192837-video=4500000.m3u8?idasset=6813791&hls_client_manifest_version=3 for it to start getting all the files or would i have to parse the m3u8 myself first? |
| 01:35:11 | <nicolas17> | they seem to have like 3 different video distribution systems :P |
| 01:35:33 | <klea> | lol, nicolas17 could you document them on https://wiki.archiveteam.org/wiki/RTVE_Play once it's approved? |
| 01:35:44 | | ducky quits [Ping timeout: 260 seconds] |
| 01:35:44 | <pokechu22> | It needs to be parsed into a list of .ts files. I *think* cuphead2527480 has already been working on this so you'd want to ask them (!tell cuphead2527480 <msg> in #archivebot should ping both of you when they next join) |
| 01:35:56 | <klea> | if you want i can give you my draft on transfer.archivette.am since i believe you don't have to wait for queues |
| 01:36:37 | <nicolas17> | like https://rtvedrmstaging.rtve.es/59/78/16177859/hls/alta/v_alta.m3u8?assetid=16177859 |
| 01:36:58 | <nicolas17> | that seems to be a whole different system than the mediavodv2 thing? |
| 01:37:25 | <nicolas17> | and I think the master playlist at https://rtvedrmstaging.rtve.es/59/78/16177859/16177859.m3u8 *used* to work for me but now doesn't |
| 01:37:29 | <h2ibot> | Klea created RTVE Play (+394, Created the RTVE Play page): https://wiki.archiveteam.org/?title=RTVE%20Play |
| 01:37:50 | <klea> | yay, thanks JAA for confirming me |
| 01:37:54 | <klea> | testing? |
| 01:38:13 | <klea> | nicolas17: https://rtvedrmstaging.rtve.es/59/78/16177859/16177859.m3u8 seems to work for me |
| 01:38:20 | <nicolas17> | yeah, georestricted then |
| 01:38:27 | <klea> | yes |
| 01:38:29 | <h2ibot> | JustAnotherArchivist changed the user rights of User:Klea |
| 01:38:30 | <klea> | it says so even on headers |
| 01:38:39 | <klea> | h2ibot has delays? |
| 01:39:29 | <klea> | https://transfer.archivete.am/bTUqX/rtvestaginghwithdr |
| 01:39:29 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/bTUqX/rtvestaginghwithdr |
| 01:40:09 | <@JAA> | h2ibot is just the messenger. |
| 01:40:38 | <klea> | https://rtvedrmstaging.rtve.es/api/geoblock/${asset_id}/es.json seems like a interesting url? |
| 01:40:50 | <@JAA> | The script that generates these messages is polling for new edits every minute, IIRC. |
| 01:41:41 | <klea> | oh cool this redirects: https://www.rtve.es/play/videos/_/_/6813791/ |
| 01:41:57 | <klea> | so you can change from asset id to webplayer |
| 01:42:13 | <nicolas17> | also the url to those m3u8s is always like |
| 01:42:30 | <nicolas17> | 89/67/123456789 |
| 01:42:39 | <nicolas17> | last 4 digits are used in the directory path |
| 01:42:43 | <klea> | nicolas17: the video you gave me is no longer available: https://www.rtve.es/play/videos/grand-prix/temporada-2024-programa-1/16177859/ |
| 01:42:57 | <klea> | i guess that's an example of a unavailable video? |
| 01:43:12 | <klea> | also, the url you gave me seemed to be slightly low quality (iirc, i didn't watch a big part of it) |
| 01:43:33 | <nicolas17> | well for me the webpage says "Este contenido tiene restringidos los derechos de emisión en su ubicación" because of geoblock |
| 01:43:39 | <klea> | oh |
| 01:43:56 | <klea> | aaa |
| 01:44:01 | <klea> | i want to provably save a page |
| 01:44:25 | <nicolas17> | as for quality yes, v_alta seems to be the lowest quality, see your paste :p |
| 01:44:27 | <klea> | ie, save the page from my IP, but in a way that AT does the ssl stuff on your side, so everyone can prove what i'm saying |
| 01:44:39 | <nicolas17> | and hdfull/v_hdfull is highest |
| 01:44:59 | <klea> | oh |
| 01:46:37 | | ducky (ducky) joins |
| 01:46:38 | <klea> | nicolas17: don't you see a "Disponible hasta: 07-07-2025" line? |
| 01:46:44 | <nicolas17> | yes that *too* |
| 01:46:52 | <klea> | then you're seeing it's no longer available |
| 01:47:01 | <klea> | that text means "available until:" |
| 01:47:26 | <klea> | and the date is on DD-MM-YYYY form (i'd need more examples to check but it's not common to have MM-DD-YYYY in EU) |
| 01:56:32 | <h2ibot> | Nicolas17v2 edited Adobe Aero (+291, update status): https://wiki.archiveteam.org/?diff=57904&oldid=57884 |
| 01:57:32 | <h2ibot> | Nicolas17v2 edited Adobe Aero (-23, fix duplicate infobox parameters): https://wiki.archiveteam.org/?diff=57905&oldid=57904 |
| 01:58:20 | <klea> | nicolas17: could you check if you can play this podcast or download it or if it's also (potentially?) geoblocked too? https://www.rtve.es/play/audios/el-vuelo-del-fenix/ |
| 01:58:36 | | ducky quits [Ping timeout: 260 seconds] |
| 01:59:15 | | ducky (ducky) joins |
| 01:59:51 | <gosc> | arkiver, thinking of adding the https thing I mentioned to Mobile Phone Applications/https is this fine? since that guide applies to more than just what I was using it for |
| 01:59:55 | <nicolas17> | it plays |
| 02:00:05 | <klea> | yay! |
| 02:00:24 | <nicolas17> | also, I didn't check now, but in the past some videos played and some didn't |
| 02:00:28 | <klea> | is there a option thingy, with two options (iirc, i'm lazy to check again), with one of them saying "descargar" |
| 02:00:33 | <klea> | yeah that's possible |
| 02:00:48 | <klea> | since they source some original content, but also some content in colaboration with others (which may or may not be original) |
| 02:01:24 | <klea> | nicolas17: should i include the information asset api url in the wiki or better not? |
| 02:01:36 | <klea> | iirc this chatlog to have access to the logs on the website it required passwd? |
| 02:01:44 | <nicolas17> | this channel doesn't |
| 02:01:47 | <klea> | oh |
| 02:02:57 | <nicolas17> | if we're seriously gonna save this it might justify a new channel, idk |
| 02:03:03 | <klea> | oh |
| 02:03:15 | <klea> | yeah i guess too lol |
| 02:03:30 | <klea> | is there some channel specific to archiving video? |
| 02:07:12 | <@JAA> | Technically, #videobot exists, but it's very dead and has been for many years. |
| 02:07:32 | <klea> | should we resurrect it or make smth else? |
| 02:08:35 | <@JAA> | Other than that, things have been happening in specific channels for particular sites, no overarching general thing. |
| 02:08:45 | <klea> | oh |
| 02:08:55 | <@JAA> | #down-the-tube for YouTube, #vinewhine for Vine, etc. |
| 02:09:51 | | Overlordz quits [Quit: Leaving] |
| 02:11:34 | <h2ibot> | Klea edited RTVE Play (+1021, Expand article by adding geoblock, and some…): https://wiki.archiveteam.org/?diff=57906&oldid=57903 |
| 02:12:15 | | tzt quits [Read error: Connection reset by peer] |
| 02:13:03 | | tzt (tzt) joins |
| 02:14:34 | <klea> | i'm tempted to make h2ibot be unable to ping me on #archiveteam-bs, JAA is there some reason why I shouldn't do that? |
| 02:17:16 | | ducky quits [Ping timeout: 260 seconds] |
| 02:17:58 | <nicolas17> | does archivebot only follow links in HTML? |
| 02:18:38 | | ducky (ducky) joins |
| 02:19:23 | <nicolas17> | if I try to archive https://mesu.apple.com/assets/com_apple_MobileAsset_UARP_A2618/com_apple_MobileAsset_UARP_A2618.xml, will it do a futile request to https://updates.cdn-apple.com/2025/patches/082-64953/1CC0CA1E-ECF8-458F-B820-55020FE5E2BA/ or will it ignore that because it's not an <a href>? |
| 02:19:56 | | chrismeller8 quits [Quit: chrismeller8] |
| 02:25:26 | <pokechu22> | nicolas17: it *sorta* follows links in js/json, but the exact rules for it are unclear. It also follows links in sitemap XMLs, but I don't think that would apply here. If it extracted links, the ones I would expect to see are https://updates.cdn-apple.com/2025/patches/082-64953/1CC0CA1E-ECF8-458F-B820-55020FE5E2BA/ and |
| 02:25:28 | <pokechu22> | https://mesu.apple.com/assets/com_apple_MobileAsset_UARP_A2618/com_apple_MobileAsset_UARP_A2618/959670e0684014dbfa25f3ab80db4f3a78dc8781.zip with the latter relative link feeling more likely (and possibly occuring even on an !ao job); that's my expectation for .json though |
| 02:25:42 | <pokechu22> | given that both are junk, seems like trying it is the best way to find out :) |
| 02:27:43 | | chrismeller8 (chrismeller) joins |
| 02:28:00 | | ducky quits [Ping timeout: 260 seconds] |
| 02:28:38 | | ducky (ducky) joins |
| 02:33:37 | <h2ibot> | Klea edited List of websites excluded from the Wayback Machine (+57, Add pigtailsinpaint): https://wiki.archiveteam.org/?diff=57907&oldid=57901 |
| 02:34:37 | | MrMcNuggets quits [Client Quit] |
| 02:43:05 | | chaoticbee quits [Remote host closed the connection] |
| 02:46:21 | | tzt quits [Ping timeout: 272 seconds] |
| 02:49:10 | <@JAA> | klea: ¯\_(ツ)_/¯ |
| 02:50:28 | <klea> | i think i did that |
| 02:53:58 | <klea> | how should the wiki page listing anti-bot software be named? |
| 02:56:17 | <gosc> | JAA, made an account, can you remove the edit verification thing? thanks |
| 02:56:24 | <gosc> | it's calmevening |
| 02:57:24 | <nicolas17> | did you make edits already? |
| 02:58:01 | <gosc> | one |
| 02:59:04 | <nicolas17> | just edit and when JAA gets tired of having to approve each edit he'll change user permissions :P |
| 02:59:21 | <gosc> | alright |
| 03:03:46 | <@JAA> | Yep, that :-) |
| 03:16:02 | | ducky quits [Read error: Connection reset by peer] |
| 03:18:57 | | ducky (ducky) joins |
| 03:24:21 | | nathang2184 quits [Ping timeout: 272 seconds] |
| 03:28:12 | | ducky quits [Ping timeout: 260 seconds] |
| 03:29:03 | | ducky (ducky) joins |
| 03:38:56 | | ducky quits [Ping timeout: 260 seconds] |
| 03:39:14 | | ducky (ducky) joins |
| 04:00:47 | <Ryz> | Heya folks, I need help on extracting contents from https://dataportal.greatersudbury.ca/ - as I'm not too sure if there's anything more to grab than the AB job that I ran moments earlier; asking here since databases tend to be trickier to extract~ |
| 04:01:58 | <Ryz> | Huh, there's also this, https://opendata.greatersudbury.ca/ - which I'm about to run in AB in a few moments |
| 04:05:04 | <pokechu22> | https://opendata.greatersudbury.ca looks like arcgis. I'll add it to my todo list |
| 04:14:24 | | ducky quits [Ping timeout: 260 seconds] |
| 04:20:39 | | chaoticbee (chaoticbee) joins |
| 04:21:40 | | ducky (ducky) joins |
| 04:28:08 | | Shjosan quits [Quit: Am sleepy (-, – )…zzzZZZ] |
| 04:30:40 | | Shjosan (Shjosan) joins |
| 04:33:44 | | chaoticbee quits [Remote host closed the connection] |
| 04:35:02 | <pabs> | klea: Obstacles [to archiving], we can include Cloudflare, other bot blockers, admin requests etc |
| 04:35:30 | | chaoticbee (chaoticbee) joins |
| 05:00:36 | | ducky quits [Ping timeout: 260 seconds] |
| 05:01:11 | | SootBector quits [Remote host closed the connection] |
| 05:01:59 | | ducky (ducky) joins |
| 05:02:24 | | SootBector (SootBector) joins |
| 05:03:36 | | chaoticbee quits [Remote host closed the connection] |
| 05:04:54 | | chaoticbee (chaoticbee) joins |
| 05:08:26 | | tzt (tzt) joins |
| 05:21:43 | | Dada joins |
| 05:26:26 | | sg72 quits [Remote host closed the connection] |
| 05:27:35 | | sg72 joins |
| 05:55:30 | | Wohlstand quits [Quit: Wohlstand] |
| 06:12:55 | | DogsRNice quits [Read error: Connection reset by peer] |
| 06:14:48 | | ducky quits [Ping timeout: 260 seconds] |
| 06:15:22 | | ducky (ducky) joins |
| 06:36:15 | | driib97 quits [Ping timeout: 272 seconds] |
| 06:38:50 | | midou quits [Ping timeout: 256 seconds] |
| 06:44:15 | | driib97 (driib) joins |
| 06:44:29 | | ericgallager quits [Ping timeout: 272 seconds] |
| 06:48:54 | | midou joins |
| 06:49:38 | | ericgallager joins |
| 06:53:34 | | midou quits [Ping timeout: 256 seconds] |
| 06:59:12 | | midou joins |
| 07:06:01 | | ericgallager quits [Ping timeout: 272 seconds] |
| 07:07:51 | | midou quits [Read error: Connection reset by peer] |
| 07:07:55 | | driib97 quits [Ping timeout: 272 seconds] |
| 07:09:20 | | midou joins |
| 07:09:57 | | driib97 (driib) joins |
| 07:30:11 | | driib97 quits [Client Quit] |
| 07:30:31 | | driib97 (driib) joins |
| 07:35:07 | | driib97 quits [Client Quit] |
| 07:35:27 | | driib97 (driib) joins |
| 07:36:38 | | midou quits [Ping timeout: 256 seconds] |
| 07:40:36 | | driib97 quits [Ping timeout: 256 seconds] |
| 07:46:53 | | midou joins |
| 07:54:40 | | ducky quits [Ping timeout: 260 seconds] |
| 07:55:44 | | ducky (ducky) joins |
| 08:13:51 | | nine quits [Quit: See ya!] |
| 08:14:04 | | nine joins |
| 08:14:04 | | nine is now authenticated as nine |
| 08:14:04 | | nine quits [Changing host] |
| 08:14:04 | | nine (nine) joins |
| 08:26:24 | | ducky quits [Ping timeout: 260 seconds] |
| 08:36:19 | | ducky (ducky) joins |