00:02:02dm4v quits [Read error: Connection reset by peer]
00:02:10dm4v_ joins
00:02:36dm4v_ is now known as dm4v
00:02:36dm4v quits [Changing host]
00:02:36dm4v (dm4v) joins
00:03:30wyatt8740 quits [Ping timeout: 258 seconds]
00:03:59wyatt8740 joins
00:31:20Megame quits [Read error: Connection reset by peer]
00:31:38Megame joins
00:44:04lunik1 joins
01:02:37dm4v_ joins
01:02:42dm4v quits [Ping timeout: 250 seconds]
01:02:49dm4v_ is now known as dm4v
01:02:52dm4v quits [Changing host]
01:02:52dm4v (dm4v) joins
01:10:50wessel1512 quits [Read error: Connection reset by peer]
01:10:53wessel1512 joins
01:15:37wessel15126 joins
01:15:54wessel1512 quits [Read error: Connection reset by peer]
01:15:54wessel15126 is now known as wessel1512
01:16:54<jacobk>Hello, I'm interested in archiving the wordlists, images, and audios from wordplay.com. As far as I can tell, the website loads almost everything dynamically with a couple of javascript files, so view-source: does not get the information. I tried using a tool called phantomjs to render the page and save the modified DOM plus a png image, which works in most cases (but sometimes it stops before the page is fully loaded). I also
01:16:54<jacobk> called the API more directly with curl -i and saved the json response of each call. Neither of these methods get the images and audio though, just links to them (they're on a different domain). What would be the best way to download the linked images/audio, and also make sure that I redownload all of the pages that stopped before finishing loading (should return "not found" if the course/lesson actually doesn't exist, whereas
01:16:54<jacobk> is will say "loading" if it hasn't finished loading)?
01:45:13Megame quits [Client Quit]
02:17:26lennier1 (lennier1) joins
02:23:00<thuban>nuroten: grabbing _31 this week_ (https://podcast.rthk.hk/podcast/tv_thisweek2014_i.xml) now; will follow up with _open line open view_ (https://podcast.rthk.hk/podcast/radio1_openline_openview.xml). is that correct, and is the talk show available through the podcast site?
02:23:44nostalgebraist joins
02:24:24<thuban>also, thanks JAA for the monitor, and thanks nuroten for adding the twitter links
02:24:28<nuroten>thuban: thanks! yeah, that's the one, about ~1k videos
02:24:41<nuroten>sorry, not videos, audio I think
02:25:00<nuroten>31 is video
02:26:27<thuban>note to self, update script to handle filetype correctly...
02:32:44<thuban>ah, easier than i was expecting
02:34:43<nuroten>a chunk of the the groups/items from parties onwards don't have twitter accounts, included where found
02:39:46<nuroten>I'm wondering whether to save certain renowned figures' twitter timelines as part of this whole thing ... Agnes Chow's facebook page is gone (unclear whether she removed it herself), but twitter is still up
02:39:54HP_Archivist (HP_Archivist) joins
02:42:42<abccc>nuroten I'd say saving twitter and fb timelines is important, lots of fb accounts have been disappearing lately (ex: Civic Party HK, the 2nd largest pro democracy party in HK).
02:44:10<nuroten>(a bit of background: she's an activist, arrested along with other key figures and served sentence for unauthorized assembly, recently released)
02:45:05<nuroten>abccc: that's unfortunate, about Civic Party HK's Facebook page
02:45:35<abccc>nuroten Agnes Chow's fb page was removed most likely due to the ongoing "endangering national security" case against her
02:47:34<nuroten>yeah ... but twitter account would be removed too? a lot of Japanese followers after all
02:48:53<nuroten>either way, yeah, it might be important. it does not bode well
02:50:33<abccc>nuroten yes because according to the HK government, having a Twitter account means you are "colluding with foreign forces". No joke, this was the reason they used to arrest Jimmy Lai, the owner of Apple Daily.
02:51:07<abccc>and of course "colluding with foreign forces" is a national security threat tantamount to treason.
02:55:40<@OrIdow6>EggplantN: 10.7 thousand
02:55:58<@OrIdow6>jacobk: Is the site at risk?
02:58:44<nuroten>abccc: yeah, sad times
03:00:34<@JAA>OrIdow6: https://wordplay.com/notice
03:01:05<@JAA>'After more than 10 years of operation, Worpdlay will be shutting down permanently at the end of this school year. The last day of operation will be July 1, 2021.'
03:01:06<@OrIdow6>Thanks JAA
03:01:18<@OrIdow6>2 days
03:01:43<@JAA>It's already the 30th in Europe, so possibly less than one.
03:03:12<@OrIdow6>I don't think the site is European
03:04:46<@OrIdow6>But in any case very close
03:05:30<@JAA>Uh yeah, probably not. I was thinking 'Spanish' as in the country, not the language, for some reason.
03:13:38Doranwen quits [Client Quit]
03:20:09HP_Archivist quits [Ping timeout: 258 seconds]
03:33:10<@OrIdow6>Trying to enumerate it now
03:33:18<@OrIdow6>I.e. trying different IDs or whatever they are
03:37:37<jacobk>I think the IDs for everything in wordplay are sequential, <9000 courses, <130000. I checked by just trying a bunch of IDs and creating a few courses/lessons myself.
03:37:58<jacobk>*<130000 lessons
03:38:53<@OrIdow6>Oh, you created them?
03:39:46<jacobk>I created a few just for testing; most are made by teachers I think.
03:39:53<@OrIdow6>Are there any other units of the site besides courses and lessons?
03:40:29<jacobk>There's users and classes, but those aren't publically accessible.
03:40:38<@OrIdow6>Oh
03:40:48<jacobk>There's words, but I don't think those can be individually requested.
03:41:56qw3rty__ joins
03:44:00<jacobk><91000 words
03:44:57<jacobk>or, more accurately, "tiles", which contain a pair of strings ("targetText" and "nativeText") that are supposed to mean the same thing.
03:45:50qw3rty_ quits [Ping timeout: 258 seconds]
03:58:03<@EggplantN>We’ve got 2 days?
04:01:01<jacobk>Yeah
04:01:33<jacobk>Unless Wordplay stays up after what they say the last day of operation will be.
04:01:54<@OrIdow6>Or less
04:02:04<@OrIdow6>I expect it to be small, shouldn't need backfeed
04:02:17<jacobk>Maybe they'll close the visible site but keep the API up for longer. And the images/audios are stored on Cloudfront, so those may last a little while longer as well.
04:02:32<jacobk>Not that either of those things would happen intentionally.
04:03:20<@OrIdow6>The latter is fairly common, due to the way this site is set up the former is not
04:03:48<@EggplantN>Oridow6 I can setup backfeed keys if needed
04:04:06<jacobk>I've already saved all of the "printable word lists" with phantomjs.
04:04:29<jacobk>The HTML doesn't look quite right, but the text is there, and I also had phantomjs render a png which usually looks fine.
04:04:44<@OrIdow6>EggplantN: Well, if it's easy, I may use it anyway
04:05:02<@OrIdow6>jacobk: We do things properly here
04:05:21<jacobk>I don't know what I'm doing
04:05:38<jacobk>What would the proper way be?
04:06:10<@JAA>We archive the raw HTTP requests/responses as WARCs so that they can be ingested into the Wayback Machine.
04:06:15<@OrIdow6>Or, more thourpughly
04:06:17<@OrIdow6>With warcs
04:07:17<jacobk>I tried using wget with --warc-file="wp", but it seemed to be missing a lot for some reason.
04:07:47<@OrIdow6>But in any case saving sites like this is the whole "point" of ArchiveTeam
04:09:46<@OrIdow6>So ArchiveTeam will do it for you
04:09:51DogsRNice quits [Read error: Connection reset by peer]
04:09:55<jacobk>Oh
04:10:30<@OrIdow6>But on JS... there are no easy solutions. The only sure way (when working with warcs) is to make the right requests and save those
04:11:37<@OrIdow6>There are various things that e.g. reserialize the finished page, put it into images, run a headless browser and capture all traffic, etc., but nothing works 100% of the time
04:18:17<jacobk>With Wordplay, it should be feasible to programatically determine whether the page loaded properly or not, so a script that checks saved pages and resaves broken ones might make archival more reliable.
04:19:40<jacobk>If the DOM contains "Loading..." then it didn't finish, and if it says "not found" the the resource doesn't exist.
04:40:01AntiLiberal joins
04:42:08AntiLiberal quits [Remote host closed the connection]
04:42:20AntiLiberal joins
04:47:51AntiLiberal quits [Remote host closed the connection]
04:48:03AntiLiberal joins
04:59:54aaaaa quits [Remote host closed the connection]
05:00:54HP_Archivist (HP_Archivist) joins
05:10:52Doranwen (Doranwen) joins
05:59:28<@OrIdow6>Yes
06:00:52<@OrIdow6>Can someone tell me what this does? https://transfer.archivete.am/inline/142eSP/spanish_thing_function_2.js Pe is a function that does some string transformation
06:09:39<thuban>OrIdow6: it munges the article a bit
06:09:58<thuban>OrIdow6: specifically, if the string e begins with '(el)' or '(el/la)' it returns 'el' + t; '(los)' or '(los/las)', 'los' + t; '(la)', 'la' + t; '(las)', 'las' + t; otherwise, just t
06:11:15<thuban>(weird way to implement that)
06:13:35<@OrIdow6>Thank you thuban
06:13:53<thuban>yw
06:27:12vela quits [Quit: vela]
06:27:41vela (vela) joins
07:15:13BlueMaxima quits [Client Quit]
07:19:30pbm joins
07:36:59HP_Archivist quits [Ping timeout: 258 seconds]
08:00:00shoghicp quits [Ping timeout: 250 seconds]
08:00:16shoghicp (shoghicp) joins
08:35:18nuroten quits [Remote host closed the connection]
08:56:46Megame (Megame) joins
09:03:56HP_Archivist (HP_Archivist) joins
09:10:08HP_Archivist quits [Ping timeout: 258 seconds]
09:42:16shoghicp quits [Ping timeout: 250 seconds]
09:44:27shoghicp (shoghicp) joins
10:01:46shoghicp quits [Ping timeout: 250 seconds]
10:06:56shoghicp (shoghicp) joins
10:10:33<@OrIdow6>Alright wordplay should be ready in a bit
10:13:01<rewby>Cool. How are we doing this? AB, DPOS project, qwarc?
10:13:14<@OrIdow6>Warrior
10:13:19<@OrIdow6>I.e. DPOs
10:13:39<rewby>Aight. Let me know when you've got a channel or need a target
10:14:28<@OrIdow6>Ok
10:14:47<rewby>I'll do targets for this one.
10:36:20<@OrIdow6>arkiver: Can I get quick approval on https://github.com/OrIdow6/wordplay-grab ? Says it shuts down the 1st
10:38:13<thuban>will we be making this the warrior default?
10:39:20<Megame>"Kazakh human rights activist in need of data backup after Youtube channel was deleted, but reinstated back."
10:39:21<Megame>https://twitter.com/HumanKazakh/status/1409680883838185478
10:39:24<@OrIdow6>Needs a backfeed key BTW
10:39:45<@OrIdow6>thuban: Not my decision, if you're asking me
10:40:11<thuban>nope, general question/suggestion
10:40:25<@OrIdow6>Oh
10:53:50<rewby>EggplantN: ^ Can you get OrIdow6 a backfeed key
11:14:30<@EggplantN>Not for a couple hours.
11:14:39<@EggplantN>Fusl can or arkiver can sorry I’ve had to run out
11:19:47<@arkiver>OrIdow6: checking
11:19:59<@arkiver>let me handle the backfeed key
11:23:19<@arkiver>OrIdow6: code looks good, pretty nice improvement over the old code of wikidot
11:23:48<@arkiver>i didnt test it, but will assume that its complete if you say so
11:24:01<@arkiver>(and dont have time to test for another few hours - so we'll just start that project
11:27:47pbm quits [Remote host closed the connection]
11:34:49<@arkiver>OrIdow6: you're admin on the wordplay tracker
11:35:05<@arkiver>if the project goes smooth, lets do it without a special channel
11:35:13<@arkiver>if more attention is needed, we'll create a channel
11:35:37<rewby>If you give me some vars I'll prepare a target
11:40:31aryashahnaughty joins
11:40:56aryashahnaughty quits [Remote host closed the connection]
11:54:01<@arkiver>rewby: thanks! its
11:54:10<@arkiver>archiveteam_wordplay_
11:54:12<@arkiver>wordplay_
11:54:12<nyany>lmk when that's up and i'll do a nyany
11:54:16<@arkiver>Archive Team Worldplay:
11:54:18<@arkiver>oops
11:54:23<@arkiver>last one is wrong
11:54:26<@arkiver>last one should be
11:54:32<@arkiver>Archive Team Wordplay:
11:54:37<@arkiver>nyany: its done
11:54:41<rewby>All right. What kind of speed are we expecting?
11:55:18<@arkiver>no idea
11:55:36<nyany>arkiver: brilliant.
11:56:23<rewby>Aight. Give me like 5 minutes and there'll be a target in the project
12:01:01<@EggplantN>Noice
12:02:14<@arkiver>started
12:02:17<@arkiver>all items have been queued
12:03:54<rewby>I'm setting the limit to 0 until I'm done with the target
12:04:09<@HCross>please leave it at 0
12:04:12<@HCross>until target is ready
12:14:34<rewby>We've kicked off
12:16:22<nyany>well if I could get a build to succeed I'd help
12:16:42<nyany>i cannot get docker to run properly and buster-backports refuses to work
12:22:59<nyany>HCross: might be worth noting that the pubkeys for buster-backports need to be imported now
12:24:05<AK>https://univis.univie.ac.at/ausschreibungsstellensuche/ look recognisable to anyone?
12:24:21<nyany>yeah, looks like a pretty 404
12:24:37<nyany>https://usercontent.irccloud-cdn.com/file/uQIJfdfW/image.png
12:24:38<AK>arkiver, the 404 you took out of #//, was it for that url above?
12:26:56<@HCross>OrIdow6: can we cut the exponential backoff?
12:27:05<AK>I got a lovely message from a Kevin at the university of Austria asking me to stop ddosing them lol https://share.aktheknight.co.uk/riJe0/bIyuFEku64.png/raw
12:27:12<AK>Apparently we're visiting that url a lot
12:27:18<AK>With unique params each time
12:27:31<AK>He'd like to know if we could tone it down, so they don't have to ban us
12:29:32<@arkiver>AK: can you forward that email to arkiver@protonmail.com? or else PM me the URL they linked in that email
12:29:48<@HCross>arkiver: OrIdow6 wordplay, we seem to be exponentially backing off on courses that don't exist
12:29:55<AK>It's via twitter, I'll send you all the info now
12:33:32<@arkiver>HCross: oof 12 max tires
12:33:33<@arkiver>tries
12:33:55<@arkiver>OrIdow6: please check the 400s
12:34:25<rewby>We've slowed down massively, I'm barely getting 10mbps inbound on average on the target.
12:34:59<@arkiver>HCross: should be better
12:43:44<nyany>nice
12:58:47<@HCross>arkiver: I never saw any backed in use
13:02:00<@HCross>arkiver: Can we have a multi-size of 2 please
13:02:34<@HCross>I think 4XX errors are cancelling the entire multi-item
13:06:56PlsNoJava quits [Quit: ttfn]
13:16:01univie-kd joins
13:17:50<univie-kd>Hi guys! I was told to mention this issue here: Some of your researchers are unintentionally DoSing one of our subdomains by crawling the same URL over and over again, sending thousands of requests, because the URL utilizes a uniq flow-id for every request :/
13:19:01<nyany>univie-kd: is this that subdomain: https://univis.univie.ac.at/ausschreibungsstellensuche/
13:19:19<univie-kd>yep, exactly
13:19:28<nyany>I believe this is being looked into, but AK arkiver
13:19:30<rewby>arkiver, AK: It looks like someone from that uni has showed up.
13:21:07<univie-kd>Yeah, I contacted one of your guys via twitter, because we could identify his IP and we don´t want to block the IPs on the firewall, as your project is definitely not harmful :) Just wanted to drop by here and spread the word, so you can fix this, if possible
13:21:15<rewby>univie-kd: We've just pinged the people who usually deal with these situations. They'll be around soon.
13:21:22<Jake>univie-kd: any idea on the useragent?
13:22:36<@arkiver>univie-kd: yes i've been pinged about it, thanks for joining
13:22:38<@arkiver>looking inot it
13:22:40<@arkiver>into*
13:23:05<univie-kd>Nope, I was only given a list of IPs and sent on the hunt :) I can ask the Sysadmins if they can tell me the user-agents
13:23:10<univie-kd>Thanks for looking into it!
13:24:11<@arkiver>univie-kd: useragent is a browser agent
13:24:16<@arkiver>how did you find it was us?
13:24:45<@arkiver>right i see the URLs
13:25:02<rewby>Maybe just put in an ignore or something?
13:25:09<univie-kd>tracked down the IPs and a few of them led to one of "your guys" :) via twitter contacted him and he said that he is doing stuff for your project
13:25:11<@arkiver>jsessionid and/or _flowexecutionkey are the problem here i guess
13:25:18<@arkiver>nice
13:25:23<@arkiver>yeah, putting in a block for this one
13:25:44<univie-kd>Thanks for reacting so quuickly, really appreciate it
13:26:15<rewby>We're just here to preserve the internet, not to cause it to go down. ;)
13:26:48<thuban>was this via #//? what was queueing it?
13:28:07<univie-kd>Haha, yeah we just wanted to get in touch with you, because as I said, we don´t mind crawlers archiving our sites at all - would be a shame to block your IPs, as you don´t have any harmful intention!
13:28:43<@arkiver>thuban: yes this is #//
13:29:05<@arkiver>we indeed dont have harmful intentions :)
13:29:19<h3ndr1k>univie-kd: I think all of the projects contain a hint to this channel in case there are problems. That way you could have found us earlier. But it seems to have worked out anyway.
13:29:20<@arkiver>opposite - we're trying preserve your content instead of getting it offline
13:29:40<h3ndr1k>univie-kd: *All of the projects User Agents.
13:29:51<rewby>I think urls might be using a chrome UA or something
13:29:58<@arkiver>yeah rewby is correct
13:30:12<AK>Oh Hi univie-kd o/ ~Alex
13:30:13<h3ndr1k>oh ok nevermind then
13:30:26<@arkiver>univie-kd: should be fixed, may take a few hours before you see no requests. the crawling is distirbuted over several machines, they all need to update
13:30:36<univie-kd>Thanks for the hint about the user agent. As I said, I was given only a list of IPs without further information. But it is good to know for the future - thank you very much!
13:30:59<univie-kd>Thanks for fixing it so fast - awesome community here, I am impressed
13:31:03<univie-kd>Hi Alex :)
13:31:41<Jake>thank you for alerting us to the problem!
13:31:56<@arkiver>AK: ^ :)
13:31:58<rewby>A conversation is always better than a block. Sadly, we often get blocked.
13:32:07<AK>Glad we got it all sorted :)
13:36:29<rewby>arkiver: Can you set multi-item on wordplay to something like 2 or even disable it entirely? Me and HCross think that some 4xx responses are cancelling whole multi-items instead of single items. And we've kinda ran out of to-do.
13:39:35<Jake>I believe it is, a ton of mine got cancelled for one 400
13:41:23<rewby>Yeah, we're kinda working around it by requeueing often, that causes some smaller items to be issued which get completed
13:41:29<rewby>So we're not at a complete standstill
13:41:43<rewby>But I'd rather we just turn the multi-item way down or off
13:42:20<@arkiver>rewby: yes
13:42:50<rewby>There's only a few items left and I'm not worried about small files
13:43:27<@arkiver>done
13:43:57<@arkiver>OrIdow6: in general, it's better to use a low multi item size (or disable it entirely) when there is not a large number of todo items
13:44:16<@arkiver>the 150k in this case is a good example of a small number of items, multi item size 1 is fine for those
13:46:27<rewby>arkiver: Thanks! This is going much better
13:47:49<@arkiver>good!
13:48:16<@HCross>arkiver: have you set it so that we mark 400s as done in a few minutes
13:48:19<@HCross>after that. we'll be done
13:48:24<@HCross>*can you not have you
13:49:46Nikos410 joins
13:50:42<rewby>arkiver: Me and HCross think that we're done with non 400-ing items. Nothing's being handed in anymore and on his cluster it looks like all of the urls are 400ing.
13:51:22<Jake>all mine are new items are 400s as well
13:52:36univie-kd leaves
14:04:59britmob joins
14:17:34britmob quits [Ping timeout: 258 seconds]
14:20:24<rewby>arkiver: Yeah, I think we're done with succeeding items. Do you think you can rig it to accept the 4xx errors like HCross suggested? I'd like to wrap the project up
14:21:08lennier1 quits [Client Quit]
14:23:46lennier1 (lennier1) joins
14:32:12nostalgebraist quits [Client Quit]
14:45:36britmob joins
14:53:50benjins quits [Ping timeout: 250 seconds]
15:01:07nostalgebraist joins
15:10:05benjins joins
15:20:03Arcorann__ quits [Ping timeout: 258 seconds]
15:47:05Nikos410 quits [Remote host closed the connection]
15:54:39nuroten joins
17:06:59somerando3 joins
17:08:05<somerando3>Does anyone have a list of ALL the facebook videos from https://www.facebook.com/hongkongfp/? I have a quick an dirty javascript that can grab all the live videos if I can scroll down through all of them, but the main videos page quickly gets gummed up and I haven't been able to scroll past ~1yr ago on the main video pages after several attempts.
17:08:06<somerando3>abccc was able to post a list like this for https://www.facebook.com/standnewshk/ several days ago.
17:08:32<somerando3>If anyone's interested, these are the lists I can provide. They're of live FB videos only (so missing some stuff). The lists have some metadata, which may be useful since no downloader I've tried has consistently gotten the FB metadata right: hkfp: https://www92.zippyshare.com/v/NoG8X74B/file.html; standnewshk:(probably incomplete, but less
17:08:32<somerando3>incomplete than my last list): https://www92.zippyshare.com/v/5VuglMD5/file.html
17:14:40<nuroten>thuban: just a heads-up, The Pulse has been axed https://podcast.rthk.hk/podcast/item.php?pid=205 co-host of Backchat, Stephen Vines (who used to host The Pulse), has resigned from the show. not sure how long it will continue to run https://podcast.rthk.hk/podcast/item.php?pid=177
17:16:00<nuroten>(that's https://podcast.rthk.hk/podcast/thepulse_i.xml and https://podcast.rthk.hk/podcast/backchat.xml)
17:26:25spirit joins
17:32:18<@JAA>somerando3's files rehosted: https://transfer.archivete.am/12eR5x/hkfp.fb.live.videos.2021-6-30.tsv https://transfer.archivete.am/oR83S/standnewshk.fb.live_videos.2.tsv
17:33:22<Jake>I tried to download a bunch of facebook videos from the reddit list, some crazy ratelimits after 60 videos
17:33:48<@JAA>somerando3: There is probably no complete list. Facebook's a PITA. Their scrolling stuff was already broken sometimes, but the rate limits just render archiving any significant amount of stuff virtually impossible.
18:11:43<@OrIdow6>https://github.com/ArchiveTeam/wordplay-grab/pull/1 should fix the Wordplay problems
18:14:44<@OrIdow6>Since I do have tracker admin, I have set the minimum
18:15:18<@OrIdow6>EggplantN HCross etc. ^
18:18:15<tech234a>https://twitter.com/minecraftearth/status/1410282240781828100
18:26:28<@OrIdow6>For the benefit of those of us that don't have Electron IRC clients or whatever it is...
18:26:30<@OrIdow6>" Today we say farewell to Minecraft Earth. We are so incredibly thankful for this wonderful community and all the memories we have built together."
19:03:27noteness_ quits [Remote host closed the connection]
19:03:45noteness (noteness) joins
19:08:20HP_Archivist (HP_Archivist) joins
19:12:30<@EggplantN>Done oridow6
19:13:03<@OrIdow6>Thank you
19:24:24<@OrIdow6>And it looks like it's done
19:25:12<@OrIdow6>Thank you everyone
19:25:26<@OrIdow6>Thank you for setting it up arkiver, I will keep the multiitem thing in mind
19:25:30<rewby>Cool. Can I clean up the targets?
19:26:55<@OrIdow6>Yes
19:26:57<Jake>good job!
19:28:14<rewby>I have received 17399830125 bytes of data from warriors.
19:28:19<rewby>Good job everyone
19:31:37<jacobk>Does that include all of the images and audio?
19:33:32<jacobk>(Based on wordplay.lua, it seems like it should have)
19:34:08<rewby>I'm not sure, someone'll have to double check the warc files
19:34:59<jacobk>Are those publicly donwloadable yet?
19:35:05<rewby>Uh. Let me double check.
19:36:14<jacobk>I see something on archive.org uploaded at 12:45 today
19:36:17<rewby>I think our items are set to restricted by default... They'll show up in the wayback machine in a few days one way or anohter. But for direct download, that's something to ask arkiver.
19:36:36<rewby>Yeah, this is one of the megawarcs, https://archive.org/details/archiveteam_wordplay_20210630124018_80a396be
19:38:05<@OrIdow6>jacobk: Yes, otherwise it wouldn't be a proper archive
19:41:59<@OrIdow6>And I will say, that some resources 403, but this happens in the live version as well
19:43:05<rewby>Final upload in progress. https://s3.services.ams.aperture-laboratories.science/rewby/public/e27d35e6-bf8f-41f1-aec4-7a33a331c6f0/1625082175.3618443.png
19:44:28<rewby>And here's the other pack! https://archive.org/details/archiveteam_wordplay_20210630213237_4db6cf0c
19:44:46<rewby>If I'd known this wasn't gonna be that big I'd have upped my chunk size to like 25G
19:45:39<jacobk>OrIdow6: 403? I rember getting 400s, but not 403s. Do you have an example of a 403 page? Probably not an actual problem, just curious.
19:46:26<@OrIdow6>jacobk: Audio for (el) centro comunitario on https://wordplay.com/lesson/100204
19:46:59<@OrIdow6>403 is cloudfront.net's way of saying 404 in thsi cas
19:47:01<@OrIdow6>e
19:47:22<rewby>Here's a link to the megawarc if you want to double check playback, https://ia601506.us.archive.org/9/items/archiveteam_wordplay_20210630213237_4db6cf0c/wordplay_20210630213237_4db6cf0c.megawarc.warc.gz
19:47:59<jacobk>Oh yeah, I forgot about cloudfront.
19:48:01Hackerpcs quits [Quit: Hackerpcs]
19:48:04<@arkiver>OrIdow6: wordplay done?
19:48:06<@arkiver>HCross: rewby: looks like OrIdow6 fixed it :)
19:48:21<rewby>arkiver: Yep, he did. I'm just cleaning down
19:48:34<@arkiver>alright, taking project off the tracker
19:49:00<@arkiver>off the front page that is
19:49:23Hackerpcs (Hackerpcs) joins
19:49:26<jacobk>rewby: That link says not available
19:49:29<rewby>Bye bye target-0f9b709f, you did well.
19:49:37<rewby>jacobk: I think the item got set to restricted automatically
19:49:54<rewby>arkiver: Any reason that the wordplay items are access restricted on archive.org?
19:50:32<@arkiver>rewby: they are?
19:50:38<@arkiver>checking
19:50:41<jacobk>I suppose it's possible that the teachers who created lists for students didn't intend for them to be public.
19:50:44<rewby>See my links from earleir
19:50:46<rewby>*earlier
19:50:53<jacobk>But, they should be just words
19:52:13<rewby>We're checking. :)
19:52:14<jacobk>Teachers can search for other teachers' lists. Wordplay documentation might suggest that sharing courses is optional, but I didn't see any such option when I logged in as a teacher.
19:52:35<@OrIdow6>jacobk: This is something different
19:52:52<@arkiver>rewby: it'll become public when i move it out of the inbox collection
19:53:18<rewby>arkiver: Ah okay. Fair enough
19:58:27<jacobk>Oh nvm, the "share a course" link says to copy the URL and send it to a colleague. So the courses are probably searchable by default.
20:10:19spirit quits [Client Quit]
20:15:21Mikesky joins
20:25:10abccc quits [Remote host closed the connection]
20:25:43abcde joins
20:34:12@dxrt quits [Client Quit]
20:35:41dxrt joins
20:35:43dxrt quits [Changing host]
20:35:43dxrt (dxrt) joins
20:35:43@ChanServ sets mode: +o dxrt
20:47:08DogsRNice (Webuser299) joins
20:50:29nostalgebraist quits [Ping timeout: 258 seconds]
20:55:51DogsRNice quits [Ping timeout: 258 seconds]
21:02:17abcde quits [Remote host closed the connection]
21:02:59DogsRNice (Webuser299) joins
21:15:17sec^nd quits [Remote host closed the connection]
21:20:15genofire quits [Quit: Gateway shutdown]
21:20:16pcr quits [Quit: Gateway shutdown]
21:28:35sec^nd (second) joins
21:54:37nertzy__ joins
21:57:11nertzy_ quits [Ping timeout: 258 seconds]
22:29:34Mikesky quits [Read error: Connection reset by peer]
22:51:00abcde joins
22:53:31abcde quits [Remote host closed the connection]
22:55:49abcdef joins
22:57:21hexa- quits [Quit: WeeChat 3.2]
23:00:00hexa- (hexa-) joins
23:08:22<somerando3>nuroten & thuban, I looked at some of those RTHK podcasts. It looks like the official RSS lists only include 1000 episodes, so for instance the 自由風自由PHONE RSS only goes back about 6 months. Actual audio is still available on their archive server, though. For instance this site has a much more complete list of 自由風自由PHONE (6139
23:08:23<somerando3>episodes, going back to Jan 2014): https://www.podchaser.com/podcasts/phone-174719/about
23:13:46<nuroten>somerando3: the rss go back to a year ... thanks, that's a very nice find!
23:17:02<nuroten>some like The Pulse do go back longer, but if everything is on the archive server then it would make sense to fetch from there
23:23:31<somerando3>It looks like podchaser has an API with a free tier, but if that doesn't work the naming convention for the files looks pretty regular, however it seems there might have been a format change from mp3 to m4a at some point.
23:27:25<somerando3>e.g. I found this old link on some website to https://archive.rthk.hk/mp3/radio/contentIndex/radio1/openline_openview/mp3/20180503_2.mp3
23:29:11<somerando3>this is a newer link: https://archive.rthk.hk/mp3/radio/contentIndex/radio1/openline_openview/m4a/20210615_4.m4a All the stuff I checked only seems to be in either one format or the other.
23:33:18Megame quits [Client Quit]
23:49:58abcdef quits [Remote host closed the connection]