00:04:36etnguyen03 quits [Client Quit]
00:11:23arch quits [Remote host closed the connection]
00:11:35arch (arch) joins
00:32:43APOLLO03 joins
00:46:08notarobot17 quits [Ping timeout: 256 seconds]
01:04:16APOLLO03 quits [Ping timeout: 256 seconds]
01:08:06notarobot17 joins
01:18:46bug quits [Quit: Leaving]
01:24:39Wohlstand quits [Quit: Wohlstand]
01:28:23azalea_sh_ quits [Ping timeout: 272 seconds]
01:28:36azalea_sh_ (azalea_sh_) joins
01:39:47notarobot17 quits [Ping timeout: 272 seconds]
01:49:46AS54591 joins
01:53:52corentin0 joins
01:54:28corentin quits [Read error: Connection reset by peer]
01:54:28corentin0 is now known as corentin
02:13:13sec^nd quits [Remote host closed the connection]
02:21:14cyanbox joins
02:25:49etnguyen03 (etnguyen03) joins
02:27:31sec^nd (second) joins
03:04:31Webuser105657 joins
03:06:06<Webuser105657>Hi, I would like to request the archiving of an important trove of legal / legislative materials regarding the former British colony of Newfoundland, a significant portion of which seems to be not yet archived.
03:08:06<pokechu22>Webuser105657: We can probably do that (though saving full laws is often too big with how they present previous versions)
03:09:00<pokechu22>What's the site?
03:09:34<Webuser105657>The collection starts here:
03:09:34<Webuser105657>https://dai.mun.ca/digital/statutesnl/
03:09:34<Webuser105657>Many of those links there are to pages in the subdomain collections.mun.ca, eventually leading to PDF files in the dai.mun.ca subdomain, which are important documents that I think should be protected by archiving. (Only a portion have been archived at all.)
03:09:34<Webuser105657>Other links go to webpages and then PDF files in the subdomain assemblynl.inmagic.com and in the resource www.assembly.nl.ca/legislation (on which some links/PDFs have previously been archived and some have not).
03:10:04Mateon1 quits [Ping timeout: 256 seconds]
03:10:32Mateon1 joins
03:12:15cultpony quits [Ping timeout: 272 seconds]
03:13:39cultpony (cultpony) joins
03:14:10<pokechu22>https://collections.mun.ca/ is ContentDM, which I have tooling to save (but looking at https://collections.mun.ca/sitemap.xml it's also *really* big - each entry listed there is ~50k URLs, so that's ~4 million pages, each of which has several other subpages)
03:14:56<pokechu22>I can make a list for archivebot, but it'll probably take several months to actually finish
03:18:31<Webuser105657>My personal concern is mostly about the legal and legislative materials that are linked from the start page (https://dai.mun.ca/digital/statutesnl/), which is probably only a tiny fraction of the overall DAI.mun.ca materials.
03:18:49<Webuser105657>Longer-term, I'm sure that scholars and researchers who focus on other areas of Newfoundland history and culture would probably appreciate having the entire site archived, just in case. But I definitely understand how that could take months to be archived.
03:19:03gosc joins
03:23:04<pokechu22>I'll see if I can modify my script to save only the statutesnl collection on https://collections.mun.ca/
03:24:48<Webuser105657>Thank you. And thank you for what I'm sure is a tremendous amount of volunteer labour for all of the archiving efforts.
03:25:36<pokechu22>https://assemblynl.inmagic.com/Presto/content/AdvancedSearch.aspx?ctID=MDQ2Yzk1MjctMTgxNC00ZWRlLTk0NGUtMDg4NTc4MzgwMWVi looks like it's largely a POST-based search which can't be saved directly, but if they're linked from https://assembly.nl.ca/legislation/ or something like that then I can just save https://assembly.nl.ca/
03:26:57<pokechu22>oh, or actually it seems like all of the 1969 statues are links to pages in https://www.assembly.nl.ca/ArchivedStatutes/SN1969.pdf
03:37:51<pokechu22>I started an archivebot job for https://transfer.archivete.am/inline/oi6cG/www.assembly.nl.ca_ArchivedStatutes.txt - I *think* the only valid files are from 1833-1970 but it's easy enough to save the whole range
03:40:17<pokechu22>I've started a second archivebot job for all of https://assembly.nl.ca/ / https://www.assembly.nl.ca/; you can watch that at http://archivebot.com/?initialFilter=assembly.nl.ca
03:44:55etnguyen03 quits [Client Quit]
03:50:10etnguyen03 (etnguyen03) joins
04:05:32etnguyen03 quits [Remote host closed the connection]
04:08:33<nulldata>A broadcast quality version of the CBS CECOT segment https://www.thereset.news/p/breaking-heres-the-60-minutes-segment
04:09:08<Webuser105657>@pokechu22 - do you have a sense of when the .mun.ca archiving might show up on the archivebot site? (I imagine that the modifying of the script could take some time. I was just trying to figure out when I should come back to the archivebot progress page to start looking for that.)
04:09:27<nicolas17>nulldata: wonder if it's this same version https://archive.org/details/60minutes-cecotsegment
04:11:05DogsRNice quits [Read error: Connection reset by peer]
04:11:48<nulldata>Seems like it - though the archive.org version includes the teaser at the start too
04:14:06adryd019 quits [Ping timeout: 256 seconds]
04:22:59<pokechu22>Webuser105657: I don't know yet, but should know in ~15 minutes
04:33:24<Webuser105657>thank you @pokechu22
04:34:17FoodNerd quits [Quit: Bye for now!]
04:36:55<pokechu22>Doesn't look too bad since https://collections.mun.ca/digital/collection/statutesnl/search is only 169 items (though each item is composed of multiple pages). I've got the script configured and am generating a URL list, but that might take around an hour. It will find links like https://dai.mun.ca/PDFs/statutesnl/StatutesofNewfoundland1833.pdf as well as allow navigation on
04:36:57<pokechu22>that part of https://collections.mun.ca/ though
04:37:29FoodNerd joins
04:38:40<pokechu22>(there won't be anything on archivebot.com until I generate the list, as generating the list happens on my laptop)
04:46:40Webuser660499 joins
04:47:21Webuser660499 quits [Client Quit]
04:51:03Webuser465956 joins
04:51:52Webuser465956 quits [Client Quit]
04:58:00HP_Archivist quits [Quit: Leaving]
05:17:50<pokechu22>OK, it's going to be significantly longer than an hour it seems. There are 61716 pages under there (since each of the 169 items is a book with a ton of pages)
05:24:59nexussfan quits [Quit: Konversation terminated!]
05:36:47<Webuser105657>@pokechu22 ah, okay that makes sense.
06:27:10ArchivalEfforts quits [Quit: No Ping reply in 180 seconds.]
06:28:19ArchivalEfforts joins
06:29:48Aurora joins
06:35:20<Aurora>hi i downloaded 11,299,548 videos and jsons from gif sharing platform tenor i was told you guys would be interested in that, it should be every post with a legacy ID in the json (every post before september 2024 or so) with the exception of 11 broken links
07:09:16Dango3608 (Dango360) joins
07:12:55Dango360 quits [Ping timeout: 272 seconds]
07:12:55Dango3608 is now known as Dango360
08:03:05Webuser599043 joins
08:04:13Webuser599043 quits [Client Quit]
08:39:10Hackerpcs quits [Quit: Hackerpcs]
08:48:22Shard79591 quits [Ping timeout: 256 seconds]
08:55:18Shard79591 (Shard) joins