00:27:34Darken quits [Remote host closed the connection]
00:27:57Darken (Darken) joins
00:50:10BlueMaxima quits [Ping timeout: 255 seconds]
01:03:29Wohlstand quits [Client Quit]
01:13:21BlueMaxima joins
01:44:34pixel leaves [Error from remote client]
01:48:52Darken quits [Remote host closed the connection]
01:49:14Darken (Darken) joins
01:56:05katia is now known as k
02:03:17<anarcat>https://location.services.mozilla.com/ to be sunset, announced on march 13th
02:03:47<nicolas17>rip
02:03:47blue_0000ff quits [Read error: Connection reset by peer]
02:04:13blue_0000ff joins
02:17:26abirkill (abirkill) joins
02:26:10<@JAA>Fuck software patents.
02:31:20<fireonlive>++
03:11:34w1kip3d1a joins
03:16:13w1kip3d1a quits [Client Quit]
03:38:26<HP_Archivist>pokechu22: I know you looked at https://www.libraw.org/ and adjusted, it finished earlier. (I had stepped away for a few hours..) Did you abort early or just adjust the crawl?
04:36:43grid joins
04:52:37abirkill quits [Client Quit]
05:02:11BlueMaxima quits [Client Quit]
05:19:01<pokechu22>HP_Archivist: I got rid of something that *should* have been junk (pagination URLs where there was a second, different, ignored, pagination param), but I'm not 100% sure if it was complete or not. Those URLs did make up most of the queue. I'll double-check.
05:22:28<HP_Archivist>pokechu22: Ah alright, yeah I appreciate it. I recently learned that library is the basis to a variety of other software that processes camera RAW files. And the Libraw project is one that is based on *another* project that ceased in 2018, which I also want to make sure gets crawled properly
05:22:29<HP_Archivist>https://www.dechifro.org/dcraw/
05:51:35<pokechu22>HP_Archivist: I can confirm that it successfully requested all 125 pages from https://www.libraw.org/comments/recent?page=1 to https://www.libraw.org/comments/recent?page=125. The ignore I added was for stuff like https://www.libraw.org/comments/recent?destination=comments/recent%3Fpage%3D10&page=1 to
05:51:37<pokechu22>https://www.libraw.org/comments/recent?destination=comments/recent%3Fpage%3D10&page=125 which are the exact same as the actual page list, but with a second page number in the middle that does nothing (so instead of 125 requests for 125 pages, it'd be 15625 requests... which is just silly). The site's complete.
05:53:12<HP_Archivist>Hm, alright. Thank you for checking!
06:02:51Ruthalas59 quits [Ping timeout: 272 seconds]
06:06:09Ruthalas59 (Ruthalas) joins
06:38:17<h2ibot>Petchea created Piapro (+1301, Created page with "{{Infobox project | title =…): https://wiki.archiveteam.org/?title=Piapro
06:48:19<h2ibot>Petchea edited Piapro (+899): https://wiki.archiveteam.org/?diff=51888&oldid=51887
06:51:20<h2ibot>Petchea edited Piapro (+52): https://wiki.archiveteam.org/?diff=51889&oldid=51888
06:52:20<h2ibot>Petchea edited Piapro (-28): https://wiki.archiveteam.org/?diff=51890&oldid=51889
06:54:20<h2ibot>Petchea edited Piapro (+283): https://wiki.archiveteam.org/?diff=51891&oldid=51890
07:04:22<h2ibot>Petchea edited Piapro (+132, not just music): https://wiki.archiveteam.org/?diff=51892&oldid=51891
07:13:09lennier2_ quits [Ping timeout: 272 seconds]
07:17:58lennier2_ joins
08:02:48pixel (pixel) joins
08:10:06tguuy joins
08:10:42tguuy quits [Client Quit]
09:00:03Bleo182600 quits [Client Quit]
09:01:28Bleo182600 joins
10:29:21sec^nd quits [Remote host closed the connection]
10:29:52sec^nd (second) joins
10:34:39Guest92 quits [Ping timeout: 265 seconds]
11:25:20angenieux quits [Quit: The Lounge - https://thelounge.chat]
11:25:55angenieux (angenieux) joins
11:47:58icedice (icedice) joins
12:05:28linuxgemini4 (linuxgemini) joins
12:08:17linuxgemini quits [Ping timeout: 272 seconds]
12:08:17linuxgemini4 is now known as linuxgemini
12:19:34Darken2 (Darken) joins
12:23:37Darken quits [Ping timeout: 255 seconds]
12:23:42<imer>"On April 10th, 2024 the cell data downloads will be deleted and will no longer be available. " DELETED? (re mozilla location services)
12:27:21Darken2 quits [Read error: Connection reset by peer]
12:27:44Darken2 (Darken) joins
12:52:39<PredatorIWD>imer: Ran the downloads page https://location.services.mozilla.com/downloads through IA with Save outlinks on and it actually got First archive on most links but someone should still check it since that can miss crawling some links from the page.
12:53:18<imer>it might not grab the larger downloads properly I think?
12:53:39<PredatorIWD>Is there any surefire way to save a page like this on IA other than the basic web.archive.org/save UI?
12:54:02<imer>archivebot, I'm sure someone will run it through
12:54:16<PredatorIWD>imer: I manually entered the 2 big downloads it missed as well, might have missed some smaller ones also
12:54:56<Barto>i've thrown https://location.services.mozilla.com/ into archivebot
12:55:10<imer>thanks!
13:19:13Arcorann quits [Ping timeout: 272 seconds]
13:46:59decky quits [Read error: Connection reset by peer]
13:47:25decky joins
13:47:50Bleo182600 quits [Client Quit]
13:48:08Bleo182600 joins
13:49:55Guest41 joins
13:55:14Guest41 quits [Client Quit]
14:14:01pabs quits [Remote host closed the connection]
14:18:07linuxgemini quits [Ping timeout: 272 seconds]
14:32:19f_ quits [Ping timeout: 255 seconds]
14:32:34f_ (funderscore) joins
14:55:08pabs (pabs) joins
14:59:44Wohlstand (Wohlstand) joins
15:01:03Darken (Darken) joins
15:04:16Darken2 quits [Ping timeout: 255 seconds]
15:51:33linuxgemini (linuxgemini) joins
15:58:59Darken quits [Read error: Connection reset by peer]
15:59:22Darken (Darken) joins
17:13:37wickedplayer494 quits [Remote host closed the connection]
17:16:36wickedplayer494 joins
17:20:11wickedplayer494 quits [Remote host closed the connection]
17:22:13wickedplayer494 joins
17:25:42wickedplayer494 quits [Remote host closed the connection]
17:26:21wickedplayer494 joins
17:28:37Darken2 (Darken) joins
17:31:01wickedplayer494 quits [Remote host closed the connection]
17:32:46Darken quits [Ping timeout: 255 seconds]
17:42:11lexikiq joins
17:48:49wickedplayer494 joins
17:56:52wickedplayer494 quits [Remote host closed the connection]
18:09:21wickedplayer494 joins
18:14:39Darken2 quits [Read error: Connection reset by peer]
18:15:00Darken2 (Darken) joins
18:36:47wickedplayer494 quits [Remote host closed the connection]
18:39:41DLoader quits [Ping timeout: 272 seconds]
18:40:34qinplusdmi joins
18:52:39DLoader (DLoader) joins
19:08:55h3ndr1k_ quits [Ping timeout: 265 seconds]
19:11:18h3ndr1k (h3ndr1k) joins
19:21:29qinplusdmi quits [Ping timeout: 265 seconds]
19:38:49wickedplayer494 joins
19:46:08leo60228 quits [Client Quit]
19:46:14jacksonchen666 (jacksonchen666) joins
19:52:20leo60228 (leo60228) joins
19:54:21leo60228 quits [Client Quit]
19:57:07leo60228 (leo60228) joins
20:04:06Wohlstand quits [Client Quit]
20:05:22linuxgemini quits [Client Quit]
20:21:04nicolas17 quits [Ping timeout: 255 seconds]
20:22:15leo60228 quits [Client Quit]
20:25:08leo60228 (leo60228) joins
20:25:48qwertyasdfuiopghjkl quits [Write error: Broken pipe]
20:25:59qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
20:32:07nicolas17 joins
20:36:12icedice quits [Client Quit]
20:46:19<eightthree>are there projects for repeatedly backing up job posting sites (and maybe marketplaces/classifieds like gumtree) since those tend to be deleted fairly quickly, not sure if archive.org is anything close to thorough at keeping copies...
20:47:18<@JAA>If you have a good list of such sites/pages, we could throw them into #//'s thing.
20:56:15qwertyasdfuiopghjkl quits [Client Quit]
21:03:05qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
21:35:09BlueMaxima joins
22:07:53linuxgemini (linuxgemini) joins
22:12:43wickedplayer494 quits [Remote host closed the connection]
22:25:36atphoenix_ quits [Remote host closed the connection]
22:26:17atphoenix_ (atphoenix) joins
22:28:57archivist99 quits [Ping timeout: 272 seconds]
22:36:23atphoenix_ quits [Remote host closed the connection]
22:37:00atphoenix_ (atphoenix) joins
22:56:27blue_0000ff quits [Read error: Connection reset by peer]
22:56:58blue_0000ff joins
22:59:47<eightthree>JAA: they are often geographically limited, how much do I need to parse these and come up with a deduplicated, sorted list, also carving out subsections of host/domains that are job specific (i.e. the format of urls for jobs in hackernews, linkedin, etc?): https://en.wikipedia.org/wiki/List_of_employment_websites
22:59:51<eightthree>https://en.wikipedia.org/wiki/.jobs
22:59:51<eightthree>https://github.com/lukasz-madon/awesome-remote-job
22:59:51<eightthree>https://github.com/hugo53/awesome-RemoteWork
22:59:51<eightthree>https://github.com/zenika-open-source/awesome-remote-work
22:59:51<eightthree>https://github.com/engineerapart/TheRemoteFreelancer
22:59:52<eightthree>https://github.com/remoteintech/remote-jobs (seems like a list of employer websites, not likely frequent posted and deleted content like jobs)
22:59:53<eightthree>https://github.com/lukasz-madon/awesome-remote-job?tab=readme-ov-file#job-boards-aggregators
22:59:54<eightthree>https://github.com/lukasz-madon/awesome-remote-job?tab=readme-ov-file#job-boards
23:00:55<@JAA>Ideally, you'd compile a list and create a PR against https://github.com/ArchiveTeam/urls-sources .
23:03:11<eightthree>ok, this wont be done anytime in the next week from me, but if someone wants to start PRing I welcome it
23:04:46<eightthree>I was actually surprised this hasnt been done yet by archiveteam, I was hoping to find where to find the archives! (currently jobsearching and a little in hurry to find income)
23:11:21Arcorann (Arcorann) joins
23:29:58<nicolas17>eightthree: why archive them? historical data analysis?
23:31:06<nicolas17>for actual job search you'd only care about recent posts...
23:39:44atphoenix_ quits [Remote host closed the connection]
23:40:26atphoenix_ (atphoenix) joins
23:47:25pixel leaves
23:49:33<eightthree>nicolas17: so there's urgent searching and then there's "how long to wait and how infrequently does "dream job" or "rare desirable item for sale" show up in jobsites/classifieds/marketplaces. I was noticing a lot of "this post disappeared" type messages, getting confused and annoyed and feeling like this was worth publicly archiving...
23:50:57<nicolas17>(what's a "dream job", I don't understand)
23:51:00<nicolas17>(half joking)