00:27:34 | | Darken quits [Remote host closed the connection] |
00:27:57 | | Darken (Darken) joins |
00:50:10 | | BlueMaxima quits [Ping timeout: 255 seconds] |
01:03:29 | | Wohlstand quits [Client Quit] |
01:13:21 | | BlueMaxima joins |
01:44:34 | | pixel leaves [Error from remote client] |
01:48:52 | | Darken quits [Remote host closed the connection] |
01:49:14 | | Darken (Darken) joins |
01:56:05 | | katia is now known as k |
02:03:17 | <anarcat> | https://location.services.mozilla.com/ to be sunset, announced on march 13th |
02:03:47 | <nicolas17> | rip |
02:03:47 | | blue_0000ff quits [Read error: Connection reset by peer] |
02:04:13 | | blue_0000ff joins |
02:17:26 | | abirkill (abirkill) joins |
02:26:10 | <@JAA> | Fuck software patents. |
02:31:20 | <fireonlive> | ++ |
03:11:34 | | w1kip3d1a joins |
03:16:13 | | w1kip3d1a quits [Client Quit] |
03:38:26 | <HP_Archivist> | pokechu22: I know you looked at https://www.libraw.org/ and adjusted, it finished earlier. (I had stepped away for a few hours..) Did you abort early or just adjust the crawl? |
04:36:43 | | grid joins |
04:52:37 | | abirkill quits [Client Quit] |
05:02:11 | | BlueMaxima quits [Client Quit] |
05:19:01 | <pokechu22> | HP_Archivist: I got rid of something that *should* have been junk (pagination URLs where there was a second, different, ignored, pagination param), but I'm not 100% sure if it was complete or not. Those URLs did make up most of the queue. I'll double-check. |
05:22:28 | <HP_Archivist> | pokechu22: Ah alright, yeah I appreciate it. I recently learned that library is the basis to a variety of other software that processes camera RAW files. And the Libraw project is one that is based on *another* project that ceased in 2018, which I also want to make sure gets crawled properly |
05:22:29 | <HP_Archivist> | https://www.dechifro.org/dcraw/ |
05:51:35 | <pokechu22> | HP_Archivist: I can confirm that it successfully requested all 125 pages from https://www.libraw.org/comments/recent?page=1 to https://www.libraw.org/comments/recent?page=125. The ignore I added was for stuff like https://www.libraw.org/comments/recent?destination=comments/recent%3Fpage%3D10&page=1 to |
05:51:37 | <pokechu22> | https://www.libraw.org/comments/recent?destination=comments/recent%3Fpage%3D10&page=125 which are the exact same as the actual page list, but with a second page number in the middle that does nothing (so instead of 125 requests for 125 pages, it'd be 15625 requests... which is just silly). The site's complete. |
05:53:12 | <HP_Archivist> | Hm, alright. Thank you for checking! |
06:02:51 | | Ruthalas59 quits [Ping timeout: 272 seconds] |
06:06:09 | | Ruthalas59 (Ruthalas) joins |
06:38:17 | <h2ibot> | Petchea created Piapro (+1301, Created page with "{{Infobox project | title =…): https://wiki.archiveteam.org/?title=Piapro |
06:48:19 | <h2ibot> | Petchea edited Piapro (+899): https://wiki.archiveteam.org/?diff=51888&oldid=51887 |
06:51:20 | <h2ibot> | Petchea edited Piapro (+52): https://wiki.archiveteam.org/?diff=51889&oldid=51888 |
06:52:20 | <h2ibot> | Petchea edited Piapro (-28): https://wiki.archiveteam.org/?diff=51890&oldid=51889 |
06:54:20 | <h2ibot> | Petchea edited Piapro (+283): https://wiki.archiveteam.org/?diff=51891&oldid=51890 |
07:04:22 | <h2ibot> | Petchea edited Piapro (+132, not just music): https://wiki.archiveteam.org/?diff=51892&oldid=51891 |
07:13:09 | | lennier2_ quits [Ping timeout: 272 seconds] |
07:17:58 | | lennier2_ joins |
08:02:48 | | pixel (pixel) joins |
08:10:06 | | tguuy joins |
08:10:42 | | tguuy quits [Client Quit] |
09:00:03 | | Bleo182600 quits [Client Quit] |
09:01:28 | | Bleo182600 joins |
10:29:21 | | sec^nd quits [Remote host closed the connection] |
10:29:52 | | sec^nd (second) joins |
10:34:39 | | Guest92 quits [Ping timeout: 265 seconds] |
11:25:20 | | angenieux quits [Quit: The Lounge - https://thelounge.chat] |
11:25:55 | | angenieux (angenieux) joins |
11:47:58 | | icedice (icedice) joins |
12:05:28 | | linuxgemini4 (linuxgemini) joins |
12:08:17 | | linuxgemini quits [Ping timeout: 272 seconds] |
12:08:17 | | linuxgemini4 is now known as linuxgemini |
12:19:34 | | Darken2 (Darken) joins |
12:23:37 | | Darken quits [Ping timeout: 255 seconds] |
12:23:42 | <imer> | "On April 10th, 2024 the cell data downloads will be deleted and will no longer be available. " DELETED? (re mozilla location services) |
12:27:21 | | Darken2 quits [Read error: Connection reset by peer] |
12:27:44 | | Darken2 (Darken) joins |
12:52:39 | <PredatorIWD> | imer: Ran the downloads page https://location.services.mozilla.com/downloads through IA with Save outlinks on and it actually got First archive on most links but someone should still check it since that can miss crawling some links from the page. |
12:53:18 | <imer> | it might not grab the larger downloads properly I think? |
12:53:39 | <PredatorIWD> | Is there any surefire way to save a page like this on IA other than the basic web.archive.org/save UI? |
12:54:02 | <imer> | archivebot, I'm sure someone will run it through |
12:54:16 | <PredatorIWD> | imer: I manually entered the 2 big downloads it missed as well, might have missed some smaller ones also |
12:54:56 | <Barto> | i've thrown https://location.services.mozilla.com/ into archivebot |
12:55:10 | <imer> | thanks! |
13:19:13 | | Arcorann quits [Ping timeout: 272 seconds] |
13:46:59 | | decky quits [Read error: Connection reset by peer] |
13:47:25 | | decky joins |
13:47:50 | | Bleo182600 quits [Client Quit] |
13:48:08 | | Bleo182600 joins |
13:49:55 | | Guest41 joins |
13:55:14 | | Guest41 quits [Client Quit] |
14:14:01 | | pabs quits [Remote host closed the connection] |
14:18:07 | | linuxgemini quits [Ping timeout: 272 seconds] |
14:32:19 | | f_ quits [Ping timeout: 255 seconds] |
14:32:34 | | f_ (funderscore) joins |
14:55:08 | | pabs (pabs) joins |
14:59:44 | | Wohlstand (Wohlstand) joins |
15:01:03 | | Darken (Darken) joins |
15:04:16 | | Darken2 quits [Ping timeout: 255 seconds] |
15:51:33 | | linuxgemini (linuxgemini) joins |
15:58:59 | | Darken quits [Read error: Connection reset by peer] |
15:59:22 | | Darken (Darken) joins |
17:13:37 | | wickedplayer494 quits [Remote host closed the connection] |
17:16:36 | | wickedplayer494 joins |
17:16:59 | | wickedplayer494 is now authenticated as wickedplayer494 |
17:20:11 | | wickedplayer494 quits [Remote host closed the connection] |
17:22:13 | | wickedplayer494 joins |
17:25:42 | | wickedplayer494 quits [Remote host closed the connection] |
17:26:21 | | wickedplayer494 joins |
17:28:37 | | Darken2 (Darken) joins |
17:31:01 | | wickedplayer494 quits [Remote host closed the connection] |
17:32:46 | | Darken quits [Ping timeout: 255 seconds] |
17:42:11 | | lexikiq joins |
17:48:49 | | wickedplayer494 joins |
17:49:01 | | wickedplayer494 is now authenticated as wickedplayer494 |
17:56:52 | | wickedplayer494 quits [Remote host closed the connection] |
18:09:21 | | wickedplayer494 joins |
18:09:30 | | wickedplayer494 is now authenticated as wickedplayer494 |
18:14:39 | | Darken2 quits [Read error: Connection reset by peer] |
18:15:00 | | Darken2 (Darken) joins |
18:36:47 | | wickedplayer494 quits [Remote host closed the connection] |
18:39:41 | | DLoader quits [Ping timeout: 272 seconds] |
18:40:34 | | qinplusdmi joins |
18:52:39 | | DLoader (DLoader) joins |
19:08:55 | | h3ndr1k_ quits [Ping timeout: 265 seconds] |
19:11:18 | | h3ndr1k (h3ndr1k) joins |
19:21:29 | | qinplusdmi quits [Ping timeout: 265 seconds] |
19:38:49 | | wickedplayer494 joins |
19:39:04 | | wickedplayer494 is now authenticated as wickedplayer494 |
19:46:08 | | leo60228 quits [Client Quit] |
19:46:14 | | jacksonchen666 (jacksonchen666) joins |
19:52:20 | | leo60228 (leo60228) joins |
19:54:21 | | leo60228 quits [Client Quit] |
19:57:07 | | leo60228 (leo60228) joins |
20:04:06 | | Wohlstand quits [Client Quit] |
20:05:22 | | linuxgemini quits [Client Quit] |
20:21:04 | | nicolas17 quits [Ping timeout: 255 seconds] |
20:22:15 | | leo60228 quits [Client Quit] |
20:25:08 | | leo60228 (leo60228) joins |
20:25:48 | | qwertyasdfuiopghjkl quits [Write error: Broken pipe] |
20:25:59 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
20:32:07 | | nicolas17 joins |
20:36:12 | | icedice quits [Client Quit] |
20:46:19 | <eightthree> | are there projects for repeatedly backing up job posting sites (and maybe marketplaces/classifieds like gumtree) since those tend to be deleted fairly quickly, not sure if archive.org is anything close to thorough at keeping copies... |
20:47:18 | <@JAA> | If you have a good list of such sites/pages, we could throw them into #//'s thing. |
20:56:15 | | qwertyasdfuiopghjkl quits [Client Quit] |
21:03:05 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
21:35:09 | | BlueMaxima joins |
22:07:53 | | linuxgemini (linuxgemini) joins |
22:12:43 | | wickedplayer494 quits [Remote host closed the connection] |
22:25:36 | | atphoenix_ quits [Remote host closed the connection] |
22:26:17 | | atphoenix_ (atphoenix) joins |
22:28:57 | | archivist99 quits [Ping timeout: 272 seconds] |
22:36:23 | | atphoenix_ quits [Remote host closed the connection] |
22:37:00 | | atphoenix_ (atphoenix) joins |
22:56:27 | | blue_0000ff quits [Read error: Connection reset by peer] |
22:56:58 | | blue_0000ff joins |
22:59:47 | <eightthree> | JAA: they are often geographically limited, how much do I need to parse these and come up with a deduplicated, sorted list, also carving out subsections of host/domains that are job specific (i.e. the format of urls for jobs in hackernews, linkedin, etc?): https://en.wikipedia.org/wiki/List_of_employment_websites |
22:59:51 | <eightthree> | https://en.wikipedia.org/wiki/.jobs |
22:59:51 | <eightthree> | https://github.com/lukasz-madon/awesome-remote-job |
22:59:51 | <eightthree> | https://github.com/hugo53/awesome-RemoteWork |
22:59:51 | <eightthree> | https://github.com/zenika-open-source/awesome-remote-work |
22:59:51 | <eightthree> | https://github.com/engineerapart/TheRemoteFreelancer |
22:59:52 | <eightthree> | https://github.com/remoteintech/remote-jobs (seems like a list of employer websites, not likely frequent posted and deleted content like jobs) |
22:59:53 | <eightthree> | https://github.com/lukasz-madon/awesome-remote-job?tab=readme-ov-file#job-boards-aggregators |
22:59:54 | <eightthree> | https://github.com/lukasz-madon/awesome-remote-job?tab=readme-ov-file#job-boards |
23:00:55 | <@JAA> | Ideally, you'd compile a list and create a PR against https://github.com/ArchiveTeam/urls-sources . |
23:03:11 | <eightthree> | ok, this wont be done anytime in the next week from me, but if someone wants to start PRing I welcome it |
23:04:46 | <eightthree> | I was actually surprised this hasnt been done yet by archiveteam, I was hoping to find where to find the archives! (currently jobsearching and a little in hurry to find income) |
23:11:21 | | Arcorann (Arcorann) joins |
23:29:58 | <nicolas17> | eightthree: why archive them? historical data analysis? |
23:31:06 | <nicolas17> | for actual job search you'd only care about recent posts... |
23:39:44 | | atphoenix_ quits [Remote host closed the connection] |
23:40:26 | | atphoenix_ (atphoenix) joins |
23:47:25 | | pixel leaves |
23:49:33 | <eightthree> | nicolas17: so there's urgent searching and then there's "how long to wait and how infrequently does "dream job" or "rare desirable item for sale" show up in jobsites/classifieds/marketplaces. I was noticing a lot of "this post disappeared" type messages, getting confused and annoyed and feeling like this was worth publicly archiving... |
23:50:57 | <nicolas17> | (what's a "dream job", I don't understand) |
23:51:00 | <nicolas17> | (half joking) |