00:00:36etnguyen03 quits [Client Quit]
00:14:03Island joins
00:31:32Chris5010 quits [Quit: Ping timeout (120 seconds)]
00:31:50Chris5010 (Chris5010) joins
00:33:46ljcool2006_ joins
00:36:05utulien joins
00:36:18ljcool2006 quits [Ping timeout: 250 seconds]
00:39:51etnguyen03 (etnguyen03) joins
00:55:44utulien quits [Client Quit]
01:03:59beardicus (beardicus) joins
01:08:22beardicus quits [Ping timeout: 250 seconds]
01:10:58nicolas17 quits [Ping timeout: 250 seconds]
01:11:46<LddPotato>Are the rsync targets limited per IP?
01:11:46<LddPotato>I have been doing some traffic shaping, using a tunnel which has multiple IPs to fetch data for projects, and doing the uploads straight over the connection instead of using the tunnel, resulting in all uploads coming from a single IP.
01:15:40nicolas17 joins
01:20:56utulien joins
01:30:39<@imer>LddPotato: no
01:33:01beardicus (beardicus) joins
01:48:08BornOn420 quits [Remote host closed the connection]
01:48:36BornOn420 (BornOn420) joins
01:49:06charlotte_ quits [Ping timeout: 250 seconds]
01:49:18beardicus quits [Ping timeout: 260 seconds]
02:00:08beardicus (beardicus) joins
02:07:55Wohlstand quits [Remote host closed the connection]
02:21:24StarletCharlotte joins
02:22:51Shyy quits [Quit: The Lounge - https://thelounge.chat]
02:30:36sec^nd quits [Remote host closed the connection]
02:30:37SootBector quits [Write error: Broken pipe]
02:30:58SootBector (SootBector) joins
02:30:58sec^nd (second) joins
02:34:48Shyy joins
02:41:38Shyy quits [Client Quit]
02:43:01Shyy joins
02:43:36Wohlstand (Wohlstand) joins
02:47:03utulien quits [Ping timeout: 260 seconds]
02:53:51StarletCharlotte quits [Remote host closed the connection]
02:54:05StarletCharlotte joins
02:55:51StarletCharlotte quits [Remote host closed the connection]
02:56:07StarletCharlotte joins
03:24:10etnguyen03 quits [Client Quit]
03:24:58qwertyasdfuiopghjkl2 quits [Ping timeout: 260 seconds]
03:25:10tzt quits [Read error: Connection reset by peer]
03:25:46tzt (tzt) joins
03:35:11etnguyen03 (etnguyen03) joins
03:40:33qwertyasdfuiopghjkl2 joins
03:40:59qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
03:50:06etnguyen03 quits [Remote host closed the connection]
03:55:49LEHall joins
03:56:21<LEHall>Hello channel... is this the right place to ask about crawling a site?
03:57:15<@OrIdow6>Yes LEHall
03:58:24beastbg8_ joins
03:59:41<pabs>LEHall: which site? is it shutting down?
04:00:20<LEHall>Great, thank you OrIdow6 and @pabs!
04:01:19<LEHall>I'm working on an effort to preserve the history of immersive art, and the site I'd like to crawl is an auction site with props and things from the production of Sleep No More in New York City, which just shut down after 14 years: https://sleepnomoreauction.com/
04:01:53<LEHall>It might be a little tricky because each item pops up a little module with its description? I'm not sure. But would appreciate any tips in how to best capture it all
04:02:18beastbg8__ quits [Ping timeout: 260 seconds]
04:02:26<LEHall>The auction closes on 30 Jan also, so it's a little time sensitive in that way
04:03:35<pabs>running it in http://archivebot.com/ now
04:03:46<LEHall>Ahh, wonderful! Thank you so much
04:04:22<pabs>looks like it isn't going to work though since in a non-JS browser, you see only "You need to enable JavaScript to run this app."
04:04:53<@OrIdow6>We could make a list of URLs
04:05:05<@OrIdow6>Or try it in JSEater even though that doesn't go into the WBM yet?
04:05:09<pabs>we also have a WIP JS-enabled browser-based thing going in #jseater but it doesn't have recursive mode yet and is still WIP
04:06:07<pabs>so really the only option is start network devtools in your browser, click on every page and try to figure out all the API calls etc, and export that to a URL list, which we can then save
04:07:06<@OrIdow6>Looks like it uses a lot of WS
04:07:12<pabs>hmm, the site seems broken for me too
04:07:17<@OrIdow6>Works for me pabs
04:07:43<pabs>I get some CORS Failed when I click on Top Deals etc
04:07:55<@OrIdow6>WS and POST
04:07:56<@imer>are those post requests to fetch details? gross.
04:08:02<@imer>beat me to it lol
04:09:32<@OrIdow6>If we want to run something that does POST we can at least enumerate? all the IDs at https://auctionsoftware.net/mobileapi/getprodetails
04:09:52<@OrIdow6>Won't play back but at least that seems to be the most crucial data
04:10:03<@OrIdow6>... I think
04:11:03<pabs>cloudfront image bucket isn't enumerable :/
04:11:59<pabs>ugh the websocket is still streaming data but the page isn't changing...
04:12:32<pabs>oh wow, its just sending the current server time over and over agian
04:14:09<pabs>oh wow, the pagination data in json is HTML with JavaScript
04:20:23<pabs>LEHall: so yeah, this is going to be hard. I don't have time for it myself, hopefully others do
04:22:07<pokechu22>It looks like each one has an ID like https://sleepnomoreauction.com/search?product=939928 but yeah, doing that properly will be hard :/
04:22:34<pokechu22>(via the "copy to clipboard" icon on the item details)
04:23:51cow_2001 quits [Quit: ✡]
04:26:07<pabs>those pages POST to https://auctionsoftware.net/mobileapi/getprodetails
04:27:10<pokechu22>Yeah, but those ones are at least something that could be thrown into archive.today (though that's still a less optimal result)
04:29:39cow_2001 joins
04:29:58<LEHall>@pabs and everyone, Thanks for checking it out! I'll cross my fingers that someone can chip away at it. In the meantime I will try and at least grab pdfs of them all x_x lol
04:31:25HP_Archivist quits [Read error: Connection reset by peer]
04:31:41HP_Archivist (HP_Archivist) joins
04:35:22<LEHall>For future reference, if there are sites later on that I want to crawl, is popping in here and asking for an assist the best method?
04:36:21<pabs>yeah, especially anything that works without JS, since we can crawl that with ArchiveBot and it will end up in web.archive.org
04:37:39<LEHall>Rad, thank you so much!
04:37:46<LEHall>I'll be back :)
04:46:15lennier2 joins
04:48:04lennier2_ quits [Ping timeout: 250 seconds]
04:55:52beardicus quits [Ping timeout: 250 seconds]
04:57:18LEHall quits [Client Quit]
05:07:47beardicus (beardicus) joins
05:32:28qwertyasdfuiopghjkl2 (qwertyasdfuiopghjkl2) joins
05:44:50beardicus quits [Ping timeout: 250 seconds]
06:22:27<pokechu22>https://transfer.archivete.am/3F93O/auction.io_sleepnomoreauction.com_process.py - https://transfer.archivete.am/5ox5T/sleepnomoreauction.com_urls.txt. But archivebot can't do the POSTs itself. I've printed the relevant POST data with the URLs in that list but we'll need something else to record the data (which includes the description of every item). Images at least will be
06:22:28<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/3F93O/auction.io_sleepnomoreauction.com_process.py https://transfer.archivete.am/inline/5ox5T/sleepnomoreauction.com_urls.txt.
06:22:29<pokechu22>saved
06:38:26<pabs>https://www.omgubuntu.co.uk/2025/01/ubuntu-devs-matrix-switch
06:43:21<pokechu22>ok, no, that script didn't work quite right; I'm not downloading images properly
06:44:16<@JAA>So after https://www.dslreports.com/ went dark a little while ago, there is now finally a notice up saying it's 'on vacation'. Will keep an eye on whether it returns.
07:26:37nulldata quits [Quit: Ping timeout (120 seconds)]
07:27:40nulldata (nulldata) joins
07:41:22Wohlstand quits [Remote host closed the connection]
07:41:27Wohlstand1 (Wohlstand) joins
07:46:18Wohlstand1 quits [Ping timeout: 260 seconds]
08:22:51michaelblob joins
08:30:24Wohlstand (Wohlstand) joins
08:40:50Jake (Jake) joins
08:41:06michaelblob quits [Read error: Connection reset by peer]
09:15:22loug8318142 joins
09:37:43Wohlstand quits [Ping timeout: 260 seconds]
09:40:23PredatorIWD25 quits [Read error: Connection reset by peer]
09:42:13ducky quits [Ping timeout: 260 seconds]
09:45:24PredatorIWD25 joins
09:52:11ducky (ducky) joins
10:27:32Island quits [Read error: Connection reset by peer]
12:00:02Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat]
12:02:50Bleo18260072271962345 joins
12:30:58pabs quits [Ping timeout: 260 seconds]
12:34:11SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
12:34:42SkilledAlpaca418962 joins
12:43:00^ quits [Ping timeout: 250 seconds]
13:04:17BornOn420 quits [Remote host closed the connection]
13:04:45BornOn420 (BornOn420) joins
13:07:28^ (^) joins
13:13:23beardicus (beardicus) joins
13:18:13beardicus quits [Ping timeout: 260 seconds]
13:18:53beardicus (beardicus) joins
13:25:48beardicus quits [Ping timeout: 260 seconds]
13:27:48beardicus (beardicus) joins
13:31:47BornOn420_ (BornOn420) joins
13:32:02BornOn420 quits [Remote host closed the connection]
13:32:18pabs (pabs) joins
13:35:42nstrom joins
13:52:12StarletCharlotte quits [Remote host closed the connection]
13:54:16StarletCharlotte joins
14:05:28beardicus quits [Ping timeout: 260 seconds]
14:07:46BornOn420_ quits [Remote host closed the connection]
14:08:55BornOn420 (BornOn420) joins
14:25:19beardicus (beardicus) joins
14:28:12Dango360 (Dango360) joins
14:31:29Webuser342759 joins
14:32:27Webuser342759 quits [Client Quit]
14:37:14BornOn420 quits [Remote host closed the connection]
14:37:39BornOn420 (BornOn420) joins
15:14:51VerifiedJ quits [Remote host closed the connection]
15:15:35VerifiedJ (VerifiedJ) joins
15:33:07Dango360 quits [Read error: Connection reset by peer]
15:39:09Dango360 (Dango360) joins
15:48:11earl joins
15:56:18beardicus quits [Ping timeout: 260 seconds]
15:59:04beardicus (beardicus) joins
15:59:44linuxgemini quits [Ping timeout: 250 seconds]
16:00:31linuxgemini (linuxgemini) joins
16:11:13Wohlstand (Wohlstand) joins
16:11:37scurvy_duck joins
16:15:01Overlordz quits [Remote host closed the connection]
16:15:33beardicus quits [Ping timeout: 260 seconds]
16:21:03beardicus (beardicus) joins
17:35:28beardicus quits [Ping timeout: 260 seconds]
17:51:15beardicus (beardicus) joins
19:04:52Juest (Juest) joins
19:55:28nicolas17 quits [Ping timeout: 260 seconds]
19:59:24nicolas17 joins
20:50:05nstrom quits [Quit: Ooops, wrong browser tab.]
20:50:17pixel leaves [Error from remote client]
21:03:43beardicus quits [Ping timeout: 260 seconds]
21:05:17earl quits []
21:16:54Dango360 quits [Read error: Connection reset by peer]
21:17:33TastyWiener959 (TastyWiener95) joins
21:21:13Dango360 (Dango360) joins
21:21:49beardicus (beardicus) joins
21:42:43Island joins
21:43:44scurvy_duck quits [Quit: Leaving]
21:48:21TastyWiener959 quits [Client Quit]
21:49:32TastyWiener95 (TastyWiener95) joins
21:59:43beardicus quits [Ping timeout: 260 seconds]
22:05:35etnguyen03 (etnguyen03) joins
22:20:06beardicus (beardicus) joins
22:27:26BlueMaxima joins
22:30:03beardicus quits [Ping timeout: 260 seconds]
22:50:25onetruth quits [Read error: Connection reset by peer]
22:51:33entrox quits [Quit: Ping timeout (120 seconds)]
22:51:57onetruth joins
22:51:59lukash98 quits [Quit: Ping timeout (120 seconds)]
22:51:59TastyWiener95 quits [Client Quit]
22:53:24TastyWiener95 (TastyWiener95) joins
22:54:04entrox joins
22:55:53onetruth quits [Read error: Connection reset by peer]
22:57:19TastyWiener95 quits [Client Quit]
22:57:39leo60228 quits [Quit: ZNC 1.8.2 - https://znc.in]
23:02:02leo60228 (leo60228) joins
23:02:38onetruth joins
23:04:22TastyWiener95 (TastyWiener95) joins
23:06:31katocala quits [Remote host closed the connection]
23:06:41Matthww quits [Quit: Ping timeout (120 seconds)]
23:06:46etnguyen03 quits [Client Quit]
23:07:02Matthww joins
23:24:39pedantic-darwin joins
23:48:24driib9 quits [Quit: The Lounge - https://thelounge.chat]
23:49:03driib9 (driib) joins