00:00:36 | | etnguyen03 quits [Client Quit] |
00:14:03 | | Island joins |
00:31:32 | | Chris5010 quits [Quit: Ping timeout (120 seconds)] |
00:31:50 | | Chris5010 (Chris5010) joins |
00:33:46 | | ljcool2006_ joins |
00:36:05 | | utulien joins |
00:36:18 | | ljcool2006 quits [Ping timeout: 250 seconds] |
00:39:51 | | etnguyen03 (etnguyen03) joins |
00:55:44 | | utulien quits [Client Quit] |
01:03:59 | | beardicus (beardicus) joins |
01:08:22 | | beardicus quits [Ping timeout: 250 seconds] |
01:10:58 | | nicolas17 quits [Ping timeout: 250 seconds] |
01:11:46 | <LddPotato> | Are the rsync targets limited per IP? |
01:11:46 | <LddPotato> | I have been doing some traffic shaping, using a tunnel which has multiple IPs to fetch data for projects, and doing the uploads straight over the connection instead of using the tunnel, resulting in all uploads coming from a single IP. |
01:15:40 | | nicolas17 joins |
01:15:44 | | nicolas17 is now authenticated as nicolas17 |
01:20:56 | | utulien joins |
01:30:39 | <@imer> | LddPotato: no |
01:33:01 | | beardicus (beardicus) joins |
01:48:08 | | BornOn420 quits [Remote host closed the connection] |
01:48:36 | | BornOn420 (BornOn420) joins |
01:49:06 | | charlotte_ quits [Ping timeout: 250 seconds] |
01:49:18 | | beardicus quits [Ping timeout: 260 seconds] |
02:00:08 | | beardicus (beardicus) joins |
02:07:55 | | Wohlstand quits [Remote host closed the connection] |
02:21:24 | | StarletCharlotte joins |
02:22:51 | | Shyy quits [Quit: The Lounge - https://thelounge.chat] |
02:30:36 | | sec^nd quits [Remote host closed the connection] |
02:30:37 | | SootBector quits [Write error: Broken pipe] |
02:30:58 | | SootBector (SootBector) joins |
02:30:58 | | sec^nd (second) joins |
02:34:48 | | Shyy joins |
02:41:38 | | Shyy quits [Client Quit] |
02:43:01 | | Shyy joins |
02:43:36 | | Wohlstand (Wohlstand) joins |
02:47:03 | | utulien quits [Ping timeout: 260 seconds] |
02:53:51 | | StarletCharlotte quits [Remote host closed the connection] |
02:54:05 | | StarletCharlotte joins |
02:55:51 | | StarletCharlotte quits [Remote host closed the connection] |
02:56:07 | | StarletCharlotte joins |
03:24:10 | | etnguyen03 quits [Client Quit] |
03:24:58 | | qwertyasdfuiopghjkl2 quits [Ping timeout: 260 seconds] |
03:25:10 | | tzt quits [Read error: Connection reset by peer] |
03:25:46 | | tzt (tzt) joins |
03:35:11 | | etnguyen03 (etnguyen03) joins |
03:40:33 | | qwertyasdfuiopghjkl2 joins |
03:40:33 | | qwertyasdfuiopghjkl2 is now authenticated as qwertyasdfuiopghjkl2 |
03:40:59 | | qwertyasdfuiopghjkl2 quits [Max SendQ exceeded] |
03:50:06 | | etnguyen03 quits [Remote host closed the connection] |
03:55:49 | | LEHall joins |
03:56:21 | <LEHall> | Hello channel... is this the right place to ask about crawling a site? |
03:57:15 | <@OrIdow6> | Yes LEHall |
03:58:24 | | beastbg8_ joins |
03:59:41 | <pabs> | LEHall: which site? is it shutting down? |
04:00:20 | <LEHall> | Great, thank you OrIdow6 and @pabs! |
04:01:19 | <LEHall> | I'm working on an effort to preserve the history of immersive art, and the site I'd like to crawl is an auction site with props and things from the production of Sleep No More in New York City, which just shut down after 14 years: https://sleepnomoreauction.com/ |
04:01:53 | <LEHall> | It might be a little tricky because each item pops up a little module with its description? I'm not sure. But would appreciate any tips in how to best capture it all |
04:02:18 | | beastbg8__ quits [Ping timeout: 260 seconds] |
04:02:26 | <LEHall> | The auction closes on 30 Jan also, so it's a little time sensitive in that way |
04:03:35 | <pabs> | running it in http://archivebot.com/ now |
04:03:46 | <LEHall> | Ahh, wonderful! Thank you so much |
04:04:22 | <pabs> | looks like it isn't going to work though since in a non-JS browser, you see only "You need to enable JavaScript to run this app." |
04:04:53 | <@OrIdow6> | We could make a list of URLs |
04:05:05 | <@OrIdow6> | Or try it in JSEater even though that doesn't go into the WBM yet? |
04:05:09 | <pabs> | we also have a WIP JS-enabled browser-based thing going in #jseater but it doesn't have recursive mode yet and is still WIP |
04:06:07 | <pabs> | so really the only option is start network devtools in your browser, click on every page and try to figure out all the API calls etc, and export that to a URL list, which we can then save |
04:07:06 | <@OrIdow6> | Looks like it uses a lot of WS |
04:07:12 | <pabs> | hmm, the site seems broken for me too |
04:07:17 | <@OrIdow6> | Works for me pabs |
04:07:43 | <pabs> | I get some CORS Failed when I click on Top Deals etc |
04:07:55 | <@OrIdow6> | WS and POST |
04:07:56 | <@imer> | are those post requests to fetch details? gross. |
04:08:02 | <@imer> | beat me to it lol |
04:09:32 | <@OrIdow6> | If we want to run something that does POST we can at least enumerate? all the IDs at https://auctionsoftware.net/mobileapi/getprodetails |
04:09:52 | <@OrIdow6> | Won't play back but at least that seems to be the most crucial data |
04:10:03 | <@OrIdow6> | ... I think |
04:11:03 | <pabs> | cloudfront image bucket isn't enumerable :/ |
04:11:59 | <pabs> | ugh the websocket is still streaming data but the page isn't changing... |
04:12:32 | <pabs> | oh wow, its just sending the current server time over and over agian |
04:14:09 | <pabs> | oh wow, the pagination data in json is HTML with JavaScript |
04:20:23 | <pabs> | LEHall: so yeah, this is going to be hard. I don't have time for it myself, hopefully others do |
04:22:07 | <pokechu22> | It looks like each one has an ID like https://sleepnomoreauction.com/search?product=939928 but yeah, doing that properly will be hard :/ |
04:22:34 | <pokechu22> | (via the "copy to clipboard" icon on the item details) |
04:23:51 | | cow_2001 quits [Quit: ✡] |
04:26:07 | <pabs> | those pages POST to https://auctionsoftware.net/mobileapi/getprodetails |
04:27:10 | <pokechu22> | Yeah, but those ones are at least something that could be thrown into archive.today (though that's still a less optimal result) |
04:29:39 | | cow_2001 joins |
04:29:58 | <LEHall> | @pabs and everyone, Thanks for checking it out! I'll cross my fingers that someone can chip away at it. In the meantime I will try and at least grab pdfs of them all x_x lol |
04:31:25 | | HP_Archivist quits [Read error: Connection reset by peer] |
04:31:41 | | HP_Archivist (HP_Archivist) joins |
04:35:22 | <LEHall> | For future reference, if there are sites later on that I want to crawl, is popping in here and asking for an assist the best method? |
04:36:21 | <pabs> | yeah, especially anything that works without JS, since we can crawl that with ArchiveBot and it will end up in web.archive.org |
04:37:39 | <LEHall> | Rad, thank you so much! |
04:37:46 | <LEHall> | I'll be back :) |
04:46:15 | | lennier2 joins |
04:48:04 | | lennier2_ quits [Ping timeout: 250 seconds] |
04:55:52 | | beardicus quits [Ping timeout: 250 seconds] |
04:57:18 | | LEHall quits [Client Quit] |
05:07:47 | | beardicus (beardicus) joins |
05:32:28 | | qwertyasdfuiopghjkl2 (qwertyasdfuiopghjkl2) joins |
05:44:50 | | beardicus quits [Ping timeout: 250 seconds] |
06:22:27 | <pokechu22> | https://transfer.archivete.am/3F93O/auction.io_sleepnomoreauction.com_process.py - https://transfer.archivete.am/5ox5T/sleepnomoreauction.com_urls.txt. But archivebot can't do the POSTs itself. I've printed the relevant POST data with the URLs in that list but we'll need something else to record the data (which includes the description of every item). Images at least will be |
06:22:28 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/3F93O/auction.io_sleepnomoreauction.com_process.py https://transfer.archivete.am/inline/5ox5T/sleepnomoreauction.com_urls.txt. |
06:22:29 | <pokechu22> | saved |
06:38:26 | <pabs> | https://www.omgubuntu.co.uk/2025/01/ubuntu-devs-matrix-switch |
06:43:21 | <pokechu22> | ok, no, that script didn't work quite right; I'm not downloading images properly |
06:44:16 | <@JAA> | So after https://www.dslreports.com/ went dark a little while ago, there is now finally a notice up saying it's 'on vacation'. Will keep an eye on whether it returns. |
07:26:37 | | nulldata quits [Quit: Ping timeout (120 seconds)] |
07:27:40 | | nulldata (nulldata) joins |
07:41:22 | | Wohlstand quits [Remote host closed the connection] |
07:41:27 | | Wohlstand1 (Wohlstand) joins |
07:46:18 | | Wohlstand1 quits [Ping timeout: 260 seconds] |
08:22:51 | | michaelblob joins |
08:23:23 | | michaelblob is now authenticated as michaelblob |
08:30:24 | | Wohlstand (Wohlstand) joins |
08:40:50 | | Jake (Jake) joins |
08:41:06 | | michaelblob quits [Read error: Connection reset by peer] |
09:15:22 | | loug8318142 joins |
09:37:43 | | Wohlstand quits [Ping timeout: 260 seconds] |
09:40:23 | | PredatorIWD25 quits [Read error: Connection reset by peer] |
09:42:13 | | ducky quits [Ping timeout: 260 seconds] |
09:45:24 | | PredatorIWD25 joins |
09:52:11 | | ducky (ducky) joins |
10:27:32 | | Island quits [Read error: Connection reset by peer] |
12:00:02 | | Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat] |
12:02:50 | | Bleo18260072271962345 joins |
12:30:58 | | pabs quits [Ping timeout: 260 seconds] |
12:34:11 | | SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962] |
12:34:42 | | SkilledAlpaca418962 joins |
12:43:00 | | ^ quits [Ping timeout: 250 seconds] |
13:04:17 | | BornOn420 quits [Remote host closed the connection] |
13:04:45 | | BornOn420 (BornOn420) joins |
13:07:28 | | ^ (^) joins |
13:13:23 | | beardicus (beardicus) joins |
13:18:13 | | beardicus quits [Ping timeout: 260 seconds] |
13:18:53 | | beardicus (beardicus) joins |
13:25:48 | | beardicus quits [Ping timeout: 260 seconds] |
13:27:48 | | beardicus (beardicus) joins |
13:31:47 | | BornOn420_ (BornOn420) joins |
13:32:02 | | BornOn420 quits [Remote host closed the connection] |
13:32:18 | | pabs (pabs) joins |
13:35:42 | | nstrom joins |
13:52:12 | | StarletCharlotte quits [Remote host closed the connection] |
13:54:16 | | StarletCharlotte joins |
14:05:28 | | beardicus quits [Ping timeout: 260 seconds] |
14:07:46 | | BornOn420_ quits [Remote host closed the connection] |
14:08:55 | | BornOn420 (BornOn420) joins |
14:25:19 | | beardicus (beardicus) joins |
14:28:12 | | Dango360 (Dango360) joins |
14:31:29 | | Webuser342759 joins |
14:32:27 | | Webuser342759 quits [Client Quit] |
14:37:14 | | BornOn420 quits [Remote host closed the connection] |
14:37:39 | | BornOn420 (BornOn420) joins |
15:14:51 | | VerifiedJ quits [Remote host closed the connection] |
15:15:35 | | VerifiedJ (VerifiedJ) joins |
15:33:07 | | Dango360 quits [Read error: Connection reset by peer] |
15:39:09 | | Dango360 (Dango360) joins |
15:48:11 | | earl joins |
15:56:18 | | beardicus quits [Ping timeout: 260 seconds] |
15:59:04 | | beardicus (beardicus) joins |
15:59:44 | | linuxgemini quits [Ping timeout: 250 seconds] |
16:00:31 | | linuxgemini (linuxgemini) joins |
16:11:13 | | Wohlstand (Wohlstand) joins |
16:11:37 | | scurvy_duck joins |
16:15:01 | | Overlordz quits [Remote host closed the connection] |
16:15:33 | | beardicus quits [Ping timeout: 260 seconds] |
16:21:03 | | beardicus (beardicus) joins |
17:35:28 | | beardicus quits [Ping timeout: 260 seconds] |
17:51:15 | | beardicus (beardicus) joins |
19:04:52 | | Juest (Juest) joins |
19:55:28 | | nicolas17 quits [Ping timeout: 260 seconds] |
19:59:24 | | nicolas17 joins |
20:50:05 | | nstrom quits [Quit: Ooops, wrong browser tab.] |
20:50:17 | | pixel leaves [Error from remote client] |
21:03:43 | | beardicus quits [Ping timeout: 260 seconds] |
21:05:17 | | earl quits [] |
21:16:54 | | Dango360 quits [Read error: Connection reset by peer] |
21:17:33 | | TastyWiener959 (TastyWiener95) joins |
21:21:13 | | Dango360 (Dango360) joins |
21:21:49 | | beardicus (beardicus) joins |
21:42:43 | | Island joins |
21:43:44 | | scurvy_duck quits [Quit: Leaving] |
21:48:21 | | TastyWiener959 quits [Client Quit] |
21:49:32 | | TastyWiener95 (TastyWiener95) joins |
21:59:43 | | beardicus quits [Ping timeout: 260 seconds] |
22:05:35 | | etnguyen03 (etnguyen03) joins |
22:20:06 | | beardicus (beardicus) joins |
22:27:26 | | BlueMaxima joins |
22:30:03 | | beardicus quits [Ping timeout: 260 seconds] |
22:50:25 | | onetruth quits [Read error: Connection reset by peer] |
22:51:33 | | entrox quits [Quit: Ping timeout (120 seconds)] |
22:51:57 | | onetruth joins |
22:51:59 | | lukash98 quits [Quit: Ping timeout (120 seconds)] |
22:51:59 | | TastyWiener95 quits [Client Quit] |
22:53:24 | | TastyWiener95 (TastyWiener95) joins |
22:54:04 | | entrox joins |
22:55:53 | | onetruth quits [Read error: Connection reset by peer] |
22:57:19 | | TastyWiener95 quits [Client Quit] |
22:57:39 | | leo60228 quits [Quit: ZNC 1.8.2 - https://znc.in] |
23:02:02 | | leo60228 (leo60228) joins |
23:02:38 | | onetruth joins |
23:04:22 | | TastyWiener95 (TastyWiener95) joins |
23:06:31 | | katocala quits [Remote host closed the connection] |
23:06:41 | | Matthww quits [Quit: Ping timeout (120 seconds)] |
23:06:46 | | etnguyen03 quits [Client Quit] |
23:07:02 | | Matthww joins |
23:24:39 | | pedantic-darwin joins |
23:48:24 | | driib9 quits [Quit: The Lounge - https://thelounge.chat] |
23:49:03 | | driib9 (driib) joins |