| 00:03:29 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 00:04:17 | | AmAnd0A joins |
| 00:05:09 | | Void0 (Void0) joins |
| 00:05:56 | | DLoader_ joins |
| 00:07:10 | | Void0 quits [Client Quit] |
| 00:07:10 | | DLoader quits [Ping timeout: 252 seconds] |
| 00:07:17 | | DLoader_ is now known as DLoader |
| 00:12:07 | | Arcorann (Arcorann) joins |
| 00:14:58 | | fullpwnmedia joins |
| 00:15:35 | <fullpwnmedia> | Hey! Me again. Just checking up on the status of that Windows Update url list I sent. |
| 00:18:50 | <fullpwnmedia> | JAA did you chuck it into ArchiveBot? |
| 00:22:56 | | le0n_ quits [Ping timeout: 265 seconds] |
| 00:24:01 | | icedice (icedice) joins |
| 00:26:35 | | icedice quits [Client Quit] |
| 00:26:41 | | BigBrain quits [Ping timeout: 245 seconds] |
| 00:41:29 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 00:41:38 | | le0n (le0n) joins |
| 00:41:45 | | AmAnd0A joins |
| 00:55:47 | | fullpwnmedia quits [Remote host closed the connection] |
| 01:35:37 | | imer0 (imer) joins |
| 01:36:53 | | imer quits [Ping timeout: 265 seconds] |
| 01:36:53 | | imer0 is now known as imer |
| 01:53:24 | | systwi__ is now known as systwi |
| 01:55:57 | | fullpwn quits [Read error: Connection reset by peer] |
| 01:56:13 | | fullpwn joins |
| 02:07:00 | | xkey quits [Quit: xkey] |
| 02:12:31 | | BlueMaxima joins |
| 02:49:36 | | sec^nd quits [Ping timeout: 245 seconds] |
| 03:05:13 | <tomodachi94> | More useragents for ArchiveBot, yay!!! https://github.com/ArchiveTeam/ArchiveBot/pull/556 |
| 03:05:32 | | dumbgoy_ quits [Ping timeout: 252 seconds] |
| 03:22:35 | <@JAA> | fullpwnmedia: I didn't, I was going to keep this list in my stash for the software binaries project I've been meaning to launch. That requires software that doesn't exist yet, and there's no ETA for it. Does Microsoft typically get rid of these files? I thought they kept them for quite a long time. |
| 03:36:55 | | Shjosan quits [Quit: Am sleepy (-, – )…zzzZZZ] |
| 03:37:33 | | Shjosan (Shjosan) joins |
| 03:45:57 | | Sluggs quits [Excess Flood] |
| 03:46:21 | | Sluggs joins |
| 03:59:07 | | railen63 quits [Remote host closed the connection] |
| 04:00:00 | | aGerman quits [Quit: The Lounge - https://thelounge.chat] |
| 04:00:01 | | treora quits [Quit: blub blub.] |
| 04:01:31 | | treora joins |
| 04:03:00 | | aGerman (aGerman) joins |
| 04:04:30 | | Dango360 quits [Read error: Connection reset by peer] |
| 04:05:28 | | railen63 joins |
| 04:06:28 | | railen63 quits [Remote host closed the connection] |
| 04:06:42 | | Dango360 (Dango360) joins |
| 04:06:43 | | railen63 joins |
| 04:16:10 | | railen63 quits [Remote host closed the connection] |
| 04:19:13 | | railen63 joins |
| 04:20:10 | | railen63 quits [Remote host closed the connection] |
| 04:20:23 | | railen63 joins |
| 04:32:16 | | decky_e quits [Ping timeout: 252 seconds] |
| 04:32:36 | | decky_e (decky_e) joins |
| 04:42:00 | | decky_e quits [Ping timeout: 265 seconds] |
| 04:42:40 | | sonick (sonick) joins |
| 04:42:45 | | decky_e (decky_e) joins |
| 04:46:08 | <sonick> | It was announced in an email newsletter that https://technote.ipros.jp/ will end on June 15, 2023. |
| 04:46:44 | <sonick> | It seems large enough to run in AB. |
| 04:55:05 | | nicolas17 quits [Client Quit] |
| 04:55:24 | | nicolas17 joins |
| 05:23:18 | | xkey (xkey) joins |
| 05:57:24 | | systwi_ quits [Quit: systwi_] |
| 05:57:24 | | nothere quits [Quit: Leaving] |
| 05:58:47 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 06:07:20 | | Island quits [Read error: Connection reset by peer] |
| 06:08:33 | | systwi_ joins |
| 06:08:48 | | nothere joins |
| 06:17:49 | | BigBrain (bigbrain) joins |
| 07:06:16 | | bf_ joins |
| 07:36:23 | | Ivan226 joins |
| 08:38:17 | | fuzzy8021 quits [Read error: Connection reset by peer] |
| 09:27:32 | | Minkafighter quits [Quit: The Lounge - https://thelounge.chat] |
| 09:27:48 | | Minkafighter joins |
| 09:40:21 | | bloom joins |
| 09:53:54 | | driib quits [Quit: The Lounge - https://thelounge.chat] |
| 09:54:17 | | driib (driib) joins |
| 09:57:00 | | bloom quits [Remote host closed the connection] |
| 10:05:21 | | thehedgeh0g quits [Remote host closed the connection] |
| 10:05:22 | | shreyasminocha quits [Remote host closed the connection] |
| 10:05:22 | | evan quits [Remote host closed the connection] |
| 10:06:31 | | evan joins |
| 10:06:34 | | thehedgeh0g (mrHedgehog0) joins |
| 10:06:34 | | shreyasminocha (shreyasminocha) joins |
| 10:07:09 | | AK quits [Quit: AK] |
| 10:13:20 | <that_lurker> | Would it be possible to do a crawl of http://porn.serverbear.com/ it's the blog of serverbear that closed in 2016 and the site is slowly deteriorating. If the crawl speed can be slowed down that would be best as load times are longish. |
| 10:13:35 | <that_lurker> | Has some cool stuff in it https://twitter.com/mikko_2013/status/1664534090324869120 |
| 10:14:37 | <that_lurker> | most photos of the site are at least "safe" on tumblr |
| 10:24:30 | | decky_e quits [Remote host closed the connection] |
| 10:28:29 | | icedice (icedice) joins |
| 10:29:16 | | icedice2 (icedice) joins |
| 10:52:43 | | AK (AK) joins |
| 11:01:18 | | JohnnyJ quits [Client Quit] |
| 11:01:59 | | JohnnyJ joins |
| 11:05:46 | | Ruthalas5 quits [Ping timeout: 265 seconds] |
| 11:09:27 | | Ruthalas5 (Ruthalas) joins |
| 11:33:34 | | CreaZyp154 joins |
| 11:37:25 | <CreaZyp154> | I've got a few ideas to avoid redirection loops: 1, save redirections directly instead of queuing them. 2, warrior follows redirect to see if there's a loop before queuing. 3, add an url parameter for the api endpoint for a redirection history, maybe hashed so that long urls don't cause issue |
| 11:39:43 | <CreaZyp154> | or maybe just a redirection count for the last one |
| 11:44:33 | | fuzzy8021 (fuzzy8021) joins |
| 11:54:36 | | icedice2 quits [Client Quit] |
| 11:54:56 | | icedice2 (icedice) joins |
| 12:00:41 | | Ruthalas5 quits [Ping timeout: 252 seconds] |
| 12:35:43 | | Ruthalas5 (Ruthalas) joins |
| 12:38:20 | | BigBrain quits [Remote host closed the connection] |
| 12:39:35 | | BigBrain (bigbrain) joins |
| 12:41:57 | | Ruthalas5 quits [Ping timeout: 265 seconds] |
| 12:47:35 | | sec^nd (second) joins |
| 12:59:33 | | geezabiscuit quits [Read error: Connection reset by peer] |
| 12:59:54 | | geezabiscuit (geezabiscuit) joins |
| 13:06:05 | | Ruthalas5 (Ruthalas) joins |
| 13:07:33 | | geezabiscuit quits [Read error: Connection reset by peer] |
| 13:07:42 | | geezabiscuit (geezabiscuit) joins |
| 13:13:18 | | geezabiscuit quits [Ping timeout: 252 seconds] |
| 13:13:33 | | railen63 quits [Remote host closed the connection] |
| 13:16:28 | | railen63 joins |
| 13:17:28 | | railen63 quits [Remote host closed the connection] |
| 13:17:42 | | railen63 joins |
| 13:26:02 | | geezabiscuit (geezabiscuit) joins |
| 13:34:59 | | katocala quits [Remote host closed the connection] |
| 14:00:58 | | HP_Archivist quits [Client Quit] |
| 14:08:52 | | hitgrr8 joins |
| 14:41:20 | | CreaZyp154 quits [Ping timeout: 265 seconds] |
| 15:00:50 | | lennier1 quits [Read error: Connection reset by peer] |
| 15:01:09 | | lennier1 (lennier1) joins |
| 15:02:27 | | HP_Archivist (HP_Archivist) joins |
| 15:25:27 | | HP_Archivist quits [Client Quit] |
| 15:34:51 | | za3k joins |
| 15:35:08 | | Island joins |
| 15:39:37 | | katocala joins |
| 15:41:17 | | katocala is now authenticated as katocala |
| 15:44:25 | | Arcorann quits [Remote host closed the connection] |
| 15:50:38 | | Arcorann (Arcorann) joins |
| 16:10:56 | | Arcorann quits [Read error: Connection reset by peer] |
| 16:19:44 | | dumbgoy_ joins |
| 16:21:32 | | hitgrr8 quits [Client Quit] |
| 16:56:02 | | Matthww1 quits [Ping timeout: 252 seconds] |
| 16:57:48 | | Matthww1 joins |
| 17:08:28 | <that_lurker> | Would it be possible to do a crawl of http://porn.serverbear.com/ it's a blog of serverbear that has some cool computer history and whatnot. serverbear closed in 2016 and the site is slowly deteriorating. If the crawl speed can be slowed down that would be best as load times are longish. |
| 17:09:14 | <that_lurker> | The site has some cool stuff in it like this https://twitter.com/mikko_2013/status/1664534090324869120 and at least most of the images on the site are on tumblr so they are "safe" |
| 17:19:21 | | hitgrr8 joins |
| 17:24:21 | <pokechu22> | that_lurker: It looks like it was saved back in 2016: https://archive.fart.website/archivebot/viewer/job/khrwx - is there new content since then? |
| 17:24:24 | | nicolas17 quits [Remote host closed the connection] |
| 17:24:43 | | nicolas17 joins |
| 17:25:17 | <that_lurker> | pokechu22: Most likely not. Thanks did not know it was archived |
| 17:26:16 | <pokechu22> | Looking at view-source:https://porn.serverbear.com/ it does seem like that's a tumblr-based blog, but it does seem a lot laggier compared to most of those (e.g. https://just-shower-thoughts.com) which is odd |
| 17:27:49 | <that_lurker> | most likely using some old code that causes javascript loops |
| 17:28:17 | <that_lurker> | though some pages refuse to load completely |
| 17:31:03 | | decky_e (decky_e) joins |
| 17:43:47 | <klg> | it is tumblr and the tumblr part seems to work normally to me, but they have a bunch of assets in their theme from outside of tumblr, like that blocking javascript from blog.serverbear.com which timeouts for me; but anyway no new posts since 2013 |
| 17:46:38 | | decky_e quits [Ping timeout: 252 seconds] |
| 17:47:11 | | decky_e (decky_e) joins |
| 17:59:06 | | flashfire42 quits [Read error: Connection reset by peer] |
| 17:59:06 | | s-crypt quits [Read error: Connection reset by peer] |
| 17:59:06 | | Ryz2 quits [Remote host closed the connection] |
| 17:59:06 | | kiska quits [Read error: Connection reset by peer] |
| 17:59:18 | | Ryz2 (Ryz) joins |
| 17:59:19 | | s-crypt (s-crypt) joins |
| 17:59:24 | | flashfire42 (flashfire42) joins |
| 18:00:37 | | kiska (kiska) joins |
| 18:12:52 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 18:14:08 | | decky_e quits [Ping timeout: 252 seconds] |
| 18:25:56 | | spirit quits [Client Quit] |
| 18:58:58 | <nicolas17> | JAA: I want to make POST requests and save the request and response in WARCs, qwarc only supports GET right? |
| 18:59:32 | <@JAA> | nicolas17: qwarc supports GET, POST, and HEAD. |
| 19:00:27 | <nicolas17> | oh I see, write_client_response is writing the request body too :) |
| 19:00:52 | <@JAA> | Remember to use 0.2.6 or higher, not the master branch. |
| 19:01:53 | <nicolas17> | the github repo is outdated btw |
| 19:02:19 | | decky_e (decky_e) joins |
| 19:03:43 | <@JAA> | Yep, the proper repo is linked there though. |
| 19:03:57 | <nicolas17> | why did you get rid of warcio? |
| 19:04:10 | <@JAA> | Because warcio is buggy and shouldn't be used for any WARC writing. |
| 19:04:37 | <nicolas17> | warcio's own docs sound... self-confident :P |
| 19:04:40 | <@JAA> | https://github.com/webrecorder/warcio/issues/created_by/JustAnotherArchivist |
| 19:09:14 | <fireonlive> | leave it to "Wario" to be evil |
| 19:09:20 | <fireonlive> | (or 'bad') |
| 19:10:46 | <@JAA> | The fact that lots of development happens at webrecorder but these bugs in the core library that corrupt data are getting ignored means that I can't recommend enough against using any webrecorder software for archival. It's fine for playback though. |
| 19:11:48 | <fireonlive> | are they a 'big company' or 'startup' or is it just sort of a group or people |
| 19:11:55 | <fireonlive> | s/or/of/ |
| 19:13:11 | <nicolas17> | sooo how do I use qwarc? :P |
| 19:13:20 | <@hook54321> | fireonlive: non-profit iirc |
| 19:13:44 | | decky_e quits [Ping timeout: 252 seconds] |
| 19:13:58 | <@JAA> | nicolas17: *two heavy breaths* Good luck! *hangs up* |
| 19:14:03 | <fireonlive> | ahh ok |
| 19:14:19 | <nicolas17> | this invalidates your "warcio does <undocumented thing>" issues! /s |
| 19:14:27 | <@JAA> | :-P |
| 19:14:28 | | decky_e (decky_e) joins |
| 19:14:31 | <fireonlive> | x3 |
| 19:15:20 | <fireonlive> | JAA reads RFCs recreationatly; amazing |
| 19:15:34 | <fireonlive> | big thumbs up from me |
| 19:15:47 | <@hook54321> | one of the people involved with Rhizome's web archiving ethics conference was also angry that an ArchiveTeam member was archiving public social media |
| 19:15:47 | <@JAA> | Yeah, I care too much about standards and compliance, I guess. |
| 19:15:58 | <fireonlive> | they are fun! |
| 19:16:08 | <@JAA> | I also tried to implement an IRC client based on the RFCs. Yeah, that didn't go well... |
| 19:16:32 | <fireonlive> | haha |
| 19:16:40 | <fireonlive> | reminds me of trying to make my own 'services' |
| 19:16:41 | <nicolas17> | I started making a Wireshark dissector for Matter before realizing what "the spec is 900 pages" really means |
| 19:16:48 | <fireonlive> | linking up with unrealircd however long ago |
| 19:17:18 | <fireonlive> | was fun in either case though |
| 19:17:53 | <TheTechRobo> | hook54321: do you have a video link to that conference? |
| 19:20:10 | <nicolas17> | okay warcio it is then /s |
| 19:20:51 | <@hook54321> | TheTechRobo: https://invidious.snopyta.org/channel/UCxT4WqoDaO3B_Hvhr6rpB6Q |
| 19:21:01 | <@hook54321> | unsure if that's all of the talks or not |
| 19:21:35 | <@hook54321> | wait that's a different conference i think |
| 19:21:57 | | Megame (Megame) joins |
| 19:22:19 | <@JAA> | nicolas17: qwarc is self-documenting in that it copies the code into the meta WARC. You can find examples in my IA uploads. |
| 19:22:48 | <@JAA> | (There's also a mechanism to copy further dependencies beyond just the spec file itself.) |
| 19:23:43 | <fireonlive> | we don't need no documentation |
| 19:23:48 | <fireonlive> | we don't need no thought control |
| 19:24:01 | <fireonlive> | 🎶 |
| 19:24:26 | <@JAA> | :-) |
| 19:24:44 | | za3k quits [Remote host closed the connection] |
| 19:24:53 | <fireonlive> | :D |
| 19:29:08 | | za3k joins |
| 19:30:26 | | BigBrain quits [Ping timeout: 245 seconds] |
| 19:49:41 | | sonick quits [Client Quit] |
| 20:12:51 | | za3k quits [Remote host closed the connection] |
| 20:38:44 | <icedice2> | https://old.reddit.com/r/Piracy/comments/13yglih/stop_crying_wipe_your_tears_introducing_nqrarbg/?sort=top |
| 20:38:46 | <icedice2> | Based |
| 20:39:02 | | icedice2 quits [Client Quit] |
| 20:39:23 | | icedice2 (icedice) joins |
| 20:39:35 | | icedice2 quits [Remote host closed the connection] |
| 20:47:16 | <icedice> | Fuck, they have a Discord server |
| 20:47:24 | <icedice> | That's going to fuck them over eventually |
| 20:49:11 | | icedice quits [Client Quit] |
| 20:49:31 | <Terbium> | Yep... |
| 20:49:43 | <Terbium> | also they have cloudflare, that's going to be a pain to archive |
| 20:58:57 | | nickofni1 is now known as nickofnicks |
| 20:59:49 | | icedice (icedice) joins |
| 21:00:45 | | Unholy2361 quits [Quit: The Lounge - https://thelounge.chat] |
| 21:01:27 | | Unholy2361 (Unholy2361) joins |
| 21:04:05 | | Dango360 quits [Ping timeout: 252 seconds] |
| 21:04:22 | <masterX244> | s/cloudflare/buttflare/g |
| 21:04:36 | | Dango360 (Dango360) joins |
| 21:11:47 | <nicolas17> | how the heck does reddit allow this content? |
| 21:14:57 | | BigBrain (bigbrain) joins |
| 21:16:27 | <fireonlive> | links to torrent trackers? |
| 21:17:17 | <fireonlive> | interesting load balancing.. https://whatever -> https://s<n>.whatever |
| 21:18:06 | <fireonlive> | ..it's done in javascript |
| 21:18:53 | <fireonlive> | https://transfer.archivete.am/W0YBg/ngrarbg.txt |
| 21:18:57 | <fireonlive> | that's... a way |
| 21:19:36 | <nicolas17> | fireonlive: a whole r/Piracy subreddit, there *has* to be content in there that can't be justified with "well it's actually just a link to a search engine yadda yadda" |
| 21:19:51 | | lunik173 quits [Remote host closed the connection] |
| 21:20:00 | <fireonlive> | ahh |
| 21:20:06 | <fireonlive> | https://old.reddit.com/r/Piracy/comments/13yglih/stop_crying_wipe_your_tears_introducing_nqrarbg/jmnip44/?context=1 |
| 21:20:11 | <fireonlive> | oops |
| 21:27:44 | <fireonlive> | https://github.com/Not-Quite-RARBG/api/commits/main/search.php ; ah |
| 21:27:54 | | za3k joins |
| 21:27:58 | | spirit joins |
| 21:30:11 | <Terbium> | i don't that fixes the problem... |
| 21:32:04 | <systwi_> | Glad to see RARBG is getting the love it deserves. :-) |
| 21:33:53 | <fireonlive> | Terbium: yeahhh......... |
| 21:34:25 | <fireonlive> | 'ok PDO..' 'oh like that?' |
| 21:34:27 | <fireonlive> | 'what?' |
| 21:37:02 | <joepie91|m> | what is this, 2012? |
| 21:37:17 | <fireonlive> | ikr |
| 21:37:23 | <fireonlive> | i was half expecting to see mysql_query |
| 21:38:16 | <joepie91|m> | so like, these people are on Discord, putting their code on Github, and it contains a 2010s-era SQLi |
| 21:38:21 | <fireonlive> | 's3' from their 'load balancing algorithm' just redirects back to apex (after about 20 years) |
| 21:38:30 | <fireonlive> | oh it also leaks their real server hostname, oops |
| 21:38:34 | <joepie91|m> | I can't help but get a "literal 13 year olds with no experience" vibe from this |
| 21:38:36 | <Terbium> | *Today on Code Review with AT*: We'll review how not to protect against SQL Injection in your PHP Code |
| 21:38:42 | <joepie91|m> | which is Bad News |
| 21:38:48 | <joepie91|m> | (also for them) |
| 21:39:21 | <fireonlive> | i don't want to say it in this logged chat |
| 21:39:23 | <fireonlive> | but curl -D - https://s3.nq-rarbg.to |
| 21:39:32 | <fireonlive> | and look at the returned 'host:' |
| 21:39:42 | <fireonlive> | make a cup of tea while it's loading |
| 21:40:09 | <fireonlive> | is that their actual behind-the-flare server? |
| 21:40:12 | <fireonlive> | hm |
| 21:42:18 | <fireonlive> | yeah it's some free like netfly/heroku thing |
| 21:42:21 | <fireonlive> | (with paid plans) |
| 21:42:58 | <Terbium> | why not run the site on Github pages at this point :P |
| 21:42:59 | <fireonlive> | (also why is host: in the response header?) |
| 21:42:59 | <masterX244> | odd, no host appearing for me |
| 21:44:52 | <fireonlive> | pm'd |
| 21:45:04 | <Terbium> | you're right, it's probably netlify |
| 21:45:31 | <fireonlive> | look like now that it's 500ing instead of 302ing there's no host header |
| 21:48:02 | <fireonlive> | based on absolutely nothing looks like they're all using the same web host thing |
| 21:48:10 | <fireonlive> | well based on the x-nf-request-id header |
| 21:48:56 | <fireonlive> | unless all their servers so happen to be fronted by netlify :p |
| 21:49:05 | <fireonlive> | and then by cloudflare |
| 22:01:49 | <TheTechRobo> | I cant load their website _at all_. |
| 22:02:08 | <nicolas17> | JAA: qwarc requires aiohttp version exactly 2.3.10, and with that version I can't even "import aiohttp", nor understand the error >_< |
| 22:02:24 | <TheTechRobo> | Oh, s3 works. s2 and s1 seem to be borked. |
| 22:02:47 | <nicolas17> | JAA: https://paste.debian.net/1281844/ |
| 22:03:08 | <nicolas17> | why does it need such a specific version of aiohttp? |
| 22:03:20 | <@JAA> | nicolas17: Ah yeah, that. |
| 22:03:27 | <@JAA> | That's not caused by aiohttp. |
| 22:03:43 | <TheTechRobo> | Oh, great, they *already* shut down their Discord. |
| 22:03:50 | <@JAA> | The aiohttp requirement is because I'm monkeypatching internals to get access to the raw data streams. |
| 22:03:57 | <nicolas17> | ah seems I need to pin to an older async-timeout |
| 22:04:08 | <fireonlive> | s3 just redirects for me |
| 22:04:13 | <@JAA> | But that error is because you need async-timeout==3.0.1. |
| 22:04:23 | <TheTechRobo> | fireonlive: https://s3.nq-rarbg.to/ ? |
| 22:04:47 | <fireonlive> | ye, trying again though |
| 22:05:36 | <fireonlive> | ah, 500s now |
| 22:05:39 | <TheTechRobo> | Eventually s1 and s2 give me empty responses. |
| 22:05:44 | <TheTechRobo> | oh yep, 502 |
| 22:05:46 | <fireonlive> | wonder if someone sqli'd them |
| 22:06:14 | <TheTechRobo> | lol |
| 22:06:25 | <TheTechRobo> | either that or they're running this on a potato that rivals imgur's |
| 22:06:42 | <fireonlive> | anyone here can slide into my DM for the host leak if they wish |
| 22:07:59 | <nicolas17> | $ qwarc --version |
| 22:08:00 | <nicolas17> | qwarc 0.2.8 |
| 22:08:01 | <nicolas17> | yay |
| 22:12:23 | | hitgrr8 quits [Client Quit] |
| 22:16:14 | <nicolas17> | JAA: looking at soundcloud-tracks spec file https://paste.debian.net/1281845/ it seems generate() splits the range 0-679999999 into 10000-sized chunks, then _process splits them again into 200-sized chunks to send the requests? if you can request 200 IDs at a time, why not split into 200-sized chunks since the beginning? |
| 22:19:03 | <@JAA> | nicolas17: I don't remember whether it was an issue on that specific one, but lock contention becomes an issue at high throughput. |
| 22:19:30 | <@JAA> | Each item getting checked out or back into the DB requires a lock. |
| 22:20:13 | <nicolas17> | ah, I don't plan to use high concurrency so can I make process() just run a single fetch? |
| 22:20:40 | <@JAA> | Sure |
| 22:21:22 | <@JAA> | As a rough number, on my old potato that runs most of these crawls, the contention becomes an issue at a couple thousand items per minute. |
| 22:22:08 | <@JAA> | And I frequently do several hundred requests per second, so items with very few requests just aren't going to work there. |
| 22:24:07 | <nicolas17> | so if I understand this correctly, --concurrency changes the number of simultaneous async tasks in one process, if I want to use multiple processes I just run multiple instances with the same spec and db? |
| 22:25:53 | <@JAA> | Correct |
| 22:26:44 | <nicolas17> | but if I want to use multiple computers/IPs I'm on my own? :D |
| 22:27:20 | <@JAA> | Yeah, coordination only works between processes on the same machine. |
| 22:28:02 | <@JAA> | I have rough plans about that, but not sure when that will happen. |
| 22:38:34 | <nicolas17> | fuuuuuuck |
| 22:38:47 | <nicolas17> | aiohttp calls asyncio.Task.current_task |
| 22:39:04 | <nicolas17> | "This method is *deprecated* and will be removed in Python 3.9. Use the asyncio.current_task() function instead." |
| 22:39:25 | <fireonlive> | are you using greater than 3.9? :D |
| 22:39:33 | <nicolas17> | I'm on 3.9.2 |
| 22:39:33 | | fullpwn quits [Read error: Connection reset by peer] |
| 22:39:34 | <@JAA> | Oh yes, I run this under 3.6 or something ancient like that. Haven't had time to maintain it recently. |
| 22:40:12 | | fullpwn joins |
| 22:40:46 | <@JAA> | And it's an ancient aiohttp version, of course. |
| 22:40:59 | <fireonlive> | *pulls out worksfornow stamp* |
| 22:41:02 | <@JAA> | Migrating that to something more recent isn't exactly straightforward since there have been two new major versions since. |
| 22:41:55 | <nicolas17> | can be done slowly :P |
| 22:42:10 | <@JAA> | I've been meaning to either push the aiohttp maintainers to add proper access to the raw data streams or replace it with h11 + own asyncio code. |
| 22:42:21 | <@JAA> | But well, ENOTIME |
| 22:43:07 | <fireonlive> | we gotta clone JAA |
| 22:43:10 | <fireonlive> | we don't have the technology |
| 22:43:37 | <@JAA> | :-) |
| 22:49:08 | | fullpwn quits [Ping timeout: 252 seconds] |
| 22:58:48 | <nicolas17> | seems this works successfully, unless I pass "--concurrency 2", in which case it downloads everything it has to download and exits with asyncio.exceptions.CancelledError (?!) |
| 23:00:30 | <nicolas17> | that was python 3.8, gonna try 3.7... |
| 23:04:44 | <nicolas17> | 3.7 works better |
| 23:07:44 | <nicolas17> | JAA: aww this thing can't re-request stuff and deduplicate with revisit records right? |
| 23:14:05 | | HP_Archivist (HP_Archivist) joins |
| 23:27:30 | | railen63 quits [Remote host closed the connection] |
| 23:27:45 | | railen63 joins |
| 23:27:59 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
| 23:31:39 | | Lord_Nightmare (Lord_Nightmare) joins |
| 23:41:39 | | HP_Archivist quits [Client Quit] |
| 23:46:45 | | lunik173 joins |
| 23:58:16 | | Unholy2361 quits [Remote host closed the connection] |