00:00:45 | | Matthww quits [Client Quit] |
00:03:21 | | Matthww joins |
00:03:56 | | Matthww quits [Remote host closed the connection] |
00:13:21 | | etnguyen03 (etnguyen03) joins |
00:18:03 | | loug8318142 quits [Quit: The Lounge - https://thelounge.chat] |
00:26:53 | | etnguyen03 quits [Client Quit] |
00:38:10 | | HP_Archivist quits [Quit: Leaving] |
00:43:50 | | beardicus (beardicus) joins |
01:01:33 | | etnguyen03 (etnguyen03) joins |
01:15:08 | | beardicus quits [Ping timeout: 260 seconds] |
01:23:19 | | beardicus (beardicus) joins |
01:33:19 | | hackbug quits [Remote host closed the connection] |
01:34:12 | | hackbug (hackbug) joins |
01:48:39 | | nine quits [Quit: See ya!] |
01:48:52 | | nine joins |
01:48:52 | | nine is now authenticated as nine |
01:48:52 | | nine quits [Changing host] |
01:48:52 | | nine (nine) joins |
02:07:03 | | beardicus quits [Ping timeout: 260 seconds] |
02:22:56 | | beardicus (beardicus) joins |
02:30:48 | | sec^nd quits [Remote host closed the connection] |
02:30:52 | | SootBector quits [Remote host closed the connection] |
02:30:53 | | HP_Archivist (HP_Archivist) joins |
02:31:08 | | sec^nd (second) joins |
02:31:11 | | SootBector (SootBector) joins |
02:31:59 | | StarletCharlotte quits [Read error: Connection reset by peer] |
02:33:01 | <pokechu22> | Do we have any good way of recording a bunch of POST requests with known data and URLs into a WARC? I generated a list of those for https://sleepnomoreauction.com/ yesterday, but I'm only able to save the images with archivebot since everything else is POST (note that they also require an origin header, and possibly some other headers) |
02:42:08 | | etnguyen03 quits [Client Quit] |
02:49:44 | <TheTechRobo> | Should be theoretically easy with qwarc from my previous research, but /me doesn't currently have time to write a spec file |
02:50:43 | | pabs quits [Read error: Connection reset by peer] |
02:52:05 | <@OrIdow6> | Could also be done with wget-at |
02:52:17 | | graham9 joins |
02:52:25 | | pabs (pabs) joins |
02:53:06 | | etnguyen03 (etnguyen03) joins |
02:58:10 | <@JAA> | Yeah, easy with qwarc. |
02:59:18 | <pabs> | -feed/#hackernews- Society for Technical Communication to permanently close its doors https://www.stc.org/ https://news.ycombinator.com/item?id=42867324 |
03:01:47 | <h2ibot> | OrIdow6 edited Niconico (+432, /* Nico Nico Seiga */ Nico Nico Shunga has done…): https://wiki.archiveteam.org/?diff=54287&oldid=54264 |
03:03:16 | <TheTechRobo> | AttributeError: module 'asyncio' has no attribute 'coroutine'. Did you mean: 'coroutines'? |
03:03:16 | <TheTechRobo> | What version of Python should I use for qwarc? |
03:03:43 | <TheTechRobo> | (I'm on the latest commit in the 0.2 branch) |
03:04:38 | <@JAA> | Hmm yeah, that was removed in 3.11. |
03:04:58 | <@JAA> | FWIW, that isn't used in qwarc's code, so it'll be from a dependency. |
03:05:06 | <@JAA> | Probably the ancient aiohttp. |
03:05:09 | <TheTechRobo> | Ah, yeah, aiohttp |
03:06:26 | <@JAA> | The aiohttp code is ugly because it doesn't expose the raw HTTP traffic, so it hard-depends on that ancient version. |
03:06:56 | <@JAA> | I usually run my things under 3.6, but I know 3.8 works fine. Not sure about newer ones. |
03:06:56 | <TheTechRobo> | On 3.9 I get |
03:06:56 | <TheTechRobo> | class CeilTimeout(Timeout): |
03:06:57 | <TheTechRobo> | TypeError: function() argument 'code' must be code, not str |
03:07:01 | <TheTechRobo> | Also in aiohttp. |
03:07:12 | <@JAA> | Not in async-timeout? |
03:07:25 | <TheTechRobo> | /home/thetechrobo/qwarc/venv/lib/python3.9/site-packages/aiohttp/helpers.py |
03:07:36 | <@JAA> | Hmm yeah, I guess that's where the error happens. |
03:07:40 | <@JAA> | You need async-timeout==3.0.1. |
03:07:49 | <h2ibot> | OrIdow6 edited Web Roasting (+283, Explain what it is a bit): https://wiki.archiveteam.org/?diff=54288&oldid=30443 |
03:08:09 | <@JAA> | https://github.com/aio-libs/aiohttp/issues/6320 |
03:08:11 | | pie_ quits [] |
03:09:43 | <nicolas17> | let's build our own library |
03:09:54 | <nicolas17> | with h11, blackjack and hookers |
03:10:02 | <@JAA> | That's the plan, yes. |
03:10:52 | <TheTechRobo> | Can I make pyenv rebuild the sqlite part of python without removing and reinstalling the entire version? Turns out I didn't have the sqlite headers installed when I installed 3.9. |
03:11:36 | <@JAA> | This was never intended to be long-lived. Remember that qwarc in its current form is basically the code I wrote for one specific project years ago, repackaged into something somewhat reusable. |
03:11:56 | <@JAA> | TheTechRobo: As far as I know, no. |
03:12:40 | <@JAA> | qwarc also used to use warcio. I ripped that out in record time when I discovered its intentional data mangling. |
03:12:53 | <@JAA> | So now it's bespoke custom WARC-writing code. |
03:12:54 | <TheTechRobo> | I have wondered, are WARCs made by warcio still in the WBM? |
03:13:06 | <TheTechRobo> | Not just for qwarc, but also for other things |
03:13:34 | <@JAA> | Replacing that is at the top of my qwarc todo list, hence pywarc. |
03:13:54 | <@JAA> | I'm sure there's warcio data in the WBM, yeah. |
03:15:06 | <TheTechRobo> | Are the old qwarc grabs still in the WBM? |
03:15:16 | <@JAA> | I believe so. |
03:17:50 | <h2ibot> | TheTechRobo edited Qwarc (+248, Add dependency information): https://wiki.archiveteam.org/?diff=54289&oldid=53904 |
03:18:36 | <nicolas17> | optane10 is on fire |
03:27:42 | <nicolas17> | optane10 is consistently returning "max connections -1" on youtube, and "connection refused" on blogger |
03:36:53 | <h2ibot> | PaulWise edited ArchiveBot/Ignore (+30, better facebook/instagram ignore): https://wiki.archiveteam.org/?diff=54290&oldid=54271 |
03:37:12 | <@JAA> | That's been mentioned in the project channels, yes. |
03:39:54 | <h2ibot> | PaulWise edited ArchiveBot/Ignore (+193, add wordpress junk): https://wiki.archiveteam.org/?diff=54291&oldid=54290 |
03:39:55 | <h2ibot> | PaulWise edited ArchiveBot/Ignore (+2, ignore trailing / too): https://wiki.archiveteam.org/?diff=54292&oldid=54291 |
03:40:54 | <h2ibot> | PaulWise edited ArchiveBot/Ignore (+6, pinterest ignore other language subdomains too): https://wiki.archiveteam.org/?diff=54293&oldid=54292 |
03:43:04 | <TheTechRobo> | JAA: I assume in the generate(cls) function, whatever I queue has to be a string? |
03:43:39 | <@JAA> | TheTechRobo: Yes |
03:43:56 | <@JAA> | Also, ensure there are no dupes. |
03:58:07 | | graham9 quits [Client Quit] |
03:58:40 | <TheTechRobo> | Does qwarc write to stdout? |
04:01:25 | | pixel (pixel) joins |
04:04:18 | | pixel leaves |
04:04:22 | | pixel (pixel) joins |
04:04:31 | <@JAA> | TheTechRobo: Only if your spec file does. |
04:05:03 | <@JAA> | qwarc on its own, no. |
04:08:42 | | Wohlstand quits [Quit: Wohlstand] |
04:10:06 | | etnguyen03 quits [Remote host closed the connection] |
04:11:12 | <@JAA> | (I do sometimes output things on FD 3 or similar for scripting around qwarc.) |
04:13:54 | | ljcool2006_ quits [Quit: Leaving] |
04:33:55 | <TheTechRobo> | AttributeError: type object '_asyncio.Task' has no attribute 'current_task' on Python 3.9 |
04:34:16 | <@JAA> | Welp |
04:35:35 | <@JAA> | Oh yeah, deprecated in 3.7, removed in 3.9. |
04:35:52 | <@JAA> | Again, not used in qwarc, so I bet it's aiohttp. |
04:36:23 | <TheTechRobo> | Yup |
04:37:25 | <TheTechRobo> | You said pywarc will be provide an API for HTTP requests/responses, right? I assume it'll also do weird things to aiohttp? |
04:39:55 | <TheTechRobo> | Er, this might be a stupid question, but is there a way to override qwarc's user agent? You can set one in `headers`, but then you'll have two. |
04:43:07 | <h2ibot> | TheTechRobo edited Qwarc (+52): https://wiki.archiveteam.org/?diff=54294&oldid=54289 |
04:45:03 | <@JAA> | No, pywarc won't use aiohttp. It'll probably h11 with sync and async wrappers. |
04:46:25 | <@JAA> | Heh, there's been a todo comment in the code since 2019 about header overriding. |
04:47:43 | <TheTechRobo> | I'll take that as a no then. :-) |
04:47:44 | <@JAA> | The default headers are stored in the item's `headers` attribute. You can manipulate that from `__init__`, for example (*after* the `super().__init__` call). |
04:47:56 | <TheTechRobo> | spoke too soon |
04:48:11 | <@JAA> | E.g. `def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs); self.headers = []` |
05:02:38 | | beardicus quits [Ping timeout: 260 seconds] |
05:08:52 | <TheTechRobo> | I like how I said I didn't have time to write a spec file, then proceeded to spend two hours writing my first one. |
05:08:56 | <TheTechRobo> | Procrastination is fun. lol |
05:11:28 | | beardicus (beardicus) joins |
05:14:16 | <TheTechRobo> | If it's useful to anyone, this pretty much just takes a list of URL + body data + HTTP verb, and requests it all. No retries, but there's a JSON log with the status code to stdout. https://transfer.archivete.am/inline/103umH/yoink.py |
05:21:41 | | BlueMaxima quits [Read error: Connection reset by peer] |
05:29:51 | <pokechu22> | I don't have the ability to upload WARCs that end up in WBM (though I guess for a warc of POSTs that's not relevant, but the data in question is all of the URLs with # in them in https://transfer.archivete.am/inline/TnzuJ/sleepnomoreauction.com_urls_2.txt and the headers from line 50 of https://transfer.archivete.am/twnvK/auction.io_sleepnomoreauction.com_process_2.py |
05:29:52 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/twnvK/auction.io_sleepnomoreauction.com_process_2.py |
05:34:08 | <TheTechRobo> | !remindme 8h do thing |
05:34:09 | <eggdrop> | [remind] ok, i'll remind you at 2025-01-30T13:34:08Z |
05:38:17 | | Webuser884331 joins |
05:38:49 | <Webuser884331> | question.... yahoo breif case |
05:39:58 | | beardicus quits [Ping timeout: 260 seconds] |
05:40:13 | <@OrIdow6> | Webuser884331: We didn't get it, sorry |
05:40:33 | <@OrIdow6> | "...but the warning time was roughly 60 days, which is long by Yahoo standards but hardly ideal for a service up for nearly a decade" per https://wiki.archiveteam.org/index.php/Yahoo!_Briefcase |
05:43:15 | <Webuser884331> | what about someones hotmail |
05:43:36 | <Webuser884331> | my mum died |
05:43:54 | <Webuser884331> | i want any photos she saved |
05:46:17 | <@JAA> | I think that was before AT even existed, although only barely. |
06:20:18 | | Webuser884331 quits [Client Quit] |
06:23:20 | <@JAA> | Actually, not quite. AT emerged in January 2009, domain registration on 2009-01-06. I thought it was a bit later that year. |
06:54:17 | | Dango360 quits [Read error: Connection reset by peer] |
07:55:09 | | pabs quits [Read error: Connection reset by peer] |
07:55:37 | | pabs (pabs) joins |
08:30:25 | | SootBector quits [Remote host closed the connection] |
08:30:42 | | SootBector (SootBector) joins |
08:33:09 | | ` |
08:40:08 | | Emitewiki joins |
08:40:24 | <Emitewiki> | Anything we can do about this, or is it outside our purview/already done? https://bsky.app/profile/bobpony.com/post/3lgvxot2kos2j |
08:41:04 | <pabs> | "Microsoft will be removing the downloads for old Windows Themes in the future." |
08:41:10 | <pabs> | https://support.microsoft.com/windows/windows-themes-94880287-6046-1d35-6d2f-35dee759701e |
08:42:17 | <pabs> | looks like it will work in AB |
08:42:51 | <Emitewiki> | Sweet. |
08:45:19 | <pabs> | seems to be working, but some of the themes are already 404, including from a browser |
08:45:46 | <Emitewiki> | 💀 |
08:46:19 | <pabs> | it likely can't save the Windows Store stuff, which is behind a weird link ms-windows-store://collection/?collectionid=WindowsThemes |
08:52:05 | <Emitewiki> | Dang. Any way for us to manually do some shenanagins to save it? |
08:54:19 | <pabs> | reading the page again, that part isn't in danger, just the direct links, which are being saved |
08:55:09 | <pabs> | should be on archive.org in a few days |
08:59:47 | <Emitewiki> | Ah, you're right. Cool cool, thanks |
08:59:48 | <Emitewiki> | ! |
09:31:14 | | Island quits [Read error: Connection reset by peer] |
09:56:35 | | scurvy_duck joins |
10:00:32 | <Emitewiki> | Anyone mind sending this through AB? The dev is starting to delist some of their games from stores, and this usually precludes a website shutdown, so I just want to be extra safe. https://www.catsoulstudios.com/ |
10:03:16 | <that_lurker> | sure |
10:03:47 | <Emitewiki> | Thanks. |
10:03:51 | <that_lurker> | is there a news article about that somewhere? |
10:04:25 | <Emitewiki> | It's an announcement on their Steam game notices. |
10:04:34 | <Emitewiki> | So, like, within the Steam interface itself. |
10:04:43 | <that_lurker> | aa ok |
10:06:46 | | Emitewiki quits [Client Quit] |
11:30:15 | | Webuser220882 joins |
11:30:26 | | Webuser220882 quits [Client Quit] |
11:34:13 | | PotatoProton01 joins |
11:51:00 | | PotatoProton01 quits [Client Quit] |
12:00:03 | | Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat] |
12:02:49 | | Bleo18260072271962345 joins |
12:13:29 | | lennier2_ joins |
12:16:03 | | lennier2 quits [Ping timeout: 260 seconds] |
12:16:46 | | icedice (icedice) joins |
12:24:59 | | pie_ (pie_) joins |
12:28:33 | | pie_ quits [Client Quit] |
12:30:14 | | pie_ (pie_) joins |
12:30:21 | | pie_ quits [Client Quit] |
12:30:33 | | pie_ (pie_) joins |
12:35:20 | | SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962] |
12:35:36 | | etnguyen03 (etnguyen03) joins |
12:35:52 | | SkilledAlpaca418962 joins |
13:07:02 | | beardicus (beardicus) joins |
13:17:35 | | iram quits [Quit: ~] |
13:18:05 | | iram joins |
13:26:24 | | Naruyoko5 joins |
13:27:36 | | Naruyoko quits [Ping timeout: 250 seconds] |
13:27:48 | | scurvy_duck quits [Ping timeout: 260 seconds] |
13:34:08 | <eggdrop> | [remind] TheTechRobo: do thing |
14:04:38 | | Wohlstand (Wohlstand) joins |
14:06:49 | <TheTechRobo> | pokechu22: So an example request might be a POST to https://auctionsoftware.net/mobileapi/fetchLocation with the body {"countryCode": 62} ? |
14:07:20 | <TheTechRobo> | Is there any link extraction needed or do they just have to be grabbed? |
14:10:42 | | katocala joins |
14:10:59 | | katocala is now authenticated as katocala |
14:25:20 | | hexa- quits [Quit: WeeChat 4.4.3] |
14:26:27 | | hexa- (hexa-) joins |
14:36:50 | | BornOn420 quits [Remote host closed the connection] |
14:37:23 | | BornOn420 (BornOn420) joins |