00:00:45Matthww quits [Client Quit]
00:03:21Matthww joins
00:03:56Matthww quits [Remote host closed the connection]
00:13:21etnguyen03 (etnguyen03) joins
00:18:03loug8318142 quits [Quit: The Lounge - https://thelounge.chat]
00:26:53etnguyen03 quits [Client Quit]
00:38:10HP_Archivist quits [Quit: Leaving]
00:43:50beardicus (beardicus) joins
01:01:33etnguyen03 (etnguyen03) joins
01:15:08beardicus quits [Ping timeout: 260 seconds]
01:23:19beardicus (beardicus) joins
01:33:19hackbug quits [Remote host closed the connection]
01:34:12hackbug (hackbug) joins
01:48:39nine quits [Quit: See ya!]
01:48:52nine joins
01:48:52nine quits [Changing host]
01:48:52nine (nine) joins
02:07:03beardicus quits [Ping timeout: 260 seconds]
02:22:56beardicus (beardicus) joins
02:30:48sec^nd quits [Remote host closed the connection]
02:30:52SootBector quits [Remote host closed the connection]
02:30:53HP_Archivist (HP_Archivist) joins
02:31:08sec^nd (second) joins
02:31:11SootBector (SootBector) joins
02:31:59StarletCharlotte quits [Read error: Connection reset by peer]
02:33:01<pokechu22>Do we have any good way of recording a bunch of POST requests with known data and URLs into a WARC? I generated a list of those for https://sleepnomoreauction.com/ yesterday, but I'm only able to save the images with archivebot since everything else is POST (note that they also require an origin header, and possibly some other headers)
02:42:08etnguyen03 quits [Client Quit]
02:49:44<TheTechRobo>Should be theoretically easy with qwarc from my previous research, but /me doesn't currently have time to write a spec file
02:50:43pabs quits [Read error: Connection reset by peer]
02:52:05<@OrIdow6>Could also be done with wget-at
02:52:17graham9 joins
02:52:25pabs (pabs) joins
02:53:06etnguyen03 (etnguyen03) joins
02:58:10<@JAA>Yeah, easy with qwarc.
02:59:18<pabs>-feed/#hackernews- Society for Technical Communication to permanently close its doors https://www.stc.org/ https://news.ycombinator.com/item?id=42867324
03:01:47<h2ibot>OrIdow6 edited Niconico (+432, /* Nico Nico Seiga */ Nico Nico Shunga has done…): https://wiki.archiveteam.org/?diff=54287&oldid=54264
03:03:16<TheTechRobo>AttributeError: module 'asyncio' has no attribute 'coroutine'. Did you mean: 'coroutines'?
03:03:16<TheTechRobo>What version of Python should I use for qwarc?
03:03:43<TheTechRobo>(I'm on the latest commit in the 0.2 branch)
03:04:38<@JAA>Hmm yeah, that was removed in 3.11.
03:04:58<@JAA>FWIW, that isn't used in qwarc's code, so it'll be from a dependency.
03:05:06<@JAA>Probably the ancient aiohttp.
03:05:09<TheTechRobo>Ah, yeah, aiohttp
03:06:26<@JAA>The aiohttp code is ugly because it doesn't expose the raw HTTP traffic, so it hard-depends on that ancient version.
03:06:56<@JAA>I usually run my things under 3.6, but I know 3.8 works fine. Not sure about newer ones.
03:06:56<TheTechRobo>On 3.9 I get
03:06:56<TheTechRobo> class CeilTimeout(Timeout):
03:06:57<TheTechRobo>TypeError: function() argument 'code' must be code, not str
03:07:01<TheTechRobo>Also in aiohttp.
03:07:12<@JAA>Not in async-timeout?
03:07:25<TheTechRobo>/home/thetechrobo/qwarc/venv/lib/python3.9/site-packages/aiohttp/helpers.py
03:07:36<@JAA>Hmm yeah, I guess that's where the error happens.
03:07:40<@JAA>You need async-timeout==3.0.1.
03:07:49<h2ibot>OrIdow6 edited Web Roasting (+283, Explain what it is a bit): https://wiki.archiveteam.org/?diff=54288&oldid=30443
03:08:09<@JAA>https://github.com/aio-libs/aiohttp/issues/6320
03:08:11pie_ quits []
03:09:43<nicolas17>let's build our own library
03:09:54<nicolas17>with h11, blackjack and hookers
03:10:02<@JAA>That's the plan, yes.
03:10:52<TheTechRobo>Can I make pyenv rebuild the sqlite part of python without removing and reinstalling the entire version? Turns out I didn't have the sqlite headers installed when I installed 3.9.
03:11:36<@JAA>This was never intended to be long-lived. Remember that qwarc in its current form is basically the code I wrote for one specific project years ago, repackaged into something somewhat reusable.
03:11:56<@JAA>TheTechRobo: As far as I know, no.
03:12:40<@JAA>qwarc also used to use warcio. I ripped that out in record time when I discovered its intentional data mangling.
03:12:53<@JAA>So now it's bespoke custom WARC-writing code.
03:12:54<TheTechRobo>I have wondered, are WARCs made by warcio still in the WBM?
03:13:06<TheTechRobo>Not just for qwarc, but also for other things
03:13:34<@JAA>Replacing that is at the top of my qwarc todo list, hence pywarc.
03:13:54<@JAA>I'm sure there's warcio data in the WBM, yeah.
03:15:06<TheTechRobo>Are the old qwarc grabs still in the WBM?
03:15:16<@JAA>I believe so.
03:17:50<h2ibot>TheTechRobo edited Qwarc (+248, Add dependency information): https://wiki.archiveteam.org/?diff=54289&oldid=53904
03:18:36<nicolas17>optane10 is on fire
03:27:42<nicolas17>optane10 is consistently returning "max connections -1" on youtube, and "connection refused" on blogger
03:36:53<h2ibot>PaulWise edited ArchiveBot/Ignore (+30, better facebook/instagram ignore): https://wiki.archiveteam.org/?diff=54290&oldid=54271
03:37:12<@JAA>That's been mentioned in the project channels, yes.
03:39:54<h2ibot>PaulWise edited ArchiveBot/Ignore (+193, add wordpress junk): https://wiki.archiveteam.org/?diff=54291&oldid=54290
03:39:55<h2ibot>PaulWise edited ArchiveBot/Ignore (+2, ignore trailing / too): https://wiki.archiveteam.org/?diff=54292&oldid=54291
03:40:54<h2ibot>PaulWise edited ArchiveBot/Ignore (+6, pinterest ignore other language subdomains too): https://wiki.archiveteam.org/?diff=54293&oldid=54292
03:43:04<TheTechRobo>JAA: I assume in the generate(cls) function, whatever I queue has to be a string?
03:43:39<@JAA>TheTechRobo: Yes
03:43:56<@JAA>Also, ensure there are no dupes.
03:58:07graham9 quits [Client Quit]
03:58:40<TheTechRobo>Does qwarc write to stdout?
04:01:25pixel (pixel) joins
04:04:18pixel leaves
04:04:22pixel (pixel) joins
04:04:31<@JAA>TheTechRobo: Only if your spec file does.
04:05:03<@JAA>qwarc on its own, no.
04:08:42Wohlstand quits [Quit: Wohlstand]
04:10:06etnguyen03 quits [Remote host closed the connection]
04:11:12<@JAA>(I do sometimes output things on FD 3 or similar for scripting around qwarc.)
04:13:54ljcool2006_ quits [Quit: Leaving]
04:33:55<TheTechRobo>AttributeError: type object '_asyncio.Task' has no attribute 'current_task' on Python 3.9
04:34:16<@JAA>Welp
04:35:35<@JAA>Oh yeah, deprecated in 3.7, removed in 3.9.
04:35:52<@JAA>Again, not used in qwarc, so I bet it's aiohttp.
04:36:23<TheTechRobo>Yup
04:37:25<TheTechRobo>You said pywarc will be provide an API for HTTP requests/responses, right? I assume it'll also do weird things to aiohttp?
04:39:55<TheTechRobo>Er, this might be a stupid question, but is there a way to override qwarc's user agent? You can set one in `headers`, but then you'll have two.
04:43:07<h2ibot>TheTechRobo edited Qwarc (+52): https://wiki.archiveteam.org/?diff=54294&oldid=54289
04:45:03<@JAA>No, pywarc won't use aiohttp. It'll probably h11 with sync and async wrappers.
04:46:25<@JAA>Heh, there's been a todo comment in the code since 2019 about header overriding.
04:47:43<TheTechRobo>I'll take that as a no then. :-)
04:47:44<@JAA>The default headers are stored in the item's `headers` attribute. You can manipulate that from `__init__`, for example (*after* the `super().__init__` call).
04:47:56<TheTechRobo>spoke too soon
04:48:11<@JAA>E.g. `def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs); self.headers = []`
05:02:38beardicus quits [Ping timeout: 260 seconds]
05:08:52<TheTechRobo>I like how I said I didn't have time to write a spec file, then proceeded to spend two hours writing my first one.
05:08:56<TheTechRobo>Procrastination is fun. lol
05:11:28beardicus (beardicus) joins
05:14:16<TheTechRobo>If it's useful to anyone, this pretty much just takes a list of URL + body data + HTTP verb, and requests it all. No retries, but there's a JSON log with the status code to stdout. https://transfer.archivete.am/inline/103umH/yoink.py
05:21:41BlueMaxima quits [Read error: Connection reset by peer]
05:29:51<pokechu22>I don't have the ability to upload WARCs that end up in WBM (though I guess for a warc of POSTs that's not relevant, but the data in question is all of the URLs with # in them in https://transfer.archivete.am/inline/TnzuJ/sleepnomoreauction.com_urls_2.txt and the headers from line 50 of https://transfer.archivete.am/twnvK/auction.io_sleepnomoreauction.com_process_2.py
05:29:52<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/twnvK/auction.io_sleepnomoreauction.com_process_2.py
05:34:08<TheTechRobo>!remindme 8h do thing
05:34:09<eggdrop>[remind] ok, i'll remind you at 2025-01-30T13:34:08Z
05:38:17Webuser884331 joins
05:38:49<Webuser884331>question.... yahoo breif case
05:39:58beardicus quits [Ping timeout: 260 seconds]
05:40:13<@OrIdow6>Webuser884331: We didn't get it, sorry
05:40:33<@OrIdow6>"...but the warning time was roughly 60 days, which is long by Yahoo standards but hardly ideal for a service up for nearly a decade" per https://wiki.archiveteam.org/index.php/Yahoo!_Briefcase
05:43:15<Webuser884331>what about someones hotmail
05:43:36<Webuser884331>my mum died
05:43:54<Webuser884331>i want any photos she saved
05:46:17<@JAA>I think that was before AT even existed, although only barely.
06:20:18Webuser884331 quits [Client Quit]
06:23:20<@JAA>Actually, not quite. AT emerged in January 2009, domain registration on 2009-01-06. I thought it was a bit later that year.
06:54:17Dango360 quits [Read error: Connection reset by peer]
07:55:09pabs quits [Read error: Connection reset by peer]
07:55:37pabs (pabs) joins
08:30:25SootBector quits [Remote host closed the connection]
08:30:42SootBector (SootBector) joins
08:33:09`
08:40:08Emitewiki joins
08:40:24<Emitewiki>Anything we can do about this, or is it outside our purview/already done? https://bsky.app/profile/bobpony.com/post/3lgvxot2kos2j
08:41:04<pabs>"Microsoft will be removing the downloads for old Windows Themes in the future."
08:41:10<pabs>https://support.microsoft.com/windows/windows-themes-94880287-6046-1d35-6d2f-35dee759701e
08:42:17<pabs>looks like it will work in AB
08:42:51<Emitewiki>Sweet.
08:45:19<pabs>seems to be working, but some of the themes are already 404, including from a browser
08:45:46<Emitewiki>💀
08:46:19<pabs>it likely can't save the Windows Store stuff, which is behind a weird link ms-windows-store://collection/?collectionid=WindowsThemes
08:52:05<Emitewiki>Dang. Any way for us to manually do some shenanagins to save it?
08:54:19<pabs>reading the page again, that part isn't in danger, just the direct links, which are being saved
08:55:09<pabs>should be on archive.org in a few days
08:59:47<Emitewiki>Ah, you're right. Cool cool, thanks
08:59:48<Emitewiki>!
09:31:14Island quits [Read error: Connection reset by peer]
09:56:35scurvy_duck joins
10:00:32<Emitewiki>Anyone mind sending this through AB? The dev is starting to delist some of their games from stores, and this usually precludes a website shutdown, so I just want to be extra safe. https://www.catsoulstudios.com/
10:03:16<that_lurker>sure
10:03:47<Emitewiki>Thanks.
10:03:51<that_lurker>is there a news article about that somewhere?
10:04:25<Emitewiki>It's an announcement on their Steam game notices.
10:04:34<Emitewiki>So, like, within the Steam interface itself.
10:04:43<that_lurker>aa ok
10:06:46Emitewiki quits [Client Quit]
11:30:15Webuser220882 joins
11:30:26Webuser220882 quits [Client Quit]
11:34:13PotatoProton01 joins
11:51:00PotatoProton01 quits [Client Quit]
12:00:03Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat]
12:02:49Bleo18260072271962345 joins
12:13:29lennier2_ joins
12:16:03lennier2 quits [Ping timeout: 260 seconds]
12:16:46icedice (icedice) joins
12:24:59pie_ (pie_) joins
12:28:33pie_ quits [Client Quit]
12:30:14pie_ (pie_) joins
12:30:21pie_ quits [Client Quit]
12:30:33pie_ (pie_) joins
12:35:20SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
12:35:36etnguyen03 (etnguyen03) joins
12:35:52SkilledAlpaca418962 joins
13:07:02beardicus (beardicus) joins
13:17:35iram quits [Quit: ~]
13:18:05iram joins
13:26:24Naruyoko5 joins
13:27:36Naruyoko quits [Ping timeout: 250 seconds]
13:27:48scurvy_duck quits [Ping timeout: 260 seconds]
13:34:08<eggdrop>[remind] TheTechRobo: do thing
14:04:38Wohlstand (Wohlstand) joins
14:06:49<TheTechRobo>pokechu22: So an example request might be a POST to https://auctionsoftware.net/mobileapi/fetchLocation with the body {"countryCode": 62} ?
14:07:20<TheTechRobo>Is there any link extraction needed or do they just have to be grabbed?
14:10:42katocala joins
14:25:20hexa- quits [Quit: WeeChat 4.4.3]
14:26:27hexa- (hexa-) joins
14:36:50BornOn420 quits [Remote host closed the connection]
14:37:23BornOn420 (BornOn420) joins
14:55:21<anarcat>is this on someone's radar? https://www.reddit.com/r/DataHoarder/comments/1idm9ii/datagov_is_currently_being_scrubbed/
14:55:49<anarcat>i'm getting kind of exhausted at the "okay, this fascist government is in, and they're going to destroy the entire digital infrastructure of country foo, let's crawl"
14:59:23<kiska>Some 2.2k datasets have been removed, what they are, I don't know
15:00:06kansei quits [Quit: ZNC 1.9.1 - https://znc.in]
15:01:42kansei (kansei) joins
15:34:15holbrooke joins
15:35:46riteo (riteo) joins
15:46:34earl joins
16:17:04Wohlstand quits [Remote host closed the connection]
16:17:36Wohlstand (Wohlstand) joins
16:23:58katocala quits [Ping timeout: 260 seconds]
16:24:12katocala joins
16:25:30midou quits [Remote host closed the connection]
16:25:39midou joins
16:39:56loug8318142 joins
16:40:49Wohlstand quits [Client Quit]
16:53:53SootBector quits [Remote host closed the connection]
16:54:17SootBector (SootBector) joins
17:22:54katocala quits [Ping timeout: 250 seconds]
17:23:08katocala joins
17:26:30Wohlstand (Wohlstand) joins
17:33:18holbrooke quits [Client Quit]
17:49:07<pokechu22>TheTechRobo: I've already done all of the link extraction (that's what the other lines in the file are); they just need to be grabbed. (But also the additional headers are needed; you should get a JSON response, not an HTML response)
18:09:05scurvy_duck joins
18:13:59etnguyen03 quits [Quit: Konversation terminated!]
18:24:09<TheTechRobo>pokechu22: Any rate limiting?
18:24:27<pokechu22>I didn't run into any with my script
18:25:09<pokechu22>(but the script was at concurrency 1 effectively)
18:28:55etnguyen03 (etnguyen03) joins
18:32:10holbrooke joins
18:45:14holbrooke quits [Ping timeout: 250 seconds]
18:52:43icedice quits [Quit: Leaving]
18:56:52<@JAA>From #hackint: 14:02:38 < i> They say that https://data.gov/ is getting deleted as we speak, losing 1000 datasets a day.
19:07:46notarobot1 quits [Ping timeout: 250 seconds]
19:08:40<TheTechRobo>pokechu22: I think I got all of them downloaded to WARC. I don't have WBM permission either, but as you said, it's all POST, so kind of a moot point.
19:10:05<pokechu22>Thanks. It's probably worth doing it a second time near when the auctions finish (which I thought was tomorrow, but it looks like they extended it to Feb 2? or maybe I just confused myself)
19:10:22<pokechu22>!remindme 3d https://sleepnomoreauction.com/ auctions close shortly cc TheTechRobo
19:10:23<eggdrop>[remind] ok, i'll remind you at 2025-02-02T19:10:22Z
19:13:17<TheTechRobo>pokechu22: Will the URLs be the same the second time around?
19:13:35<TheTechRobo>+ POST data
19:13:56<pokechu22>They should. I'll re-run my script and make sure but unless they list new items (which seems unlikely given that they're closing) it shouldn't change
19:14:38<TheTechRobo>Ack
19:16:16<TheTechRobo>Waiting for book_op.php to decide I'm not a serial killer...
19:20:29beardicus quits [Remote host closed the connection]
19:20:30<TheTechRobo>Up at https://archive.org/details/warc-sleepnomoreauction-post-urls
19:26:22scurvy_duck quits [Client Quit]
19:26:22moth_ quits [Read error: Connection reset by peer]
19:26:49moth_ joins
21:03:16cascode joins
21:04:45earl quits []
21:09:12moth_ quits [Read error: Connection reset by peer]
21:13:24etnguyen03 quits [Client Quit]
22:03:31balrog_ is now known as balrog
22:07:30<h2ibot>TheTechRobo edited Qwarc (+1780, Add some basic documentation): https://wiki.archiveteam.org/?diff=54295&oldid=54294
22:07:31loug8318142 quits [Quit: The Lounge - https://thelounge.chat]
22:07:56balrog quits [Quit: Bye]
22:08:14balrog (balrog) joins