00:52:49lunik11 quits [Quit: :x]
00:53:24lunik11 joins
01:05:07Exorcism quits [Quit: Ping timeout (120 seconds)]
01:05:07DigitalDragons quits [Quit: Ping timeout (120 seconds)]
01:05:36lukash98 quits [Quit: The Lounge - https://thelounge.chat]
01:05:39Exorcism (exorcism) joins
01:05:39DigitalDragons (DigitalDragons) joins
01:26:09<eggdrop>[remind] thuban: reverse-engineer https://buscadorresoluciones.ift.org.mx/
01:38:15s-crypt is now known as help
01:38:22help is now known as s-crypt
01:56:01hexa- quits [Quit: WeeChat 4.4.3]
01:57:37hexa- (hexa-) joins
02:22:29datechnoman (datechnoman) joins
02:43:42lukash98 joins
02:46:34lukash98 quits [Client Quit]
02:54:13lukash98 joins
03:01:29linuxgemini (linuxgemini) joins
03:16:18lflare quits [Quit: Ping timeout (120 seconds)]
03:16:40lflare (lflare) joins
03:31:43BlueMaxima quits [Read error: Connection reset by peer]
03:32:23Exorcism quits [Client Quit]
03:32:23DigitalDragons quits [Client Quit]
03:33:08Exorcism (exorcism) joins
03:33:19DigitalDragons (DigitalDragons) joins
03:34:23<pabs>hmm, I noticed that the AB viewer isn't listing archived URLs properly for recent jobs https://archive.fart.website/archivebot/viewer/domain/localwiki.org
03:34:59<pabs>202411250039529wri7 should not be blank, but https://localwiki.org/Users/MarshallBrain
03:38:43<@JAA>chfoo: ^
03:50:35AK quits [Ping timeout: 260 seconds]
03:51:06etnguyen03 quits [Remote host closed the connection]
04:01:27th3z0l4_ joins
04:03:25th3z0l4 quits [Ping timeout: 260 seconds]
04:11:13StarletCharlotte joins
04:25:00riteo quits [Ping timeout: 260 seconds]
04:49:04Wohlstand (Wohlstand) joins
05:08:11Exorcism quits [Client Quit]
05:08:11DigitalDragons quits [Client Quit]
05:09:31DigitalDragons (DigitalDragons) joins
05:09:41Exorcism (exorcism) joins
05:13:20Fusl (Fusl) joins
05:13:20@ChanServ sets mode: +o Fusl
06:01:45Island quits [Read error: Connection reset by peer]
06:02:04Island joins
06:41:30<thuban>ugh, i'm fucking stumped. buscadorresoluciones.ift.org.mx lets you search by date, so it should be possible to find every document, but the pdf downloads are some java server faces nightmare and i can't find a source url.
06:42:10maakuth leaves
06:42:29<thuban>i got out wireshark and captured the download process, but i can't so much as find the name of the file in the capture _even though the browser names it correctly_. everything from the server after the download request is just tls "Application Data"--it decrypts, but it looks like garbage and wireshark can't or won't parse it.
06:42:36<thuban>anyone got any bright ideas?
06:49:16<pokechu22>Is it https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition ?
06:49:40<thuban>s/it decrypts.*/_some traffic from the server decrypts_ but this doesn't/
06:49:51<thuban>i mean, i assume so, but i can't actually read the data.
06:50:32<pokechu22>Yeah, it is content-disposition
06:50:57<that_lurker>on chrome you could maybe get something out of chrome://net-export
06:51:23<pokechu22>(I got it to work by making a search in firefox, then alt -> file -> work offline, then clicking one of the results, which will open a tab saying you're offline, then I can open F12, disable offline mode, refresh, confirm that I'm fine with a duplicate POST, and it shows up in devtools)
06:52:51<@JAA>Ah, neat trick, thanks!
06:52:53riteo (riteo) joins
06:53:08<@JAA>And yeah, what a messy site.
06:53:38<@JAA>The giant 'Número de Resolución' <option> list looks fun, too.
06:56:29Webuser408847 joins
06:56:47Webuser408847 quits [Client Quit]
06:57:32Webuser404844 joins
06:57:48Webuser404844 quits [Client Quit]
06:58:11<thuban>so is there a url for the file?
06:58:49<pokechu22>No, it directly returns the file in the POST body
07:00:42<thuban>can you dump the post request? (the work offline thing is not working for me--i get a couple of posts but they're the wrong ones)
07:01:43<thuban>i wonder why wireshark wasn't decrypting properly :/
07:03:41<pokechu22>You need to open devtools on the tab for the PDF (that has the offline error)
07:04:55<thuban>oi_c_
07:05:11<thuban>thank you
07:05:14<thuban>"application/docx" lmao
07:07:36<thuban>hmmm, the payload seems to be missing
07:08:18<that_lurker>oh firefox also has logging at about:logging
07:09:30Unholy2361924645377131 quits [Ping timeout: 260 seconds]
07:09:35<thuban>ah, it's copyable, panel just doesn't show for some reason
07:13:06Exorcism quits [Client Quit]
07:13:06DigitalDragons quits [Client Quit]
07:14:33DigitalDragons (DigitalDragons) joins
07:14:33Exorcism (exorcism) joins
07:15:18<thuban>well, this whole thing is gross as hell. the downloads are dependent on the ViewState, so there's nothing we could meaningfully feed to archivebot anyway even if it did posts
07:16:47<thuban>i'm tempted to grab all the pdfs and just dump them into an ia item except that that still sounds like a huge pain in the ass. maybe with selenium
07:30:47<steering>thuban: looks like the ViewState encodes the search query and the formResults:doctosTable:<#>:j_idt87:0:j_idt88 determines the result number
07:35:41<steering>it doesn't look like it likes to do a real search unless you give it an existing ViewState with it though
07:41:55<thuban>steering: yeah, and the ViewState isn't replicable / expires / changes with pagination
07:44:23<steering>yeah, looks like you'd have to make a get to grab a ViewState, and then a post to transmute it into a search, and then a post to grab the doc
07:44:41<steering>everyone loves javascript!
07:50:39<steering>you can allow popups and uhh... something like this :D document.getElementById('formResults:doctosTable_data').querySelectorAll('a').forEach((e) => e.click());
07:51:02<steering>(hooray for broken element ID's that can't be referenced normally!)
08:04:54pixel (pixel) joins
08:10:10qwertyasdfuiopghjkl2 quits [Ping timeout: 260 seconds]
08:13:13Island quits [Read error: Connection reset by peer]
08:13:31Island joins
08:42:24qwertyasdfuiopghjkl2 (qwertyasdfuiopghjkl2) joins
09:14:28<@JAA>Sometime in July, openssl.org was redesigned and content split up across several domains. Although they've taken some care to set up redirects, something as simple as a blog post from 2021 is broken: https://www.openssl.org/blog/blog/2021/09/07/OpenSSL3.Final/ (it's here: https://openssl-library.org/post/2021-09-06-openssl3.final/ )
09:20:18<that_lurker>oof
09:20:48<@JAA>They also migrated their mailing lists to Google Groups. I wonder what else got wrecked.
09:21:40<@JAA>I'll go through that soon to figure out everything that needs to be archived.
09:28:22<szczot3k>https://crt.sh/?q=openssl.org I wonder if any of those domains point to the old website
09:32:03<@arkiver>urgh :/
09:37:45AK (AK) joins
09:42:29<StarletCharlotte>So I found this site named dnbshare that has a lot of drum & bass sets that can't be found anywhere else online, and regularly gets new uploads every day. Now, checking the latest section is fairly easy, but looking for older downloads is a lot harder because you can only rely on the search bar, so you have to try a bunch of shit in the search bar to try to scrape older files.
09:43:00<StarletCharlotte>And the search results only iterate two pages down, presumably to avoid overloading limited resources or something.
09:43:01<StarletCharlotte>https://dnbshare.com/download
09:43:09<StarletCharlotte>It makes archiving these rare sets very cumbersome though.
09:43:39<StarletCharlotte>I was told to ask about it here from #//. What's the best way to proceed?
09:46:58<that_lurker>nicolas17: Ab jobs started for https://www.boincstats.com with the /stats being ignored
10:02:53Island quits [Read error: Connection reset by peer]
10:07:25Wohlstand quits [Ping timeout: 260 seconds]
10:30:18le0n quits [Quit: see you later, alligator]
10:40:43le0n (le0n) joins
10:40:43yasomi quits [Quit: ZNC 1.9.1 - https://znc.in]
10:41:08yasomi (yasomi) joins
10:49:55Wohlstand (Wohlstand) joins
10:52:00le0n_ (le0n) joins
10:53:30le0n quits [Ping timeout: 260 seconds]
11:01:46<h2ibot>Manu uploaded File:Hedgedoc banner color horizontal.png (HedgeDoc logo / banner): https://wiki.archiveteam.org/?title=File%3AHedgedoc%20banner%20color%20horizontal.png
11:03:57BornOn420_ (BornOn420) joins
11:06:12sralracer (sralracer) joins
11:07:44BornOn420 quits [Ping timeout: 276 seconds]
11:12:48<h2ibot>Manu created HedgeDoc (+826, Creating HedgeDoc page despite many open TODOs): https://wiki.archiveteam.org/?title=HedgeDoc
11:14:49<h2ibot>Manu uploaded File:Hedgedoc banner color horizontal.png (Replaced it with transparent background version): https://wiki.archiveteam.org/?title=File%3AHedgedoc%20banner%20color%20horizontal.png
11:15:45mls quits [Quit: leaving]
11:28:14loug8318142 joins
12:00:01Bleo182600722719623 quits [Quit: The Lounge - https://thelounge.chat]
12:02:45Bleo182600722719623 joins
12:32:05Wohlstand quits [Ping timeout: 260 seconds]
12:33:49BennyOtt quits [Quit: ZNC 1.9.1 - https://znc.in]
12:38:08BennyOtt (BennyOtt) joins
12:39:54BornOn420_ quits [Remote host closed the connection]
12:40:30BornOn420 (BornOn420) joins
12:40:45SkilledAlpaca41896 quits [Quit: SkilledAlpaca41896]
12:42:51SkilledAlpaca418962 joins
12:49:34etnguyen03 (etnguyen03) joins
13:20:30<@JAA>TIL 'National Records of Scotland Web Archive': https://webarchive.nrscotland.gov.uk/
13:27:29<pabs>thuban: I do this: go into devtools, disable cache, load the page, then save all requests as .har, open that in your text editor (or jless) and search for what you are looking for, then work backwards
13:28:39<pabs>JAA: ISTR I did some openssl archiving in the last months when the foundation was created
13:34:42<@JAA>pabs: That site opens the download in a forced new tab, so regular dev tools don't track the download request. Global dev tools used to be a thing, but I think that was removed a long while ago, or I couldn't find them anymore at least. Maybe related to the introduction of content processes etc.
13:35:02<pabs>oh, ugh
13:35:55<@JAA>Good to hear re openssl.org! I guess I'll see those soon. I only briefly searched for the mailing list, which was mostly covered (/pipermail/ only) a couple weeks before the migration by Exorcism.
14:08:58nulldata (nulldata) joins
14:17:51Commander001 quits [Remote host closed the connection]
14:20:15sludge quits [Remote host closed the connection]
14:20:27sludge joins
14:24:30Commander001 joins
15:03:10<yasomi>I noticed that someone in archiveteam publishes snapshots of the bluesky firehose in the internet archive but disables downloads on them, how long are the downloads disabled for? is it kinda like the US census data where the raw data isn't made public for a very long time, hopefully long enough that it's not personally identifiable in a way that matters anymore?
15:15:29<that_lurker>yasomi: Who was the uploader?
15:16:20<that_lurker>ahh nevermind
15:17:09<yasomi>archiveteam: https://archive.org/details/archiveteam_bluesky_20241010104307_319086dd
15:17:48<yasomi>i'm more curious than anything, not angry about the data being there, more trying to understand the archival parameters (assuming it's more for an anthropological snapshot in time for later construction of historical narratives)
15:20:14<yasomi>I'd also contact the curator in private about it, but there was no name listed and the archiveteam wiki has no details about it so I had to ask publicly
15:22:13<that_lurker>Most likely the data is made private to prevent AI training. It also could be available through the Wayback Machine. arkiver or JAA might know more on this.
15:30:01<imer>megawarc being hidden is pretty standard - I think most projects do that now?
15:37:53MrMcNuggets (MrMcNuggets) joins
15:44:35Commander001 quits [Ping timeout: 260 seconds]
15:45:07Commander001 joins
15:46:34Commander001 quits [Read error: Connection reset by peer]
15:46:47Commander001 joins
16:18:09Exorcism quits [Quit: Ping timeout (120 seconds)]
16:18:15DigitalDragons quits [Quit: Ping timeout (120 seconds)]
16:20:00Exorcism (exorcism) joins
16:20:00DigitalDragons (DigitalDragons) joins
16:21:22DigitalDragons quits [Excess Flood]
16:30:40katocala quits [Ping timeout: 260 seconds]
16:30:47katocala joins
16:42:20katocala quits [Ping timeout: 260 seconds]
16:42:32katocala joins
16:43:58ducky quits [Ping timeout: 260 seconds]
16:47:44ducky (ducky) joins
17:02:45loug8318142 quits [Ping timeout: 260 seconds]
17:10:05loug8318142 joins
17:16:19DigitalDragons (DigitalDragons) joins
17:16:56SootBector quits [Ping timeout: 276 seconds]
17:18:36SootBector (SootBector) joins
17:18:43<masterx244|m>stuff in the inbox is always restricted
17:24:57DigitalDragons quits [Client Quit]
17:24:57Exorcism quits [Client Quit]
17:25:17DigitalDragons (DigitalDragons) joins
17:25:20Exorcism (exorcism) joins
17:37:10etnguyen03 quits [Quit: Konversation terminated!]
17:44:54<kiska>masterx244|m: But there are stuff that is on a... need to know basis. The whole collection is noindex https://archive.org/details/archiveteam_bluesky
17:45:29<kiska>For example we had a twitter project starting from... a while ago and the whole collection is noindex https://archive.org/details/archiveteam_twitter
17:53:25etnguyen03 (etnguyen03) joins
17:55:41systwi_ quits [Quit: systwi_]
18:04:26imer quits [Quit: Oh no]
18:05:05imer (imer) joins
18:14:23ducky quits [Ping timeout: 260 seconds]
18:14:51ducky (ducky) joins
18:44:33loug8318142 quits [Client Quit]
18:46:27loug8318142 joins
19:35:55MrMcNuggets quits [Quit: WeeChat 4.3.2]
20:21:51LddPotato__ is now known as LddPotato
20:28:50<@OrIdow6>!remindme 3d thing
20:28:51<eggdrop>[remind] ok, i'll remind you at 2024-11-30T20:28:50Z
20:38:42mls (mls) joins
20:47:20matoro quits [Ping timeout: 260 seconds]
20:50:34matoro joins
20:53:00etnguyen03 quits [Client Quit]
21:10:03jasons (jasons) joins
21:21:17sralracer quits [Quit: Ooops, wrong browser tab.]
21:43:46Island joins
22:05:35jacksonchen666 (jacksonchen666) joins
22:09:19etnguyen03 (etnguyen03) joins
22:25:26<szczot3k>When checking on my containers from time to time I see logs like " 1,548,746,752 75% 505.26kB/s 0:16:35", which are moving really, really slow. Last lines before that seem to be tracker confirming items, nothing more. Is it something I should look into?
22:28:15StarletCharlotte quits [Ping timeout: 260 seconds]
22:30:09<nicolas17>szczot3k: you may have an unusually bad network route to the upload server
22:30:21<nicolas17>unless 500KB/s is your normal internet upload speed :p
22:31:25<szczot3k>It's not the case usually, most of the time it goes fine
22:31:52<szczot3k>It's a kimsufi dedi, so shouldn't be that bad
22:32:03<nicolas17>there are multiple upload servers
22:32:06<nicolas17>what project?
22:32:13<szczot3k>urls
22:32:50<imer>that is a pretty big item for urls huh
22:33:05<nicolas17>^
22:33:47<szczot3k>https://tracker.archiveteam.org/urls/#show-all my share of uploads to urls shows this discrepancy in the amount of uploaded data to items
22:47:43jacksonchen666 is now known as jacksonchen666_
22:47:51jacksonchen666_ is now known as jc666
22:50:35jacksonchen666 (jacksonchen666) joins
22:56:43jacksonchen666 quits [Remote host closed the connection]
22:56:46jacksonchen666 (jacksonchen666) joins
23:00:13SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
23:00:36Radzig quits [Remote host closed the connection]
23:05:20<thuban>JAA: ping re forum.pclab.pl? 3 days until deadline
23:07:37Radzig joins
23:11:42jc666 quits [Client Quit]
23:29:11SkilledAlpaca418962 joins
23:48:18etnguyen03 quits [Client Quit]