00:40:58 | | pie_ is now authenticated as pie_ |
00:52:49 | | lunik11 quits [Quit: :x] |
00:53:24 | | lunik11 joins |
01:05:07 | | Exorcism quits [Quit: Ping timeout (120 seconds)] |
01:05:07 | | DigitalDragons quits [Quit: Ping timeout (120 seconds)] |
01:05:36 | | lukash98 quits [Quit: The Lounge - https://thelounge.chat] |
01:05:39 | | Exorcism (exorcism) joins |
01:05:39 | | DigitalDragons (DigitalDragons) joins |
01:26:09 | <eggdrop> | [remind] thuban: reverse-engineer https://buscadorresoluciones.ift.org.mx/ |
01:38:15 | | s-crypt is now known as help |
01:38:22 | | help is now known as s-crypt |
01:56:01 | | hexa- quits [Quit: WeeChat 4.4.3] |
01:57:37 | | hexa- (hexa-) joins |
02:22:29 | | datechnoman (datechnoman) joins |
02:43:42 | | lukash98 joins |
02:46:34 | | lukash98 quits [Client Quit] |
02:54:13 | | lukash98 joins |
03:01:29 | | linuxgemini (linuxgemini) joins |
03:16:18 | | lflare quits [Quit: Ping timeout (120 seconds)] |
03:16:40 | | lflare (lflare) joins |
03:31:43 | | BlueMaxima quits [Read error: Connection reset by peer] |
03:32:23 | | Exorcism quits [Client Quit] |
03:32:23 | | DigitalDragons quits [Client Quit] |
03:33:08 | | Exorcism (exorcism) joins |
03:33:19 | | DigitalDragons (DigitalDragons) joins |
03:34:23 | <pabs> | hmm, I noticed that the AB viewer isn't listing archived URLs properly for recent jobs https://archive.fart.website/archivebot/viewer/domain/localwiki.org |
03:34:59 | <pabs> | 202411250039529wri7 should not be blank, but https://localwiki.org/Users/MarshallBrain |
03:38:43 | <@JAA> | chfoo: ^ |
03:50:35 | | AK quits [Ping timeout: 260 seconds] |
03:51:06 | | etnguyen03 quits [Remote host closed the connection] |
04:01:27 | | th3z0l4_ joins |
04:03:25 | | th3z0l4 quits [Ping timeout: 260 seconds] |
04:11:13 | | StarletCharlotte joins |
04:25:00 | | riteo quits [Ping timeout: 260 seconds] |
04:49:04 | | Wohlstand (Wohlstand) joins |
05:08:11 | | Exorcism quits [Client Quit] |
05:08:11 | | DigitalDragons quits [Client Quit] |
05:09:31 | | DigitalDragons (DigitalDragons) joins |
05:09:41 | | Exorcism (exorcism) joins |
05:13:20 | | Fusl (Fusl) joins |
05:13:20 | | @ChanServ sets mode: +o Fusl |
06:01:45 | | Island quits [Read error: Connection reset by peer] |
06:02:04 | | Island joins |
06:41:30 | <thuban> | ugh, i'm fucking stumped. buscadorresoluciones.ift.org.mx lets you search by date, so it should be possible to find every document, but the pdf downloads are some java server faces nightmare and i can't find a source url. |
06:42:10 | | maakuth leaves |
06:42:29 | <thuban> | i got out wireshark and captured the download process, but i can't so much as find the name of the file in the capture _even though the browser names it correctly_. everything from the server after the download request is just tls "Application Data"--it decrypts, but it looks like garbage and wireshark can't or won't parse it. |
06:42:36 | <thuban> | anyone got any bright ideas? |
06:49:16 | <pokechu22> | Is it https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition ? |
06:49:40 | <thuban> | s/it decrypts.*/_some traffic from the server decrypts_ but this doesn't/ |
06:49:51 | <thuban> | i mean, i assume so, but i can't actually read the data. |
06:50:32 | <pokechu22> | Yeah, it is content-disposition |
06:50:57 | <that_lurker> | on chrome you could maybe get something out of chrome://net-export |
06:51:23 | <pokechu22> | (I got it to work by making a search in firefox, then alt -> file -> work offline, then clicking one of the results, which will open a tab saying you're offline, then I can open F12, disable offline mode, refresh, confirm that I'm fine with a duplicate POST, and it shows up in devtools) |
06:52:51 | <@JAA> | Ah, neat trick, thanks! |
06:52:53 | | riteo (riteo) joins |
06:53:08 | <@JAA> | And yeah, what a messy site. |
06:53:38 | <@JAA> | The giant 'Número de Resolución' <option> list looks fun, too. |
06:56:29 | | Webuser408847 joins |
06:56:47 | | Webuser408847 quits [Client Quit] |
06:57:32 | | Webuser404844 joins |
06:57:48 | | Webuser404844 quits [Client Quit] |
06:58:11 | <thuban> | so is there a url for the file? |
06:58:49 | <pokechu22> | No, it directly returns the file in the POST body |
07:00:42 | <thuban> | can you dump the post request? (the work offline thing is not working for me--i get a couple of posts but they're the wrong ones) |
07:01:43 | <thuban> | i wonder why wireshark wasn't decrypting properly :/ |
07:03:41 | <pokechu22> | You need to open devtools on the tab for the PDF (that has the offline error) |
07:04:55 | <thuban> | oi_c_ |
07:05:11 | <thuban> | thank you |
07:05:14 | <thuban> | "application/docx" lmao |
07:07:36 | <thuban> | hmmm, the payload seems to be missing |
07:08:18 | <that_lurker> | oh firefox also has logging at about:logging |
07:09:30 | | Unholy2361924645377131 quits [Ping timeout: 260 seconds] |
07:09:35 | <thuban> | ah, it's copyable, panel just doesn't show for some reason |
07:13:06 | | Exorcism quits [Client Quit] |
07:13:06 | | DigitalDragons quits [Client Quit] |
07:14:33 | | DigitalDragons (DigitalDragons) joins |
07:14:33 | | Exorcism (exorcism) joins |
07:15:18 | <thuban> | well, this whole thing is gross as hell. the downloads are dependent on the ViewState, so there's nothing we could meaningfully feed to archivebot anyway even if it did posts |
07:16:47 | <thuban> | i'm tempted to grab all the pdfs and just dump them into an ia item except that that still sounds like a huge pain in the ass. maybe with selenium |
07:30:47 | <steering> | thuban: looks like the ViewState encodes the search query and the formResults:doctosTable:<#>:j_idt87:0:j_idt88 determines the result number |
07:35:41 | <steering> | it doesn't look like it likes to do a real search unless you give it an existing ViewState with it though |
07:41:55 | <thuban> | steering: yeah, and the ViewState isn't replicable / expires / changes with pagination |
07:44:23 | <steering> | yeah, looks like you'd have to make a get to grab a ViewState, and then a post to transmute it into a search, and then a post to grab the doc |
07:44:41 | <steering> | everyone loves javascript! |
07:50:39 | <steering> | you can allow popups and uhh... something like this :D document.getElementById('formResults:doctosTable_data').querySelectorAll('a').forEach((e) => e.click()); |
07:51:02 | <steering> | (hooray for broken element ID's that can't be referenced normally!) |
08:04:54 | | pixel (pixel) joins |
08:10:10 | | qwertyasdfuiopghjkl2 quits [Ping timeout: 260 seconds] |
08:13:13 | | Island quits [Read error: Connection reset by peer] |
08:13:31 | | Island joins |
08:42:24 | | qwertyasdfuiopghjkl2 (qwertyasdfuiopghjkl2) joins |
09:14:28 | <@JAA> | Sometime in July, openssl.org was redesigned and content split up across several domains. Although they've taken some care to set up redirects, something as simple as a blog post from 2021 is broken: https://www.openssl.org/blog/blog/2021/09/07/OpenSSL3.Final/ (it's here: https://openssl-library.org/post/2021-09-06-openssl3.final/ ) |
09:20:18 | <that_lurker> | oof |
09:20:48 | <@JAA> | They also migrated their mailing lists to Google Groups. I wonder what else got wrecked. |
09:21:40 | <@JAA> | I'll go through that soon to figure out everything that needs to be archived. |
09:28:22 | <szczot3k> | https://crt.sh/?q=openssl.org I wonder if any of those domains point to the old website |
09:32:03 | <@arkiver> | urgh :/ |
09:37:45 | | AK (AK) joins |
09:42:29 | <StarletCharlotte> | So I found this site named dnbshare that has a lot of drum & bass sets that can't be found anywhere else online, and regularly gets new uploads every day. Now, checking the latest section is fairly easy, but looking for older downloads is a lot harder because you can only rely on the search bar, so you have to try a bunch of shit in the search bar to try to scrape older files. |
09:43:00 | <StarletCharlotte> | And the search results only iterate two pages down, presumably to avoid overloading limited resources or something. |
09:43:01 | <StarletCharlotte> | https://dnbshare.com/download |
09:43:09 | <StarletCharlotte> | It makes archiving these rare sets very cumbersome though. |
09:43:39 | <StarletCharlotte> | I was told to ask about it here from #//. What's the best way to proceed? |
09:46:58 | <that_lurker> | nicolas17: Ab jobs started for https://www.boincstats.com with the /stats being ignored |
10:02:53 | | Island quits [Read error: Connection reset by peer] |
10:07:25 | | Wohlstand quits [Ping timeout: 260 seconds] |
10:30:18 | | le0n quits [Quit: see you later, alligator] |
10:40:43 | | le0n (le0n) joins |
10:40:43 | | yasomi quits [Quit: ZNC 1.9.1 - https://znc.in] |
10:41:08 | | yasomi (yasomi) joins |
10:49:55 | | Wohlstand (Wohlstand) joins |
10:52:00 | | le0n_ (le0n) joins |
10:53:30 | | le0n quits [Ping timeout: 260 seconds] |
11:01:46 | <h2ibot> | Manu uploaded File:Hedgedoc banner color horizontal.png (HedgeDoc logo / banner): https://wiki.archiveteam.org/?title=File%3AHedgedoc%20banner%20color%20horizontal.png |
11:03:57 | | BornOn420_ (BornOn420) joins |
11:06:12 | | sralracer (sralracer) joins |
11:07:44 | | BornOn420 quits [Ping timeout: 276 seconds] |
11:12:48 | <h2ibot> | Manu created HedgeDoc (+826, Creating HedgeDoc page despite many open TODOs): https://wiki.archiveteam.org/?title=HedgeDoc |
11:14:49 | <h2ibot> | Manu uploaded File:Hedgedoc banner color horizontal.png (Replaced it with transparent background version): https://wiki.archiveteam.org/?title=File%3AHedgedoc%20banner%20color%20horizontal.png |
11:15:45 | | mls quits [Quit: leaving] |
11:28:14 | | loug8318142 joins |
12:00:01 | | Bleo182600722719623 quits [Quit: The Lounge - https://thelounge.chat] |
12:02:45 | | Bleo182600722719623 joins |
12:32:05 | | Wohlstand quits [Ping timeout: 260 seconds] |
12:33:49 | | BennyOtt quits [Quit: ZNC 1.9.1 - https://znc.in] |
12:38:08 | | BennyOtt (BennyOtt) joins |
12:39:54 | | BornOn420_ quits [Remote host closed the connection] |
12:40:30 | | BornOn420 (BornOn420) joins |
12:40:45 | | SkilledAlpaca41896 quits [Quit: SkilledAlpaca41896] |
12:42:51 | | SkilledAlpaca418962 joins |
12:49:34 | | etnguyen03 (etnguyen03) joins |
13:20:30 | <@JAA> | TIL 'National Records of Scotland Web Archive': https://webarchive.nrscotland.gov.uk/ |
13:27:29 | <pabs> | thuban: I do this: go into devtools, disable cache, load the page, then save all requests as .har, open that in your text editor (or jless) and search for what you are looking for, then work backwards |
13:28:39 | <pabs> | JAA: ISTR I did some openssl archiving in the last months when the foundation was created |
13:34:42 | <@JAA> | pabs: That site opens the download in a forced new tab, so regular dev tools don't track the download request. Global dev tools used to be a thing, but I think that was removed a long while ago, or I couldn't find them anymore at least. Maybe related to the introduction of content processes etc. |
13:35:02 | <pabs> | oh, ugh |
13:35:55 | <@JAA> | Good to hear re openssl.org! I guess I'll see those soon. I only briefly searched for the mailing list, which was mostly covered (/pipermail/ only) a couple weeks before the migration by Exorcism. |
14:08:58 | | nulldata (nulldata) joins |
14:17:51 | | Commander001 quits [Remote host closed the connection] |
14:20:15 | | sludge quits [Remote host closed the connection] |
14:20:27 | | sludge joins |
14:24:30 | | Commander001 joins |
15:03:10 | <yasomi> | I noticed that someone in archiveteam publishes snapshots of the bluesky firehose in the internet archive but disables downloads on them, how long are the downloads disabled for? is it kinda like the US census data where the raw data isn't made public for a very long time, hopefully long enough that it's not personally identifiable in a way that matters anymore? |
15:15:29 | <that_lurker> | yasomi: Who was the uploader? |
15:16:20 | <that_lurker> | ahh nevermind |
15:17:09 | <yasomi> | archiveteam: https://archive.org/details/archiveteam_bluesky_20241010104307_319086dd |
15:17:48 | <yasomi> | i'm more curious than anything, not angry about the data being there, more trying to understand the archival parameters (assuming it's more for an anthropological snapshot in time for later construction of historical narratives) |
15:20:14 | <yasomi> | I'd also contact the curator in private about it, but there was no name listed and the archiveteam wiki has no details about it so I had to ask publicly |
15:22:13 | <that_lurker> | Most likely the data is made private to prevent AI training. It also could be available through the Wayback Machine. arkiver or JAA might know more on this. |
15:30:01 | <imer> | megawarc being hidden is pretty standard - I think most projects do that now? |
15:37:53 | | MrMcNuggets (MrMcNuggets) joins |
15:44:35 | | Commander001 quits [Ping timeout: 260 seconds] |
15:45:07 | | Commander001 joins |
15:46:34 | | Commander001 quits [Read error: Connection reset by peer] |
15:46:47 | | Commander001 joins |
16:18:09 | | Exorcism quits [Quit: Ping timeout (120 seconds)] |
16:18:15 | | DigitalDragons quits [Quit: Ping timeout (120 seconds)] |
16:20:00 | | Exorcism (exorcism) joins |
16:20:00 | | DigitalDragons (DigitalDragons) joins |
16:21:22 | | DigitalDragons quits [Excess Flood] |
16:30:40 | | katocala quits [Ping timeout: 260 seconds] |
16:30:47 | | katocala joins |
16:42:20 | | katocala quits [Ping timeout: 260 seconds] |
16:42:32 | | katocala joins |
16:42:43 | | katocala is now authenticated as katocala |
16:43:58 | | ducky quits [Ping timeout: 260 seconds] |
16:47:44 | | ducky (ducky) joins |
17:02:45 | | loug8318142 quits [Ping timeout: 260 seconds] |
17:10:05 | | loug8318142 joins |
17:16:19 | | DigitalDragons (DigitalDragons) joins |
17:16:56 | | SootBector quits [Ping timeout: 276 seconds] |
17:18:36 | | SootBector (SootBector) joins |
17:18:43 | <masterx244|m> | stuff in the inbox is always restricted |
17:24:57 | | DigitalDragons quits [Client Quit] |
17:24:57 | | Exorcism quits [Client Quit] |
17:25:17 | | DigitalDragons (DigitalDragons) joins |
17:25:20 | | Exorcism (exorcism) joins |
17:37:10 | | etnguyen03 quits [Quit: Konversation terminated!] |
17:44:54 | <kiska> | masterx244|m: But there are stuff that is on a... need to know basis. The whole collection is noindex https://archive.org/details/archiveteam_bluesky |
17:45:29 | <kiska> | For example we had a twitter project starting from... a while ago and the whole collection is noindex https://archive.org/details/archiveteam_twitter |
17:53:25 | | etnguyen03 (etnguyen03) joins |
17:55:41 | | systwi_ quits [Quit: systwi_] |
18:04:26 | | imer quits [Quit: Oh no] |
18:05:05 | | imer (imer) joins |
18:14:23 | | ducky quits [Ping timeout: 260 seconds] |
18:14:51 | | ducky (ducky) joins |
18:44:33 | | loug8318142 quits [Client Quit] |
18:46:27 | | loug8318142 joins |
19:35:55 | | MrMcNuggets quits [Quit: WeeChat 4.3.2] |
20:21:51 | | LddPotato__ is now known as LddPotato |
20:28:50 | <@OrIdow6> | !remindme 3d thing |
20:28:51 | <eggdrop> | [remind] ok, i'll remind you at 2024-11-30T20:28:50Z |
20:38:42 | | mls (mls) joins |
20:47:20 | | matoro quits [Ping timeout: 260 seconds] |
20:50:34 | | matoro joins |
20:53:00 | | etnguyen03 quits [Client Quit] |
21:10:03 | | jasons (jasons) joins |
21:21:17 | | sralracer quits [Quit: Ooops, wrong browser tab.] |
21:43:46 | | Island joins |
22:05:35 | | jacksonchen666 (jacksonchen666) joins |
22:09:19 | | etnguyen03 (etnguyen03) joins |
22:25:26 | <szczot3k> | When checking on my containers from time to time I see logs like " 1,548,746,752 75% 505.26kB/s 0:16:35", which are moving really, really slow. Last lines before that seem to be tracker confirming items, nothing more. Is it something I should look into? |
22:28:15 | | StarletCharlotte quits [Ping timeout: 260 seconds] |
22:30:09 | <nicolas17> | szczot3k: you may have an unusually bad network route to the upload server |
22:30:21 | <nicolas17> | unless 500KB/s is your normal internet upload speed :p |
22:31:25 | <szczot3k> | It's not the case usually, most of the time it goes fine |
22:31:52 | <szczot3k> | It's a kimsufi dedi, so shouldn't be that bad |
22:32:03 | <nicolas17> | there are multiple upload servers |
22:32:06 | <nicolas17> | what project? |
22:32:13 | <szczot3k> | urls |
22:32:50 | <imer> | that is a pretty big item for urls huh |
22:33:05 | <nicolas17> | ^ |
22:33:47 | <szczot3k> | https://tracker.archiveteam.org/urls/#show-all my share of uploads to urls shows this discrepancy in the amount of uploaded data to items |
22:47:43 | | jacksonchen666 is now known as jacksonchen666_ |
22:47:51 | | jacksonchen666_ is now known as jc666 |
22:50:35 | | jacksonchen666 (jacksonchen666) joins |
22:56:43 | | jacksonchen666 quits [Remote host closed the connection] |
22:56:46 | | jacksonchen666 (jacksonchen666) joins |
23:00:13 | | SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962] |
23:00:36 | | Radzig quits [Remote host closed the connection] |
23:05:20 | <thuban> | JAA: ping re forum.pclab.pl? 3 days until deadline |
23:07:37 | | Radzig joins |
23:11:42 | | jc666 quits [Client Quit] |
23:29:11 | | SkilledAlpaca418962 joins |
23:48:18 | | etnguyen03 quits [Client Quit] |