00:12:21 | | Blu joins |
00:13:13 | | loug42 quits [Client Quit] |
00:14:13 | <Blu> | just like to say that roblox never shut down its user ads api after july 31st and its still available rn (https://www.roblox.com/user-sponsorship/getadimage?adId=1123) |
00:16:33 | | nepeat quits [Ping timeout: 272 seconds] |
00:19:03 | | nepeat (nepeat) joins |
00:22:01 | | etnguyen03 (etnguyen03) joins |
00:52:49 | | Island quits [Read error: Connection reset by peer] |
00:54:57 | | Island joins |
01:03:40 | | Blu quits [Ping timeout: 255 seconds] |
01:18:34 | | DogsRNice joins |
02:36:03 | | etnguyen03 quits [Client Quit] |
02:46:58 | | etnguyen03 (etnguyen03) joins |
02:50:53 | | yarrow quits [Read error: Connection reset by peer] |
02:52:54 | | yarrow (yarrow) joins |
02:58:29 | | Arcorann (Arcorann) joins |
03:11:57 | | etnguyen03 quits [Remote host closed the connection] |
03:16:25 | | eroc1990 quits [Ping timeout: 272 seconds] |
03:21:20 | | eroc1990 (eroc1990) joins |
03:30:34 | <@JAA> | Grabbing the form submissions on https://theamericapac.org/wp-content/custom/index.php for all ZIP codes from the USPS's list at https://postalpro.usps.com/ZIP_Locale_Detail with qwarc. Context: https://www.cnbc.com/2024/08/02/elon-musk-pac-voter-data-trump-harris.html |
03:32:13 | <@JAA> | Since they're POST, that won't work in the WBM, but at least the data will be there. |
03:48:47 | <fireonlive> | oh interesting, haven't seen this before |
03:49:01 | <fireonlive> | on https://tracker.archiveteam.org/ [WIP] Stack Exchange leads to a 404: https://tracker.archiveteam.org/stack-exchange/ |
03:55:15 | | Blu joins |
04:16:09 | | Guest54 quits [Client Quit] |
04:17:54 | <tech234a> | fireonlive: technically the leaderboard URL can be set to anything I think https://warriorhq.archiveteam.org/projects.json |
04:26:05 | <fireonlive> | ahh |
04:30:13 | | Blu quits [Ping timeout: 255 seconds] |
04:36:26 | | Mist8kenGAS (Mist8kenGAS) joins |
04:44:18 | <h2ibot> | Tech234a edited Current Projects (+205, Move Reddit to hiatus, split longer term…): https://wiki.archiveteam.org/?diff=53171&oldid=52489 |
04:47:12 | <fireonlive> | tech234a++ |
04:47:12 | <eggdrop> | [karma] 'tech234a' now has 2 karma! |
04:55:32 | | DogsRNice quits [Read error: Connection reset by peer] |
05:25:04 | <pabs> | hmm, does AB not detect links from uppercase HREF attributes? |
05:26:00 | | Mist8kenGAS_ joins |
05:26:41 | <@JAA> | It should find them, and there's even a test for it in wpull's test suite. |
05:27:34 | <@JAA> | Although the test suite only tests html5lib, not lxml. Hmm... |
05:27:58 | <@JAA> | Er no, it tests both, except on PyPy. |
05:29:25 | | Mist8kenGAS quits [Ping timeout: 272 seconds] |
05:38:24 | | Wohlstand (Wohlstand) joins |
05:41:06 | <pabs> | an example where it didn't 4iow663ym6arp9cfjvd0x1ka1 https://www.xach.com/gimp/tutorials/cvsgimp.html |
05:41:34 | <pabs> | the page contains some ftp links with HREF and one other link with href and it only got the href one |
05:47:52 | <@JAA> | All links are capitalised HREF on that page for me. |
05:48:33 | | Mist8kenGAS_ quits [Client Quit] |
05:49:36 | <@JAA> | !remindme 12h Check what 4iow663ym6arp9cfjvd0x1ka1 extracted exactly. |
05:49:36 | <eggdrop> | [remind] ok, i'll remind you at 2024-08-03T17:49:36Z |
05:53:48 | <pabs> | hmm, I must have misread |
05:54:13 | <pabs> | so its just that the ftp links were ignored somehow |
05:54:30 | <pabs> | they are all tarballs |
05:56:02 | <@JAA> | Yeah, wpull doesn't follow links from HTTP(S) to FTP by default, and I don't think AB enables it either. |
05:57:13 | <@JAA> | That's inherited from wget, not a wpull invention. |
05:57:23 | <@JAA> | --follow-ftp is the option to enable such recursion. |
05:58:17 | <@JAA> | (Not available in AB) |
06:12:03 | | Island quits [Read error: Connection reset by peer] |
06:52:01 | | IDK (IDK) joins |
07:09:29 | | Unholy236192464537713 quits [Ping timeout: 272 seconds] |
07:20:21 | | Wohlstand quits [Client Quit] |
07:48:07 | | Arcorann quits [Ping timeout: 272 seconds] |
07:50:16 | | Arcorann (Arcorann) joins |
07:51:20 | | Arcorann quits [Remote host closed the connection] |
07:56:48 | <h2ibot> | Exorcism edited Mailman/2 (-92): https://wiki.archiveteam.org/?diff=53172&oldid=53167 |
08:01:12 | | loug42 joins |
08:02:48 | <h2ibot> | Exorcism edited MoinMoin (+0): https://wiki.archiveteam.org/?diff=53173&oldid=53169 |
08:37:11 | | michaelblob quits [Read error: Connection reset by peer] |
08:53:28 | | tzt quits [Ping timeout: 255 seconds] |
09:00:04 | | Bleo1826007227196 quits [Client Quit] |
09:01:25 | | Bleo1826007227196 joins |
09:08:55 | | andybak joins |
09:11:59 | <h2ibot> | Exorcism edited MoinMoin (-37): https://wiki.archiveteam.org/?diff=53174&oldid=53173 |
09:14:16 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
09:14:26 | | andybak quits [Client Quit] |
09:35:20 | | sonick (sonick) joins |
09:40:49 | | IDK quits [Client Quit] |
10:01:13 | | tmob quits [Read error: Connection reset by peer] |
11:00:03 | | Bleo1826007227196 quits [Client Quit] |
11:01:22 | | Bleo1826007227196 joins |
11:50:41 | | SkilledAlpaca quits [Client Quit] |
11:52:11 | | SkilledAlpaca joins |
12:00:42 | | etnguyen03 (etnguyen03) joins |
12:13:47 | | StarletCharlotte joins |
12:14:23 | <StarletCharlotte> | Hey, so... What's the best way to archive all of the files in a public S3 bucket? |
12:15:27 | <katia> | https://gitea.arpa.li/JustAnotherArchivist/little-things/src/branch/master/s3-bucket-list |
12:15:47 | <katia> | + archivebot |
12:15:49 | <katia> | what's the bucket? |
12:19:25 | <StarletCharlotte> | @katia Well, I'm hesitant to say what it is publicly before it can be backed up because I don't want the "OMG IT'S THE PETALEAK" people to get their hands on it and get it taken down like the TestFlight stuff. Should I DM you or something? |
12:21:23 | <katia> | StarletCharlotte, sure, you can dm me, but i can't do much more than list the files with that script + archive it with archivebot |
12:21:59 | <StarletCharlotte> | @katia: As long as it all gets backed up somewhere safe. |
12:53:43 | | etnguyen03 quits [Client Quit] |
13:25:33 | | etnguyen03 (etnguyen03) joins |
13:32:28 | | Gadelhas56 quits [Ping timeout: 255 seconds] |
13:49:10 | | etnguyen03 quits [Client Quit] |
14:23:03 | | loug42 quits [Client Quit] |
14:23:15 | | loug42 joins |
14:28:49 | <h2ibot> | Jarshua edited List of websites excluded from the Wayback Machine (+26, added woke-world.com): https://wiki.archiveteam.org/?diff=53175&oldid=53165 |
14:28:50 | <h2ibot> | Censuro edited Talk:URLTeam (+94, /* Shouldn't archive.today be considered a URL…): https://wiki.archiveteam.org/?diff=53176&oldid=51920 |
14:28:51 | <h2ibot> | QuieselWusul edited Discord (+64, added roblox discord server aggregator): https://wiki.archiveteam.org/?diff=53177&oldid=52307 |
14:29:49 | <h2ibot> | Taka edited Deathwatch (+243, /* 2024 */ Added Akiba Souken): https://wiki.archiveteam.org/?diff=53178&oldid=53106 |
14:31:59 | | Gadelhas56 joins |
15:00:55 | <h2ibot> | JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=53179&oldid=53175 |
15:15:51 | | tzt (tzt) joins |
15:15:55 | | shgaqnyrjp_ (shgaqnyrjp) joins |
15:17:23 | | shgaqnyrjp quits [Ping timeout: 260 seconds] |
15:20:19 | | xarph_ quits [Ping timeout: 272 seconds] |
15:20:23 | | xarph joins |
15:50:37 | | midou quits [Ping timeout: 255 seconds] |
16:04:59 | | BearFortress quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
16:25:57 | | midou joins |
16:31:00 | | IDK (IDK) joins |
16:31:34 | | midou quits [Ping timeout: 255 seconds] |
16:35:16 | | midou joins |
17:05:16 | | BearFortress joins |
17:15:31 | | StarletCharlotte quits [Remote host closed the connection] |
17:17:41 | | VerifiedJ9 quits [Remote host closed the connection] |
17:18:24 | | VerifiedJ9 (VerifiedJ) joins |
17:49:36 | <eggdrop> | [remind] JAA: Check what 4iow663ym6arp9cfjvd0x1ka1 extracted exactly. |
18:52:12 | | shgaqnyrjp_ is now known as shgaqnyrjp |
19:08:27 | | StarletCharlotte joins |
19:09:48 | <StarletCharlotte> | So I'm looking at the job for https://transfer.archivete.am/f45an/2024-08-03_buildkit.s3.amazonaws.com.txt, and it looks like it's missing quite a bit because of connection closed errors, despite the fact that those URLs are working. Is that... Normal? Is that going to be fixed later? |
19:09:48 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/f45an/2024-08-03_buildkit.s3.amazonaws.com.txt, |
19:22:52 | <thuban> | StarletCharlotte: it's a known issue with concurrency (https://github.com/ArchiveTeam/wpull/issues/397, can manifest as "connection closed" as well as "read timeout") |
19:23:30 | <thuban> | i'll keep an eye on the job and crank it back down to 1 in time to retry the failed files |
19:23:43 | <StarletCharlotte> | Sounds good. |
19:44:12 | | nicolas17 joins |
19:55:31 | | etnguyen03 (etnguyen03) joins |
19:58:59 | | tzt quits [Ping timeout: 272 seconds] |
20:13:13 | | DogsRNice joins |
20:17:26 | <katia> | do/should we #down-the-tube youtube videos on wikipedia? |
20:21:39 | | shgaqnyrjp quits [Ping timeout: 260 seconds] |
20:26:31 | | Island joins |
20:27:29 | | yarrow quits [Read error: Connection reset by peer] |
20:32:18 | | yarrow (yarrow) joins |
20:35:38 | | jacksonchen666 (jacksonchen666) joins |
21:37:50 | | shgaqnyrjp (shgaqnyrjp) joins |
21:44:17 | | Blu joins |
22:30:40 | | michaelblob (michaelblob) joins |
22:30:59 | | Blu quits [Ping timeout: 272 seconds] |
22:40:04 | | IDK quits [Client Quit] |
22:53:39 | | SF quits [Remote host closed the connection] |
22:57:15 | | BlueMaxima joins |
22:59:37 | | that_lurker quits [Remote host closed the connection] |
23:01:53 | | that_lurker joins |
23:01:53 | | that_lurker is now authenticated as that_lurker |
23:08:21 | | StarletCharlotte quits [Ping timeout: 272 seconds] |
23:20:06 | | SF joins |
23:20:36 | | SF quits [Remote host closed the connection] |
23:29:33 | | SF joins |
23:30:31 | | jacksonchen666 quits [Client Quit] |
23:42:13 | | lennier2 quits [Ping timeout: 255 seconds] |
23:46:43 | | etnguyen03 quits [Client Quit] |
23:59:03 | | loug42 quits [Client Quit] |