00:12:21Blu joins
00:13:13loug42 quits [Client Quit]
00:14:13<Blu>just like to say that roblox never shut down its user ads api after july 31st and its still available rn (https://www.roblox.com/user-sponsorship/getadimage?adId=1123)
00:16:33nepeat quits [Ping timeout: 272 seconds]
00:19:03nepeat (nepeat) joins
00:22:01etnguyen03 (etnguyen03) joins
00:52:49Island quits [Read error: Connection reset by peer]
00:54:57Island joins
01:03:40Blu quits [Ping timeout: 255 seconds]
01:18:34DogsRNice joins
02:36:03etnguyen03 quits [Client Quit]
02:46:58etnguyen03 (etnguyen03) joins
02:50:53yarrow quits [Read error: Connection reset by peer]
02:52:54yarrow (yarrow) joins
02:58:29Arcorann (Arcorann) joins
03:11:57etnguyen03 quits [Remote host closed the connection]
03:16:25eroc1990 quits [Ping timeout: 272 seconds]
03:21:20eroc1990 (eroc1990) joins
03:30:34<@JAA>Grabbing the form submissions on https://theamericapac.org/wp-content/custom/index.php for all ZIP codes from the USPS's list at https://postalpro.usps.com/ZIP_Locale_Detail with qwarc. Context: https://www.cnbc.com/2024/08/02/elon-musk-pac-voter-data-trump-harris.html
03:32:13<@JAA>Since they're POST, that won't work in the WBM, but at least the data will be there.
03:48:47<fireonlive>oh interesting, haven't seen this before
03:49:01<fireonlive>on https://tracker.archiveteam.org/ [WIP] Stack Exchange leads to a 404: https://tracker.archiveteam.org/stack-exchange/
03:55:15Blu joins
04:16:09Guest54 quits [Client Quit]
04:17:54<tech234a>fireonlive: technically the leaderboard URL can be set to anything I think https://warriorhq.archiveteam.org/projects.json
04:26:05<fireonlive>ahh
04:30:13Blu quits [Ping timeout: 255 seconds]
04:36:26Mist8kenGAS (Mist8kenGAS) joins
04:44:18<h2ibot>Tech234a edited Current Projects (+205, Move Reddit to hiatus, split longer term…): https://wiki.archiveteam.org/?diff=53171&oldid=52489
04:47:12<fireonlive>tech234a++
04:47:12<eggdrop>[karma] 'tech234a' now has 2 karma!
04:55:32DogsRNice quits [Read error: Connection reset by peer]
05:25:04<pabs>hmm, does AB not detect links from uppercase HREF attributes?
05:26:00Mist8kenGAS_ joins
05:26:41<@JAA>It should find them, and there's even a test for it in wpull's test suite.
05:27:34<@JAA>Although the test suite only tests html5lib, not lxml. Hmm...
05:27:58<@JAA>Er no, it tests both, except on PyPy.
05:29:25Mist8kenGAS quits [Ping timeout: 272 seconds]
05:38:24Wohlstand (Wohlstand) joins
05:41:06<pabs>an example where it didn't 4iow663ym6arp9cfjvd0x1ka1 https://www.xach.com/gimp/tutorials/cvsgimp.html
05:41:34<pabs>the page contains some ftp links with HREF and one other link with href and it only got the href one
05:47:52<@JAA>All links are capitalised HREF on that page for me.
05:48:33Mist8kenGAS_ quits [Client Quit]
05:49:36<@JAA>!remindme 12h Check what 4iow663ym6arp9cfjvd0x1ka1 extracted exactly.
05:49:36<eggdrop>[remind] ok, i'll remind you at 2024-08-03T17:49:36Z
05:53:48<pabs>hmm, I must have misread
05:54:13<pabs>so its just that the ftp links were ignored somehow
05:54:30<pabs>they are all tarballs
05:56:02<@JAA>Yeah, wpull doesn't follow links from HTTP(S) to FTP by default, and I don't think AB enables it either.
05:57:13<@JAA>That's inherited from wget, not a wpull invention.
05:57:23<@JAA>--follow-ftp is the option to enable such recursion.
05:58:17<@JAA>(Not available in AB)
06:12:03Island quits [Read error: Connection reset by peer]
06:52:01IDK (IDK) joins
07:09:29Unholy236192464537713 quits [Ping timeout: 272 seconds]
07:20:21Wohlstand quits [Client Quit]
07:48:07Arcorann quits [Ping timeout: 272 seconds]
07:50:16Arcorann (Arcorann) joins
07:51:20Arcorann quits [Remote host closed the connection]
07:56:48<h2ibot>Exorcism edited Mailman/2 (-92): https://wiki.archiveteam.org/?diff=53172&oldid=53167
08:01:12loug42 joins
08:02:48<h2ibot>Exorcism edited MoinMoin (+0): https://wiki.archiveteam.org/?diff=53173&oldid=53169
08:37:11michaelblob quits [Read error: Connection reset by peer]
08:53:28tzt quits [Ping timeout: 255 seconds]
09:00:04Bleo1826007227196 quits [Client Quit]
09:01:25Bleo1826007227196 joins
09:08:55andybak joins
09:11:59<h2ibot>Exorcism edited MoinMoin (-37): https://wiki.archiveteam.org/?diff=53174&oldid=53173
09:14:16qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
09:14:26andybak quits [Client Quit]
09:35:20sonick (sonick) joins
09:40:49IDK quits [Client Quit]
10:01:13tmob quits [Read error: Connection reset by peer]
11:00:03Bleo1826007227196 quits [Client Quit]
11:01:22Bleo1826007227196 joins
11:50:41SkilledAlpaca quits [Client Quit]
11:52:11SkilledAlpaca joins
12:00:42etnguyen03 (etnguyen03) joins
12:13:47StarletCharlotte joins
12:14:23<StarletCharlotte>Hey, so... What's the best way to archive all of the files in a public S3 bucket?
12:15:27<katia>https://gitea.arpa.li/JustAnotherArchivist/little-things/src/branch/master/s3-bucket-list
12:15:47<katia>+ archivebot
12:15:49<katia>what's the bucket?
12:19:25<StarletCharlotte>@katia Well, I'm hesitant to say what it is publicly before it can be backed up because I don't want the "OMG IT'S THE PETALEAK" people to get their hands on it and get it taken down like the TestFlight stuff. Should I DM you or something?
12:21:23<katia>StarletCharlotte, sure, you can dm me, but i can't do much more than list the files with that script + archive it with archivebot
12:21:59<StarletCharlotte>@katia: As long as it all gets backed up somewhere safe.
12:53:43etnguyen03 quits [Client Quit]
13:25:33etnguyen03 (etnguyen03) joins
13:32:28Gadelhas56 quits [Ping timeout: 255 seconds]
13:49:10etnguyen03 quits [Client Quit]
14:23:03loug42 quits [Client Quit]
14:23:15loug42 joins
14:28:49<h2ibot>Jarshua edited List of websites excluded from the Wayback Machine (+26, added woke-world.com): https://wiki.archiveteam.org/?diff=53175&oldid=53165
14:28:50<h2ibot>Censuro edited Talk:URLTeam (+94, /* Shouldn't archive.today be considered a URL…): https://wiki.archiveteam.org/?diff=53176&oldid=51920
14:28:51<h2ibot>QuieselWusul edited Discord (+64, added roblox discord server aggregator): https://wiki.archiveteam.org/?diff=53177&oldid=52307
14:29:49<h2ibot>Taka edited Deathwatch (+243, /* 2024 */ Added Akiba Souken): https://wiki.archiveteam.org/?diff=53178&oldid=53106
14:31:59Gadelhas56 joins
15:00:55<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=53179&oldid=53175
15:15:51tzt (tzt) joins
15:15:55shgaqnyrjp_ (shgaqnyrjp) joins
15:17:23shgaqnyrjp quits [Ping timeout: 260 seconds]
15:20:19xarph_ quits [Ping timeout: 272 seconds]
15:20:23xarph joins
15:50:37midou quits [Ping timeout: 255 seconds]
16:04:59BearFortress quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
16:25:57midou joins
16:31:00IDK (IDK) joins
16:31:34midou quits [Ping timeout: 255 seconds]
16:35:16midou joins
17:05:16BearFortress joins
17:15:31StarletCharlotte quits [Remote host closed the connection]
17:17:41VerifiedJ9 quits [Remote host closed the connection]
17:18:24VerifiedJ9 (VerifiedJ) joins
17:49:36<eggdrop>[remind] JAA: Check what 4iow663ym6arp9cfjvd0x1ka1 extracted exactly.
18:52:12shgaqnyrjp_ is now known as shgaqnyrjp
19:08:27StarletCharlotte joins
19:09:48<StarletCharlotte>So I'm looking at the job for https://transfer.archivete.am/f45an/2024-08-03_buildkit.s3.amazonaws.com.txt, and it looks like it's missing quite a bit because of connection closed errors, despite the fact that those URLs are working. Is that... Normal? Is that going to be fixed later?
19:09:48<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/f45an/2024-08-03_buildkit.s3.amazonaws.com.txt,
19:22:52<thuban>StarletCharlotte: it's a known issue with concurrency (https://github.com/ArchiveTeam/wpull/issues/397, can manifest as "connection closed" as well as "read timeout")
19:23:30<thuban>i'll keep an eye on the job and crank it back down to 1 in time to retry the failed files
19:23:43<StarletCharlotte>Sounds good.
19:44:12nicolas17 joins
19:55:31etnguyen03 (etnguyen03) joins
19:58:59tzt quits [Ping timeout: 272 seconds]
20:13:13DogsRNice joins
20:17:26<katia>do/should we #down-the-tube youtube videos on wikipedia?
20:21:39shgaqnyrjp quits [Ping timeout: 260 seconds]
20:26:31Island joins
20:27:29yarrow quits [Read error: Connection reset by peer]
20:32:18yarrow (yarrow) joins
20:35:38jacksonchen666 (jacksonchen666) joins
21:37:50shgaqnyrjp (shgaqnyrjp) joins
21:44:17Blu joins
22:30:40michaelblob (michaelblob) joins
22:30:59Blu quits [Ping timeout: 272 seconds]
22:40:04IDK quits [Client Quit]
22:53:39SF quits [Remote host closed the connection]
22:57:15BlueMaxima joins
22:59:37that_lurker quits [Remote host closed the connection]
23:01:53that_lurker joins
23:08:21StarletCharlotte quits [Ping timeout: 272 seconds]
23:20:06SF joins
23:20:36SF quits [Remote host closed the connection]
23:29:33SF joins
23:30:31jacksonchen666 quits [Client Quit]
23:42:13lennier2 quits [Ping timeout: 255 seconds]
23:46:43etnguyen03 quits [Client Quit]
23:59:03loug42 quits [Client Quit]