00:05:27TheTechRobo is now known as OriginalUsername
00:05:34OriginalUsername is now known as TheTechRobo
00:08:38<pabs>aww, one site I wanted to crawl has some links only in comments :(
00:11:49eggdrop quits [Client Quit]
00:13:08eggdrop (eggdrop) joins
00:16:33<pabs>project10: I definitely think AT needs a system for feeding links discovered by #archivebot, #wikibot, #// and other places to different projects. imgur links for eg usually 429 in AB. the mailman/buzilla/codearchiver/SWH/wikibot/etc projects could also use those auto-discovery services
00:19:02<project10>especially with (seemingly?) longer-running projects now like imgur, reddit, telegram. I assume things like imgur discovered through #// are sent via backfeed to the imgur queue in that case?
00:19:27<nicolas17>I *think* #// and other projects do feed into imgur, but not archivebot
00:20:07<nulldata>https://sh.itjust.works/post/4842435
00:20:47BearFortress joins
00:20:55<project10>so what, it was based on the metro novels? or just a metro-like idea?
00:21:26eggdrop quits [Ping timeout: 252 seconds]
00:25:19eggdrop (eggdrop) joins
00:45:57KoalaFritto joins
00:46:48KoalaFritto30 joins
00:47:50etnguyen03 quits [Ping timeout: 252 seconds]
00:47:50KoalaFritto30 quits [Remote host closed the connection]
00:51:05etnguyen03 (etnguyen03) joins
00:51:17KoalaFritto quits [Ping timeout: 265 seconds]
01:14:00etnguyen03 quits [Ping timeout: 265 seconds]
01:15:02etnguyen03 (etnguyen03) joins
01:26:38<h2ibot>PaulWise edited Bugzilla (+785, add bugzilla-url-list by JAA strategy): https://wiki.archiveteam.org/?diff=50756&oldid=50599
01:28:41eggdrop quits [Client Quit]
01:30:11etnguyen03 quits [Ping timeout: 252 seconds]
01:31:51eggdrop (eggdrop) joins
01:51:22eggdrop quits [Client Quit]
01:55:09eggdrop (eggdrop) joins
02:42:29<thuban>arkiver: i've checked periodically, but i still just get redirects to the shutdown notice. plcp might know more
02:56:29etnguyen03 (etnguyen03) joins
02:58:56<h2ibot>DigitalDragon edited NewsGrabber (+18): https://wiki.archiveteam.org/?diff=50757&oldid=50706
02:59:38<fireonlive>that works
03:02:46<project10>does AB have a max size per fetched URL? I see the debian.org/releases job fetching netinst ISOs but no others, I assume size limit at play?
03:04:43<pabs>hmm, didn't mean to fetch those
03:22:06krvme joins
03:25:08decagon__ quits [Ping timeout: 252 seconds]
03:34:04<thuban>archivebot jobs for katapult are all done; i will grab the meta files and extract srcset components when they get uploaded (probably tomorrow)
03:42:23dumbgoy quits [Ping timeout: 265 seconds]
03:50:58<DogsRNice>what does ab do with 429 errors?
03:53:09<DogsRNice>im noticing on the empire minecraft job that its not getting imgur links and some of them arent in the wbm, the rest were grabbed by the imgur project already
04:06:33kiryu quits [Ping timeout: 265 seconds]
04:07:31<pokechu22>It retries them twice and then dismisses them, but imgur will never succeed with AB - you'll need to download the meta-warc and send a list of them to the imgur project
04:08:18kiryu (kiryu) joins
04:11:21<DogsRNice>not really sure how to do that lol
04:11:53<DogsRNice>kind sounds like something that could be automated (not that i know how to do that either)
04:15:15etnguyen03 quits [Ping timeout: 265 seconds]
04:21:46DogsRNice quits [Read error: Connection reset by peer]
04:22:36etnguyen03 (etnguyen03) joins
04:33:34kiryu quits [Remote host closed the connection]
04:35:07kiryu joins
04:35:07kiryu quits [Changing host]
04:35:07kiryu (kiryu) joins
04:51:09etnguyen03 quits [Client Quit]
05:57:14BlueMaxima quits [Read error: Connection reset by peer]
06:22:26Dango360 quits [Read error: Connection reset by peer]
06:24:46nicolas17 quits [Client Quit]
06:51:27railen63 quits [Remote host closed the connection]
07:00:08nfriedly quits [Remote host closed the connection]
07:02:31BigBrain_ quits [Ping timeout: 245 seconds]
07:02:51Arcorann (Arcorann) joins
07:03:18Arcorann quits [Remote host closed the connection]
07:05:03Unholy236131661808515 quits [Remote host closed the connection]
07:07:10Unholy236131661808515 (Unholy2361) joins
07:13:56nulldata quits [Ping timeout: 252 seconds]
07:15:58greg joins
07:16:59nulldata (nulldata) joins
07:21:02greg quits [Remote host closed the connection]
07:24:30Arcorann (Arcorann) joins
07:32:04BigBrain_ (bigbrain) joins
08:04:09Shampoo2140 quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
08:04:35Shampoo2140 joins
08:05:48nulldata quits [Ping timeout: 265 seconds]
08:06:16Shampoo2140 quits [Client Quit]
08:07:57Shampoo2140 joins
08:08:29nulldata (nulldata) joins
08:47:51qw3rty joins
09:03:23nulldata quits [Ping timeout: 252 seconds]
09:06:31nulldata (nulldata) joins
09:22:56BigBrain_ quits [Ping timeout: 245 seconds]
09:25:28BigBrain_ (bigbrain) joins
09:35:50gfhh quits [Ping timeout: 252 seconds]
09:41:12bilboed quits [Quit: The Lounge - https://thelounge.chat]
09:41:32bilboed joins
09:45:37Shampoo2140 quits [Client Quit]
09:47:22Shampoo2140 joins
10:02:00igloo22225 quits [Quit: The Lounge - https://thelounge.chat]
10:03:17igloo22225 (igloo22225) joins
10:08:31nfriedly joins
10:14:29Shampoo2140 quits [Client Quit]
10:16:13Shampoo2140 joins
11:01:17icedice (icedice) joins
11:03:26Shampoo2140 quits [Client Quit]
11:03:44Shampoo2140 joins
11:04:55Shampoo2140 quits [Client Quit]
11:06:37Shampoo2140 joins
11:35:22gfhh joins
11:59:03icedice quits [Client Quit]
12:07:59Carnildo_again joins
12:08:39Carnildo quits [Read error: Connection reset by peer]
12:21:00Island quits [Ping timeout: 265 seconds]
12:28:00Megame (Megame) joins
12:38:14etnguyen03 (etnguyen03) joins
12:52:49icedice (icedice) joins
13:11:59gfhh quits [Ping timeout: 252 seconds]
13:12:31gfhh joins
13:29:17JohnnyJ joins
13:41:14Arcorann quits [Ping timeout: 265 seconds]
13:44:30andrew quits [Client Quit]
13:45:32etnguyen03 quits [Ping timeout: 252 seconds]
13:47:08andrew (andrew) joins
13:50:53LeGoupil joins
14:12:45PredatorIWD_ joins
14:16:02PredatorIWD quits [Ping timeout: 265 seconds]
14:17:10<h2ibot>JustAnotherArchivist edited The WARC Ecosystem (+304, /* Tools */ Add ArchiveBox): https://wiki.archiveteam.org/?diff=50758&oldid=50711
14:39:24Island joins
14:44:57<anarcat>not sure if this is -ot but we might need a watch on bandcamp https://teddydd.me/2023/backup-your-bandcamp-music/
14:49:00LeGoupil quits [Client Quit]
15:03:53HP_Archivist quits [Ping timeout: 265 seconds]
15:05:58etnguyen03 (etnguyen03) joins
15:13:52Megame quits [Client Quit]
15:13:55icedice quits [Client Quit]
15:18:51<TheTechRobo>Wonder if archivebox could use wget-AT
15:26:21railen63 joins
15:35:29kiryu quits [Remote host closed the connection]
15:36:46kiryu joins
15:36:46kiryu quits [Changing host]
15:36:46kiryu (kiryu) joins
15:41:36icedice (icedice) joins
15:42:22<icedice><anarcat> not sure if this is -ot but we might need a watch on bandcamp https://teddydd.me/2023/backup-your-bandcamp-music/
15:42:31<icedice>Reminds me of Amazon Prime Video's bs
15:43:41<fireonlive>TheTechRobo: if it did i would be so happy
15:59:48Dango360 (Dango360) joins
16:08:38kiryu quits [Remote host closed the connection]
16:10:24dumbgoy joins
16:17:09<qq44|m>how can I mirror a site with wget-lua and include all page requisites?
16:17:23<qq44|m>--recursive, --mirror, and --page-requisites doesn't seem to work
16:20:39<imer>qq44|m: are you using a lua script? (I dont know the solution, I assume wget-lua behaved like regular wget, but with more scripting)
16:21:28<qq44|m>imer: im not using a script, just plain old wgetlua
16:22:40<qq44|m>grab-site seems to work properly with page requisites, but wget doesn't seem to pull them with recursive downloads
16:37:39etnguyen03 quits [Ping timeout: 265 seconds]
17:01:06andrew6 (andrew) joins
17:02:17ferro joins
17:03:16andrew quits [Ping timeout: 265 seconds]
17:03:23andrew6 is now known as andrew
17:03:35ferro quits [Remote host closed the connection]
18:00:57<h2ibot>FireonLive edited Current Projects (-163, Remove NG -- superseded by URLs): https://wiki.archiveteam.org/?diff=50759&oldid=50685
18:03:41xarph quits [Ping timeout: 265 seconds]
18:27:39<pokechu22>Looking at https://web.archive.org/web/20230000000000*/https://e.orange.fr/error404.html some captures show in blue and some show in orange - I'm pretty sure https://e.orange.fr/error404.html always returns 404, so is there a reason for them being blue? (that page has a ton of captures because any personal page that had a 404 or didn't exist would *redirect* there, and
18:27:42<pokechu22>archivebot doesn't dedupelicate redirect targets)
18:28:53<qq44|m>pokechu22: perhaps the server didn't return 404 error code in the headers, and instead returned 200 but said 404 on the page?
18:30:02<pokechu22>Picking a snapshot from april 2 that shows as blue (https://web.archive.org/web/20230402090744/https://e.orange.fr/error404.html) still shows a 404 in my developer tools when loading the page
18:46:30<@JAA>I've found the colours in the calendar to be wildly inaccurate all the time.
18:50:01<fireonlive>calendars, the bane of our existence
18:51:15<@JAA>In other news, my FuzzyMemories.TV grab-site crawl finished.
18:52:05petrichor quits [Quit: ZNC 1.8.2 - https://znc.in]
18:52:08<@JAA>Three /watch/ URLs failed, otherwise it looks fine.
18:52:26<imer>nice
18:53:02<@JAA>4232 video files from 4761 attempted IDs
18:53:13petrichor (petrichor) joins
18:53:51petrichor quits [Client Quit]
18:54:50<@JAA>Random example of a video where the file is a 404: http://www.fuzzymemories.tv/watch/2276/kiddieland-amusement-park-commercial-1-1990/
18:55:04petrichor (petrichor) joins
18:55:17<fireonlive>awesome :)
18:55:28jacksonchen666 (jacksonchen666) joins
18:55:36petrichor quits [Client Quit]
18:55:46<@JAA>Total WARC size is 107 GiB. It'll be on its slow way to IA soon.
18:57:03petrichor (petrichor) joins
19:07:03jacksonchen666 quits [Client Quit]
19:07:22etnguyen03 (etnguyen03) joins
19:14:42KoalaFritto joins
19:20:19Island_ joins
19:22:57Island quits [Ping timeout: 265 seconds]
19:23:56erkinalp joins
19:24:57andrew quits [Client Quit]
19:28:05andrew (andrew) joins
19:31:16Carnildo_again is now known as Carnildo
19:32:51leo60228 quits [Quit: ZNC 1.8.2 - https://znc.in]
19:33:14leo60228 (leo60228) joins
19:43:46erkinalp quits [Remote host closed the connection]
19:56:17<h2ibot>JustAnotherArchivist created The Museum of Classic Chicago Television (+611, Created page with "{{Infobox project | URL =…): https://wiki.archiveteam.org/?title=The%20Museum%20of%20Classic%20Chicago%20Television
19:57:18<h2ibot>JustAnotherArchivist created FuzzyMemories.TV (+54, Redirected page to [[The Museum of Classic…): https://wiki.archiveteam.org/?title=FuzzyMemories.TV
19:57:19<h2ibot>JustAnotherArchivist created FuzzyMemoriesTV (+54, Redirected page to [[The Museum of Classic…): https://wiki.archiveteam.org/?title=FuzzyMemoriesTV
20:06:57katocala quits [Remote host closed the connection]
20:08:50givemeawhisper joins
20:09:13givemeawhisper quits [Remote host closed the connection]
20:54:10efeafewa quits [Remote host closed the connection]
21:06:21efeafewa joins
21:09:11shinji257 quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
21:12:37<h2ibot>FireonLive edited Reddit (+129, wording fix?): https://wiki.archiveteam.org/?diff=50763&oldid=50722
21:13:18<flashfire42>https://wiki.archiveteam.org/index.php/ArchiveBot/2019_Australian_federal_election a question do pages like this still work as originally intended?
21:13:42<pokechu22>https://wiki.archiveteam.org/index.php/Special:Contributions/HadeanEon makes me think no
21:13:52<@JAA>No
21:14:31<flashfire42>Alright then. I will have to use the viewer to work out what to put in and what not then. All good. Still good sources of things to throw in.
21:20:01nicolas17 joins
21:41:59DogsRNice joins
21:45:39KoalaFritto quits [Remote host closed the connection]
21:50:15shinji257 (shinji257) joins
21:57:08etnguyen03 quits [Ping timeout: 265 seconds]
22:11:52<h2ibot>JustAnotherArchivist edited The Museum of Classic Chicago Television (+597, Add known archives): https://wiki.archiveteam.org/?diff=50764&oldid=50760
22:17:54BlueMaxima joins
22:39:46etnguyen03 (etnguyen03) joins
22:57:17<fireonlive>-+rss:#hackernews- Microsoft to kill off third-party printer drivers in Windows: https://www.theregister.com/2023/09/11/go_native_or_go_home/ https://news.ycombinator.com/item?id=37473628
22:57:19<fireonlive>"To be clear, the end of servicing applies to drivers provided via Windows Update. Manufacturers will, according to Microsoft, "need to provide customers with an alternative means to download and install those printer drivers." Legacy v3 and v4 Windows printer drivers are facing the end of servicing ax."
23:25:07<nicolas17>I have never seen 3rd party drivers updating via Windows Update
23:26:47<fireonlive>looks like this 'Mopria' has existed for a while and more newer printers are using it?
23:38:26nicolas17 quits [Ping timeout: 252 seconds]
23:42:41nicolas17 joins