00:12:01nepeat quits [Ping timeout: 272 seconds]
00:13:32nepeat (nepeat) joins
00:17:21ATinySpaceMarine quits [Client Quit]
00:20:15nepeat quits [Ping timeout: 272 seconds]
00:21:06etnguyen03 quits [Client Quit]
00:23:30nepeat (nepeat) joins
00:32:33<klea>btw, JAA if you give dir-to-ia a folder with a file that has no content it will fail as dirty because IA doesn't allow files with no content-length
00:33:09<klea>A request of the requested method PUT requires a valid Content-length.
00:38:54<katia>We archiving empty files now ig
00:39:54<klea>no, but i noticed that it marked it as dirty when i accidentally left an empty file from test.
00:39:54<@JAA>Reasonable
00:40:13<@JAA>I mean, I could make it fail locally instead.
00:40:27<klea>you could make it not mark it as dirty, or is that a bad idea?
00:40:35<katia>Doesn’t seem better
00:40:39<katia>Fix your empty files
00:41:01<klea>yeah, that is also a good idea
00:41:13<klea>and having dir-to-ia add a marker when it has successfully finished uploading would be possible?
00:41:17<@JAA>This might actually be a bug in ia-upload-stream though. You can upload empty files to IA.
00:41:23<klea>oh.
00:41:37<katia>klea: have a patch?
00:41:48<klea>katia: i can create patch if JAA wants.
00:41:53<@JAA>A significant percentage of DPoS items have an empty .tar.
00:42:10<@JAA>Random example: https://archive.org/download/archiveteam_urls_20260125001525_d9e94fa7
00:42:17<klea>JAA: we can't see.
00:42:22<klea>oh wait what
00:42:27<@JAA>Of course you can.
00:42:52<klea>huh
00:45:26<c3manu>bookdown.org is still running, but by now i queued all the books listed on https://www.bookdown.org/home/archive/ (i don't think i missed any, i went through them manually)
00:46:07<c3manu>i mean the ones that are hosted on domains other than bookdown.org
00:59:01ATinySpaceMarine joins
01:03:47Shard116 (Shard) joins
01:04:35Shard11 quits [Ping timeout: 272 seconds]
01:04:35Shard116 is now known as Shard11
01:04:59etnguyen03 (etnguyen03) joins
01:20:01midou joins
01:24:11<klea>i wonder why JAA choose the hextable approach rather than strtol or strtoul: https://stackoverflow.com/questions/10324/convert-a-hexadecimal-string-to-an-integer-efficiently-in-c
01:24:24<klea>from #archiveteam-ot (i was told that was off topic there)
01:24:33Sk1d quits [Read error: Connection reset by peer]
01:25:10<@JAA>I didn't write that code. But I suspect a static array lookup is faster than a function call.
01:25:35<steering>especially when that function call is gonna do a bunch of math on it
01:25:40<klea>oh.
01:25:50<steering>(and a ton of branches)
01:26:00<steering>https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtol_l.c;h=ac53312ba87f48a86ca7a6e494ea8ae0838e0b6e;hb=HEAD#l236
01:26:13<@JAA>Anubis--
01:26:13<eggdrop>[karma] 'Anubis' now has -14 karma!
01:26:15<klea>anubis--
01:26:15<eggdrop>[karma] 'anubis' now has -15 karma!
01:27:03<klea>i was too far away from keyboard to type a uppercase a (also was going to type it with the wrong hand)
01:27:23midou quits [Ping timeout: 272 seconds]
01:27:25<@JAA>Yeah, fun function
01:30:19<PC>is #photosucket dead? i've got some image URLs that aren't saving (and since it may be a free account, they might not be around forever, but i think they'll last a few months), but the channel seems pretty quiet
01:31:03<@JAA>Channels for inactive projects tend to be quiet. It's the right channel.
01:32:14<PC>nods. so inactive currently? wondering what the best way to get those images on the WBM would be. (i'm also curious if that'll fix them where they're embedded, since i've got a livejournal post with a bunch of them that are just showing as broken images)
01:33:45<klea>JAA: how bad of an idea is running this script in a loop somewhere as long as i won't have dir-to-ia folders in the dir-to-ia directory that will be getting new files in a incremental manner? <https://transfer.archivete.am/inline/vAP6T/clean-dir-to-ia.sh>
01:40:11<@JAA>klea: I can't be bothered to think about the interactions with the current version of dir-to-ia, but regardless, you'd be relying on implementaiton details that might change at any time.
01:40:34<klea>could you expose things that aren't implementation details that won't change?
01:40:49<@JAA>Those are in the config file.
01:41:00<@JAA>And in the --help text.
01:41:11PC quits [Quit: PC]
01:41:33<klea>that doesn't allow me to hook into jobs failing, or jobs completing succesfully
01:41:53<klea>i wanted to delete just the .dir-to-ia metadata files and then the directory they're in.
01:41:53<@JAA>Correct
01:42:48PC joins
01:58:44Wohlstand quits [Quit: Wohlstand]
02:12:19<@Fusl>klea justauser: update just in case youre wondering, jason pointed me to a discord server with people that archive old gaming magazines and similar stuff, im talks with them now
02:12:33<klea>ah
02:12:56<klea>Fusl: if you could, and the discord seems to not be too small, could you consider giving a invite to #discard for archival?
02:15:41<@Fusl>klea: sure, just did
02:15:47<klea>thanks
02:15:53<@Fusl>ofc!
02:20:18PC quits [Ping timeout: 256 seconds]
02:33:35midou joins
02:42:01<kiska>Fusl as someone who works in freight, do not send those magazines internationally. They will experience the most horrific things that happen when a worker has to move up to 1.2k cartons per hour
02:51:46<nicolas17>many years ago I had a subscription to a weekly electronics magazine, which came with components
02:52:25<nicolas17>in a little bag together attached to the magazine
02:52:32<nicolas17>all the way from Spain to Argentina
02:52:35<nicolas17>there wasn't a single issue where the integrated circuits came intact, I always had to un-bend the pins
02:54:04<klea>i wonder what company did that.
02:54:51<nicolas17>klea: well a few years ago I scanned every single of the 1200 pages so here you go https://archive.org/details/fg-electronica-modular/
02:55:01<klea>huh
02:55:04<klea>how did you do those scans?
02:55:30<nicolas17>flatbed scanner
02:55:53<nicolas17>the pages were pretty much designed to be detached and put into a binder
02:56:01<klea>oh
02:56:50<that_lurker>IA has some interesting magazines stored
02:57:04<klea>btw, how did you load it to archive?
02:57:28<klea>oh simply electronica-modular-NN_images.zip files?
02:57:31<nicolas17>yes
02:57:36<klea>nice
02:58:00<nicolas17>the format is simply a zip with image files, which should have filenames that sort correctly
02:58:20<that_lurker>nicolas17: you should add the scanner to the item metadata :-)
02:58:37<@Fusl>nicolas17: flatbed scanner is probably also how im going to archive mine if i cant find anyone who will do it for me but its going to be a lot of work to go through all of them :/
02:59:42<klea>nicolas17 apparently went trough 1200 pages :p
02:59:42<nicolas17>that_lurker: it's unclear to me what the 'scanner' metadata property is supposed to mean
03:00:18<klea><https://archive.org/developers/metadata-schema/index.html#scanner>
03:00:43<nicolas17>given that the uploader likes to stick "Internet Archive HTML5 Uploader 1.7.0" there
03:01:18<klea>i suppose it should be the physical flatbed scanner that you used.
03:07:20Wohlstand1 (Wohlstand) joins
03:09:48Wohlstand1 is now known as Wohlstand
03:27:44oxtyped quits [Ping timeout: 256 seconds]
03:29:18<h2ibot>Klea edited List of websites excluded from the Wayback Machine/Partial exclusions (+246, Add…): https://wiki.archiveteam.org/?diff=60331&oldid=60321
03:35:04<cruller>That reminds me, is [[ArchiveCorps]] still alive?
03:36:46<klea>given the last actual edit with actual changes that aren't gramatical, automated changes, or changes in style was in 2015-08-30, i'd say no. <https://wiki.archiveteam.org/index.php?title=ArchiveCorps&oldid=24132>
03:36:50oxtyped joins
03:37:07Webuser516269 quits [Quit: Ooops, wrong browser tab.]
03:37:13<klea>i should've used the Special: revision link format.
03:39:39DogsRNice quits [Read error: Connection reset by peer]
03:44:23<cruller>Their site was accessible as of last August, but was not being actively updated. https://web.archive.org/web/20250101000000*/http://www.archivecorps.org/
03:47:44<klea>domain got parked 2025-09-08 so likely expired. <view-source:https://web.archive.org/web/20250908154038if_/http://www.archivecorps.org/>
03:47:57<klea>smh
03:48:01<klea>(*) <https://web.archive.org/web/20250908154038if_/http://www.archivecorps.org/>
03:53:11Webuser692881 joins
03:53:22Webuser692881 quits [Client Quit]
04:03:52Wohlstand quits [Client Quit]
04:20:03etnguyen03 quits [Quit: Konversation terminated!]
04:25:48etnguyen03 (etnguyen03) joins
04:38:05etnguyen03 quits [Client Quit]
04:41:55etnguyen03 (etnguyen03) joins
04:47:04midou quits [Ping timeout: 256 seconds]
04:48:31Wohlstand1 (Wohlstand) joins
04:50:54Wohlstand1 is now known as Wohlstand
05:04:37n9nes quits [Ping timeout: 272 seconds]
05:04:40n9nes joins
05:05:33gosc joins
05:09:10etnguyen03 quits [Remote host closed the connection]
05:12:01lukash984 joins
05:14:52<@Fusl>kiska: yeah i dont intend to send them internationally and much rather prefer to drive them somewhere tbh
05:16:41midou joins
05:23:58Guest58 quits [Read error: Connection reset by peer]
05:55:55oxtyped quits [Ping timeout: 272 seconds]
06:00:44lukash984 quits [Ping timeout: 256 seconds]
06:03:26oxtyped joins
06:20:14nexussfan quits [Quit: Konversation terminated!]
06:45:01Wohlstand quits [Client Quit]
06:51:17oxtyped quits [Read error: Connection reset by peer]
06:57:00Kotomind joins
07:00:20oxtyped joins
07:30:44Island quits [Read error: Connection reset by peer]
07:39:36gosc quits [Client Quit]
07:41:14ThetaDev_ joins
07:42:10ThetaDev quits [Ping timeout: 256 seconds]
08:21:35midou quits [Ping timeout: 272 seconds]
08:26:58Webuser908883 joins
08:38:58cyanbox quits [Read error: Connection reset by peer]
08:55:28Sk1d joins
09:01:52midou joins
09:07:11Sk1d quits [Ping timeout: 272 seconds]
09:19:59Dada joins
09:47:58APOLLO03 quits [Ping timeout: 256 seconds]
09:48:10Dada quits [Remote host closed the connection]
09:49:36Dada joins
09:53:56nyakase quits [Quit: @ERROR: max connections (-1) reached -- try again later]
09:59:20APOLLO03 joins
10:04:02nyakase (nyakase) joins
10:10:11Sk1d joins
11:02:19NatTheCat quits [Quit: Ping timeout (120 seconds)]
11:02:44NatTheCat (NatTheCat) joins
11:02:46Sk1d quits [Ping timeout: 256 seconds]
11:17:07evergreen5 quits [Quit: Bye]
11:17:47evergreen5 joins
11:31:46Webuser908883 quits [Quit: Ooops, wrong browser tab.]
11:46:48Sk1d joins
11:49:04chunkynutz609 joins
11:49:14chunkynutz60 quits [Ping timeout: 256 seconds]
11:49:14chunkynutz609 is now known as chunkynutz60
12:00:03Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
12:01:30PC joins
12:02:48Bleo182600722719623455222 joins
13:10:33Sk1d quits [Read error: Connection reset by peer]
13:20:28pabs quits [Ping timeout: 256 seconds]
13:21:43pabs (pabs) joins
13:43:47Webuser787687 joins
14:37:30<justauser>It'd be neat to do something with https://enigma-dev.org/, but with all the software down here (wiki, forum, bugtracker, etc), some of it custom, I'm afraid to simply let AB loose on it. Any suggestions?
14:37:57Wohlstand (Wohlstand) joins
14:43:50cyanbox joins
14:52:43cyanbox quits [Read error: Connection reset by peer]
14:58:46etnguyen03 (etnguyen03) joins
15:19:57nexussfan (nexussfan) joins
15:24:33aninternettroll quits [Remote host closed the connection]
15:27:02aninternettroll (aninternettroll) joins
15:31:10etnguyen03 quits [Client Quit]
15:38:14etnguyen03 (etnguyen03) joins
15:43:01Radzig quits [Ping timeout: 272 seconds]
15:43:16chrismrtn quits [Quit: Lost terminal]
15:43:17Radzig joins
15:45:26chrismrtn (chrismrtn) joins
15:46:27Hackerpcs quits [Remote host closed the connection]
15:49:06Hackerpcs (Hackerpcs) joins
15:54:00Hackerpcs quits [Remote host closed the connection]
15:54:49Hackerpcs (Hackerpcs) joins
15:58:05Webuser787687 quits [Client Quit]
16:00:27DogsRNice joins
16:04:12Webuser916646 joins
16:15:57Max_G quits [Ping timeout: 272 seconds]
16:17:06Max_G joins
16:27:21PC quits [Ping timeout: 272 seconds]
16:36:15<h2ibot>Klea uploaded File:Wtf-delete.png (…): https://wiki.archiveteam.org/?title=File%3AWtf-delete.png
16:43:17etnguyen03 quits [Client Quit]
16:59:19v01d joins
17:01:25PC joins
17:07:55etnguyen03 (etnguyen03) joins
17:21:05sec^nd quits [Remote host closed the connection]
17:21:25sec^nd (second) joins
17:31:49cyanbox joins
17:33:38Webuser149673 joins
17:36:02PC quits [Ping timeout: 256 seconds]
17:44:11Webuser149673 quits [Client Quit]
17:44:42katia is now known as BouncerServ
17:44:45pseudorizer (pseudorizer) joins
17:46:47BouncerServ is now known as katia
17:56:39pseudorizer quits [Ping timeout: 272 seconds]
18:07:20second (second) joins
18:09:22sec^nd quits [Ping timeout: 256 seconds]
18:09:22second is now known as sec^nd
18:21:11PC joins
18:25:27etnguyen03 quits [Client Quit]
18:36:32Island joins
18:51:14etnguyen03 (etnguyen03) joins
18:51:29<h2ibot>Justauser edited Twitter (+375, /* Archives */ - updates): https://wiki.archiveteam.org/?diff=60333&oldid=59351
19:07:16Kotomind quits [Ping timeout: 256 seconds]
19:07:34<h2ibot>Klea created Tally (+895, Created page with "{{underconstruction}} …): https://wiki.archiveteam.org/?title=Tally
19:20:43driib97 quits [Quit: The Lounge - https://thelounge.chat]
19:24:24driib97 (driib) joins
19:30:19irisfreckles13 joins
19:34:11Max_G quits [Ping timeout: 272 seconds]
19:34:46nyakase quits [Quit: @ERROR: max connections (-1) reached -- try again later]
20:31:49PC quits [Ping timeout: 272 seconds]
20:34:51Max_G joins
21:08:50PC joins
21:24:52<pokechu22>Can someone with experience in Chinese tell if https://mingpaocanada.com/tor/ is specifically shutting down on January 31, or just ceasing publication? The banner at the top says "明報加拿大網站將於2026年1月17日停止更新。即時新聞便會更新直至2026年1月31日" which Google translates to "Ming Pao Canada website will cease updates on January 17, 2026.
21:24:54<pokechu22>Breaking news updates will continue until January 31, 2026." presumably referring to https://www.mingpaocanada.com/realtimenews/tor/list_CANADA_NEW.cfm?m=0 but I don't see anything that specifically references shutdown. I also don't see any equivalent notice on https://mingpaocanada.com/van/ although it has stopped updating
21:27:14<pokechu22>both archivebot jobs seem unlikely to finish before Jan 31
21:28:39Dada quits [Remote host closed the connection]
21:32:01<klea>relating to bento, i went to https://bento.me/selim and got some beeps for exif locations on <https://storage.googleapis.com/creatorspace-public/users%2Fclb3ofthr0006jw08spl04a9x%2FfjlnKqpC4SzuTnWa-IMG_8616%25202.jpg> and <https://storage.googleapis.com/creatorspace-public/users%2Fclb3ofthr0006jw08spl04a9x%2F1kGNU3gSiguyq5gQ-IMG_1958.JPG> which seem to be on a public s3
21:32:01<klea>bucket,, should i ask in AB to get <https://storage.googleapis.com/creatorspace-public/>/
21:32:07<klea>s/\/$/?/
21:33:04<klea>asked in #archivebot .
21:34:08<pokechu22>listing that bucket now
21:36:39<klea>thanks
21:48:41SootBector quits [Remote host closed the connection]
21:49:50SootBector (SootBector) joins
21:51:29Sk1d joins
22:01:56lukash984 joins
22:13:20<PC>klea: can you add this link to that twitter batch from earlier? https://twitter.com/zunkome2/status/1713581981420618067 i'd appreciate it! (i also found a few from that batch that seem to just be 404s nows, let me know if that matters)
22:13:20<eggdrop>nitter: https://nitter.net/zunkome2/status/1713581981420618067
22:13:39<klea>PC: from what i know it'd just show that those failed on #jseater
22:14:04<pokechu22>I saw one earlier that I think didn't show up as failed - twitter is too scripty to have real 404s
22:14:09<klea>oh
22:14:11<klea>shitty
22:14:13<klea>aaa
22:14:14<PC>rip
22:14:31<klea>well, it'd just archive the not found page :(
22:15:05<PC>no worries, if it's gone, it's gone
22:16:33<PC>ugh, the annoying thing with one of my links though is that it's been marked as mature (it's a completely SFW pokemon drawing...), which means it shows up as a 404 when logged out... no idea how to get that one, but it explains why it's been breaking on archive.is (my temporary backup option)
22:16:42<PC>i think ghostarchive can handle those because it seems to use a dummy account
22:17:16<PC>(the link: https://twitter.com/maruyaki45/status/1198978911444451328 )
22:17:16<eggdrop>nitter: https://nitter.net/maruyaki45/status/1198978911444451328
22:18:35<PC>i've also got 36 links that don't seem to be marked as mature or anything, but are failing on archive.is for an unknown reason... at less of a risk of getting deleted than some other ones on my list though (i lost one since i first made it a month ago </3) so as long as they can get picked up /somehow/ in a WBM-able way, all will be good
22:23:50<nicolas17>unknownsrc: I'm still in disbelief that sites like linktree actually convince people to give them money
22:56:07<klea>btw, PC i believe outlinks from those urls aren't getting grabbed, if that's what you were interested in.
23:00:09SootBector quits [Remote host closed the connection]
23:01:16SootBector (SootBector) joins
23:02:07<PC>i was hoping to have the image embedded in the page (i have the actual image up on the WBM already, it's just nice to have it visible when loading the tweet)
23:02:10v01d quits [Remote host closed the connection]
23:02:28<PC>(that is, to have the image URL reachable in some way from just the tweet URL)
23:02:33<PC>that's all really
23:02:59<PC>(aside from the tweet text and metadata, obviously)
23:04:05<klea>the images that are loaded as part of jobs are archived.
23:04:08<klea>what aren't is outlinks
23:05:03<klea>like for example: <https://mnbot.very-good-quality-co.de/item/77bac3df-2eba-4ab6-a49b-22a4437eb8c7> has the image that's shown in the screenshot: <https://mnbot.very-good-quality-co.de/item/77bac3df-2eba-4ab6-a49b-22a4437eb8c7> but it hasn't by itself made any query for archival of https://ctccomic.com/comic/498/
23:05:15<klea>and also, the image won't appear on the WBM.
23:05:23<klea>because jsbot isn't approved there.
23:06:16<PC>re: the outlinks, that's fine! i'm gonna handle those separately at some point
23:06:21<klea>oh ok
23:06:49<PC>hmm, that's rough re: the images. i will have likely already gotten that same URL up on the WBM so we can see if it'll show up embedded anyway?
23:08:01<PC>ah, damn, just checked and nope, seems that's grabbing the name=small versions. i could get those up manually if it's always name=small? and hope it embeds when it's up
23:08:30<PC>it's a pain to dig through HTML to find the image url and them check which version of it has been archived just to see what it looked like, otherwise
23:08:48SootBector quits [Remote host closed the connection]
23:09:09<klea>PC: do they appear on the urls in the job on the webpage, iirc there's an api for it
23:09:26<PC>they're under requisities it seems
23:09:49<klea>ok, then i believe it could be automated out.
23:09:55SootBector (SootBector) joins
23:09:57Webuser916646 quits [Quit: Ooops, wrong browser tab.]
23:11:16lennier2_ joins
23:12:41<PC>! that'd be great!
23:13:47etnguyen03 quits [Quit: Konversation terminated!]
23:14:20lennier2 quits [Ping timeout: 256 seconds]
23:14:42<klea>could you tell me the other name?
23:14:43etnguyen03 (etnguyen03) joins
23:14:51<klea>https://pbs.twimg.com/media/DAEJ5VaVoAAwETR?format=jpg&name=small i suppose https://pbs.twimg.com/media/DAEJ5VaVoAAwETR?format=jpg&name=big?
23:14:55<PC>large
23:15:00<klea>thanks
23:15:06<PC>that's the largest size for images that aren't 4096x4096 (those are rare)
23:15:10<PC>i've already saved all of those though!
23:15:14<PC>for all of the links i've got
23:15:54<PC>i've also saved medium ones where those would be the ones popping up when clicking on the image and opening it in a new tab
23:15:57<PC>but not the small ones
23:16:16<@JAA>name=orig is supposed to return the original size.
23:16:28<PC>oh yes! i've not saved those but could if needed
23:17:05<PC>since orig is equivalent to the largest size the image fits in (large, 4096x4096, sometimes medium or 900x900 for smaller images, small is rare since images are rarely that small)
23:17:34<PC>i can come up with a list of the orig variants of the images, if that'd be helpful
23:17:37<klea>oh
23:17:48<klea>i was going to do large only, i guess i'll do two.
23:18:03<PC>those are i think the ones that are being embedded in the generated versions of the twitterarchive ones
23:18:28<PC>orig would catch both large images and ones that are 4096x4096, though i've already gotten those at the 4096x4096 where they exist
23:18:41<PC>and all the smaller ones at large, so those are all up on the WBM
23:24:17TunaLobster quits [Quit: So long and thanks for all the fish]
23:25:06TunaLobster joins
23:34:29Wohlstand quits [Quit: Wohlstand]
23:35:46v01d joins
23:38:22Wohlstand1 (Wohlstand) joins
23:40:48Wohlstand1 is now known as Wohlstand