00:03:37<@OrIdow6>VerifiedJ: Nice find, thnakfully seems brickshelf is the biggest
00:03:46<@OrIdow6>Died at 52, wonder what it was
00:04:45APOLLO_03 quits [Remote host closed the connection]
00:05:04APOLLO_03 joins
00:05:16libksplay_ joins
00:05:19APOLLO_03 quits [Client Quit]
00:08:13libksplay quits [Ping timeout: 260 seconds]
00:12:25APOLLO03 joins
00:13:10APOLLO03 quits [Client Quit]
00:13:28APOLLO03 joins
00:17:34<nicolas17>ok this is going *slow*, I wonder if someone else is doing their own crawling and overloading brickshelf
00:19:17snvy quits [Quit: Ooops, wrong browser tab.]
00:20:40etnguyen03 quits [Client Quit]
00:23:00<@OrIdow6>The "bio-media team", whatever that is, apparenty made a copy previously https://www.tumblr.com/nitro-nova/772438449878859776/hey-just-giving-an-update-while-making-this
00:23:06<@OrIdow6>But I don't see any mention of it happening now
00:24:24<nicolas17>request latency is 2-3 seconds rn
00:28:16<nicolas17>no errors or timeouts, just... slow
00:29:22<nicolas17>98.5MB in 3700 requests, got mostly html and thumbs so far, only a handful of high-res images
00:41:46<@OrIdow6>Some kind of throttling? Latency for me in a browser is ca 300 ms which isn't great, but isn't that
00:41:57<@OrIdow6>For a single request
00:44:39<nicolas17>it's better now
00:47:24UwU quits [Quit: bye]
00:49:27<h2ibot>Marlin edited List of websites excluded from the Wayback Machine (+32, add www.theknifemedia.com): https://wiki.archiveteam.org/?diff=54275&oldid=54265
00:49:28<h2ibot>FounderOfBeans edited List of websites excluded from the Wayback Machine (+25, kiddle.co): https://wiki.archiveteam.org/?diff=54276&oldid=54275
00:49:29<h2ibot>Hans5958 edited URLTeam (+65, /* "Official" shorteners */ mwb.link): https://wiki.archiveteam.org/?diff=54278&oldid=54173
00:49:35UwU joins
00:49:38AlsoHP_Archivist quits [Ping timeout: 260 seconds]
00:54:15LddPotato (LddPotato) joins
01:12:32<h2ibot>JustAnotherArchivist edited Deathwatch (+2087, /* 2025 */ Add various upcoming shutdowns): https://wiki.archiveteam.org/?diff=54279&oldid=54272
01:13:51beardicus (beardicus) joins
01:18:26beardicus quits [Ping timeout: 250 seconds]
01:19:42BornOn420 quits [Remote host closed the connection]
01:21:52BornOn420 (BornOn420) joins
01:30:02Wohlstand quits [Quit: Wohlstand]
01:32:16HP_Archivist (HP_Archivist) joins
01:34:45<@JAA>I'm poking around on EA Answers HQ. I don't like Lithium.
01:35:51<@JAA>Looks like it has per-board topic IDs, and URLs use the post/message ID instead.
01:36:12<@JAA>That's how the IDs go to over 14 million although there aren't anywhere near that many topics.
01:40:03etnguyen03 (etnguyen03) joins
01:40:34ljcool2006 quits [Quit: Leaving]
02:50:02<@JAA>I think they're actually both message IDs. Why they maintain two separate IDs for each post is anyone's guess.
03:03:29libksplay_ quits [Read error: Connection reset by peer]
03:03:42libksplay joins
03:08:28etnguyen03 quits [Client Quit]
03:11:29etnguyen03 (etnguyen03) joins
03:17:58Webuser675837 joins
03:19:27Webuser675837 quits [Client Quit]
03:28:18BlueMaxima quits [Read error: Connection reset by peer]
03:37:59StarletCharlotte joins
03:39:03<StarletCharlotte>So I was told https://918thefan.com/ was just put into archivebot, should I still put it up on Fire Drill?
03:43:54<@OrIdow6>StarletCharlotte: You can if you want, I don't think many people really check that page though
03:44:09<@OrIdow6>It seems like a pretty small and simple site?
03:44:46<StarletCharlotte>@OrIdow6: That is true. Several of the sites on Fire Drill appear to already be dead.
03:46:15Webuser584226 joins
03:48:51Webuser584226 quits [Client Quit]
04:00:34etnguyen03 quits [Remote host closed the connection]
04:39:07<@JAA>Apparently there's a media gallery, and the entries there get the same kind of ID as every post: https://answers.ea.com/t5/media/gallerypage
04:39:16<@JAA>E.g. https://answers.ea.com/t5/media/gallerypage/image-id/32077iB0518D4F06D5BD4D == 6412170
04:40:24<@JAA>There's also an 'Archived Boards' section somewhere that's not publicly accessible.
04:40:38<@JAA>Also, the response times are awful.
04:40:49<@JAA>Which makes sense considering each response is a couple hundred kB of HTML.
04:55:26<@JAA>Boards also get their own ID, although it doesn't appear anywhere. 12954682 == https://answers.ea.com/t5/Anciens-Jeux-FIFA/bd-p/FIFA-13-FR
05:02:01angenieux quits [Quit: The Lounge - https://thelounge.chat]
05:02:53angenieux (angenieux) joins
05:09:11<@arkiver>StarletCharlotte: doesn't matter if not too many people check, please put it there
05:31:31<pabs>if anyone here has time/inclination, I'm using this pad to classify TuxFamily-hosted sites; list errors, software types, related sites etc https://pad.notkiska.pw/p/archivebot-tuxfamily
05:31:48<pabs>(TuxFamily is "slowly dying")
05:32:39ljcool2006 joins
05:33:32<@arkiver>wow nice pabs!
05:33:34<@arkiver>that's a lot
05:35:48<pabs>yeah. got them from projects.tuxfamily.org outlinks and also subdomain enumeration
05:36:43<pabs>also, I already ran everything thru my wiki classification script and have done direct subdomains in #wikibot
05:37:13<pabs>but there may be some wikis missed by that, so if you see wikis linked from front pages etc, add those in please
05:43:17<Flashfire42>pabs I will attack them for telegram and youtube links If I have permission from arkiver to run them in the respective channels
05:43:43<pabs>please don't DoS them, the server may be a bit fragile :)
05:44:00<pabs>haven't seen any so far though
05:44:03<Flashfire42>I mean I will go through each link looking for telegram or youtube
05:44:09<pabs>ok cool
05:44:44<pabs>maybe also use ab2f or meta WARCs after each AB is done to see if any were found that way
05:48:01<@arkiver>pabs: what is ab2f?
05:48:34<pabs>katia's AB WebSocket storage project https://ab2f.archivingyoursh.it/
05:48:55<pabs>stores all the messages from the websocket in per-job files
05:49:29<pabs>makes it easy to see what happened recently in an AB job if you didn't have the dashboard open
05:50:37<@arkiver>that is very nice, i had no idea that existed
05:50:58<@arkiver>pabs: so first question is if there are more tuxfamily sites than you identified, and second question how to archive them?
05:50:59<pabs>was only mentioned on #archiveteam-dev I think
05:51:08<@JAA>Very new, has only existed for 2-ish weeks.
05:51:08<@arkiver>or did you figure out the "how to archive them" part yourself?
05:51:23<@arkiver>does it contain running jobs as well?
05:51:30<@JAA>Yes
05:52:33<pabs>arkiver: could be more sites, I asked #tuxfamily on their IRC server, no response yet. another option might be figuring out what domains are on the same IPs as existing domains. no idea how to do that
05:52:48<pabs>for archiving, I was just going to go through them all with AB and wikibot as appropriate
05:53:11<@arkiver>alright
05:53:25<pabs>LWN article for background https://lwn.net/Articles/1004988/
05:53:31<@arkiver>i really need to get #Y going, it would be perfect for these domains
05:53:33<@JAA>I have something semi-usable for EA Answers HQ, but I need to go sleep now, will start it when I'm around to monitor at least at the start.
05:54:15<@arkiver>and I will get that going - the archiving part is easy, the more difficult part will be to get the WARCs split like AB over the different jobs. (probably start multiple Wget-AT sessions, but I have further ideas too)
05:54:19<@arkiver>need to figure that out
05:54:26<@arkiver>pabs: i'll also check what i can come up with
05:55:27<pabs>I think the list is fairly comprehensive though, projects.tuxfamily.org outlinks should be the entire hosted domains
05:55:41<@arkiver>katia: you've probably given this though, but zst'ing or gzipping the jsonl files will save a ton of space :P
05:55:57<pabs>they are using ZFS compression already
05:56:21<pabs>(but in theory there could be stuff outside the TuxFamily VHHFS hosting software)
05:56:26<@arkiver>ah!
05:56:38<pabs>er VHFFS
05:57:03<@arkiver>good night JAA :)
06:06:43Webuser024416 joins
06:07:38Webuser024416 quits [Client Quit]
06:08:12<@arkiver>pabs: anything interesting in here maybe? https://transfer.archivete.am/evFcd/tuxfamily_org.txt
06:08:12<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/evFcd/tuxfamily_org.txt
06:08:42<pabs>those look like already in the list, but I will check
06:10:04<pabs>oh, you found a few extra: forum.fvpatwds.tuxfamily.org gnsenespanol.tuxfamily.org snaw.tuxfamily.org
06:10:08<pabs>adding them to the list
06:11:45<pabs>all those are 404 though
06:18:00<@arkiver>maybe old ones that got deleted yeah
06:21:22worker quits [Quit: Ooops, wrong browser tab.]
07:05:35earl joins
07:09:26flotwig_ joins
07:10:18flotwig quits [Ping timeout: 250 seconds]
07:10:18flotwig_ is now known as flotwig
07:48:13beardicus (beardicus) joins
07:53:08beardicus quits [Ping timeout: 260 seconds]
07:54:40bmo quits [Quit: Ooops, wrong browser tab.]
08:21:47PotatoProton01 joins
09:07:15libksplay_ joins
09:10:43libksplay quits [Ping timeout: 260 seconds]
09:11:26Pedrosso quits [Read error: Connection reset by peer]
09:11:50Pedrosso joins
09:12:03ScenarioPlanet5 (ScenarioPlanet) joins
09:12:27TheTechRobo6 (TheTechRobo) joins
09:12:28yasomi quits [Ping timeout: 260 seconds]
09:13:03ScenarioPlanet quits [Ping timeout: 260 seconds]
09:13:03TheTechRobo quits [Ping timeout: 260 seconds]
09:13:04TheTechRobo6 is now known as TheTechRobo
09:13:04ScenarioPlanet5 is now known as ScenarioPlanet
09:23:11yasomi (yasomi) joins
09:24:55PotatoProton01 quits [Client Quit]
09:32:06BornOn420 quits [Remote host closed the connection]
09:32:34BornOn420 (BornOn420) joins
09:34:39ell7 quits [Quit: Ping timeout (120 seconds)]
10:10:42sec^nd quits [Remote host closed the connection]
10:11:17sec^nd (second) joins
10:28:29Cronfox quits [Quit: No Ping reply in 180 seconds.]
10:30:16Cronfox (Cronfox) joins
10:40:33nicolas17 quits [Ping timeout: 260 seconds]
10:45:11nicolas17 joins
10:59:51<katia>arkiver, apparent size is 809.9GiB but size on disk is 37.9GiB
11:00:25<katia>zfs zstd is magic
11:40:42<Pedrosso>compression in general is- but yes.
11:41:36PotatoProton01 joins
11:57:55Webuser633550 joins
11:58:16Webuser633550 quits [Client Quit]
12:00:02Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat]
12:02:55Bleo18260072271962345 joins
12:14:54<katia>anyone going to fosdem?
12:23:05mannie (nannie) joins
12:23:33<mannie>When do I get the +v/voice?
12:28:46<katia>mannie, elaborate?
12:30:08<mannie>katia: Earlyer this week some mods and I had a disccusion about getting a +v for archivebot. I still haven't it
12:30:09beardicus (beardicus) joins
12:30:25APOLLO03 quits [Quit: Leaving]
12:31:43<katia>mannie, i read the chat and voiced you so you can try stuff but i can't add you to the autovoice list so it'll last as long as your connection.
12:32:01<mannie>Thanks katia
12:34:53beardicus quits [Ping timeout: 260 seconds]
12:46:29beardicus (beardicus) joins
12:52:02SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
12:54:08beardicus quits [Ping timeout: 260 seconds]
12:54:20SkilledAlpaca418962 joins
12:57:32Webuser442121 joins
12:58:09beardicus (beardicus) joins
12:59:56SarcasticDwarf8 joins
13:00:07SarcasticDwarf8 quits [Client Quit]
13:00:44SarcasticDwarf joins
13:00:57Webuser442121 leaves
13:07:40<katia>13:58:26 Webuser442121 hi
13:07:40<katia>13:58:31 Webuser442121 what is this website for ?
13:07:43<katia>hmm
13:08:52Webuser776362 joins
13:09:14Webuser776362 leaves
13:11:38beardicus quits [Ping timeout: 260 seconds]
13:28:35SW491 joins
13:29:31<SW491>https://www.discogs.com/release/31600114-71-Hello another mp3 dam is found and its now sold
13:33:04beardicus (beardicus) joins
13:33:19<SW491>hi
13:33:34<katia>ciao
13:48:50mannie quits [Client Quit]
13:56:05Wohlstand (Wohlstand) joins
13:56:57APOLLO03 joins
13:57:38beardicus quits [Ping timeout: 250 seconds]
13:58:19PotatoProton01 quits [Client Quit]
13:59:19APOLLO03 quits [Client Quit]
13:59:31APOLLO03 joins
14:00:04APOLLO03 quits [Client Quit]
14:00:14APOLLO03 joins
14:11:18SW491 quits [Client Quit]
14:22:35beardicus (beardicus) joins
14:25:19Webuser082342 joins
14:26:11<@arkiver>katia: that's pretty awsome
14:26:16<@arkiver>i was unaware zfs did that
14:27:06beardicus quits [Ping timeout: 250 seconds]
14:31:23etnguyen03 (etnguyen03) joins
14:35:46nicolas17 quits [Ping timeout: 250 seconds]
14:40:12nicolas17 joins
14:44:48<szczot3k>wonder if btrfs would work just as well
14:55:30etnguyen03 quits [Client Quit]
14:57:48<katia>feel free to run a copy of https://ab2f.archivingyoursh.it/app.py szczot3k
14:58:50<szczot3k>katia: my only storage VM doesn't have a lot of IOPS. How hard on IOPS is it?
14:59:59<katia>seems to do roughly 400iops per disk (raid1) every 4 secs
15:00:16<katia>500
15:00:50<katia>these are 2x Intel DC P3520 Series 1.2TB
15:00:55<katia>i won the OVH lottery
15:01:18<szczot3k>No way, my buyvm slabs won't go that high
15:01:48<katia>>spinning rust
15:01:56<katia>archivebot does not like, i hear
15:02:05<szczot3k>slabs are fine for - for backups
15:02:13<szczot3k>And this is what I'm running on it
15:02:31<katia>well you can just open http://archivebot.com and look at the kB/s counter on the top right
15:02:57<katia>i suppose you could flush to disk much less frequently and such
15:03:12<szczot3k>pg barman/rsync from other VMs/offsite of my home's PCs - good enough for it, but AB would kill them
15:03:24<katia>i have a hosthatch storage vm
15:03:39<katia>it has a 50gb 'boot ssd' i use as L2ARC and 7TB of RAID10 disk
15:03:51etnguyen03 (etnguyen03) joins
15:03:56<katia>that poor boot ssd already has like 3TB written to it in the 2 weeks i've had it
15:03:57<szczot3k>HH had some troubles after the last deal they pushed
15:04:20<katia>shrug
15:05:56libksplay_ quits [Read error: Connection reset by peer]
15:06:18libksplay joins
15:26:42etnguyen03 quits [Client Quit]
15:37:35etnguyen03 (etnguyen03) joins
15:44:55beardicus (beardicus) joins
15:49:26beardicus quits [Ping timeout: 250 seconds]
15:55:52<hexa->katia: at hetzner nix-community got 2y old ssds that have clearly been used for chia mining
15:56:09<hexa->https://github.com/nix-community/infra/issues/1644
15:56:31<hexa->2y power on, 4.5 TBW
15:57:27<earl>I have a question about the Livestream Warrior project. I downloaded ~2400 GB, uploaded ~190 GB. I see lots of .ts files being downloaded (audio transport stream?). If most of the site is audio, and the audio is likely already compressed, then why I am uploading less than 10% of what I am downloading?
16:15:09<szczot3k>earl: probs better to ask on #deadtrickle
16:31:24SootBector quits [Remote host closed the connection]
16:31:48SootBector (SootBector) joins
16:34:03PotatoProton01 joins
16:36:00<hexa->sorry, PBW*
16:36:09<hexa->2y power on, 4.5 PBW
16:50:56<katia>lol@that
17:18:30beardicus (beardicus) joins
17:21:53SkilledAlpaca418962 quits [Ping timeout: 260 seconds]
17:22:26PotatoProton01 quits [Client Quit]
17:23:03beardicus quits [Ping timeout: 260 seconds]
17:24:16<nicolas17>looks like brickshelf AB is still moving smoothly
17:24:26mannie (nannie) joins
17:25:21<mannie>katia give me earlyer today a +v can I get it again to do some more learning/expirementing?
17:27:59SkilledAlpaca418962 joins
17:28:11scurvy_duck joins
17:31:09<katia>mannie, have you read https://archivebot.readthedocs.io/ ?
17:31:30<mannie>some part
17:31:43<mannie>Will take to time now to read it all
17:34:54PredatorIWD25 quits [Read error: Connection reset by peer]
17:38:02loug8318142 joins
17:52:42etnguyen03 quits [Client Quit]
17:52:51PredatorIWD25 joins
18:02:22BornOn420 quits [Ping timeout: 276 seconds]
18:05:59<mannie>katia: I have readed more than relavant is
18:06:12<mannie>like in 90% of the info
18:07:42<Blueacid>How do! It seems that blogtalkradio.com is shutting down in Jan 2025.. the news is a month or so old. Is that too late to do anything, or is there any worth in trying to salvage something?
18:08:15etnguyen03 (etnguyen03) joins
18:11:43<szczot3k>Blueacid: do you have the shutdown announcement? Any idea how big/old this site is, and what to expect?
18:14:11BornOn420 (BornOn420) joins
18:35:40mannie quits [Client Quit]
18:36:10<katia>Blueacid, i've started a job for it in archivebot
18:41:31ducky quits [Read error: Connection reset by peer]
18:41:35ducky (ducky) joins
18:47:53PotatoProton01 joins
18:56:55beardicus (beardicus) joins
18:59:40nicolas17 quits [Ping timeout: 250 seconds]
19:03:58beardicus quits [Ping timeout: 260 seconds]
19:07:15nicolas17 joins
19:07:27<Blueacid>szczot3k: https://podnews.net/article/blogtalkradio-customer-email
19:08:07<Blueacid>katia: Much obliged - I don't know how much content there is, and whether there's sufficient time to get it all!
19:10:14<Blueacid>but saw a post on Reddit about it and thought I would ask here :)
19:52:18beardicus (beardicus) joins
20:04:03nicolas17 quits [Ping timeout: 260 seconds]
20:11:34PotatoProton01 quits [Client Quit]
20:15:58<that_lurker>Blueacid: Thank you at least notifying us. We can't keep up with everything so in these cases we can at least try to save as much as possible
20:26:08nicolas17 joins
20:30:05onetruth joins
20:47:13beardicus quits [Ping timeout: 260 seconds]
20:48:42beardicus (beardicus) joins
20:48:52<wickedplayer494>:siren: :siren: :siren: In an unprecedented move, Google has pulled all but the most recent OTA update package from https://developers.google.com/android/ota and full factory image from https://developers.google.com/android/images for the Pixel 4a (non-5G/sunfish)
20:49:09<wickedplayer494>Likely related to whatever is going on with https://wiki.rossmanngroup.com/wiki/Pixel_4a_Battery_Performance_Program which Google is clearly not being open and honest about
20:49:32<wickedplayer494>spotted by https://www.reddit.com/r/GooglePixel/comments/1iajsu3/google_removed_pixel_4a_firmware_images_from/
20:57:58etnguyen03 quits [Client Quit]
21:07:57<@JAA>My attempt at EA Answers HQ got blocked by Buttfront within minutes.
21:09:45<nicolas17>wickedplayer494: ugh
21:09:52<nicolas17>seems too late to do something about it
21:10:27<@JAA>Blueacid, katia: An AB job for BlogTalkRadio has been running for a bit already.
21:10:43BlueMaxima joins
21:11:02<katia><:|
21:11:22<@JAA>y u no HTTPS? :-)
21:11:29<@JAA>4df97m2wklx5v9wgw1zxuaqq7, over 3 TiB grabbed so far
21:11:52<katia>do i abort the mistake i made? <|:D
21:12:47earl quits []
21:16:41<katia>wickedplayer494, i archivebotted all the direct links in those 2 pages, a bunch of them are not in WBM
21:20:56scurvy_duck quits [Ping timeout: 250 seconds]
21:21:43beardicus quits [Remote host closed the connection]
21:29:02HP_Archivist quits [Read error: Connection reset by peer]
21:33:42SF quits [Remote host closed the connection]
21:34:07HP_Archivist (HP_Archivist) joins
21:34:31HP_Archivist quits [Read error: Connection reset by peer]
21:37:50lennier2_ joins
21:40:53lennier2 quits [Ping timeout: 260 seconds]
21:55:05etnguyen03 (etnguyen03) joins
22:13:46scurvy_duck joins
22:23:15PredatorIWD25 quits [Read error: Connection reset by peer]
22:25:36SootBector quits [Remote host closed the connection]
22:25:45PredatorIWD25 joins
22:25:54SootBector (SootBector) joins
22:28:53<Blueacid>JAA: Where does that AB job 'run'? Is that a warrior job, or on some other infrastructure? TY!
22:38:19notarobot1 quits [Quit: The Lounge - https://thelounge.chat]
22:40:35<nicolas17>archivebot pipelines are separate infrastructure
22:40:37<nicolas17>http://archivebot.com/pipelines
22:42:10raccoon (raccoon) joins
22:55:51Webuser214195 quits [Quit: Ooops, wrong browser tab.]
23:00:18Webuser103147 joins
23:00:54etnguyen03 quits [Client Quit]
23:01:57loug8318142 quits [Quit: The Lounge - https://thelounge.chat]
23:07:30tech234a quits [Quit: Connection closed for inactivity]
23:07:57TastyWiener952 (TastyWiener95) joins
23:11:26TastyWiener95 quits [Ping timeout: 250 seconds]
23:11:26TastyWiener952 is now known as TastyWiener95
23:39:55Webuser082342 quits [Quit: Ooops, wrong browser tab.]
23:42:35Webuser873058 joins
23:56:05etnguyen03 (etnguyen03) joins