00:04:33etnguyen03 quits [Client Quit]
00:05:00Shard111 (Shard) joins
00:07:05Shard11 quits [Ping timeout: 272 seconds]
00:07:05Shard111 is now known as Shard11
00:15:07Dada quits [Remote host closed the connection]
00:16:10nexussfan (nexussfan) joins
00:27:10Shard112 (Shard) joins
00:29:40Shard11 quits [Ping timeout: 256 seconds]
00:29:40Shard112 is now known as Shard11
00:37:42lukash984 quits [Quit: The Lounge - https://thelounge.chat]
00:38:49useretail quits [Quit: Leaving]
01:04:12Shard115 (Shard) joins
01:04:48Shard11 quits [Ping timeout: 256 seconds]
01:04:48Shard115 is now known as Shard11
01:19:57lukash984 joins
01:22:13etnguyen03 (etnguyen03) joins
02:19:05<kiska>klea: There is, but my infra is... ancient :D
02:19:18<kiska>And I have no time to do anything\
02:19:36<klea>:(
02:19:38<klea>time--
02:19:39<eggdrop>[karma] 'time' now has -1 karma!
02:20:17Webuser751678 joins
02:20:41<klea>I expected it to have a lot more negative karma.
02:21:10Webuser751678 quits [Client Quit]
02:21:47<kiska>So yeah... whatever is running is running on hopes that it doesn't break until I get around to fixing it
02:23:23<kiska>The machine that does the thing run each project websocket in systemd so it can recover from the machine restarting, I just need to make a service for it
02:23:49<klea>Time to make a service that makes more services!
02:24:08<kiska>Hrm... time deficient :D
02:24:12<kiska>Remember
02:24:20<klea>Oh
02:24:29<kiska>I work about 65 hours per week, I have fuck all time to do anything else
02:24:34<klea>Could you share a sample of one of your systemd services with all the data filled out?
02:24:52<klea>Including the service's name :p
02:29:59<kiska>I mean... there isn't anything to filter out
02:30:01<kiska>https://paste.kiska.pw/CordonsCulminations
02:31:26<kiska>I guess replace "testing" with project
02:35:12<klea>ok, so stick this in the systemd directory. https://transfer.archivete.am/inline/Mcy2N/dpos-tracker-listener@.service
02:36:23<klea>Then stick this into cron or something that runs it periodically (if you want me to make a systemd timer and service i can try to do that):
02:36:23<klea>jq --raw-output '.projects[].name|@sh|"systemctl enable --now \(.)"' <(curl --silent --compressed https://warriorhq.archiveteam.org/projects.json)
02:37:20<kiska>When I get the time to I'll make something horrificly bad :D
02:37:23<kiska>That sounds sane
02:37:37<@JAA>Isn't that missing a `dpos-tracker-listener@` or something?
02:37:50<klea>JAA: indeed, I forgot about that part :p
02:38:20<klea>kiska: wait, why do you want to make an insane setup on purpose?
02:38:31nexussfan quits [Client Quit]
02:38:39<kiska>I am kinda joking about making something bad
02:38:42<@JAA>No risk, no fun.
02:39:03<kiska>Besides the worse it is the more fun it is to debug...
02:40:49<kiska>Hrm... I wonder if I can package up this into a bad container :D
02:41:21<klea>Oh. JAA should run a cronjob that has a 0.0000001% chance of erasing the entirety of transfer and the wiki, and all the servers that hold the temporary files in the to be categorized for upload process every minute.
02:42:06<kiska>Every minute... Too long, every microsecond
02:42:53<klea>And whilst at it, to add some more risk and thus fun, change the default project to urls for 15 hours, make every youtube link on URLs be automatically queued into #down-the-tube and then after some time set the default project back to youtube.
02:45:09nicolas17 adds a kmem russian roulette cronjob to klea's computer
02:45:26Wohlstand1 (Wohlstand) joins
02:47:48Wohlstand1 is now known as Wohlstand
02:57:36cyanbox joins
03:19:26NatTheCat2 (NatTheCat) joins
03:20:26NatTheCat quits [Read error: Connection reset by peer]
03:20:26NatTheCat2 is now known as NatTheCat
03:27:51NatTheCat quits [Ping timeout: 272 seconds]
03:54:50Wohlstand quits [Client Quit]
04:09:38useretail joins
04:13:41useretail quits [Client Quit]
04:13:58useretail joins
04:17:32NatTheCat (NatTheCat) joins
04:18:23etnguyen03 quits [Client Quit]
04:25:46etnguyen03 (etnguyen03) joins
04:42:04etnguyen03 quits [Remote host closed the connection]
04:59:42DogsRNice quits [Read error: Connection reset by peer]
05:02:48pabs quits [Ping timeout: 256 seconds]
05:04:30n9nes quits [Ping timeout: 256 seconds]
05:05:05n9nes joins
05:05:59pabs (pabs) joins
05:32:46<steering>time--
05:32:47<eggdrop>[karma] 'time' now has -2 karma!
05:34:45gosc joins
06:04:10PredatorIWD25 quits [Read error: Connection reset by peer]
06:12:45gosc quits [Client Quit]
06:13:09sg72 quits [Ping timeout: 272 seconds]
06:17:52<pabs>klea: re .cat, probably yes
06:17:59PredatorIWD25 joins
06:20:09<pabs>justauser: did you do any .arpa jobs?
06:31:19useretail quits [Remote host closed the connection]
06:31:53useretail joins
06:36:12Wohlstand1 (Wohlstand) joins
06:38:35Wohlstand1 is now known as Wohlstand
06:43:03Wohlstand quits [Client Quit]
06:43:17sg72 joins
07:00:40sg72 quits [Ping timeout: 256 seconds]
07:07:12sg72 joins
07:11:58Wohlstand (Wohlstand) joins
07:12:05sg-72 joins
07:15:51sg72 quits [Ping timeout: 272 seconds]
07:16:18SootBector quits [Remote host closed the connection]
07:17:27SootBector (SootBector) joins
07:39:29Kotomind joins
07:48:16n9nes quits [Ping timeout: 256 seconds]
07:48:43n9nes joins
07:55:45n9nes quits [Ping timeout: 272 seconds]
07:58:26n9nes joins
08:09:21Island quits [Read error: Connection reset by peer]
08:58:24BornOn420 (BornOn420) joins
09:36:16Ointment8862 quits [Remote host closed the connection]
09:46:53Sk1d joins
09:47:28Sk1d quits [Client Quit]
09:47:32Sk1d joins
10:15:53Hans2026 joins
10:23:34Shard11 quits [Read error: Connection reset by peer]
10:23:42Shard11 (Shard) joins
10:45:16Sk1d quits [Remote host closed the connection]
10:45:24Sk1d joins
10:53:05rohvani quits [Ping timeout: 272 seconds]
10:58:16<h2ibot>OrIdow6 edited Roblox (+168, Reorganize the page, mostly to get rid of the…): https://wiki.archiveteam.org/?diff=60393&oldid=59820
11:03:00NatTheCat3 (NatTheCat) joins
11:04:54NatTheCat quits [Ping timeout: 256 seconds]
11:04:54NatTheCat3 is now known as NatTheCat
11:17:19<h2ibot>Klea edited Roblox (+160, WBM links shouldn't use [[Template:Url]]): https://wiki.archiveteam.org/?diff=60394&oldid=60393
11:18:57<klea>Maybe I should make a Template for WBM urls, so the data can be specified.
11:19:17Sk1d quits [Client Quit]
11:30:21<h2ibot>Klea edited Template:IA file (+1059, Add a private option for IA access-restriction): https://wiki.archiveteam.org/?diff=60395&oldid=25589
11:31:21<h2ibot>Klea edited Roblox (+13, Set [[Template:IA file]] private=true): https://wiki.archiveteam.org/?diff=60396&oldid=60394
11:33:51<klea>Should we merge [[Template:IA file]] and [[Template:IA id]]?, I believe it might be helpfull, since then you'd be able to do something like {{IA id|1=archiveteam_archivebot_go_20240911183234_5bb571eb/wattsupwiththat.wpcomstaging.com-inf-20240725-034320-tchk5-00062.warc.gz}}, and have the template show what it'd render with {{IA
11:33:52<klea>file|identifier=archiveteam_archivebot_go_20240911183234_5bb571eb|path=wattsupwiththat.wpcomstaging.com-inf-20240725-034320-tchk5-00062.warc.gz}}. However I'm not entirely sure how to merge those.
11:39:22sec^nd quits [Ping timeout: 256 seconds]
11:40:39sec^nd (second) joins
11:47:24Coderjo_ quits [Ping timeout: 256 seconds]
11:47:35Coderjo_ joins
12:00:01Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
12:02:44Bleo182600722719623455222 joins
12:11:13ducky_ (ducky) joins
12:11:46ducky quits [Ping timeout: 256 seconds]
12:11:49ducky_ is now known as ducky
12:13:27<h2ibot>Klea edited Thingiverse (+28, Add unknown status for post-2015): https://wiki.archiveteam.org/?diff=60397&oldid=56780
12:13:49<klea>Should we run a project like we have for #imgone #pastalavista and similar where we try to archive every thingiverse item?
12:17:26<klea>And I suppose at that point maybe also for Sketchfab (Q7534755), Cults (Q62426317), and Printables.com (Q118399132).
12:25:56APOLLO03 quits [Ping timeout: 256 seconds]
12:31:00APOLLO03 joins
12:43:25<kline>how many objects are there?
12:46:34<klea>I believe 7286349 based on the latest id from https://www.thingiverse.com/?page=1&sort=newest
12:47:20<klea>> Hello everyone, tomorrow I'm going to try to make a lot of models in packages: American, British, and Japanese!!!
12:47:20<klea>Huh, maybe it's like [[Discourse]] where there may be new content added. :(
12:48:12<kline>they have an api but im not sure its suitable
12:48:47<klea>I suppose re-using https://github.com/archiveteam/thingiverse-grab might not be the entirely best idea.
13:09:51nine quits [Quit: See ya!]
13:10:05nine joins
13:10:05nine quits [Changing host]
13:10:05nine (nine) joins
13:12:25Kotomind quits [Ping timeout: 272 seconds]
13:17:27etnguyen03 (etnguyen03) joins
13:30:46etnguyen03 quits [Client Quit]
13:33:56Arcorann_ quits [Ping timeout: 256 seconds]
13:36:58etnguyen03 (etnguyen03) joins
14:10:39lukash984 quits [Quit: The Lounge - https://thelounge.chat]
14:50:47<Hans2026>Hello, I was referred to this IRC by the ArchiveTeam FAQ (1st item, "Can you save it"). My question is, can you save SwamiJ.com? I'm just a reader, not connected to the owners. I don't know if it's "at risk", just that it appears not to have been updated in many years, and I find it to be a well-written and comprehensive resource on its subject
14:50:47<Hans2026>matter (traditional yoga and meditation of the Himalayan masters). It has possibly hundreds of pages (HTML, PDF, etc.).
14:53:10frank joins
14:53:21<Hans2026>It's not that I want an archive for myself, just that it would be a valuable reference for others, in case the site disappears someday.
15:11:29<kline>Hans2026, do you want to know a good way to start?
15:11:54<kline>are you on a linux system (or windows with the windows subsystem for linux)?
15:15:04<Hans2026>Since the FAQ referred me to this IRC, I can only assume that this is the right place to proceed. Yes, I have a laptop running Linux, and also have shell access to some Linux sites.
15:15:48<kline>you only need your own computer!
15:16:06<Hans2026>ok
15:16:08<@OrIdow6>kline: It sounds like they're asking for it to be run in AB or the like, not for advice on how to make warcs themselves
15:16:22<kline>OrIdow6, both are good
15:16:38<klea>sitemap.xml seems to have 2512 lines, but I think they're limiting access by UA somehow.
15:16:46<klea>also sending a Expires: 2106 header.
15:17:04<kline>Hans2026, https://wiki.archiveteam.org/index.php/Wget#Creating_WARC_with_wget has a couple of quick lines to run while a better archive is made
15:17:25<kline>if anything happens, you have a quick copy made with most of the info
15:20:05<Hans2026>I may be somewhat disk-space constrained for now, so I'm not sure I want to store an archive here. Also, as I said, this isn't for me, but for the benefit of anyone interested in that subject matter. In the FAQ item, the answer to "Can you save it" began with the word "yes", so I thought that meant the Archive Team can save it. So I guess my
15:20:05<Hans2026>first question is, can the Archive Team save it, or is this more of a DIY thing?
15:20:12<@OrIdow6>kline: I think that is more complicated than most people should have to deal with
15:20:44<klea>Hans2026: Yes, Archive Team can save it if you ask in #archivebot, since it doesn't seem to be that big (relatively)
15:22:44<@OrIdow6>Hans2026: I've started an ArchiveBot job for it, you can see the progress at http://archivebot.com/?initialFilter=swamij
15:23:23<@OrIdow6>After it finishes it'll be visible in web.archive.org within a few weeks (I think; the interval fluxuates sometimes)
15:25:40<Hans2026>As I understand, web.archive.org is the Wayback Machine. Wayback already has some captures, but I don't know if the whole site was ever captured. Is it correct to understand that, regardless of whether a whole-site capture was ever done before, the new capture will be a whole-site capture?
15:28:08<klea>Yes, at least with ArchiveBot captures, they don't ignore pages that have been captured.
15:28:19<@OrIdow6>Hans2026: Yes; there are technical caveats on what "whole-site" means but this website looks simple enough that they shouldn't apply much
15:28:37<@OrIdow6>As long as it can find a page by a series of links from the homepage they'll be captured
15:29:00<@OrIdow6>Ah sorry klea
15:29:26<Hans2026>Ok cool. As far as I know, they are fairly ordinary web pages (HTML with some images, and some PDFs).
15:29:38<klea>OrIdow6: I don't mind having others confirm the same response, don't worry too hard about it.
15:30:00<klea>You explained the manual ignores applied by voices or ops on #archivebot more cleanly than I would, in any case.
15:32:03<Hans2026>Given that the main home page was already captured many times, when I go look for the ArchiveBot capture later on, will there be a way for me to select that I want to see the ArchiveBot capture rather than other captures that were done?
15:35:50<Hans2026>For example, if the site is gone later on, and I want to give people a link to the whole-site capture, it would be useful to know how to point people to it. But I might not know which link on Wayback is for the whole-site capture.
15:37:26<klea>I believe that when web.archive.org links have a timestamp, that's normally locked in to a version in time, which since the warc's timestampped, should mean you get to see AT's version of the capture, and if the capture was somehow incomplete any further captures in the future can give you more resources.
15:41:47etnguyen03 quits [Client Quit]
15:41:54Cuphead2527480 (Cuphead2527480) joins
15:50:09Webuser782625 joins
15:50:58iknownothing joins
15:51:10<Hans2026>If I understand correctly, people viewing some other (incomplete) capture can click on links and still get pages that are in this capture. I guess that works for most purposes, but I'm still wondering, is there a way to specifically look for an AT capture on Wayback (either by some selector on Wayback, or by looking up the capture/upload history
15:51:10<Hans2026>on ArchiveTeam)?
15:55:38<@OrIdow6>Hans2026: You can download the raw files that we send to the Wayback Machine uses and then put them in a viewer (will be at https://archive.fart.website/archivebot/viewer/?q=SwamiJ.com , and a viewer is at https://replayweb.page/ )
15:56:40<@OrIdow6>You can check the source of a page in the Wayback Machine, but there's no way to automatically restrict it to a particular source
15:57:02<klea>Also, what OrIdow6 sometimes can't apply for certain projects where the WARCs are Access-Restricted, so in that case you'll have to rely on clicking the about this capture button.
16:00:44<Hans2026>Ok thanks. This brings to mind another question. The reason I even know ArchiveTeam exists is because I looked up a website of my own on Wayback (wheelbit.net, just a hobby site of no importance) and I found that it was archived, the most recent one (2024) mentioning ArchiveTeam under "about this capture". Is there a way to find out the AT
16:00:44<Hans2026>capture history of this site, and also why it was done (considering that there's not really any content there)?
16:09:09<@OrIdow6>Hans2026: The most recent one is from a catch-all system that automatically grabs large numbers of pages linked from various places, hard to trace
16:09:11<klea>The about section shows collection Archive Team urls, ie #//.
16:09:19<@OrIdow6>Wow quite the timing
16:09:28<klea>Yeah, your response was better :)
16:09:53<klea>(I had made up a response about the CDX api response, and the metadata headers the webpage sends but that's probably too technical)
16:14:25<Hans2026>OK, thanks for your patience with my newbie questions, but today is the first time I know that ArchiveTeam exists. From the FAQ, I thought AT only archived sites that are somehow deemed essential for posterity, but from your explanation, it sounds like AT runs in an automated way (not limited to "curated" selections).
16:16:37<klea>It depends, as everything does.
16:17:36<iknownothing>I know Learner.org is on the radar for the 2026 sunset. It seems that only the surrounding website is saved in the wayback machine, but the main point of the website is the video content. I'm wondering if saving the video files is something that you guys do and if that is going to be considered. I know nothing just wanted to give a little awareness
16:17:36<iknownothing>about this.
16:25:36cyanbox_ joins
16:28:05<h2ibot>Klea edited Deathwatch (+305, /* 2026-08 */ Add Trinket.io): https://wiki.archiveteam.org/?diff=60398&oldid=60392
16:28:07cyanbox quits [Ping timeout: 272 seconds]
17:02:07frank quits [Client Quit]
17:14:01<kiska>Hrm... opendiary throwing 503?
17:14:08<kiska>Or is it their way of telling me I am banned
17:14:22Webuser782625 quits [Client Quit]
17:14:22iknownothing quits [Client Quit]
17:16:24<klea>What's the best way for archiving moinmoin wikis? https://moin.vitali64.duckdns.org/ has expired cert.
17:16:40<klea>https://fun.dersco.re/weblog/2025/05/12/status-update/ says it'll be lost.
17:17:32<klea>Well, it says "docs moved to git"
17:23:18<Hans2026>I'm monitoring the progress of "swamij" on archivebot, and there's a lot more files than I thought. Most of them seem to be off-site, i.e., not from "swamij", but from "fbcdn", "yimg", and many others including "afternic" (a domain reseller) and "vistaprint" (an online printing service). Any idea why so many off-site URLs are being captured?
17:23:18<Hans2026>(I'm concerned that something might have gone wrong, as I thought this archive would capture only "swamij"... I hope the site wasn't maliciously infected with unrelated content.)
17:47:50Wohlstand quits [Quit: Wohlstand]
17:51:43Cuphead2527480 quits [Client Quit]
17:52:32Island joins
17:59:42Wohlstand (Wohlstand) joins
18:07:33datechnoman quits [Ping timeout: 272 seconds]
18:13:57Wohlstand quits [Client Quit]
18:15:14etnguyen03 (etnguyen03) joins
18:18:21gosc joins
18:19:28gosc quits [Client Quit]
18:29:53datechnoman (datechnoman) joins
19:18:01ax (ax) joins
19:22:55ducky quits [Ping timeout: 272 seconds]
19:23:04ducky (ducky) joins
19:27:30<nicolas17>maybe it should have used --no-offsite (and I wish that could be added after the fact)
19:27:46<nicolas17>however there's only 1700 URLs in queue and it's not growing
19:27:58<nicolas17>so it's not a big deal
19:27:59ducky quits [Ping timeout: 272 seconds]
19:31:23<nicolas17>opendiary ETA still march 8
19:31:56<nicolas17>arkiver: ping about making opendiary do less requests, so we can increase rate limit
19:40:02ducky (ducky) joins
19:59:46etnguyen03 quits [Client Quit]
20:08:29Webuser698035 joins
20:08:57Webuser698035 quits [Client Quit]
20:08:59etnguyen03 (etnguyen03) joins
20:25:09APOLLO03 quits [Quit: .]
20:31:58etnguyen03 quits [Client Quit]