| 00:04:33 | | etnguyen03 quits [Client Quit] |
| 00:05:00 | | Shard111 (Shard) joins |
| 00:07:05 | | Shard11 quits [Ping timeout: 272 seconds] |
| 00:07:05 | | Shard111 is now known as Shard11 |
| 00:15:07 | | Dada quits [Remote host closed the connection] |
| 00:16:10 | | nexussfan (nexussfan) joins |
| 00:27:10 | | Shard112 (Shard) joins |
| 00:29:40 | | Shard11 quits [Ping timeout: 256 seconds] |
| 00:29:40 | | Shard112 is now known as Shard11 |
| 00:37:42 | | lukash984 quits [Quit: The Lounge - https://thelounge.chat] |
| 00:38:49 | | useretail quits [Quit: Leaving] |
| 01:04:12 | | Shard115 (Shard) joins |
| 01:04:48 | | Shard11 quits [Ping timeout: 256 seconds] |
| 01:04:48 | | Shard115 is now known as Shard11 |
| 01:19:57 | | lukash984 joins |
| 01:22:13 | | etnguyen03 (etnguyen03) joins |
| 02:19:05 | <kiska> | klea: There is, but my infra is... ancient :D |
| 02:19:18 | <kiska> | And I have no time to do anything\ |
| 02:19:36 | <klea> | :( |
| 02:19:38 | <klea> | time-- |
| 02:19:39 | <eggdrop> | [karma] 'time' now has -1 karma! |
| 02:20:17 | | Webuser751678 joins |
| 02:20:41 | <klea> | I expected it to have a lot more negative karma. |
| 02:21:10 | | Webuser751678 quits [Client Quit] |
| 02:21:47 | <kiska> | So yeah... whatever is running is running on hopes that it doesn't break until I get around to fixing it |
| 02:23:23 | <kiska> | The machine that does the thing run each project websocket in systemd so it can recover from the machine restarting, I just need to make a service for it |
| 02:23:49 | <klea> | Time to make a service that makes more services! |
| 02:24:08 | <kiska> | Hrm... time deficient :D |
| 02:24:12 | <kiska> | Remember |
| 02:24:20 | <klea> | Oh |
| 02:24:29 | <kiska> | I work about 65 hours per week, I have fuck all time to do anything else |
| 02:24:34 | <klea> | Could you share a sample of one of your systemd services with all the data filled out? |
| 02:24:52 | <klea> | Including the service's name :p |
| 02:29:59 | <kiska> | I mean... there isn't anything to filter out |
| 02:30:01 | <kiska> | https://paste.kiska.pw/CordonsCulminations |
| 02:31:26 | <kiska> | I guess replace "testing" with project |
| 02:35:12 | <klea> | ok, so stick this in the systemd directory. https://transfer.archivete.am/inline/Mcy2N/dpos-tracker-listener@.service |
| 02:36:23 | <klea> | Then stick this into cron or something that runs it periodically (if you want me to make a systemd timer and service i can try to do that): |
| 02:36:23 | <klea> | jq --raw-output '.projects[].name|@sh|"systemctl enable --now \(.)"' <(curl --silent --compressed https://warriorhq.archiveteam.org/projects.json) |
| 02:37:20 | <kiska> | When I get the time to I'll make something horrificly bad :D |
| 02:37:23 | <kiska> | That sounds sane |
| 02:37:37 | <@JAA> | Isn't that missing a `dpos-tracker-listener@` or something? |
| 02:37:50 | <klea> | JAA: indeed, I forgot about that part :p |
| 02:38:20 | <klea> | kiska: wait, why do you want to make an insane setup on purpose? |
| 02:38:31 | | nexussfan quits [Client Quit] |
| 02:38:39 | <kiska> | I am kinda joking about making something bad |
| 02:38:42 | <@JAA> | No risk, no fun. |
| 02:39:03 | <kiska> | Besides the worse it is the more fun it is to debug... |
| 02:40:49 | <kiska> | Hrm... I wonder if I can package up this into a bad container :D |
| 02:41:21 | <klea> | Oh. JAA should run a cronjob that has a 0.0000001% chance of erasing the entirety of transfer and the wiki, and all the servers that hold the temporary files in the to be categorized for upload process every minute. |
| 02:42:06 | <kiska> | Every minute... Too long, every microsecond |
| 02:42:53 | <klea> | And whilst at it, to add some more risk and thus fun, change the default project to urls for 15 hours, make every youtube link on URLs be automatically queued into #down-the-tube and then after some time set the default project back to youtube. |
| 02:45:09 | | nicolas17 adds a kmem russian roulette cronjob to klea's computer |
| 02:45:26 | | Wohlstand1 (Wohlstand) joins |
| 02:47:48 | | Wohlstand1 is now known as Wohlstand |
| 02:57:36 | | cyanbox joins |
| 03:19:26 | | NatTheCat2 (NatTheCat) joins |
| 03:20:26 | | NatTheCat quits [Read error: Connection reset by peer] |
| 03:20:26 | | NatTheCat2 is now known as NatTheCat |
| 03:27:51 | | NatTheCat quits [Ping timeout: 272 seconds] |
| 03:54:50 | | Wohlstand quits [Client Quit] |
| 04:09:38 | | useretail joins |
| 04:13:41 | | useretail quits [Client Quit] |
| 04:13:58 | | useretail joins |
| 04:17:32 | | NatTheCat (NatTheCat) joins |
| 04:18:23 | | etnguyen03 quits [Client Quit] |
| 04:25:46 | | etnguyen03 (etnguyen03) joins |
| 04:42:04 | | etnguyen03 quits [Remote host closed the connection] |
| 04:59:42 | | DogsRNice quits [Read error: Connection reset by peer] |
| 05:02:48 | | pabs quits [Ping timeout: 256 seconds] |
| 05:04:30 | | n9nes quits [Ping timeout: 256 seconds] |
| 05:05:05 | | n9nes joins |
| 05:05:59 | | pabs (pabs) joins |
| 05:32:46 | <steering> | time-- |
| 05:32:47 | <eggdrop> | [karma] 'time' now has -2 karma! |
| 05:34:45 | | gosc joins |
| 06:04:10 | | PredatorIWD25 quits [Read error: Connection reset by peer] |
| 06:12:45 | | gosc quits [Client Quit] |
| 06:13:09 | | sg72 quits [Ping timeout: 272 seconds] |
| 06:17:52 | <pabs> | klea: re .cat, probably yes |
| 06:17:59 | | PredatorIWD25 joins |
| 06:20:09 | <pabs> | justauser: did you do any .arpa jobs? |
| 06:31:19 | | useretail quits [Remote host closed the connection] |
| 06:31:53 | | useretail joins |
| 06:36:12 | | Wohlstand1 (Wohlstand) joins |
| 06:38:35 | | Wohlstand1 is now known as Wohlstand |
| 06:43:03 | | Wohlstand quits [Client Quit] |
| 06:43:17 | | sg72 joins |
| 07:00:40 | | sg72 quits [Ping timeout: 256 seconds] |
| 07:07:12 | | sg72 joins |
| 07:11:58 | | Wohlstand (Wohlstand) joins |
| 07:12:05 | | sg-72 joins |
| 07:15:51 | | sg72 quits [Ping timeout: 272 seconds] |
| 07:16:18 | | SootBector quits [Remote host closed the connection] |
| 07:17:27 | | SootBector (SootBector) joins |
| 07:39:29 | | Kotomind joins |
| 07:48:16 | | n9nes quits [Ping timeout: 256 seconds] |
| 07:48:43 | | n9nes joins |
| 07:55:45 | | n9nes quits [Ping timeout: 272 seconds] |
| 07:58:26 | | n9nes joins |
| 08:09:21 | | Island quits [Read error: Connection reset by peer] |
| 08:58:24 | | BornOn420 (BornOn420) joins |
| 09:36:16 | | Ointment8862 quits [Remote host closed the connection] |
| 09:46:53 | | Sk1d joins |
| 09:47:28 | | Sk1d quits [Client Quit] |
| 09:47:32 | | Sk1d joins |
| 10:15:53 | | Hans2026 joins |
| 10:23:34 | | Shard11 quits [Read error: Connection reset by peer] |
| 10:23:42 | | Shard11 (Shard) joins |
| 10:45:16 | | Sk1d quits [Remote host closed the connection] |
| 10:45:24 | | Sk1d joins |
| 10:53:05 | | rohvani quits [Ping timeout: 272 seconds] |
| 10:58:16 | <h2ibot> | OrIdow6 edited Roblox (+168, Reorganize the page, mostly to get rid of theā¦): https://wiki.archiveteam.org/?diff=60393&oldid=59820 |
| 11:03:00 | | NatTheCat3 (NatTheCat) joins |
| 11:04:54 | | NatTheCat quits [Ping timeout: 256 seconds] |
| 11:04:54 | | NatTheCat3 is now known as NatTheCat |
| 11:17:19 | <h2ibot> | Klea edited Roblox (+160, WBM links shouldn't use [[Template:Url]]): https://wiki.archiveteam.org/?diff=60394&oldid=60393 |
| 11:18:57 | <klea> | Maybe I should make a Template for WBM urls, so the data can be specified. |
| 11:19:17 | | Sk1d quits [Client Quit] |
| 11:30:21 | <h2ibot> | Klea edited Template:IA file (+1059, Add a private option for IA access-restriction): https://wiki.archiveteam.org/?diff=60395&oldid=25589 |
| 11:31:21 | <h2ibot> | Klea edited Roblox (+13, Set [[Template:IA file]] private=true): https://wiki.archiveteam.org/?diff=60396&oldid=60394 |
| 11:33:51 | <klea> | Should we merge [[Template:IA file]] and [[Template:IA id]]?, I believe it might be helpfull, since then you'd be able to do something like {{IA id|1=archiveteam_archivebot_go_20240911183234_5bb571eb/wattsupwiththat.wpcomstaging.com-inf-20240725-034320-tchk5-00062.warc.gz}}, and have the template show what it'd render with {{IA |
| 11:33:52 | <klea> | file|identifier=archiveteam_archivebot_go_20240911183234_5bb571eb|path=wattsupwiththat.wpcomstaging.com-inf-20240725-034320-tchk5-00062.warc.gz}}. However I'm not entirely sure how to merge those. |
| 11:39:22 | | sec^nd quits [Ping timeout: 256 seconds] |
| 11:40:39 | | sec^nd (second) joins |
| 11:47:24 | | Coderjo_ quits [Ping timeout: 256 seconds] |
| 11:47:35 | | Coderjo_ joins |
| 12:00:01 | | Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat] |
| 12:02:44 | | Bleo182600722719623455222 joins |
| 12:11:13 | | ducky_ (ducky) joins |
| 12:11:46 | | ducky quits [Ping timeout: 256 seconds] |
| 12:11:49 | | ducky_ is now known as ducky |
| 12:13:27 | <h2ibot> | Klea edited Thingiverse (+28, Add unknown status for post-2015): https://wiki.archiveteam.org/?diff=60397&oldid=56780 |
| 12:13:49 | <klea> | Should we run a project like we have for #imgone #pastalavista and similar where we try to archive every thingiverse item? |
| 12:17:26 | <klea> | And I suppose at that point maybe also for Sketchfab (Q7534755), Cults (Q62426317), and Printables.com (Q118399132). |
| 12:25:56 | | APOLLO03 quits [Ping timeout: 256 seconds] |
| 12:31:00 | | APOLLO03 joins |
| 12:43:25 | <kline> | how many objects are there? |
| 12:46:34 | <klea> | I believe 7286349 based on the latest id from https://www.thingiverse.com/?page=1&sort=newest |
| 12:47:20 | <klea> | > Hello everyone, tomorrow I'm going to try to make a lot of models in packages: American, British, and Japanese!!! |
| 12:47:20 | <klea> | Huh, maybe it's like [[Discourse]] where there may be new content added. :( |
| 12:48:12 | <kline> | they have an api but im not sure its suitable |
| 12:48:47 | <klea> | I suppose re-using https://github.com/archiveteam/thingiverse-grab might not be the entirely best idea. |
| 13:09:51 | | nine quits [Quit: See ya!] |
| 13:10:05 | | nine joins |
| 13:10:05 | | nine is now authenticated as nine |
| 13:10:05 | | nine quits [Changing host] |
| 13:10:05 | | nine (nine) joins |
| 13:12:25 | | Kotomind quits [Ping timeout: 272 seconds] |
| 13:17:27 | | etnguyen03 (etnguyen03) joins |
| 13:30:46 | | etnguyen03 quits [Client Quit] |
| 13:33:56 | | Arcorann_ quits [Ping timeout: 256 seconds] |
| 13:36:58 | | etnguyen03 (etnguyen03) joins |
| 14:10:39 | | lukash984 quits [Quit: The Lounge - https://thelounge.chat] |
| 14:50:47 | <Hans2026> | Hello, I was referred to this IRC by the ArchiveTeam FAQ (1st item, "Can you save it"). My question is, can you save SwamiJ.com? I'm just a reader, not connected to the owners. I don't know if it's "at risk", just that it appears not to have been updated in many years, and I find it to be a well-written and comprehensive resource on its subject |
| 14:50:47 | <Hans2026> | matter (traditional yoga and meditation of the Himalayan masters). It has possibly hundreds of pages (HTML, PDF, etc.). |
| 14:53:10 | | frank joins |
| 14:53:21 | <Hans2026> | It's not that I want an archive for myself, just that it would be a valuable reference for others, in case the site disappears someday. |
| 15:11:29 | <kline> | Hans2026, do you want to know a good way to start? |
| 15:11:54 | <kline> | are you on a linux system (or windows with the windows subsystem for linux)? |
| 15:15:04 | <Hans2026> | Since the FAQ referred me to this IRC, I can only assume that this is the right place to proceed. Yes, I have a laptop running Linux, and also have shell access to some Linux sites. |
| 15:15:48 | <kline> | you only need your own computer! |
| 15:16:06 | <Hans2026> | ok |
| 15:16:08 | <@OrIdow6> | kline: It sounds like they're asking for it to be run in AB or the like, not for advice on how to make warcs themselves |
| 15:16:22 | <kline> | OrIdow6, both are good |
| 15:16:38 | <klea> | sitemap.xml seems to have 2512 lines, but I think they're limiting access by UA somehow. |
| 15:16:46 | <klea> | also sending a Expires: 2106 header. |
| 15:17:04 | <kline> | Hans2026, https://wiki.archiveteam.org/index.php/Wget#Creating_WARC_with_wget has a couple of quick lines to run while a better archive is made |
| 15:17:25 | <kline> | if anything happens, you have a quick copy made with most of the info |
| 15:20:05 | <Hans2026> | I may be somewhat disk-space constrained for now, so I'm not sure I want to store an archive here. Also, as I said, this isn't for me, but for the benefit of anyone interested in that subject matter. In the FAQ item, the answer to "Can you save it" began with the word "yes", so I thought that meant the Archive Team can save it. So I guess my |
| 15:20:05 | <Hans2026> | first question is, can the Archive Team save it, or is this more of a DIY thing? |
| 15:20:12 | <@OrIdow6> | kline: I think that is more complicated than most people should have to deal with |
| 15:20:44 | <klea> | Hans2026: Yes, Archive Team can save it if you ask in #archivebot, since it doesn't seem to be that big (relatively) |
| 15:22:44 | <@OrIdow6> | Hans2026: I've started an ArchiveBot job for it, you can see the progress at http://archivebot.com/?initialFilter=swamij |
| 15:23:23 | <@OrIdow6> | After it finishes it'll be visible in web.archive.org within a few weeks (I think; the interval fluxuates sometimes) |
| 15:25:40 | <Hans2026> | As I understand, web.archive.org is the Wayback Machine. Wayback already has some captures, but I don't know if the whole site was ever captured. Is it correct to understand that, regardless of whether a whole-site capture was ever done before, the new capture will be a whole-site capture? |
| 15:28:08 | <klea> | Yes, at least with ArchiveBot captures, they don't ignore pages that have been captured. |
| 15:28:19 | <@OrIdow6> | Hans2026: Yes; there are technical caveats on what "whole-site" means but this website looks simple enough that they shouldn't apply much |
| 15:28:37 | <@OrIdow6> | As long as it can find a page by a series of links from the homepage they'll be captured |
| 15:29:00 | <@OrIdow6> | Ah sorry klea |
| 15:29:26 | <Hans2026> | Ok cool. As far as I know, they are fairly ordinary web pages (HTML with some images, and some PDFs). |
| 15:29:38 | <klea> | OrIdow6: I don't mind having others confirm the same response, don't worry too hard about it. |
| 15:30:00 | <klea> | You explained the manual ignores applied by voices or ops on #archivebot more cleanly than I would, in any case. |
| 15:32:03 | <Hans2026> | Given that the main home page was already captured many times, when I go look for the ArchiveBot capture later on, will there be a way for me to select that I want to see the ArchiveBot capture rather than other captures that were done? |
| 15:35:50 | <Hans2026> | For example, if the site is gone later on, and I want to give people a link to the whole-site capture, it would be useful to know how to point people to it. But I might not know which link on Wayback is for the whole-site capture. |
| 15:37:26 | <klea> | I believe that when web.archive.org links have a timestamp, that's normally locked in to a version in time, which since the warc's timestampped, should mean you get to see AT's version of the capture, and if the capture was somehow incomplete any further captures in the future can give you more resources. |
| 15:41:47 | | etnguyen03 quits [Client Quit] |
| 15:41:54 | | Cuphead2527480 (Cuphead2527480) joins |
| 15:50:09 | | Webuser782625 joins |
| 15:50:58 | | iknownothing joins |
| 15:51:10 | <Hans2026> | If I understand correctly, people viewing some other (incomplete) capture can click on links and still get pages that are in this capture. I guess that works for most purposes, but I'm still wondering, is there a way to specifically look for an AT capture on Wayback (either by some selector on Wayback, or by looking up the capture/upload history |
| 15:51:10 | <Hans2026> | on ArchiveTeam)? |
| 15:55:38 | <@OrIdow6> | Hans2026: You can download the raw files that we send to the Wayback Machine uses and then put them in a viewer (will be at https://archive.fart.website/archivebot/viewer/?q=SwamiJ.com , and a viewer is at https://replayweb.page/ ) |
| 15:56:40 | <@OrIdow6> | You can check the source of a page in the Wayback Machine, but there's no way to automatically restrict it to a particular source |
| 15:57:02 | <klea> | Also, what OrIdow6 sometimes can't apply for certain projects where the WARCs are Access-Restricted, so in that case you'll have to rely on clicking the about this capture button. |
| 16:00:44 | <Hans2026> | Ok thanks. This brings to mind another question. The reason I even know ArchiveTeam exists is because I looked up a website of my own on Wayback (wheelbit.net, just a hobby site of no importance) and I found that it was archived, the most recent one (2024) mentioning ArchiveTeam under "about this capture". Is there a way to find out the AT |
| 16:00:44 | <Hans2026> | capture history of this site, and also why it was done (considering that there's not really any content there)? |
| 16:09:09 | <@OrIdow6> | Hans2026: The most recent one is from a catch-all system that automatically grabs large numbers of pages linked from various places, hard to trace |
| 16:09:11 | <klea> | The about section shows collection Archive Team urls, ie #//. |
| 16:09:19 | <@OrIdow6> | Wow quite the timing |
| 16:09:28 | <klea> | Yeah, your response was better :) |
| 16:09:53 | <klea> | (I had made up a response about the CDX api response, and the metadata headers the webpage sends but that's probably too technical) |
| 16:14:25 | <Hans2026> | OK, thanks for your patience with my newbie questions, but today is the first time I know that ArchiveTeam exists. From the FAQ, I thought AT only archived sites that are somehow deemed essential for posterity, but from your explanation, it sounds like AT runs in an automated way (not limited to "curated" selections). |
| 16:16:37 | <klea> | It depends, as everything does. |
| 16:17:36 | <iknownothing> | I know Learner.org is on the radar for the 2026 sunset. It seems that only the surrounding website is saved in the wayback machine, but the main point of the website is the video content. I'm wondering if saving the video files is something that you guys do and if that is going to be considered. I know nothing just wanted to give a little awareness |
| 16:17:36 | <iknownothing> | about this. |
| 16:25:36 | | cyanbox_ joins |
| 16:28:05 | <h2ibot> | Klea edited Deathwatch (+305, /* 2026-08 */ Add Trinket.io): https://wiki.archiveteam.org/?diff=60398&oldid=60392 |
| 16:28:07 | | cyanbox quits [Ping timeout: 272 seconds] |
| 17:02:07 | | frank quits [Client Quit] |
| 17:14:01 | <kiska> | Hrm... opendiary throwing 503? |
| 17:14:08 | <kiska> | Or is it their way of telling me I am banned |
| 17:14:22 | | Webuser782625 quits [Client Quit] |
| 17:14:22 | | iknownothing quits [Client Quit] |
| 17:16:24 | <klea> | What's the best way for archiving moinmoin wikis? https://moin.vitali64.duckdns.org/ has expired cert. |
| 17:16:40 | <klea> | https://fun.dersco.re/weblog/2025/05/12/status-update/ says it'll be lost. |
| 17:17:32 | <klea> | Well, it says "docs moved to git" |
| 17:23:18 | <Hans2026> | I'm monitoring the progress of "swamij" on archivebot, and there's a lot more files than I thought. Most of them seem to be off-site, i.e., not from "swamij", but from "fbcdn", "yimg", and many others including "afternic" (a domain reseller) and "vistaprint" (an online printing service). Any idea why so many off-site URLs are being captured? |
| 17:23:18 | <Hans2026> | (I'm concerned that something might have gone wrong, as I thought this archive would capture only "swamij"... I hope the site wasn't maliciously infected with unrelated content.) |
| 17:47:50 | | Wohlstand quits [Quit: Wohlstand] |
| 17:51:43 | | Cuphead2527480 quits [Client Quit] |
| 17:52:32 | | Island joins |
| 17:59:42 | | Wohlstand (Wohlstand) joins |
| 18:07:33 | | datechnoman quits [Ping timeout: 272 seconds] |
| 18:13:57 | | Wohlstand quits [Client Quit] |
| 18:15:14 | | etnguyen03 (etnguyen03) joins |
| 18:18:21 | | gosc joins |
| 18:19:28 | | gosc quits [Client Quit] |
| 18:29:53 | | datechnoman (datechnoman) joins |
| 19:18:01 | | ax (ax) joins |
| 19:22:55 | | ducky quits [Ping timeout: 272 seconds] |
| 19:23:04 | | ducky (ducky) joins |
| 19:27:30 | <nicolas17> | maybe it should have used --no-offsite (and I wish that could be added after the fact) |
| 19:27:46 | <nicolas17> | however there's only 1700 URLs in queue and it's not growing |
| 19:27:58 | <nicolas17> | so it's not a big deal |
| 19:27:59 | | ducky quits [Ping timeout: 272 seconds] |
| 19:31:23 | <nicolas17> | opendiary ETA still march 8 |
| 19:31:56 | <nicolas17> | arkiver: ping about making opendiary do less requests, so we can increase rate limit |
| 19:40:02 | | ducky (ducky) joins |
| 19:59:46 | | etnguyen03 quits [Client Quit] |
| 20:08:29 | | Webuser698035 joins |
| 20:08:57 | | Webuser698035 quits [Client Quit] |
| 20:08:59 | | etnguyen03 (etnguyen03) joins |
| 20:25:09 | | APOLLO03 quits [Quit: .] |
| 20:31:58 | | etnguyen03 quits [Client Quit] |