00:02:33le0n quits [Ping timeout: 252 seconds]
00:07:45sralracer quits [Quit: Ooops, wrong browser tab.]
00:10:00loug8318142 quits [Quit: The Lounge - https://thelounge.chat]
00:16:28<h2ibot>Nicolas17v2 edited Garnek.pl (-2, Remove leftover markup): https://wiki.archiveteam.org/?diff=53864&oldid=53861
00:45:07Xanthon joins
00:45:07Xanthon quits [Changing host]
00:45:07Xanthon (Xanthon) joins
01:30:42<h2ibot>Pokechu22 edited Deathwatch (+178, /* 2025 */ https://civilianpublicservice.org/): https://wiki.archiveteam.org/?diff=53865&oldid=53854
02:11:32jacksonchen666 quits [Client Quit]
04:05:44etnguyen03 quits [Remote host closed the connection]
04:39:01le0n (le0n) joins
05:06:40ducky quits [Read error: Connection reset by peer]
05:08:28ducky (ducky) joins
05:37:29toastpaint joins
05:51:20toastpaint quits [Client Quit]
06:21:23ducky quits [Ping timeout: 260 seconds]
06:27:45ducky (ducky) joins
06:33:03ducky quits [Ping timeout: 260 seconds]
06:40:19ducky (ducky) joins
06:47:44<h2ibot>PaulWise edited ArchiveBot/Monitoring (+48, I2P eepsites): https://wiki.archiveteam.org/?diff=53866&oldid=53851
06:54:44Jake quits [Quit: Leaving for a bit!]
06:55:09<pabs>^ if anyone knows of current or former i2p inproxies, please let me know. so far I have i2p.us i2p.to www.darknetproxy.com i2phides.me i2p.mk16.de i2p-inproxy.mk16.de hiddenservice.net
06:56:13mls quits [Quit: leaving]
07:00:40<pabs>and same for tor onion inproxies, so far I have \.onion(\.(ly|to|city|cab|direct))? - hiddenservice.net - portal\.mozz\.us/[^/]+/[^/]+\.onion
07:05:51Unholy2361924645377131 (Unholy2361) joins
07:08:22Jake (Jake) joins
07:08:25<qwertyasdfuiopghjkl>I'm guessing you might also want to scan for https://en.wikipedia.org/wiki/Hyphanet urls, though I'm not entirely sure what those would look like.
07:13:30<pabs>can you add it to the ideas section on the wiki?
07:20:42<qwertyasdfuiopghjkl>I don't have an account on the wiki yet (will have time to make one once I get some other stuff done, but it'll be a while)
07:29:21Jake quits [Client Quit]
07:29:43Jake (Jake) joins
07:39:12riteo joins
07:42:45<pabs>ok will add later
07:43:23<riteo>Hi! I've been running a warrior for quite some time and since logs are kinda broken I connected by home bouncer here as I'd like to take a closer look at the various projects running (like the current ASK.fm preservation thing)
07:43:54<riteo>Hope I'm not writing in the wrong channel and that my bouncer won't quit/join too many times randomly since it's an hacky home bouncer
07:55:35riteo quits [Remote host closed the connection]
07:55:44riteo (riteo) joins
07:58:02<riteo>sorry was setting up the password
08:06:15Wohlstand (Wohlstand) joins
08:26:18<@OrIdow6>Hello riteo, you can see the current projects running on the tracker homepage at https://tracker.archiveteam.org/ (except for https://tracker.archiveteam.org/garnek/ for some reason)
08:52:02BlueMaxima quits [Read error: Connection reset by peer]
09:31:34<pabs>marshallbrain.com needs an AB - died https://news.ycombinator.com/item?id=42222387
09:46:03Naruyoko5 joins
09:47:26Mist8kenGAS (Mist8kenGAS) joins
09:49:45Naruyoko quits [Ping timeout: 260 seconds]
10:29:37<h2ibot>Manu edited Webring/fediring.net (+564, /* Queued a few more pages */): https://wiki.archiveteam.org/?diff=53867&oldid=53746
10:45:38ducky quits [Ping timeout: 260 seconds]
10:48:14Island quits [Read error: Connection reset by peer]
10:49:38ducky (ducky) joins
11:13:00Commander001 quits [Ping timeout: 252 seconds]
11:13:31Commander001 joins
11:35:20rappet quits [Quit: https://quassel-irc.org - Komfortabler Chat. Überall.]
11:36:44MrMcNuggets (MrMcNuggets) joins
11:37:19rappet (rappet) joins
12:00:02Bleo182600722719623 quits [Quit: The Lounge - https://thelounge.chat]
12:00:11Guest77 joins
12:00:12<eggdrop>[tell] Guest77: [2024-05-03T02:12:30Z] <TheTechRobo> https://replayweb.page is a convenient site for viewing WARCs.
12:00:13<eggdrop>[tell] Guest77: [2024-05-03T03:19:57Z] <thuban> if you want to extract files en masse rather than browse them, you can use `unar`: https://theunarchiver.com/command-line
12:00:14<eggdrop>[tell] Guest77: [2024-05-03T03:26:02Z] <thuban> note that because warc records are not necessarily one-to-one with uris (there may be multiple requests for the same location at different times, with different data, etc), simply extracting the whole thing like it's a zip may be 'lossy'; i don't know specifically how unar handles this
12:00:49<Guest77>thank you very much thuban
12:01:05<Guest77>Hello, i have a question. i am trying to archive using grab-site, which uses wpull
12:01:26<Guest77>wpull is somewhat a wget analog, but for some reason, i am unable to pass "--convert-links" or "-k"(alias)
12:02:02<Guest77>i mean, i want to extract my .warc and then navigate offline through the files, but this approach does not let me do so, the links point to the original online URL site
12:02:48Bleo182600722719623 joins
12:03:21NF885 (NF885) joins
12:03:44<NF885>hi. looks like https://tracker.archiveteam.org/garnek/ isn't linked from the main tracker page
12:04:37<NF885>unless it's intentional given that it'll only run for about a day?
12:04:43Mist8kenGAS quits [Remote host closed the connection]
12:07:54<nstrom|m>quite possibly, it was a rush job
12:08:08<nstrom|m>hat one is hitting the tracker rate limit already so doesn't really need more workers at the moment, though if limit is raised that may change
12:10:15<NF885>ah ok
12:10:33<NF885>I didn't know whether somebody forgot to list it
12:15:35Matthww quits [Quit: The Lounge - https://thelounge.chat]
12:20:09sralracer (sralracer) joins
12:23:30<Guest77>hmm, it seems that wpull does not have -k for some reason
12:25:30le0n quits [Ping timeout: 260 seconds]
12:35:13Guest77 quits [Client Quit]
12:36:02le0n (le0n) joins
12:39:01SkilledAlpaca41896 quits [Quit: SkilledAlpaca41896]
12:40:47SkilledAlpaca41896 joins
12:49:13Guest77 joins
12:53:33NF885 quits [Client Quit]
12:57:35lennier2 quits [Ping timeout: 260 seconds]
12:58:57lennier2 joins
13:11:12le0n quits [Client Quit]
13:13:46Guest77 quits [Client Quit]
13:14:23le0n (le0n) joins
13:21:27Matthww joins
14:03:11<thuban>!tell Guest77 base wpull does have `-k`, but the fork used for grab-site does not (it's supposed to produce accurate warcs and modifying the content would compromise their integrity). is there some reason browsing using a warc viewer is not adequate?
14:03:11<eggdrop>[tell] ok, I'll tell Guest77 when they join next
14:07:47JaffaCakes118 (JaffaCakes118) joins
14:11:02<JaffaCakes118>Could someone archive https://leilabuftonw8.wixsite.com/leila-s-portfolio with archivebot please (no coverage)
14:16:29<thuban>JaffaCakes118: sure
14:17:16<JaffaCakes118>ty
14:17:44JaffaCakes118 quits [Client Quit]
14:25:55<myself>oh this is neat, Warrior gets a mention in this list of "do-good projects" to run on spare servers: https://github.com/ArchiveBox/good-karma-kit
14:29:04Guest77 joins
14:29:05<eggdrop>[tell] Guest77: [2024-11-24T14:03:11Z] <thuban> base wpull does have `-k`, but the fork used for grab-site does not (it's supposed to produce accurate warcs and modifying the content would compromise their integrity). is there some reason browsing using a warc viewer is not adequate?
14:33:12<Guest77>i wanted to see them in the browser offline, i am testing webrecorder-player now
14:34:13<thuban>Guest77: i'd recommend replayweb.page
14:37:32etnguyen03 (etnguyen03) joins
14:45:30<Guest77>hmm, maybe i'm overcomplicating things
14:45:59<Guest77>i've been looking at the website and its possible to have one as that one using "wabac.js"
14:46:27<Guest77>do i need a whole server to run it on my side? or an http server would be enough?(python http.server)
14:49:52<Guest77>i've been testing the appimages,they work but there are some limitations(no search bar for example)
14:52:05<thuban>i guess i'm still not sure exactly what your requirements are
14:58:44<Guest77>https://replayweb.page/ but offline
14:59:41<Guest77>this site leads me to wabac.js which is the backend used in there
15:00:31<Guest77>ReplayWeb.page has one appimage but that loads an electron UI which has no search field
15:07:03etnguyen03 quits [Client Quit]
15:08:57<thuban>i believe wabac.js is not sufficient for browsing as it doesn't do any client-side rewriting, but i've never deployed replayweb.page locally and can't really help you with usage. have you tried clicking the 'browse contents' icon in the upper left to open the panel?
15:11:39<Guest77>hmm, i think something useful
15:12:10<Guest77>activating View>Toggle developer tools shows me a search field(searchs the source)
15:12:26<Guest77>i can jump to what keyword i like with that, so its solved
15:12:28<Guest77>thanks thuban
15:12:40<Guest77>(replayweb appimage)
15:24:18systwi_ joins
15:38:35etnguyen03 (etnguyen03) joins
16:59:18<nicolas17>rewriting links while archiving is a bad idea and any proper warc-producing tool should stop you from doing so
16:59:29<nicolas17>links should be rewritten while replaying the warc
17:07:07Wohlstand quits [Quit: Wohlstand]
17:25:47<thuban>it would probably be helpful to document which options are absent in grab-site's wpull for that reason, though--as it is the ludios_wpull repo says it "should go away when wpull is similarly improved", so there's no indication that there are intentional differences at all
17:29:04<@JAA>wpull can save both to WARC and to plain files. The link conversion stuff should only apply to the latter.
17:31:59<thuban>i expect that's the wpull behavior, yeah, in which case it would be safe to restore grab-site's access to `-k`--but it ought to be documented regardless
17:40:15<@JAA>I mean, grab-site always uses --delete-after, so I'm not sure that makes sense anyway.
17:40:31<@JAA>It'd be more reasonable to have a tool that can convert a WARC to a -k-like structure.
17:42:00<h2ibot>Imer edited Garnek.pl (+415, Add note about image servers): https://wiki.archiveteam.org/?diff=53868&oldid=53864
17:45:01<h2ibot>Exorcism edited Garnek.pl (+0): https://wiki.archiveteam.org/?diff=53869&oldid=53868
17:51:11ducky quits [Read error: Connection reset by peer]
17:52:28ducky (ducky) joins
17:54:11MrMcNuggets quits [Quit: WeeChat 4.3.2]
18:04:22Webuser961446 joins
18:05:48<thuban>Webuser961446: ah, ok. (i wondered about that .com)
18:06:03<Webuser961446>:)
18:15:47etnguyen03 quits [Client Quit]
18:22:24Guest77 quits [Quit: -]
18:27:07etnguyen03 (etnguyen03) joins
18:50:55murb quits [Quit: gone]
18:53:29murb (murb) joins
19:01:11etnguyen03 quits [Client Quit]
19:06:51Radzig quits [Quit: ZNC 1.9.1 - https://znc.in]
19:07:19Radzig joins
19:17:51<szczot3k>thuban: continuing the drama - archivebot doesn't have SSL
19:18:42<szczot3k>s/the drama/the HTTPS drama
19:22:37Guest45 joins
19:23:57Guest45 is now known as NeonGlitch
19:24:10<szczot3k>Was there any talk about archiving GoldenLine.pl? It's a polish job-seeking site, including employers profile, with people opinions.
19:24:28<szczot3k>It's shutting down at the end of the month
19:25:06<szczot3k>It's been up for twenty years - major media are talking about it shutting down after this many years. https://businessinsider.com.pl/wiadomosci/koniec-goldenline-agora-zamyka-kultowy-polski-serwis/6hhbx2r (in polish)
19:31:18th3z0l4_ quits [Ping timeout: 252 seconds]
19:32:15th3z0l4 joins
19:35:27etnguyen03 (etnguyen03) joins
19:41:35BlueMaxima joins
19:54:18<Alienmaster|m>What happend to the twitter data on archive?
20:10:07<nicolas17>https://wiki.archiveteam.org/index.php/Garnek.pl we may need more IPs on this project
20:56:30nicolas17 quits [Ping timeout: 260 seconds]
20:59:53nicolas17_ joins
21:00:36<h2ibot>JAABot edited CurrentWarriorProject (+5): https://wiki.archiveteam.org/?diff=53870&oldid=53803
21:01:59nicolas17_ is now known as nicolas17
21:02:13<nicolas17>went too hard on askfm and my piece of shit modem rebooted
21:20:33<szczot3k>How important is a residential connection for warrior? Is running it from a 'hosting' ISP a bad idea? (OVH/Hetzner/anything like that)
21:21:02<szczot3k>From my experience from trying to run a VPN over OVH I can see that a lot of webistes will either block it, or slap a captcha on it
21:23:42<szczot3k>Got buyvm, ifog.ch, ovh (with a few IPs), and some other providers I can spare bandwith from for warrior. Second thing - I run a (middle) tor relay at home, so I wonder if this would skew the results
21:25:47<lennier2>It really depends on the project, I'd say. A lot of people use Hetzner, Digital Ocean, etc. (Though they're probably usually using Docker to run multiple projects and higher concurrency.)
21:26:22<lennier2>I'd say try on something like askfm and just make sure you're uploading items.
21:30:42<szczot3k>Any benefits to running the virtual appliance over docker?
21:31:49<datechnoman>It really comes down to your technical expertise/knowledge and concurrency you want to run. The warrior appliance is limited to 6 concurrent sessions where the docker container can run 20
21:32:09<lennier2>You can just leave Warrior running set to a default project, while Docker you need to start each project manually.
21:34:11<lennier2>I didn't think it was that hard to set up, though. And now I can, for example, run 2 containers with 15 concurrency each on askfm. And I was going to run garnek on top of that except my ISP is apparently blocked on that one.
21:34:13<lennier2>https://wiki.archiveteam.org/index.php/Running_Archive_Team_Projects_with_Docker
21:35:07<lennier2>But the concurrency you can run without being blocked varies by project.
21:36:55<szczot3k>While I appreciate docker, having to spin up the containers manually is tough
21:42:05Island joins
21:42:05<lennier2>I guess people use scripts if they have a bunch of IPs. But Warrior is definitely the set it up once then forget about it option.
21:43:53<lennier2>Or set up Warrior then supplement with Docker if you hear about a particular project like askfm or garnek that needs workers.
21:48:29<szczot3k>lennier2 the last option seems like a good idea, will prepare my env for it tomorrow
21:49:42hackbug quits [Remote host closed the connection]
21:54:22<lennier2>Cool, thanks for helping!
22:11:05<szczot3k>VM template I made has 2GB of disk space after cloning, gparted kernel panics. It really sounds like a project for tomorrow.
22:13:48NeonGlitch quits [Client Quit]
22:14:10Guest45 joins
22:15:25Guest45 quits [Client Quit]
22:28:04hackbug (hackbug) joins
22:33:58<h2ibot>OrIdow6 edited Cohost (+539, Updates): https://wiki.archiveteam.org/?diff=53871&oldid=53760
22:37:07franga2000 leaves [The Lounge - https://thelounge.chat]
22:38:14Guest45 joins
22:39:22Guest45 quits [Client Quit]
22:41:40<@OrIdow6>Holy cow so many messages
22:42:15<szczot3k>Nvm, got it working. So what would be the best safe concurency of askfm container?
22:42:33<szczot3k>I run three separate instances, with different ip (on the same subnet tho).
22:45:23<@OrIdow6>pabs: Looks like that was indeed run in AB
23:01:46<lennier2>szczot3k: You should be able to run 20 per IP. I actually have 30 on my IP (need two containers to get that high). I'm just using a single IP, though, so not 100% sure about the subnet. The channel for that is #dontaskfm if you want to see if anyone has more comments on that.
23:05:47hackbug quits [Client Quit]
23:07:25Naruyoko5 quits [Read error: Connection reset by peer]
23:08:17Naruyoko joins
23:24:30hackbug (hackbug) joins
23:44:20<pabs>arkiver: idea for a DPoS project: https://www.mail-archive.com/ and https://marc.info/
23:46:02<pabs>after what happened with Gmane and the general decline in mailing lists, I worry about these two giant archives
23:46:10etnguyen03 quits [Quit: Konversation terminated!]
23:48:10<pabs>OrIdow6: unfortunately not a few of his other websites tho, so I'm looking at those