00:02:33 | | le0n quits [Ping timeout: 252 seconds] |
00:07:45 | | sralracer quits [Quit: Ooops, wrong browser tab.] |
00:10:00 | | loug8318142 quits [Quit: The Lounge - https://thelounge.chat] |
00:16:28 | <h2ibot> | Nicolas17v2 edited Garnek.pl (-2, Remove leftover markup): https://wiki.archiveteam.org/?diff=53864&oldid=53861 |
00:45:07 | | Xanthon joins |
00:45:07 | | Xanthon is now authenticated as Xanthon |
00:45:07 | | Xanthon quits [Changing host] |
00:45:07 | | Xanthon (Xanthon) joins |
01:30:42 | <h2ibot> | Pokechu22 edited Deathwatch (+178, /* 2025 */ https://civilianpublicservice.org/): https://wiki.archiveteam.org/?diff=53865&oldid=53854 |
02:11:32 | | jacksonchen666 quits [Client Quit] |
04:05:44 | | etnguyen03 quits [Remote host closed the connection] |
04:39:01 | | le0n (le0n) joins |
05:06:40 | | ducky quits [Read error: Connection reset by peer] |
05:08:28 | | ducky (ducky) joins |
05:37:29 | | toastpaint joins |
05:51:20 | | toastpaint quits [Client Quit] |
06:21:23 | | ducky quits [Ping timeout: 260 seconds] |
06:27:45 | | ducky (ducky) joins |
06:33:03 | | ducky quits [Ping timeout: 260 seconds] |
06:40:19 | | ducky (ducky) joins |
06:47:44 | <h2ibot> | PaulWise edited ArchiveBot/Monitoring (+48, I2P eepsites): https://wiki.archiveteam.org/?diff=53866&oldid=53851 |
06:54:44 | | Jake quits [Quit: Leaving for a bit!] |
06:55:09 | <pabs> | ^ if anyone knows of current or former i2p inproxies, please let me know. so far I have i2p.us i2p.to www.darknetproxy.com i2phides.me i2p.mk16.de i2p-inproxy.mk16.de hiddenservice.net |
06:56:13 | | mls quits [Quit: leaving] |
07:00:40 | <pabs> | and same for tor onion inproxies, so far I have \.onion(\.(ly|to|city|cab|direct))? - hiddenservice.net - portal\.mozz\.us/[^/]+/[^/]+\.onion |
07:05:51 | | Unholy2361924645377131 (Unholy2361) joins |
07:08:22 | | Jake (Jake) joins |
07:08:25 | <qwertyasdfuiopghjkl> | I'm guessing you might also want to scan for https://en.wikipedia.org/wiki/Hyphanet urls, though I'm not entirely sure what those would look like. |
07:13:30 | <pabs> | can you add it to the ideas section on the wiki? |
07:20:42 | <qwertyasdfuiopghjkl> | I don't have an account on the wiki yet (will have time to make one once I get some other stuff done, but it'll be a while) |
07:29:21 | | Jake quits [Client Quit] |
07:29:43 | | Jake (Jake) joins |
07:39:12 | | riteo joins |
07:42:45 | <pabs> | ok will add later |
07:43:23 | <riteo> | Hi! I've been running a warrior for quite some time and since logs are kinda broken I connected by home bouncer here as I'd like to take a closer look at the various projects running (like the current ASK.fm preservation thing) |
07:43:54 | <riteo> | Hope I'm not writing in the wrong channel and that my bouncer won't quit/join too many times randomly since it's an hacky home bouncer |
07:54:50 | | riteo is now authenticated as riteo |
07:55:35 | | riteo quits [Remote host closed the connection] |
07:55:44 | | riteo (riteo) joins |
07:58:02 | <riteo> | sorry was setting up the password |
08:06:15 | | Wohlstand (Wohlstand) joins |
08:26:18 | <@OrIdow6> | Hello riteo, you can see the current projects running on the tracker homepage at https://tracker.archiveteam.org/ (except for https://tracker.archiveteam.org/garnek/ for some reason) |
08:52:02 | | BlueMaxima quits [Read error: Connection reset by peer] |
09:31:34 | <pabs> | marshallbrain.com needs an AB - died https://news.ycombinator.com/item?id=42222387 |
09:46:03 | | Naruyoko5 joins |
09:47:26 | | Mist8kenGAS (Mist8kenGAS) joins |
09:49:45 | | Naruyoko quits [Ping timeout: 260 seconds] |
10:29:37 | <h2ibot> | Manu edited Webring/fediring.net (+564, /* Queued a few more pages */): https://wiki.archiveteam.org/?diff=53867&oldid=53746 |
10:45:38 | | ducky quits [Ping timeout: 260 seconds] |
10:48:14 | | Island quits [Read error: Connection reset by peer] |
10:49:38 | | ducky (ducky) joins |
11:13:00 | | Commander001 quits [Ping timeout: 252 seconds] |
11:13:31 | | Commander001 joins |
11:35:20 | | rappet quits [Quit: https://quassel-irc.org - Komfortabler Chat. Überall.] |
11:36:44 | | MrMcNuggets (MrMcNuggets) joins |
11:37:19 | | rappet (rappet) joins |
12:00:02 | | Bleo182600722719623 quits [Quit: The Lounge - https://thelounge.chat] |
12:00:11 | | Guest77 joins |
12:00:12 | <eggdrop> | [tell] Guest77: [2024-05-03T02:12:30Z] <TheTechRobo> https://replayweb.page is a convenient site for viewing WARCs. |
12:00:13 | <eggdrop> | [tell] Guest77: [2024-05-03T03:19:57Z] <thuban> if you want to extract files en masse rather than browse them, you can use `unar`: https://theunarchiver.com/command-line |
12:00:14 | <eggdrop> | [tell] Guest77: [2024-05-03T03:26:02Z] <thuban> note that because warc records are not necessarily one-to-one with uris (there may be multiple requests for the same location at different times, with different data, etc), simply extracting the whole thing like it's a zip may be 'lossy'; i don't know specifically how unar handles this |
12:00:49 | <Guest77> | thank you very much thuban |
12:01:05 | <Guest77> | Hello, i have a question. i am trying to archive using grab-site, which uses wpull |
12:01:26 | <Guest77> | wpull is somewhat a wget analog, but for some reason, i am unable to pass "--convert-links" or "-k"(alias) |
12:02:02 | <Guest77> | i mean, i want to extract my .warc and then navigate offline through the files, but this approach does not let me do so, the links point to the original online URL site |
12:02:48 | | Bleo182600722719623 joins |
12:03:21 | | NF885 (NF885) joins |
12:03:44 | <NF885> | hi. looks like https://tracker.archiveteam.org/garnek/ isn't linked from the main tracker page |
12:04:37 | <NF885> | unless it's intentional given that it'll only run for about a day? |
12:04:43 | | Mist8kenGAS quits [Remote host closed the connection] |
12:07:54 | <nstrom|m> | quite possibly, it was a rush job |
12:08:08 | <nstrom|m> | hat one is hitting the tracker rate limit already so doesn't really need more workers at the moment, though if limit is raised that may change |
12:10:15 | <NF885> | ah ok |
12:10:33 | <NF885> | I didn't know whether somebody forgot to list it |
12:15:35 | | Matthww quits [Quit: The Lounge - https://thelounge.chat] |
12:20:09 | | sralracer (sralracer) joins |
12:23:30 | <Guest77> | hmm, it seems that wpull does not have -k for some reason |
12:25:30 | | le0n quits [Ping timeout: 260 seconds] |
12:35:13 | | Guest77 quits [Client Quit] |
12:36:02 | | le0n (le0n) joins |
12:39:01 | | SkilledAlpaca41896 quits [Quit: SkilledAlpaca41896] |
12:40:47 | | SkilledAlpaca41896 joins |
12:49:13 | | Guest77 joins |
12:53:33 | | NF885 quits [Client Quit] |
12:57:35 | | lennier2 quits [Ping timeout: 260 seconds] |
12:58:57 | | lennier2 joins |
13:11:12 | | le0n quits [Client Quit] |
13:13:46 | | Guest77 quits [Client Quit] |
13:14:23 | | le0n (le0n) joins |
13:21:27 | | Matthww joins |
14:03:11 | <thuban> | !tell Guest77 base wpull does have `-k`, but the fork used for grab-site does not (it's supposed to produce accurate warcs and modifying the content would compromise their integrity). is there some reason browsing using a warc viewer is not adequate? |
14:03:11 | <eggdrop> | [tell] ok, I'll tell Guest77 when they join next |
14:07:47 | | JaffaCakes118 (JaffaCakes118) joins |
14:11:02 | <JaffaCakes118> | Could someone archive https://leilabuftonw8.wixsite.com/leila-s-portfolio with archivebot please (no coverage) |
14:16:29 | <thuban> | JaffaCakes118: sure |
14:17:16 | <JaffaCakes118> | ty |
14:17:44 | | JaffaCakes118 quits [Client Quit] |
14:25:55 | <myself> | oh this is neat, Warrior gets a mention in this list of "do-good projects" to run on spare servers: https://github.com/ArchiveBox/good-karma-kit |
14:29:04 | | Guest77 joins |
14:29:05 | <eggdrop> | [tell] Guest77: [2024-11-24T14:03:11Z] <thuban> base wpull does have `-k`, but the fork used for grab-site does not (it's supposed to produce accurate warcs and modifying the content would compromise their integrity). is there some reason browsing using a warc viewer is not adequate? |
14:33:12 | <Guest77> | i wanted to see them in the browser offline, i am testing webrecorder-player now |
14:34:13 | <thuban> | Guest77: i'd recommend replayweb.page |
14:37:32 | | etnguyen03 (etnguyen03) joins |
14:45:30 | <Guest77> | hmm, maybe i'm overcomplicating things |
14:45:59 | <Guest77> | i've been looking at the website and its possible to have one as that one using "wabac.js" |
14:46:27 | <Guest77> | do i need a whole server to run it on my side? or an http server would be enough?(python http.server) |
14:49:52 | <Guest77> | i've been testing the appimages,they work but there are some limitations(no search bar for example) |
14:52:05 | <thuban> | i guess i'm still not sure exactly what your requirements are |
14:58:44 | <Guest77> | https://replayweb.page/ but offline |
14:59:41 | <Guest77> | this site leads me to wabac.js which is the backend used in there |
15:00:31 | <Guest77> | ReplayWeb.page has one appimage but that loads an electron UI which has no search field |
15:07:03 | | etnguyen03 quits [Client Quit] |
15:08:57 | <thuban> | i believe wabac.js is not sufficient for browsing as it doesn't do any client-side rewriting, but i've never deployed replayweb.page locally and can't really help you with usage. have you tried clicking the 'browse contents' icon in the upper left to open the panel? |
15:11:39 | <Guest77> | hmm, i think something useful |
15:12:10 | <Guest77> | activating View>Toggle developer tools shows me a search field(searchs the source) |
15:12:26 | <Guest77> | i can jump to what keyword i like with that, so its solved |
15:12:28 | <Guest77> | thanks thuban |
15:12:40 | <Guest77> | (replayweb appimage) |
15:24:18 | | systwi_ joins |
15:38:35 | | etnguyen03 (etnguyen03) joins |
16:59:18 | <nicolas17> | rewriting links while archiving is a bad idea and any proper warc-producing tool should stop you from doing so |
16:59:29 | <nicolas17> | links should be rewritten while replaying the warc |
17:07:07 | | Wohlstand quits [Quit: Wohlstand] |
17:25:47 | <thuban> | it would probably be helpful to document which options are absent in grab-site's wpull for that reason, though--as it is the ludios_wpull repo says it "should go away when wpull is similarly improved", so there's no indication that there are intentional differences at all |
17:29:04 | <@JAA> | wpull can save both to WARC and to plain files. The link conversion stuff should only apply to the latter. |
17:31:59 | <thuban> | i expect that's the wpull behavior, yeah, in which case it would be safe to restore grab-site's access to `-k`--but it ought to be documented regardless |
17:40:15 | <@JAA> | I mean, grab-site always uses --delete-after, so I'm not sure that makes sense anyway. |
17:40:31 | <@JAA> | It'd be more reasonable to have a tool that can convert a WARC to a -k-like structure. |
17:42:00 | <h2ibot> | Imer edited Garnek.pl (+415, Add note about image servers): https://wiki.archiveteam.org/?diff=53868&oldid=53864 |
17:45:01 | <h2ibot> | Exorcism edited Garnek.pl (+0): https://wiki.archiveteam.org/?diff=53869&oldid=53868 |
17:51:11 | | ducky quits [Read error: Connection reset by peer] |
17:52:28 | | ducky (ducky) joins |
17:54:11 | | MrMcNuggets quits [Quit: WeeChat 4.3.2] |
18:04:22 | | Webuser961446 joins |
18:05:48 | <thuban> | Webuser961446: ah, ok. (i wondered about that .com) |
18:06:03 | <Webuser961446> | :) |
18:15:47 | | etnguyen03 quits [Client Quit] |
18:22:24 | | Guest77 quits [Quit: -] |
18:27:07 | | etnguyen03 (etnguyen03) joins |
18:50:55 | | murb quits [Quit: gone] |
18:53:29 | | murb (murb) joins |
19:01:11 | | etnguyen03 quits [Client Quit] |
19:06:51 | | Radzig quits [Quit: ZNC 1.9.1 - https://znc.in] |
19:07:19 | | Radzig joins |
19:17:51 | <szczot3k> | thuban: continuing the drama - archivebot doesn't have SSL |
19:18:42 | <szczot3k> | s/the drama/the HTTPS drama |
19:22:37 | | Guest45 joins |
19:23:57 | | Guest45 is now known as NeonGlitch |
19:24:10 | <szczot3k> | Was there any talk about archiving GoldenLine.pl? It's a polish job-seeking site, including employers profile, with people opinions. |
19:24:28 | <szczot3k> | It's shutting down at the end of the month |
19:25:06 | <szczot3k> | It's been up for twenty years - major media are talking about it shutting down after this many years. https://businessinsider.com.pl/wiadomosci/koniec-goldenline-agora-zamyka-kultowy-polski-serwis/6hhbx2r (in polish) |
19:31:18 | | th3z0l4_ quits [Ping timeout: 252 seconds] |
19:32:15 | | th3z0l4 joins |
19:35:27 | | etnguyen03 (etnguyen03) joins |
19:41:35 | | BlueMaxima joins |
19:54:18 | <Alienmaster|m> | What happend to the twitter data on archive? |
20:10:07 | <nicolas17> | https://wiki.archiveteam.org/index.php/Garnek.pl we may need more IPs on this project |
20:56:30 | | nicolas17 quits [Ping timeout: 260 seconds] |
20:59:53 | | nicolas17_ joins |
21:00:36 | <h2ibot> | JAABot edited CurrentWarriorProject (+5): https://wiki.archiveteam.org/?diff=53870&oldid=53803 |
21:01:59 | | nicolas17_ is now known as nicolas17 |
21:02:06 | | nicolas17 is now authenticated as nicolas17 |
21:02:13 | <nicolas17> | went too hard on askfm and my piece of shit modem rebooted |
21:20:33 | <szczot3k> | How important is a residential connection for warrior? Is running it from a 'hosting' ISP a bad idea? (OVH/Hetzner/anything like that) |
21:21:02 | <szczot3k> | From my experience from trying to run a VPN over OVH I can see that a lot of webistes will either block it, or slap a captcha on it |
21:23:42 | <szczot3k> | Got buyvm, ifog.ch, ovh (with a few IPs), and some other providers I can spare bandwith from for warrior. Second thing - I run a (middle) tor relay at home, so I wonder if this would skew the results |
21:25:47 | <lennier2> | It really depends on the project, I'd say. A lot of people use Hetzner, Digital Ocean, etc. (Though they're probably usually using Docker to run multiple projects and higher concurrency.) |
21:26:22 | <lennier2> | I'd say try on something like askfm and just make sure you're uploading items. |
21:30:42 | <szczot3k> | Any benefits to running the virtual appliance over docker? |
21:31:49 | <datechnoman> | It really comes down to your technical expertise/knowledge and concurrency you want to run. The warrior appliance is limited to 6 concurrent sessions where the docker container can run 20 |
21:32:09 | <lennier2> | You can just leave Warrior running set to a default project, while Docker you need to start each project manually. |
21:34:11 | <lennier2> | I didn't think it was that hard to set up, though. And now I can, for example, run 2 containers with 15 concurrency each on askfm. And I was going to run garnek on top of that except my ISP is apparently blocked on that one. |
21:34:13 | <lennier2> | https://wiki.archiveteam.org/index.php/Running_Archive_Team_Projects_with_Docker |
21:35:07 | <lennier2> | But the concurrency you can run without being blocked varies by project. |
21:36:55 | <szczot3k> | While I appreciate docker, having to spin up the containers manually is tough |
21:42:05 | | Island joins |
21:42:05 | <lennier2> | I guess people use scripts if they have a bunch of IPs. But Warrior is definitely the set it up once then forget about it option. |
21:43:53 | <lennier2> | Or set up Warrior then supplement with Docker if you hear about a particular project like askfm or garnek that needs workers. |
21:48:29 | <szczot3k> | lennier2 the last option seems like a good idea, will prepare my env for it tomorrow |
21:49:42 | | hackbug quits [Remote host closed the connection] |
21:54:22 | <lennier2> | Cool, thanks for helping! |
22:11:05 | <szczot3k> | VM template I made has 2GB of disk space after cloning, gparted kernel panics. It really sounds like a project for tomorrow. |
22:13:48 | | NeonGlitch quits [Client Quit] |
22:14:10 | | Guest45 joins |
22:15:25 | | Guest45 quits [Client Quit] |
22:28:04 | | hackbug (hackbug) joins |
22:33:58 | <h2ibot> | OrIdow6 edited Cohost (+539, Updates): https://wiki.archiveteam.org/?diff=53871&oldid=53760 |
22:37:07 | | franga2000 leaves [The Lounge - https://thelounge.chat] |
22:38:14 | | Guest45 joins |
22:39:22 | | Guest45 quits [Client Quit] |
22:41:40 | <@OrIdow6> | Holy cow so many messages |
22:42:15 | <szczot3k> | Nvm, got it working. So what would be the best safe concurency of askfm container? |
22:42:33 | <szczot3k> | I run three separate instances, with different ip (on the same subnet tho). |
22:45:23 | <@OrIdow6> | pabs: Looks like that was indeed run in AB |
23:01:46 | <lennier2> | szczot3k: You should be able to run 20 per IP. I actually have 30 on my IP (need two containers to get that high). I'm just using a single IP, though, so not 100% sure about the subnet. The channel for that is #dontaskfm if you want to see if anyone has more comments on that. |
23:05:47 | | hackbug quits [Client Quit] |
23:07:25 | | Naruyoko5 quits [Read error: Connection reset by peer] |
23:08:17 | | Naruyoko joins |
23:24:30 | | hackbug (hackbug) joins |
23:44:20 | <pabs> | arkiver: idea for a DPoS project: https://www.mail-archive.com/ and https://marc.info/ |
23:46:02 | <pabs> | after what happened with Gmane and the general decline in mailing lists, I worry about these two giant archives |
23:46:10 | | etnguyen03 quits [Quit: Konversation terminated!] |
23:48:10 | <pabs> | OrIdow6: unfortunately not a few of his other websites tho, so I'm looking at those |