00:14:00pabs quits [Client Quit]
00:22:24pabs (pabs) joins
00:34:16<@OrIdow6>arkiver: Here is what I have for the "framework" for clara.io so far - nearly complete, just need to add in users and README/update wget version/etc. - due to the situation with 2 different license files I've made it private but shared it with Arkiver2 https://github.com/OrIdow6/clara-with-framework
00:36:41<@OrIdow6>As an example "item" there is view_model:%7b%22url%22%3a%22https%3a%2f%2fclara%2eio%2fview%2fe4dba886%2d340d%2d4c4f%2dba14%2d55312b1452de%22%7d
00:38:41<@OrIdow6>The dodgiest thing I do is in Framework/queue_item.lua , fill_in_defaults, where I set the requests (the output of this function ultimately is an item in a get_urls list) to have a zero-length body and manually clear the headers relating to the body, in order to get wget not to deduplicate them itself
00:42:56<@OrIdow6>Wait, that may have an issue
00:56:15<@OrIdow6>Or I just can't read
01:05:40<ivan>signs of Twitter breaking more. force logged out, can't log back in. {"errors":[{"message":"Over capacity","code":130}]}
01:06:53fishingforsoup joins
01:07:22fishingforsoup_ joins
01:08:35<ivan>https://twitter.com/ZoeSchiffer/status/1606408842417512455
01:11:48fishingforsoup quits [Ping timeout: 265 seconds]
01:12:39<@JAA>I thought they only had two datacentres? So if they're closing one and downsizing the other...
01:13:47<@JAA>At least that's what I vaguely remember reading a few years ago.
01:15:37<ivan>the login page DoSes the site by trying the request again in fairly rapid succession
01:16:11<ivan>where am I going to post now?? github?
01:18:26HackMii_ quits [Ping timeout: 245 seconds]
01:21:00HackMii_ (hacktheplanet) joins
01:21:05<@arkiver>OrIdow6: i see the repo
01:27:26<@arkiver>well let me know when it's ready and we can get it started
01:28:43fishingforsoup_ quits [Ping timeout: 265 seconds]
01:30:47<@arkiver>the framework seems like a pretty big step away from what is currently usually used
01:52:14systwi quits [Ping timeout: 264 seconds]
02:09:05eroc1990 quits [Remote host closed the connection]
02:09:28eroc1990 (eroc1990) joins
02:31:25<flashfire42>Well Obscure Gamers just died again
02:37:29AngryBunny joins
02:43:21BlueMaxima quits [Read error: Connection reset by peer]
03:06:00systwi (systwi) joins
03:32:25TheTechRobo68 joins
03:32:29TheTechRobo68 is now known as AlsoTheTechRobo
03:32:44<AlsoTheTechRobo>Did anything happen with LGTM.com?
03:32:51<AlsoTheTechRobo>Looks like it's closed.
03:33:29<AlsoTheTechRobo>Also, any news on Clara.io?
03:38:02<AlsoTheTechRobo>(I'll check logs later, wish my desktop was available)
03:38:05AlsoTheTechRobo quits [Remote host closed the connection]
03:51:34monoxane quits [Quit: Ping timeout (120 seconds)]
03:51:48monoxane (monoxane) joins
05:10:23wyatt8740 joins
05:11:44wyatt8750 quits [Read error: Connection reset by peer]
05:12:59wyatt8750 joins
05:14:51wyatt8740 quits [Ping timeout: 250 seconds]
05:17:27wyatt8750 quits [Ping timeout: 250 seconds]
05:17:39wyatt8740 joins
05:25:41wyatt8740 quits [Ping timeout: 250 seconds]
05:35:33wyatt8740 joins
06:03:10hackbug quits [Read error: Connection reset by peer]
06:18:25hackbug (hackbug) joins
06:38:15G4te_Keep3r349 quits [Client Quit]
06:49:24G4te_Keep3r349 joins
06:49:52<h2ibot>OrIdow6 uploaded File:Clara IO logo.png (Logo of clara.io shutting down end of 2022): https://wiki.archiveteam.org/?title=File%3AClara%20IO%20logo.png
06:50:52<h2ibot>OrIdow6 edited Clara.io (+27, Add logo): https://wiki.archiveteam.org/?diff=49306&oldid=49214
07:22:22Island quits [Read error: Connection reset by peer]
07:47:36Ketchup901 quits [Ping timeout: 245 seconds]
07:47:58Ketchup901 (Ketchup901) joins
07:53:29<@OrIdow6>arkiver: It's an experiment
07:54:29<@OrIdow6>The main thing I intend this to maybe be used for is sites like Wikidot where there are many tens of different page types that need to be handled individually
07:54:35<@OrIdow6>Makes it more managable
07:54:41<@OrIdow6>But it is more tedious to write
07:54:57<@OrIdow6>I hope it would make it more managable, that is
07:55:57TheTechRobo quits [Read error: Connection reset by peer]
07:56:39<@OrIdow6>Anyhow it's ready (unless there are further pings), needs a backfeed key at Framework/backfeed.lua line 44
07:58:42<@OrIdow6>I haven't done discovery yet but it should be able to spider around quite a bit starting with user_main:%7b%22url%22%3a%22https%3a%2f%2fclara%2eio%2fuser%2fbhouston%22%7d
08:20:56sec^nd quits [Ping timeout: 245 seconds]
08:25:49sec^nd (second) joins
08:54:16sec^nd quits [Ping timeout: 245 seconds]
08:55:19sec^nd (second) joins
09:36:46sec^nd quits [Ping timeout: 245 seconds]
09:39:16HackMii_ quits [Ping timeout: 245 seconds]
09:40:18HackMii_ (hacktheplanet) joins
09:41:40sec^nd (second) joins
09:59:41HackMii_ quits [Ping timeout: 245 seconds]
10:00:14HackMii_ (hacktheplanet) joins
10:00:46sec^nd quits [Remote host closed the connection]
10:01:13sec^nd (second) joins
11:26:07lennier1 quits [Ping timeout: 265 seconds]
11:27:25lennier1 (lennier1) joins
12:14:40Megame (Megame) joins
12:35:31HackMii_ quits [Ping timeout: 245 seconds]
12:38:03HackMii_ (hacktheplanet) joins
13:06:19Arcorann_ quits [Ping timeout: 250 seconds]
13:16:40Pomp joins
13:17:38Pomp quits [Remote host closed the connection]
13:30:43Megame quits [Client Quit]
14:31:14Jonimus quits [Ping timeout: 265 seconds]
14:56:06Jonimus joins
15:26:50omni quits [Read error: Connection reset by peer]
15:26:52omni joins
15:59:10hitgrr8 joins
16:29:42pampan7 joins
16:30:14pampan7 quits [Remote host closed the connection]
17:22:46rocketdive joins
17:26:19<rocketdive>hey! i saw a bunch of links of the geocities jp index were coming through on the grab crawler on warrior, i was wondering if anyone could queue links from fortunecity.ws to be archived, this archive has been around since 2012 but i've noticed a lot of the links aren't backed up and theres thousands of pages. since it's the same vein as this
17:26:19<rocketdive>geocities index just thought i'd ask!
17:26:53<rocketdive>i just worry that one day this site will just go offline, so i think it's extremely important to back up just incase. but i understand if nobody feels that way
17:44:26rocketdive quits [Remote host closed the connection]
18:31:25braindancer joins
19:07:17Megame (Megame) joins
19:34:55Island joins
20:15:35Iki joins
20:21:12onetruth joins
20:29:15<@OrIdow6>https://www.reuters.com/lifestyle/sports/brazilian-soccer-legend-pele-dies-82-his-daughter-says-2022-12-29/
20:33:41DopefishJustin quits [Remote host closed the connection]
20:50:12pie_[bnc] joins
20:50:19pie_[bnc] quits [Client Quit]
21:14:59Megame quits [Client Quit]
22:08:44<braindancer>Hello all! Got a question, can't google my way out of it. I am trying to start the warrior Docker container, and I see that it is getting stuck unable to communicate with github. I tried attaching to the container, but there's no "ping" or anything to troubleshoot connectivity, and I can't install anything because I can't sudo. Any ideas re: how to
22:08:44<braindancer>unblock? Thanks!
22:23:01sec^nd quits [Ping timeout: 245 seconds]
22:25:00sec^nd (second) joins
22:29:17AlsoTheTechRobo joins
22:29:33<AlsoTheTechRobo>braindancer: there's no sudo because you're running as root
22:29:37<AlsoTheTechRobo>try `apt install ping`
22:29:43<AlsoTheTechRobo>no
22:29:53<AlsoTheTechRobo>i dont remember what the packaage is called
22:30:50<AlsoTheTechRobo>but just plain `apt` should work
22:31:11TheTechRobo (TheTechRobo) joins
22:31:16BlueMaxima joins
22:31:21AlsoTheTechRobo quits [Remote host closed the connection]
22:32:41Pichu0102 quits [Ping timeout: 250 seconds]
22:33:01<braindancer>Unfortunately I am running as `warrior`, not root, and apt doesn't work :(
22:37:11sec^nd quits [Ping timeout: 245 seconds]
22:38:50spirit quits [Quit: Leaving]
22:39:58sec^nd (second) joins
22:55:34<h2ibot>JustAnotherArchivist edited Deathwatch (+148, /* 2022 */ Add Hipcast): https://wiki.archiveteam.org/?diff=49307&oldid=49304
22:55:44AlsoTheTechRobo joins
22:56:41spirit joins
22:56:58<AlsoTheTechRobo>braindancer: Oh, I thought that you were talking about the docker image for that specific project, not the warrior
22:59:34<braindancer>Maybe that is the better plan? I am using the generic warrior and specifying the project as an environment variable
23:00:10<AlsoTheTechRobo>Ah. If you don't want the web UI or the "ArchiveTeam's Choice" project, https://wiki.archiveteam.org/index.php/Running_Archive_Team_Projects_with_Docker shows you how to run the projects directly
23:01:33<AlsoTheTechRobo>The image address is usually the same as the GitHub repository, but with atdr.meo.ws instead of github.com
23:01:36<@JAA>The warrior is the 'set it up and forget about it' approach. The project images are the 'I want to archive as much as possible and don't mind having to mess with the setup constantly' approach.
23:01:44<AlsoTheTechRobo>^
23:02:15<braindancer>Thanks! I saw that page but it doesn't show the list of URLs for specific projects (I am looking for reddit atm).
23:02:25<AlsoTheTechRobo>(For example, for the github project, the source code is https://github.com/ArchiveTeam/github-grab and the docker image is atdr.meo.ws/archiveteam/github-grab )
23:03:00<braindancer>So I was using atdr.meo.ws/archiveteam/warrior-dockerfile
23:03:24<AlsoTheTechRobo>The github repo is listed under "Project source" in the table. For Reddit I think it's atdr.meo.ws/archiveteam/reddit-grab
23:03:34<braindancer>NVM, guessed it LOL. reddit-grab. Let me try that
23:04:42<AlsoTheTechRobo>You can use a higher concurrency when running raw Docker containers, too - up to 20 per container. (The software stops being reliable with more than 20 concurrent in one container.)
23:05:23<@arkiver>i see 55 pages of podcasts on hipcast - not sure if that is all
23:05:26<braindancer>Nice. Yes, I am trying to jack this up as much as I can; rolling it out on Kubernetes.
23:10:51<braindancer>Are there any projects that have no (or very relaxed) per-IP limits? I have a ton of bandwidth on a single IP and am trying to figure out how to best use it.
23:10:52<AlsoTheTechRobo>Maybe we should add a "Docker image" field to the infobox on wiki pages.
23:13:23<AlsoTheTechRobo>braindancer: I think the best way to max out an IP is to run multiple projects
23:14:06<braindancer>Gotcha. #hassle :D
23:15:18<AlsoTheTechRobo>Definitely make sure to run Telegram (but not at a very high concurrency) as that project needs tons of IPs, so the more the merrier
23:16:10<AlsoTheTechRobo>though I haven't been here all that much for the past two weeks or so, since I haven't been able to access my computer, so that may have changed
23:17:42<braindancer>Sounds good. Is there a quick/easy way to detect being banned by IP? Will it show up in the logs?
23:19:08<AlsoTheTechRobo>With the telegram project, ar.kiver is apparently working on better ban detection by checking a known good post (Telegram bans by pretending stuff doesn't exist)
23:19:15<AlsoTheTechRobo>Other than that, I have no idea
23:19:56<braindancer>I see. That's sneaky!
23:20:59<@arkiver>telegram is horrible when it comes to that yes
23:23:11<braindancer>Oh that reddit container is so much nicer. None of the git issues, and it goes to town way better than warrior.
23:25:42<AlsoTheTechRobo>I'm not sure why the git issues were occuring, but glad you fixed it
23:30:32spirit quits [Client Quit]
23:31:33hitgrr8 quits [Client Quit]
23:31:53<braindancer>Damn. With only 3 containers, Reddit told me to chill out :(
23:32:27<AlsoTheTechRobo>on 20 concurrent?
23:32:39<braindancer>Yup
23:33:17wyatt8740 quits [Read error: Connection reset by peer]
23:33:27<AlsoTheTechRobo>yeah, you'll get that with most websites
23:33:40<AlsoTheTechRobo>that's why a balanced ~~diet~~ set of projects is necessary
23:34:29wyatt8740 joins
23:34:33<braindancer>Haha. Yeah, the sugar rush was short-lived
23:38:14Arcorann_ joins
23:38:18wyatt8750 joins
23:38:59wyatt8740 quits [Ping timeout: 250 seconds]
23:42:05wyatt8750 quits [Remote host closed the connection]
23:44:38wyatt8740 joins
23:45:57wyatt8740 quits [Client Quit]
23:46:59wyatt8740 joins
23:51:54AngryBunny quits [Remote host closed the connection]
23:54:24AlsoTheTechRobo leaves