| 00:14:00 | | pabs quits [Client Quit] |
| 00:22:24 | | pabs (pabs) joins |
| 00:34:16 | <@OrIdow6> | arkiver: Here is what I have for the "framework" for clara.io so far - nearly complete, just need to add in users and README/update wget version/etc. - due to the situation with 2 different license files I've made it private but shared it with Arkiver2 https://github.com/OrIdow6/clara-with-framework |
| 00:36:41 | <@OrIdow6> | As an example "item" there is view_model:%7b%22url%22%3a%22https%3a%2f%2fclara%2eio%2fview%2fe4dba886%2d340d%2d4c4f%2dba14%2d55312b1452de%22%7d |
| 00:38:41 | <@OrIdow6> | The dodgiest thing I do is in Framework/queue_item.lua , fill_in_defaults, where I set the requests (the output of this function ultimately is an item in a get_urls list) to have a zero-length body and manually clear the headers relating to the body, in order to get wget not to deduplicate them itself |
| 00:42:56 | <@OrIdow6> | Wait, that may have an issue |
| 00:56:15 | <@OrIdow6> | Or I just can't read |
| 01:05:40 | <ivan> | signs of Twitter breaking more. force logged out, can't log back in. {"errors":[{"message":"Over capacity","code":130}]} |
| 01:06:53 | | fishingforsoup joins |
| 01:07:22 | | fishingforsoup_ joins |
| 01:08:35 | <ivan> | https://twitter.com/ZoeSchiffer/status/1606408842417512455 |
| 01:11:48 | | fishingforsoup quits [Ping timeout: 265 seconds] |
| 01:12:39 | <@JAA> | I thought they only had two datacentres? So if they're closing one and downsizing the other... |
| 01:13:47 | <@JAA> | At least that's what I vaguely remember reading a few years ago. |
| 01:15:37 | <ivan> | the login page DoSes the site by trying the request again in fairly rapid succession |
| 01:16:11 | <ivan> | where am I going to post now?? github? |
| 01:18:26 | | HackMii_ quits [Ping timeout: 245 seconds] |
| 01:21:00 | | HackMii_ (hacktheplanet) joins |
| 01:21:05 | <@arkiver> | OrIdow6: i see the repo |
| 01:27:26 | <@arkiver> | well let me know when it's ready and we can get it started |
| 01:28:43 | | fishingforsoup_ quits [Ping timeout: 265 seconds] |
| 01:30:47 | <@arkiver> | the framework seems like a pretty big step away from what is currently usually used |
| 01:52:14 | | systwi quits [Ping timeout: 264 seconds] |
| 02:09:05 | | eroc1990 quits [Remote host closed the connection] |
| 02:09:28 | | eroc1990 (eroc1990) joins |
| 02:31:25 | <flashfire42> | Well Obscure Gamers just died again |
| 02:37:29 | | AngryBunny joins |
| 02:43:21 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 03:06:00 | | systwi (systwi) joins |
| 03:32:25 | | TheTechRobo68 joins |
| 03:32:29 | | TheTechRobo68 is now known as AlsoTheTechRobo |
| 03:32:44 | <AlsoTheTechRobo> | Did anything happen with LGTM.com? |
| 03:32:51 | <AlsoTheTechRobo> | Looks like it's closed. |
| 03:33:29 | <AlsoTheTechRobo> | Also, any news on Clara.io? |
| 03:38:02 | <AlsoTheTechRobo> | (I'll check logs later, wish my desktop was available) |
| 03:38:05 | | AlsoTheTechRobo quits [Remote host closed the connection] |
| 03:51:34 | | monoxane quits [Quit: Ping timeout (120 seconds)] |
| 03:51:48 | | monoxane (monoxane) joins |
| 05:10:23 | | wyatt8740 joins |
| 05:11:44 | | wyatt8750 quits [Read error: Connection reset by peer] |
| 05:12:59 | | wyatt8750 joins |
| 05:14:51 | | wyatt8740 quits [Ping timeout: 250 seconds] |
| 05:17:27 | | wyatt8750 quits [Ping timeout: 250 seconds] |
| 05:17:39 | | wyatt8740 joins |
| 05:25:41 | | wyatt8740 quits [Ping timeout: 250 seconds] |
| 05:35:33 | | wyatt8740 joins |
| 06:03:10 | | hackbug quits [Read error: Connection reset by peer] |
| 06:18:25 | | hackbug (hackbug) joins |
| 06:38:15 | | G4te_Keep3r349 quits [Client Quit] |
| 06:49:24 | | G4te_Keep3r349 joins |
| 06:49:52 | <h2ibot> | OrIdow6 uploaded File:Clara IO logo.png (Logo of clara.io shutting down end of 2022): https://wiki.archiveteam.org/?title=File%3AClara%20IO%20logo.png |
| 06:50:52 | <h2ibot> | OrIdow6 edited Clara.io (+27, Add logo): https://wiki.archiveteam.org/?diff=49306&oldid=49214 |
| 07:22:22 | | Island quits [Read error: Connection reset by peer] |
| 07:47:36 | | Ketchup901 quits [Ping timeout: 245 seconds] |
| 07:47:58 | | Ketchup901 (Ketchup901) joins |
| 07:53:29 | <@OrIdow6> | arkiver: It's an experiment |
| 07:54:29 | <@OrIdow6> | The main thing I intend this to maybe be used for is sites like Wikidot where there are many tens of different page types that need to be handled individually |
| 07:54:35 | <@OrIdow6> | Makes it more managable |
| 07:54:41 | <@OrIdow6> | But it is more tedious to write |
| 07:54:57 | <@OrIdow6> | I hope it would make it more managable, that is |
| 07:55:57 | | TheTechRobo quits [Read error: Connection reset by peer] |
| 07:56:39 | <@OrIdow6> | Anyhow it's ready (unless there are further pings), needs a backfeed key at Framework/backfeed.lua line 44 |
| 07:58:42 | <@OrIdow6> | I haven't done discovery yet but it should be able to spider around quite a bit starting with user_main:%7b%22url%22%3a%22https%3a%2f%2fclara%2eio%2fuser%2fbhouston%22%7d |
| 08:20:56 | | sec^nd quits [Ping timeout: 245 seconds] |
| 08:25:49 | | sec^nd (second) joins |
| 08:54:16 | | sec^nd quits [Ping timeout: 245 seconds] |
| 08:55:19 | | sec^nd (second) joins |
| 09:36:46 | | sec^nd quits [Ping timeout: 245 seconds] |
| 09:39:16 | | HackMii_ quits [Ping timeout: 245 seconds] |
| 09:40:18 | | HackMii_ (hacktheplanet) joins |
| 09:41:40 | | sec^nd (second) joins |
| 09:59:41 | | HackMii_ quits [Ping timeout: 245 seconds] |
| 10:00:14 | | HackMii_ (hacktheplanet) joins |
| 10:00:46 | | sec^nd quits [Remote host closed the connection] |
| 10:01:13 | | sec^nd (second) joins |
| 11:26:07 | | lennier1 quits [Ping timeout: 265 seconds] |
| 11:27:25 | | lennier1 (lennier1) joins |
| 12:14:40 | | Megame (Megame) joins |
| 12:35:31 | | HackMii_ quits [Ping timeout: 245 seconds] |
| 12:38:03 | | HackMii_ (hacktheplanet) joins |
| 13:06:19 | | Arcorann_ quits [Ping timeout: 250 seconds] |
| 13:16:40 | | Pomp joins |
| 13:17:38 | | Pomp quits [Remote host closed the connection] |
| 13:30:43 | | Megame quits [Client Quit] |
| 14:31:14 | | Jonimus quits [Ping timeout: 265 seconds] |
| 14:56:06 | | Jonimus joins |
| 15:26:50 | | omni quits [Read error: Connection reset by peer] |
| 15:26:52 | | omni joins |
| 15:59:10 | | hitgrr8 joins |
| 16:29:42 | | pampan7 joins |
| 16:30:14 | | pampan7 quits [Remote host closed the connection] |
| 17:22:46 | | rocketdive joins |
| 17:26:19 | <rocketdive> | hey! i saw a bunch of links of the geocities jp index were coming through on the grab crawler on warrior, i was wondering if anyone could queue links from fortunecity.ws to be archived, this archive has been around since 2012 but i've noticed a lot of the links aren't backed up and theres thousands of pages. since it's the same vein as this |
| 17:26:19 | <rocketdive> | geocities index just thought i'd ask! |
| 17:26:53 | <rocketdive> | i just worry that one day this site will just go offline, so i think it's extremely important to back up just incase. but i understand if nobody feels that way |
| 17:44:26 | | rocketdive quits [Remote host closed the connection] |
| 18:31:25 | | braindancer joins |
| 19:07:17 | | Megame (Megame) joins |
| 19:34:55 | | Island joins |
| 20:15:35 | | Iki joins |
| 20:21:12 | | onetruth joins |
| 20:29:15 | <@OrIdow6> | https://www.reuters.com/lifestyle/sports/brazilian-soccer-legend-pele-dies-82-his-daughter-says-2022-12-29/ |
| 20:33:41 | | DopefishJustin quits [Remote host closed the connection] |
| 20:50:12 | | pie_[bnc] joins |
| 20:50:19 | | pie_[bnc] quits [Client Quit] |
| 21:14:59 | | Megame quits [Client Quit] |
| 22:08:44 | <braindancer> | Hello all! Got a question, can't google my way out of it. I am trying to start the warrior Docker container, and I see that it is getting stuck unable to communicate with github. I tried attaching to the container, but there's no "ping" or anything to troubleshoot connectivity, and I can't install anything because I can't sudo. Any ideas re: how to |
| 22:08:44 | <braindancer> | unblock? Thanks! |
| 22:23:01 | | sec^nd quits [Ping timeout: 245 seconds] |
| 22:25:00 | | sec^nd (second) joins |
| 22:29:17 | | AlsoTheTechRobo joins |
| 22:29:33 | <AlsoTheTechRobo> | braindancer: there's no sudo because you're running as root |
| 22:29:37 | <AlsoTheTechRobo> | try `apt install ping` |
| 22:29:43 | <AlsoTheTechRobo> | no |
| 22:29:53 | <AlsoTheTechRobo> | i dont remember what the packaage is called |
| 22:30:50 | <AlsoTheTechRobo> | but just plain `apt` should work |
| 22:31:11 | | TheTechRobo (TheTechRobo) joins |
| 22:31:16 | | BlueMaxima joins |
| 22:31:21 | | AlsoTheTechRobo quits [Remote host closed the connection] |
| 22:32:41 | | Pichu0102 quits [Ping timeout: 250 seconds] |
| 22:33:01 | <braindancer> | Unfortunately I am running as `warrior`, not root, and apt doesn't work :( |
| 22:37:11 | | sec^nd quits [Ping timeout: 245 seconds] |
| 22:38:50 | | spirit quits [Quit: Leaving] |
| 22:39:58 | | sec^nd (second) joins |
| 22:55:34 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+148, /* 2022 */ Add Hipcast): https://wiki.archiveteam.org/?diff=49307&oldid=49304 |
| 22:55:44 | | AlsoTheTechRobo joins |
| 22:56:41 | | spirit joins |
| 22:56:58 | <AlsoTheTechRobo> | braindancer: Oh, I thought that you were talking about the docker image for that specific project, not the warrior |
| 22:59:34 | <braindancer> | Maybe that is the better plan? I am using the generic warrior and specifying the project as an environment variable |
| 23:00:10 | <AlsoTheTechRobo> | Ah. If you don't want the web UI or the "ArchiveTeam's Choice" project, https://wiki.archiveteam.org/index.php/Running_Archive_Team_Projects_with_Docker shows you how to run the projects directly |
| 23:01:33 | <AlsoTheTechRobo> | The image address is usually the same as the GitHub repository, but with atdr.meo.ws instead of github.com |
| 23:01:36 | <@JAA> | The warrior is the 'set it up and forget about it' approach. The project images are the 'I want to archive as much as possible and don't mind having to mess with the setup constantly' approach. |
| 23:01:44 | <AlsoTheTechRobo> | ^ |
| 23:02:15 | <braindancer> | Thanks! I saw that page but it doesn't show the list of URLs for specific projects (I am looking for reddit atm). |
| 23:02:25 | <AlsoTheTechRobo> | (For example, for the github project, the source code is https://github.com/ArchiveTeam/github-grab and the docker image is atdr.meo.ws/archiveteam/github-grab ) |
| 23:03:00 | <braindancer> | So I was using atdr.meo.ws/archiveteam/warrior-dockerfile |
| 23:03:24 | <AlsoTheTechRobo> | The github repo is listed under "Project source" in the table. For Reddit I think it's atdr.meo.ws/archiveteam/reddit-grab |
| 23:03:34 | <braindancer> | NVM, guessed it LOL. reddit-grab. Let me try that |
| 23:04:42 | <AlsoTheTechRobo> | You can use a higher concurrency when running raw Docker containers, too - up to 20 per container. (The software stops being reliable with more than 20 concurrent in one container.) |
| 23:05:23 | <@arkiver> | i see 55 pages of podcasts on hipcast - not sure if that is all |
| 23:05:26 | <braindancer> | Nice. Yes, I am trying to jack this up as much as I can; rolling it out on Kubernetes. |
| 23:10:51 | <braindancer> | Are there any projects that have no (or very relaxed) per-IP limits? I have a ton of bandwidth on a single IP and am trying to figure out how to best use it. |
| 23:10:52 | <AlsoTheTechRobo> | Maybe we should add a "Docker image" field to the infobox on wiki pages. |
| 23:13:23 | <AlsoTheTechRobo> | braindancer: I think the best way to max out an IP is to run multiple projects |
| 23:14:06 | <braindancer> | Gotcha. #hassle :D |
| 23:15:18 | <AlsoTheTechRobo> | Definitely make sure to run Telegram (but not at a very high concurrency) as that project needs tons of IPs, so the more the merrier |
| 23:16:10 | <AlsoTheTechRobo> | though I haven't been here all that much for the past two weeks or so, since I haven't been able to access my computer, so that may have changed |
| 23:17:42 | <braindancer> | Sounds good. Is there a quick/easy way to detect being banned by IP? Will it show up in the logs? |
| 23:19:08 | <AlsoTheTechRobo> | With the telegram project, ar.kiver is apparently working on better ban detection by checking a known good post (Telegram bans by pretending stuff doesn't exist) |
| 23:19:15 | <AlsoTheTechRobo> | Other than that, I have no idea |
| 23:19:56 | <braindancer> | I see. That's sneaky! |
| 23:20:59 | <@arkiver> | telegram is horrible when it comes to that yes |
| 23:23:11 | <braindancer> | Oh that reddit container is so much nicer. None of the git issues, and it goes to town way better than warrior. |
| 23:25:42 | <AlsoTheTechRobo> | I'm not sure why the git issues were occuring, but glad you fixed it |
| 23:30:32 | | spirit quits [Client Quit] |
| 23:31:33 | | hitgrr8 quits [Client Quit] |
| 23:31:53 | <braindancer> | Damn. With only 3 containers, Reddit told me to chill out :( |
| 23:32:27 | <AlsoTheTechRobo> | on 20 concurrent? |
| 23:32:39 | <braindancer> | Yup |
| 23:33:17 | | wyatt8740 quits [Read error: Connection reset by peer] |
| 23:33:27 | <AlsoTheTechRobo> | yeah, you'll get that with most websites |
| 23:33:40 | <AlsoTheTechRobo> | that's why a balanced ~~diet~~ set of projects is necessary |
| 23:34:29 | | wyatt8740 joins |
| 23:34:33 | <braindancer> | Haha. Yeah, the sugar rush was short-lived |
| 23:38:14 | | Arcorann_ joins |
| 23:38:18 | | wyatt8750 joins |
| 23:38:59 | | wyatt8740 quits [Ping timeout: 250 seconds] |
| 23:42:05 | | wyatt8750 quits [Remote host closed the connection] |
| 23:44:38 | | wyatt8740 joins |
| 23:45:57 | | wyatt8740 quits [Client Quit] |
| 23:46:59 | | wyatt8740 joins |
| 23:51:54 | | AngryBunny quits [Remote host closed the connection] |
| 23:54:24 | | AlsoTheTechRobo leaves |