| 00:03:22 | | AlsoHP_Archivist joins |
| 00:06:25 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 00:06:38 | | HP_Archivist (HP_Archivist) joins |
| 00:09:16 | | AlsoHP_Archivist quits [Ping timeout: 250 seconds] |
| 00:10:27 | <Ryz> | !delay bfarujiu2orkbdx4l27roivbp 0 200 |
| 00:10:29 | <Ryz> | Oops |
| 00:12:27 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 00:12:57 | | fuzzy8021 quits [Killed (NickServ (GHOST command used by fuzzy802!~fuzzy8021@173-224-26-244.ptcnet.net))] |
| 00:13:02 | | fuzzy8021 (fuzzy8021) joins |
| 00:13:11 | | BlueMaxima joins |
| 00:14:02 | | @dxrt quits [Ping timeout: 250 seconds] |
| 00:14:39 | | ave quits [Remote host closed the connection] |
| 00:14:39 | | lun4 quits [Client Quit] |
| 00:14:39 | | linuxgemini quits [Remote host closed the connection] |
| 00:14:55 | | lun4 (lun4) joins |
| 00:14:56 | | ave (ave) joins |
| 00:15:00 | | linuxgemini (linuxgemini) joins |
| 00:15:56 | | dxrt joins |
| 00:15:58 | | dxrt is now authenticated as dxrt |
| 00:15:58 | | dxrt quits [Changing host] |
| 00:15:58 | | dxrt (dxrt) joins |
| 00:15:58 | | @ChanServ sets mode: +o dxrt |
| 00:31:50 | | @OrIdow6 quits [Remote host closed the connection] |
| 00:53:40 | | CittyKat (CittyKat) joins |
| 01:02:17 | | CittyKat quits [Remote host closed the connection] |
| 01:02:34 | | dm4v quits [Ping timeout: 250 seconds] |
| 01:04:30 | | dm4v joins |
| 01:04:33 | | dm4v is now authenticated as dm4v |
| 01:04:33 | | dm4v quits [Changing host] |
| 01:04:33 | | dm4v (dm4v) joins |
| 01:17:55 | | fungisimp joins |
| 01:19:45 | | fungisimp quits [Remote host closed the connection] |
| 01:20:46 | | OrIdow6 (OrIdow6) joins |
| 01:20:46 | | @ChanServ sets mode: +o OrIdow6 |
| 01:25:04 | | Mineroboter joins |
| 01:26:09 | | Mineroboter_ quits [Ping timeout: 258 seconds] |
| 02:30:32 | | Matthww quits [Ping timeout: 250 seconds] |
| 02:34:01 | | Matthww joins |
| 02:41:40 | | sliccricc quits [Ping timeout: 258 seconds] |
| 03:24:15 | | Iki quits [Ping timeout: 244 seconds] |
| 03:36:23 | | DogsRNice quits [Read error: Connection reset by peer] |
| 03:38:01 | | Matthww quits [Ping timeout: 258 seconds] |
| 03:40:04 | | Matthww joins |
| 03:45:37 | | Matthww3 joins |
| 03:47:40 | | Matthww quits [Ping timeout: 250 seconds] |
| 03:47:40 | | Matthww3 is now known as Matthww |
| 03:56:35 | | qw3rty_ joins |
| 04:00:15 | | qw3rty quits [Ping timeout: 258 seconds] |
| 04:05:17 | | etnguyen03 quits [Client Quit] |
| 04:43:09 | | lennier1 quits [Read error: Connection reset by peer] |
| 04:43:23 | | lennier1 (lennier1) joins |
| 04:51:48 | | Wayward quits [Ping timeout: 250 seconds] |
| 05:36:45 | | sgettel joins |
| 05:37:21 | | sgettel quits [Remote host closed the connection] |
| 05:58:17 | <@JAA> | ArchiveBox released a 'good karma kit' Docker Compose thingy a few days ago which includes our warrior image: https://github.com/ArchiveBox/good-karma-kit |
| 06:20:02 | <thuban> | a nice idea; however, due to the human costs of electricity, it is important to consider the expected benefits of providing each service (see https://www.gwern.net/Charity) |
| 07:58:26 | <purplebot> | Bandcamp edited by JesseW (+185, link to archivebot job of artist_index) just now -- https://www.archiveteam.org/?diff=46563&oldid=46548 |
| 08:03:26 | <purplebot> | Coronavirus edited by Gridkr (+478, /* Miscellaneous */) just now -- https://www.archiveteam.org/?diff=46564&oldid=46271 |
| 08:04:26 | <purplebot> | Chromebot edited by Iki (+176, +info on Wayback Machine ingestion …) just now -- https://www.archiveteam.org/?diff=46565&oldid=42865 |
| 08:10:50 | | spirit joins |
| 08:32:03 | | @arkiver quits [Quit: .] |
| 08:32:16 | | arkiver (arkiver) joins |
| 08:32:16 | | @ChanServ sets mode: +o arkiver |
| 08:46:17 | | BlueMaxima quits [Client Quit] |
| 09:01:20 | | Zopolis4 (Zopolis4) joins |
| 09:49:10 | <LeighR> | "Does target site appear to support and/or prefer IPv6" looks like something to add to whatever checklist you guys use when determining grabber running advice |
| 09:50:21 | <LeighR> | because that increased the density of IP-ban-safe grabbers on a cheap Hetzner VM quite a bit |
| 09:50:36 | <LeighR> | (at least on the yahooanswers project) |
| 09:50:43 | <@HCross> | it depends tho |
| 09:50:50 | <@HCross> | on how the site blocks, if they kill off a /64 at a time |
| 09:50:53 | <@HCross> | or a /128 |
| 09:51:39 | <LeighR> | knock wood, it appears that Y!A does not block /64s (or I'm still under their threshold) |
| 09:53:50 | | shoghicp quits [Ping timeout: 250 seconds] |
| 09:54:55 | <LeighR> | I have 400 single-concurrency containers running on one VM spread across 10 /80s, but of course, at current rate-limiting, they're each doing an average of 6 jobs/hr |
| 09:57:28 | | shoghicp (shoghicp) joins |
| 10:20:54 | | shoghicp quits [Ping timeout: 258 seconds] |
| 10:24:51 | | shoghicp (shoghicp) joins |
| 10:44:57 | | mrfooooo joins |
| 11:04:20 | | Hackerpcs quits [Quit: Hackerpcs] |
| 11:17:19 | | Matthww2 joins |
| 11:18:20 | | Matthww quits [Ping timeout: 250 seconds] |
| 11:18:20 | | Matthww2 is now known as Matthww |
| 11:21:12 | | Zopolis4 quits [Remote host closed the connection] |
| 11:22:29 | | spirit quits [Client Quit] |
| 11:27:23 | | Matthww4 joins |
| 11:28:18 | | Matthww quits [Ping timeout: 250 seconds] |
| 11:28:18 | | Matthww4 is now known as Matthww |
| 11:34:50 | | Hackerpcs (Hackerpcs) joins |
| 11:36:27 | | Matthww4 joins |
| 11:38:42 | | Matthww quits [Ping timeout: 250 seconds] |
| 11:38:42 | | Matthww4 is now known as Matthww |
| 11:41:15 | | VerifiedJ quits [Client Quit] |
| 11:41:50 | | Hackerpcs quits [Client Quit] |
| 11:47:08 | | Hackerpcs (Hackerpcs) joins |
| 11:47:57 | | VerifiedJ (VerifiedJ) joins |
| 11:49:40 | | Matthww3 joins |
| 11:50:24 | | Matthww quits [Ping timeout: 250 seconds] |
| 11:50:24 | | Matthww3 is now known as Matthww |
| 11:55:02 | | HP_Archivist quits [Read error: Connection reset by peer] |
| 12:17:15 | | Matthww3 joins |
| 12:19:26 | | Matthww quits [Ping timeout: 250 seconds] |
| 12:19:26 | | Matthww3 is now known as Matthww |
| 13:20:29 | | Sylirana quits [Ping timeout: 244 seconds] |
| 13:20:56 | | Sylirana (Sylirana) joins |
| 13:26:36 | | Iki joins |
| 15:13:38 | | Arcorann (Arcorann) joins |
| 15:51:23 | | Mineroboter quits [Client Quit] |
| 15:53:42 | | Mineroboter joins |
| 16:02:37 | | onetruth joins |
| 16:16:38 | | Arcorann quits [Ping timeout: 258 seconds] |
| 16:20:06 | | etnguyen03 (etnguyen03) joins |
| 16:23:54 | | Iki quits [Ping timeout: 244 seconds] |
| 16:30:49 | | Iki joins |
| 17:02:22 | | Wayward (wayward) joins |
| 17:18:08 | | ragu quits [Read error: Connection reset by peer] |
| 17:34:27 | | DogsRNice (Webuser299) joins |
| 18:27:26 | <purplebot> | ArchiveBot/Alternative media (political left)/list edited by Iki (+114, +urls. Some already-saved, some …) just now -- https://www.archiveteam.org/?diff=46566&oldid=38980 |
| 18:28:11 | <@JAA> | Iki: FYI, these ArchiveBot/* pages aren't updated anymore and will be migrated to something else soon™. |
| 18:49:59 | <mgrandi> | Oh is that a handy manually curated list of what has been archived recently? |
| 18:56:34 | <masterX244> | had a few stupoid bugs due to site differences in my TM-exchange discovery crawler, capturing the last part of data and then once i got it its sorting and cross-checking time |
| 19:23:05 | | superkuh quits [Quit: the neuronal action potential is an electrical manipulation of reversible abrupt phase changes in the lipid bilayer] |
| 19:24:49 | | hooway joins |
| 20:09:41 | | LeighR quits [Ping timeout: 244 seconds] |
| 20:10:28 | | Barto quits [Ping timeout: 250 seconds] |
| 20:10:41 | | Barto (Barto) joins |
| 20:18:05 | | @EggplantN quits [Quit: Ping timeout (120 seconds)] |
| 20:18:23 | | EggplantN joins |
| 20:18:32 | | EggplantN is now authenticated as EggplantN |
| 20:18:32 | | EggplantN quits [Changing host] |
| 20:18:32 | | EggplantN (EggplantN) joins |
| 20:18:32 | | @ChanServ sets mode: +o EggplantN |
| 20:21:10 | | LeighR (LeighR) joins |
| 21:14:13 | | hooway quits [Client Quit] |
| 22:01:39 | | LeighR quits [Client Quit] |
| 22:27:57 | | user (user) joins |
| 22:28:18 | <user> | Hi, all. New to IRC, so tell me if I break protocol. |
| 22:30:44 | <user> | I wanted to ask why some users are able to contribute so much, while my contributions are far and between. This has been my experience the last few days with archiving reddit and Yahoo! Answers. My internet connection is decent, yet I don't upload nearly as much as HCross or CCC3. |
| 22:31:04 | <AK> | Generally that will be related to the number of workers people are running |
| 22:31:28 | <AK> | Some of the big people run hundreds (Or thousands) of concurrent workers |
| 22:31:35 | <AK> | So we just get more assigned to us |
| 22:31:49 | <AK> | The tracker gives out tasks in random, so sometimes it's just being lucky or unlucky too |
| 22:32:01 | <user> | From different IPs, if I understand correctly? |
| 22:32:25 | <user> | As in, I can't compete with HCross or CCC3 if I use only one connection? |
| 22:32:42 | <user> | (I don't mean compete in a rivalry sense - just trying to contribute) |
| 22:34:40 | <user> | Also, is there a way to run two or more projects at the same time on one machine and IP? I run the Warrior via Docker and it lets me choose to work on one project at a time. How can I scrape reddit AND Yahoo! Answers? |
| 22:34:57 | | pcr leaves [Error from remote client] |
| 22:36:04 | | pcr joins |
| 22:48:01 | | Stilett0 quits [Ping timeout: 258 seconds] |
| 22:50:32 | <Iki> | user: You can run one project per warrior. However, I believe you can run multiple, separate warriors and run different projects on each |
| 23:26:47 | | HP_Archivist (HP_Archivist) joins |
| 23:28:32 | <atphoenix> | you can definitely run multiple warriors. Each warrior can only run one project at a time. If you use the docker images, you can have each docker image run a different project. |
| 23:29:13 | <atphoenix> | how many you can run depends on your available bandwidth and number of public IP addresses, and the characteristics of the projects you are trying to run. |
| 23:32:26 | <atphoenix> | some are IP constrained, so benefit from more unique IPs. Some projects are bandwidth heavy, so running too many workers at once will use up all your bandwidth and make each job you are running take longer. |
| 23:32:26 | | m0nika quits [Remote host closed the connection] |
| 23:32:38 | <atphoenix> | it's a balancing act overall |
| 23:36:43 | | m0nika (m0nika) joins |
| 23:52:07 | <@JAA> | Some projects use a lot of local disk for temporary storage, so you quickly run out of disk space if you run too many. Some require a lot of CPU. Some require a lot of RAM. Etc. |
| 23:52:15 | <user> | Thanks, atphoenix! I'm new to Docker and followed the instructions for running one warrior with the command: |
| 23:52:20 | <user> | sudo docker run -d --name archiveteam-warrior --label=com.centurylinklabs.watchtower.enable=true --restart=unless-stopped -p 8001:8001 atdr.meo.ws/archiveteam/warrior-dockerfile |
| 23:52:40 | <@JAA> | If you want to 'set it and forget it', it's probably best to just run one warrior at the default settings with the auto project. |
| 23:52:41 | <user> | To launch more warriros, do I repeat the same command with different --name arguments? |
| 23:53:33 | <@JAA> | If you don't mind juggling resources for each project, you likely want to run the project containers, not the warrior. |
| 23:55:18 | <user> | I can give at least a few hundred GB for the warriors and will be able to monitor the resourses the warriors take up. I don't want to 'set it and forget it', at least not for now. I'm OK with checking on it a few times a day and tweaking things. |
| 23:55:36 | <@JAA> | :-) |
| 23:55:38 | <user> | Thanks for the advice, I'll read up on how to run separate containers for each project |
| 23:59:18 | <user> | By the way, is there hope to archive all of Yahoo! Answers? |
| 23:59:24 | <user> | before May 4th |