00:01:29TheTechRobo quits [Read error: Connection reset by peer]
00:01:29ScenarioPlanet quits [Read error: Connection reset by peer]
00:01:29Pedrosso quits [Read error: Connection reset by peer]
00:01:54Pedrosso joins
00:01:59ScenarioPlanet (ScenarioPlanet) joins
00:02:11TheTechRobo (TheTechRobo) joins
00:27:37sec^nd quits [Remote host closed the connection]
00:27:59sec^nd (second) joins
00:40:47etnguyen03 quits [Client Quit]
00:41:10lunik1 quits [Ping timeout: 255 seconds]
00:48:06NIC007a83 quits [Client Quit]
01:02:21etnguyen03 (etnguyen03) joins
01:20:50Arcorann (Arcorann) joins
01:45:17<plcp>thuban: I'll ask him, brb soon™
02:26:38etnguyen03 quits [Client Quit]
02:45:36etnguyen03 (etnguyen03) joins
02:49:47p666v joins
02:51:55p666v quits [Client Quit]
03:20:41DogsRNice joins
03:27:44icedice quits [Client Quit]
03:59:53pillopillo joins
04:00:51pillopillo quits [Client Quit]
04:08:49<thuban>plcp: cool, thanks!
04:22:25deadorbit quits [Client Quit]
04:22:26Unholy2361 quits [Client Quit]
04:22:59Unholy2361 (Unholy2361) joins
04:25:53SootBector quits [Remote host closed the connection]
04:27:00SootBector (SootBector) joins
04:29:28etnguyen03 quits [Client Quit]
04:30:44etnguyen03 (etnguyen03) joins
04:48:30kiryu (kiryu) joins
04:50:27midou quits [Remote host closed the connection]
04:50:38midou joins
04:59:53etnguyen03 quits [Remote host closed the connection]
05:11:37beastbg8 (beastbg8) joins
05:29:28datechnoman quits [Quit: The Lounge - https://thelounge.chat]
05:36:45datechnoman (datechnoman) joins
06:16:10DogsRNice quits [Read error: Connection reset by peer]
07:05:03Unholy2361 quits [Remote host closed the connection]
07:06:10Unholy2361 (Unholy2361) joins
07:09:43BlueMaxima quits [Read error: Connection reset by peer]
07:24:06sec^nd quits [Ping timeout: 255 seconds]
08:10:37arjie joins
08:17:28<arjie>Hey, guys, I've got an instance of the Warrior running. What I was hoping to do is to target it at some URL and have it spider that domain from there. Is there a feasible way to configure it to do this? I assume in the best case this is something like:
08:17:29<arjie>1. Run the tracker unofficial Docker container https://wiki.archiveteam.org/index.php/Dev/Tracker
08:17:29<arjie>2. Figure out how to modify that to have a custom project file
08:17:30<arjie>3. Point my Warrior at that tracker
08:17:30<arjie>But just in case I'm overthinking this is there a straightforward way for me to just do whatever smart rate-limited crawling that the Warrior applies to some particular URL and spider out of there?
08:22:02<thuban>arjie: you probably want to look at https://github.com/ArchiveTeam/grab-site/ instead
08:22:37Island quits [Read error: Connection reset by peer]
08:23:43<arjie>Oh that sounds _exactly_ like what I want. Is there something I can pair it with to submit the pages to the Internet Archive at archive.org as well?
08:26:14<arjie>Ah I've found https://gist.github.com/Asparagirl/6206247 that helps upload a WARC. I'm going to try this out.
08:26:49<thuban>arjie: you can upload the results to the internet archive yourself, but that won't make them available in the wayback machine; only whitelisted accounts are trusted sources for the wbm.
08:27:53<thuban>the archiveteam account is whitelisted, though, so if we crawl it with archivebot it will show up. care to share the domain?
08:28:09<arjie>Ah I see. For good reason, I suppose. I'll just leave the warrior running on auto then. I was hoping to spider pages that are part of the longer tail of Internet websites that don't get much SEO.
08:29:14<arjie>Sort of struck me when I saw this comment https://news.ycombinator.com/item?id=40020345 on the Hacker News post for http://ascii.textfiles.com/archives/5591
08:30:37<thuban>well, feel free to make suggestions in #archivebot for full sites, or #// if you have big lists of individual pages
08:31:06<arjie>Okay, thank you! I imagine the usual ones like https://github.com/kagisearch/smallweb are already in there?
08:33:59<thuban>it's definitely been mentioned, let me see if we have it covered
08:34:47<fireonlive>we don't seem to have it in https://github.com/ArchiveTeam/urls-sources
08:37:21<fireonlive>welcome arjie :)
08:37:29<thuban>right, although it's been suggested. we have done a one-time capture of all the indexed feeds/homepages, though https://archive.fart.website/archivebot/viewer/?q=kagisearch
08:38:42<arjie>Thank you, fireonlive :)
08:38:42<arjie>Good to see it's already covered, thuban!
09:00:03Bleo182600 quits [Client Quit]
09:01:22Bleo182600 joins
09:29:46grid joins
09:41:55Larsenv quits [Quit: The Lounge - https://thelounge.chat]
09:59:42driib quits [Client Quit]
10:01:36tony joins
10:05:53driib (driib) joins
10:22:56bladem quits [Read error: Connection reset by peer]
10:30:40tony quits [Client Quit]
11:16:07lunik1 joins
11:49:36grid quits [Client Quit]
12:11:59eightthree quits [Ping timeout: 272 seconds]
12:13:01eightthree joins
12:19:20fuzzy8021 quits [Read error: Connection reset by peer]
12:19:53decky joins
12:21:02fuzzy8021 (fuzzy8021) joins
12:22:45decky_e quits [Ping timeout: 272 seconds]
12:31:25etnguyen03 (etnguyen03) joins
12:53:28Bleo182600 quits [Client Quit]
12:54:08Notrealname1234 (Notrealname1234) joins
12:54:41Bleo182600 joins
12:57:24Notrealname1234 quits [Client Quit]
13:20:02Jackster joins
13:22:06<Jackster>Anyone got a heritrix3 config that is reasonably fast at arching? Got a site with 2m urls and it is doing it at 0.25 per second atm. I don't fancy waiting 10 years :p
13:22:32etnguyen03 quits [Client Quit]
13:25:13sec^nd (second) joins
14:04:45pixel (pixel) joins
14:06:37Arcorann quits [Ping timeout: 272 seconds]
14:10:38grid joins
14:30:22Notrealname1234 (Notrealname1234) joins
14:34:10Notrealname1234 quits [Client Quit]
14:41:03nicholl joins
14:42:42<nicholl>I want imagsrc pics
14:43:47Jackster quits [Client Quit]
14:53:16decky_e joins
14:55:16decky quits [Ping timeout: 255 seconds]
15:19:29nicholl quits [Ping timeout: 265 seconds]
15:22:59etnguyen03 (etnguyen03) joins
15:33:30Guest quits [Ping timeout: 265 seconds]
15:48:51Guest joins
16:25:25Wohlstand (Wohlstand) joins
16:27:26Bleo182600 quits [Client Quit]
16:27:46Bleo182600 joins
16:39:55DogsRNice joins
16:43:10knecht4 quits [Client Quit]
16:46:02knecht4 joins
17:03:52Deewiant quits [Remote host closed the connection]
17:04:59Deewiant (Deewiant) joins
17:09:38BearFortress quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
17:10:28grid quits [Client Quit]
17:15:00etnguyen03 quits [Client Quit]
17:22:13grid joins
18:04:38etnguyen03 (etnguyen03) joins
18:17:03etnguyen03 quits [Client Quit]
18:25:20Wohlstand quits [Client Quit]
18:33:14etnguyen03 (etnguyen03) joins
18:39:56BearFortress joins
18:49:10Wohlstand (Wohlstand) joins
18:51:48Island joins
19:02:05^ quits [Remote host closed the connection]
19:02:20^ (^) joins
19:09:58zhongfu quits [Ping timeout: 255 seconds]
19:10:37lflare quits [Ping timeout: 272 seconds]
19:21:02zhongfu (zhongfu) joins
19:40:28grid quits [Client Quit]
19:43:33lflare (lflare) joins
19:51:21etnguyen03 quits [Client Quit]
20:00:25etnguyen03 (etnguyen03) joins
20:04:18zhongfu quits [Client Quit]
20:07:12zhongfu (zhongfu) joins
20:22:25etnguyen03 quits [Client Quit]
20:23:55jerm joins
20:24:28jerm quits [Client Quit]
20:40:06etnguyen03 (etnguyen03) joins
21:00:00tapos joins
21:00:40andrew7 (andrew) joins
21:01:29andrew quits [Killed (NickServ (GHOST command used by andrew7))]
21:01:31andrew7 is now known as andrew
21:03:52Unholy2361 quits [Client Quit]
21:05:25Unholy2361 (Unholy2361) joins
21:24:28andrew1 (andrew) joins
21:26:19andrew quits [Ping timeout: 255 seconds]
21:26:19andrew1 is now known as andrew
22:26:09BlueMaxima joins
23:13:59arjie quits [Client Quit]
23:38:14benjinsm joins
23:41:41benjins quits [Ping timeout: 272 seconds]
23:50:05Larsenv (Larsenv) joins
23:50:05Larsenv quits [Client Quit]
23:51:39atphoenix (atphoenix) joins
23:51:45Larsenv (Larsenv) joins