| 00:01:29 | | TheTechRobo quits [Read error: Connection reset by peer] |
| 00:01:29 | | ScenarioPlanet quits [Read error: Connection reset by peer] |
| 00:01:29 | | Pedrosso quits [Read error: Connection reset by peer] |
| 00:01:54 | | Pedrosso joins |
| 00:01:59 | | ScenarioPlanet (ScenarioPlanet) joins |
| 00:02:11 | | TheTechRobo (TheTechRobo) joins |
| 00:27:37 | | sec^nd quits [Remote host closed the connection] |
| 00:27:59 | | sec^nd (second) joins |
| 00:40:47 | | etnguyen03 quits [Client Quit] |
| 00:41:10 | | lunik1 quits [Ping timeout: 255 seconds] |
| 00:48:06 | | NIC007a83 quits [Client Quit] |
| 01:02:21 | | etnguyen03 (etnguyen03) joins |
| 01:20:50 | | Arcorann (Arcorann) joins |
| 01:45:17 | <plcp> | thuban: I'll ask him, brb soon™ |
| 02:26:38 | | etnguyen03 quits [Client Quit] |
| 02:45:36 | | etnguyen03 (etnguyen03) joins |
| 02:49:47 | | p666v joins |
| 02:51:55 | | p666v quits [Client Quit] |
| 03:20:41 | | DogsRNice joins |
| 03:27:44 | | icedice quits [Client Quit] |
| 03:59:53 | | pillopillo joins |
| 04:00:51 | | pillopillo quits [Client Quit] |
| 04:08:49 | <thuban> | plcp: cool, thanks! |
| 04:22:25 | | deadorbit quits [Client Quit] |
| 04:22:26 | | Unholy2361 quits [Client Quit] |
| 04:22:59 | | Unholy2361 (Unholy2361) joins |
| 04:25:53 | | SootBector quits [Remote host closed the connection] |
| 04:27:00 | | SootBector (SootBector) joins |
| 04:29:28 | | etnguyen03 quits [Client Quit] |
| 04:30:44 | | etnguyen03 (etnguyen03) joins |
| 04:48:30 | | kiryu (kiryu) joins |
| 04:50:27 | | midou quits [Remote host closed the connection] |
| 04:50:38 | | midou joins |
| 04:59:53 | | etnguyen03 quits [Remote host closed the connection] |
| 05:11:37 | | beastbg8 (beastbg8) joins |
| 05:29:28 | | datechnoman quits [Quit: The Lounge - https://thelounge.chat] |
| 05:36:45 | | datechnoman (datechnoman) joins |
| 06:16:10 | | DogsRNice quits [Read error: Connection reset by peer] |
| 07:05:03 | | Unholy2361 quits [Remote host closed the connection] |
| 07:06:10 | | Unholy2361 (Unholy2361) joins |
| 07:09:43 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 07:24:06 | | sec^nd quits [Ping timeout: 255 seconds] |
| 08:10:37 | | arjie joins |
| 08:17:28 | <arjie> | Hey, guys, I've got an instance of the Warrior running. What I was hoping to do is to target it at some URL and have it spider that domain from there. Is there a feasible way to configure it to do this? I assume in the best case this is something like: |
| 08:17:29 | <arjie> | 1. Run the tracker unofficial Docker container https://wiki.archiveteam.org/index.php/Dev/Tracker |
| 08:17:29 | <arjie> | 2. Figure out how to modify that to have a custom project file |
| 08:17:30 | <arjie> | 3. Point my Warrior at that tracker |
| 08:17:30 | <arjie> | But just in case I'm overthinking this is there a straightforward way for me to just do whatever smart rate-limited crawling that the Warrior applies to some particular URL and spider out of there? |
| 08:22:02 | <thuban> | arjie: you probably want to look at https://github.com/ArchiveTeam/grab-site/ instead |
| 08:22:37 | | Island quits [Read error: Connection reset by peer] |
| 08:23:43 | <arjie> | Oh that sounds _exactly_ like what I want. Is there something I can pair it with to submit the pages to the Internet Archive at archive.org as well? |
| 08:26:14 | <arjie> | Ah I've found https://gist.github.com/Asparagirl/6206247 that helps upload a WARC. I'm going to try this out. |
| 08:26:49 | <thuban> | arjie: you can upload the results to the internet archive yourself, but that won't make them available in the wayback machine; only whitelisted accounts are trusted sources for the wbm. |
| 08:27:53 | <thuban> | the archiveteam account is whitelisted, though, so if we crawl it with archivebot it will show up. care to share the domain? |
| 08:28:09 | <arjie> | Ah I see. For good reason, I suppose. I'll just leave the warrior running on auto then. I was hoping to spider pages that are part of the longer tail of Internet websites that don't get much SEO. |
| 08:29:14 | <arjie> | Sort of struck me when I saw this comment https://news.ycombinator.com/item?id=40020345 on the Hacker News post for http://ascii.textfiles.com/archives/5591 |
| 08:30:37 | <thuban> | well, feel free to make suggestions in #archivebot for full sites, or #// if you have big lists of individual pages |
| 08:31:06 | <arjie> | Okay, thank you! I imagine the usual ones like https://github.com/kagisearch/smallweb are already in there? |
| 08:33:59 | <thuban> | it's definitely been mentioned, let me see if we have it covered |
| 08:34:47 | <fireonlive> | we don't seem to have it in https://github.com/ArchiveTeam/urls-sources |
| 08:37:21 | <fireonlive> | welcome arjie :) |
| 08:37:29 | <thuban> | right, although it's been suggested. we have done a one-time capture of all the indexed feeds/homepages, though https://archive.fart.website/archivebot/viewer/?q=kagisearch |
| 08:38:42 | <arjie> | Thank you, fireonlive :) |
| 08:38:42 | <arjie> | Good to see it's already covered, thuban! |
| 09:00:03 | | Bleo182600 quits [Client Quit] |
| 09:01:22 | | Bleo182600 joins |
| 09:29:46 | | grid joins |
| 09:41:55 | | Larsenv quits [Quit: The Lounge - https://thelounge.chat] |
| 09:59:42 | | driib quits [Client Quit] |
| 10:01:36 | | tony joins |
| 10:05:53 | | driib (driib) joins |
| 10:22:56 | | bladem quits [Read error: Connection reset by peer] |
| 10:30:40 | | tony quits [Client Quit] |
| 11:16:07 | | lunik1 joins |
| 11:49:36 | | grid quits [Client Quit] |
| 12:11:59 | | eightthree quits [Ping timeout: 272 seconds] |
| 12:13:01 | | eightthree joins |
| 12:19:20 | | fuzzy8021 quits [Read error: Connection reset by peer] |
| 12:19:53 | | decky joins |
| 12:21:02 | | fuzzy8021 (fuzzy8021) joins |
| 12:22:45 | | decky_e quits [Ping timeout: 272 seconds] |
| 12:31:25 | | etnguyen03 (etnguyen03) joins |
| 12:53:28 | | Bleo182600 quits [Client Quit] |
| 12:54:08 | | Notrealname1234 (Notrealname1234) joins |
| 12:54:41 | | Bleo182600 joins |
| 12:57:24 | | Notrealname1234 quits [Client Quit] |
| 13:20:02 | | Jackster joins |
| 13:22:06 | <Jackster> | Anyone got a heritrix3 config that is reasonably fast at arching? Got a site with 2m urls and it is doing it at 0.25 per second atm. I don't fancy waiting 10 years :p |
| 13:22:32 | | etnguyen03 quits [Client Quit] |
| 13:25:13 | | sec^nd (second) joins |
| 14:04:45 | | pixel (pixel) joins |
| 14:06:37 | | Arcorann quits [Ping timeout: 272 seconds] |
| 14:10:38 | | grid joins |
| 14:30:22 | | Notrealname1234 (Notrealname1234) joins |
| 14:34:10 | | Notrealname1234 quits [Client Quit] |
| 14:41:03 | | nicholl joins |
| 14:42:42 | <nicholl> | I want imagsrc pics |
| 14:43:47 | | Jackster quits [Client Quit] |
| 14:53:16 | | decky_e joins |
| 14:55:16 | | decky quits [Ping timeout: 255 seconds] |
| 15:19:29 | | nicholl quits [Ping timeout: 265 seconds] |
| 15:22:59 | | etnguyen03 (etnguyen03) joins |
| 15:33:30 | | Guest quits [Ping timeout: 265 seconds] |
| 15:48:51 | | Guest joins |
| 16:08:01 | | blue_0000ff is now authenticated as blue_0000ff |
| 16:25:25 | | Wohlstand (Wohlstand) joins |
| 16:27:26 | | Bleo182600 quits [Client Quit] |
| 16:27:46 | | Bleo182600 joins |
| 16:39:55 | | DogsRNice joins |
| 16:43:10 | | knecht4 quits [Client Quit] |
| 16:46:02 | | knecht4 joins |
| 17:03:52 | | Deewiant quits [Remote host closed the connection] |
| 17:04:59 | | Deewiant (Deewiant) joins |
| 17:09:38 | | BearFortress quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
| 17:10:28 | | grid quits [Client Quit] |
| 17:15:00 | | etnguyen03 quits [Client Quit] |
| 17:22:13 | | grid joins |
| 18:04:38 | | etnguyen03 (etnguyen03) joins |
| 18:17:03 | | etnguyen03 quits [Client Quit] |
| 18:25:20 | | Wohlstand quits [Client Quit] |
| 18:33:14 | | etnguyen03 (etnguyen03) joins |
| 18:39:56 | | BearFortress joins |
| 18:49:10 | | Wohlstand (Wohlstand) joins |
| 18:51:48 | | Island joins |
| 19:02:05 | | ^ quits [Remote host closed the connection] |
| 19:02:20 | | ^ (^) joins |
| 19:09:58 | | zhongfu quits [Ping timeout: 255 seconds] |
| 19:10:37 | | lflare quits [Ping timeout: 272 seconds] |
| 19:21:02 | | zhongfu (zhongfu) joins |
| 19:40:28 | | grid quits [Client Quit] |
| 19:43:33 | | lflare (lflare) joins |
| 19:51:21 | | etnguyen03 quits [Client Quit] |
| 20:00:25 | | etnguyen03 (etnguyen03) joins |
| 20:04:18 | | zhongfu quits [Client Quit] |
| 20:07:12 | | zhongfu (zhongfu) joins |
| 20:22:25 | | etnguyen03 quits [Client Quit] |
| 20:23:55 | | jerm joins |
| 20:24:28 | | jerm quits [Client Quit] |
| 20:40:06 | | etnguyen03 (etnguyen03) joins |
| 21:00:00 | | tapos joins |
| 21:00:40 | | andrew7 (andrew) joins |
| 21:01:29 | | andrew quits [Killed (NickServ (GHOST command used by andrew7))] |
| 21:01:31 | | andrew7 is now known as andrew |
| 21:03:52 | | Unholy2361 quits [Client Quit] |
| 21:05:25 | | Unholy2361 (Unholy2361) joins |
| 21:24:28 | | andrew1 (andrew) joins |
| 21:26:19 | | andrew quits [Ping timeout: 255 seconds] |
| 21:26:19 | | andrew1 is now known as andrew |
| 22:26:09 | | BlueMaxima joins |
| 23:13:59 | | arjie quits [Client Quit] |
| 23:38:14 | | benjinsm joins |
| 23:41:41 | | benjins quits [Ping timeout: 272 seconds] |
| 23:50:05 | | Larsenv (Larsenv) joins |
| 23:50:05 | | Larsenv quits [Client Quit] |
| 23:51:39 | | atphoenix (atphoenix) joins |
| 23:51:45 | | Larsenv (Larsenv) joins |