| 00:38:50 | pabs quits [Remote host closed the connection] | |
| 00:39:39 | pabs (pabs) joins | |
| 01:33:58 | <pabs> | is AB capable of transforming the URLs that it finds before downloading them and finding more URLs in them? |
| 01:34:02 | <pabs> | alternatively, what libraries/frameworks do folks usually use for writing custom crawlers to get URLs for AB !ao < ? |
| 01:34:25 | <pabs> | (I have a site where the links are all broken, but if you transform them then they work) |
| 01:37:33 | <pokechu22> | I don't think archivebot can do that, unfortunately. I sometimes use `$$("a").map(it => it.href).join("\n")` in a browser as a very crappy way to get all the links on a given page (you can do more advanced stuff like `$$("a[href*=\"/watch\"]")` too) but that requires further postprocessing to be useful |
| 01:37:45 | <pokechu22> | (and doesn't scale at all) |
| 01:37:46 | BlueMaxima joins | |
| 01:38:56 | <pokechu22> | What site is it? Might be easier to make suggestions with examples |
| 01:40:00 | fishingforsoup__ joins | |
| 01:42:10 | <@JAA> | AB does not have URL rewriting, no. It's been on my wishlist for a while. |
| 01:42:25 | <@JAA> | I typically do such things with curl + grep/sed/awk/... |
| 01:43:19 | fishingforsoup_ quits [Ping timeout: 252 seconds] | |
| 01:43:33 | <pabs> | pokechu22: http://phase.ckut.ca/ - I'm working on anarcat's request from a few days ago |
| 01:44:14 | <pabs> | it need to be a site-recursive crawl, with URL modifications |
| 01:44:29 | <pabs> | perhaps I'll use one of the Python libs |
| 01:45:01 | <pokechu22> | example of a page with broken links? |
| 01:46:14 | <pokechu22> | oh, http://phase.ckut.ca/cgi-bin/ckut-grid.pl -> http://ckut.ca/c/en/oldgrid -> https://ckut.ca/c/en/oldgrid -> 404 when it should be http://phase.ckut.ca/c/en/oldgrid |
| 01:46:15 | <pabs> | in particular http://phase.ckut.ca/homeless/, anything that links to http://ckut.ca/homeless/.* should be translated to http://phase.ckut.ca/.* |
| 01:46:27 | <pabs> | yeah, also the front page |
| 01:47:16 | <pokechu22> | Within http://phase.ckut.ca/c/en/oldgrid things seem to be OK though |
| 01:47:39 | <pabs> | yeah, some parts of the site use relative links I think, others (broken) absolute ones |
| 01:47:58 | <pokechu22> | Yeah, that part looks like it'd be a pain to deal with :/ |
| 01:48:46 | <@JAA> | It could be done with a wpull plugin I think, but that's not particularly pleasant. |
| 01:49:26 | <pabs> | maybe its time I learned scrapy, I don't think it is a very large site |
| 01:54:36 | umgr036 quits [Remote host closed the connection] | |
| 01:54:39 | umgr036 joins | |
| 02:02:50 | <@JAA> | wget-at with a Lua script is another option. |
| 02:06:08 | Ketchup901 quits [Ping timeout: 276 seconds] | |
| 02:06:45 | Ketchup901 (Ketchup901) joins | |
| 02:18:20 | user_ quits [Ping timeout: 252 seconds] | |
| 02:18:40 | gazorpazorp (gazorpazorp) joins | |
| 02:47:45 | TheTechRobo quits [Remote host closed the connection] | |
| 02:49:25 | TheTechRobo (TheTechRobo) joins | |
| 04:41:14 | Ketchup901 quits [Remote host closed the connection] | |
| 04:41:49 | Ketchup901 (Ketchup901) joins | |
| 04:47:20 | <pabs> | I managed to do it with scrapy, bit hacky |
| 05:44:05 | BlueMaxima quits [Client Quit] | |
| 05:47:54 | Island quits [Read error: Connection reset by peer] | |
| 06:26:34 | Matthww1 quits [Ping timeout: 252 seconds] | |
| 06:29:30 | Matthww1 joins | |
| 06:30:39 | treora quits [Remote host closed the connection] | |
| 06:30:40 | treora joins | |
| 06:47:35 | lennier1 quits [Client Quit] | |
| 06:48:24 | lennier1 (lennier1) joins | |
| 07:25:47 | Ketchup901 quits [Remote host closed the connection] | |
| 07:27:34 | Ketchup901 (Ketchup901) joins | |
| 07:33:05 | gazorpazorp quits [Read error: Connection reset by peer] | |
| 07:33:23 | gazorpazorp (gazorpazorp) joins | |
| 08:02:16 | hackbug quits [Ping timeout: 252 seconds] | |
| 08:19:20 | hitgrr8 joins | |
| 08:39:36 | Ketchup901 quits [Client Quit] | |
| 08:42:43 | Ketchup901 (Ketchup901) joins | |
| 08:43:44 | umgr036 quits [Remote host closed the connection] | |
| 08:43:57 | umgr036 joins | |
| 08:53:02 | umgr036 quits [Remote host closed the connection] | |
| 08:58:51 | umgr036 joins | |
| 09:54:47 | Minkafighter722 quits [Quit: The Lounge - https://thelounge.chat] | |
| 09:56:41 | Minkafighter722 joins | |
| 10:26:56 | user_ joins | |
| 10:30:24 | umgr036 quits [Ping timeout: 252 seconds] | |
| 11:12:34 | dan_a quits [Read error: Connection reset by peer] | |
| 11:12:59 | dan_a (dan_a) joins | |
| 11:15:12 | zrracer joins | |
| 11:15:12 | zrracer quits [K-Lined] | |
| 11:56:48 | Ruthalas5 quits [Quit: Ping timeout (120 seconds)] | |
| 11:57:07 | Ruthalas5 (Ruthalas) joins | |
| 12:33:52 | user_ quits [Remote host closed the connection] | |
| 12:34:12 | user_ joins | |
| 12:35:39 | driib (driib) joins | |
| 12:40:52 | user_ quits [Remote host closed the connection] | |
| 12:41:06 | user_ joins | |
| 12:54:52 | eroc1990 quits [Ping timeout: 252 seconds] | |
| 12:57:21 | eroc1990 (eroc1990) joins | |
| 12:57:38 | hackbug (hackbug) joins | |
| 13:49:19 | Arcorann quits [Ping timeout: 252 seconds] | |
| 13:53:54 | CraftByte quits [Quit: Ping timeout (120 seconds)] | |
| 13:54:07 | CraftByte (DragonSec|CraftByte) joins | |
| 14:06:02 | Matthww13 joins | |
| 14:06:13 | Matthww1 quits [Read error: Connection reset by peer] | |
| 14:06:13 | Matthww13 is now known as Matthww1 | |
| 14:58:39 | eroc1990 quits [Client Quit] | |
| 15:15:45 | eroc1990 (eroc1990) joins | |
| 15:24:21 | eroc1990 quits [Client Quit] | |
| 15:30:47 | eroc1990 (eroc1990) joins | |
| 15:37:22 | Island joins | |
| 15:46:06 | Ketchup902 (Ketchup901) joins | |
| 15:48:36 | Ketchup901 quits [Remote host closed the connection] | |
| 17:01:56 | Island quits [Remote host closed the connection] | |
| 17:01:56 | user_ quits [Remote host closed the connection] | |
| 17:02:03 | Island joins | |
| 17:02:06 | user_ joins | |
| 17:06:43 | Island quits [Remote host closed the connection] | |
| 17:06:43 | _kallsyms quits [Ping timeout: 265 seconds] | |
| 17:06:43 | shreyasminocha quits [Ping timeout: 265 seconds] | |
| 17:06:43 | thehedgeh0g quits [Ping timeout: 265 seconds] | |
| 17:06:43 | atirclog quits [Ping timeout: 265 seconds] | |
| 17:06:43 | Connection closed. | |
| 17:06:55 | atirclog (atirclog) joins | |
| 17:06:55 | Topic: Lengthy ArchiveTeam-related discussions, questions here | Offtopic: #archiveteam-ot | https://twitter.com/textfiles/status/1069715869994020867 | |
| 17:06:55 | Topic set by JAA at 2021-11-02 03:43:57Z | |
| 17:07:01 | Island joins | |
| 17:07:06 | Current users: Island, atirclog (atirclog), kallsyms, Gereon6200 (Gereon), shreyasminocha (shreyasminocha), thehedgeh0g (mrHedgehog0), Minkafighter722, evan, Craigle (Craigle), user_, Ketchup902 (Ketchup901), eroc1990 (eroc1990), Matthww1, CraftByte (DragonSec|CraftByte), hackbug (hackbug), driib (driib), Ruthalas5 (Ruthalas), dan_a (dan_a), hitgrr8, gazorpazorp (gazorpazorp), lennier1 (lennier1), treora, TheTechRobo (TheTechRobo), fishingforsoup__, pabs (pabs), katocala (katocala), nomad-geek, thuban, benjins (benjins), fl0w_, nothere, Sluggs, @arkiver (arkiver), luckcolors (luckcolors), sknebel (sknebel), Xesxen (Xesxen), mikael, coderobe (coderobe), yano (yano), nimaje, DLoader, sepro (sepro), Atom, HackMii (hacktheplanet), Icyelut (Icyelut), Terbium, ThreeHM (ThreeHeadedMonkey), wyatt8740, raxxy-137409, nepeat (nepeat), lun4 (lun4), ave (ave), fuzzy8021 (fuzzy8021), geezabiscuit, leo60228 (leo60228), anarcat (anarcat), sidpatchy, ArchivalEfforts, monoxane (monoxane), Pingerfowder (Pingerfowder), Church (Church), michaelblob (michaelblob), VerifiedJ (VerifiedJ), datechnoman (datechnoman), abirkill (abirkill), HugsNotDrugs, nyany (nyany), qw3rty, kiska (kiska), flashfire42 (flashfire42), s-crypt22 (s-crypt), Ryz2 (Ryz), JensRex (JensRex), h2ibot (h2ibot), Pichu0202, tzt (tzt), Lord_Nightmare (Lord_Nightmare), BPCZ (BPCZ), MrRadar (MrRadar), @chfoo (chfoo), lukash799, T31M (T31M), ats (ats), @Sanqui (Sanqui), xkey (xkey), hexa- (hexa-), adia (adia), klg (klg), superkuh_, yawkat (yawkat), Maddie, summerisle (summerisle), Dj-Wawa (Dj-Wawa), h3ndr1k (h3ndr1k), Doran (Doranwen), aismallard, phuzion (phuzion), Frogging101, benjins2__, jspiros (jspiros), jodizzle (jodizzle), fangfufu (fangfufu), endrift|ZNC, Suika_, knecht420 (knecht420), Barto (Barto), Kinille (Kinille), HotSwap (HotSwap), Larsenv (Larsenv), myself, wickedplayer494 (wickedplayer494), Gaelan_ (Gaelan), DopefishJustin (DopefishJustin), Arachnophine (Arachnophine), @OrIdow6 (OrIdow6), Chris5010 (Chris5010), Shjosan_ (Shjosan), Aoede (Aoede), marto_8, franga2000, Nulo, birdjj, Jake (Jake), daxxy (daxxy), @dxrt (dxrt), cronfox (Cronfox), le0n (le0n), immibis, Hackerpcs (Hackerpcs), pie_, jtagcat (jtagcat), celestial, adamus1red (adamus1red), girst (girst), tomodachi94 (tomodachi94), Thibaultmol, apache2, redbees, lunik17, sloop_, @Fusl (Fusl), jmtd, Ryz (Ryz), akaibu|m, igloo22225 (igloo22225), justcool393 (justcool393), AK (AK), @HCross (HCross), JSharp (JSharp), russss (russss), FalconK (FalconK), Ctrl-S, ghuntley (ghuntley), thejsa, NotEggplant, mgrandi (mgrandi), qxtal (qxtal), kpcyrd (kpcyrd), zifnab06, @jrwr (jrwr), devsnek (devsnek), tech234a (tech234a), @ChanServ, pcr (pcr), programmerq (programmerq), Somebody2 (Somebody2), Jonimus, VonGuard, murmur, Muad-Dib, eythian, avoozl, zhongfu (zhongfu), wessel1512 (wessel1512), Deewiant (Deewiant), kiskaLogBot, pokechu22 (pokechu22), @kaz (Kaz), mgrytbak, Hecz (Hecz), fionera (Fionera), masterX244 (masterX244), cm, colona (colona), @JAA (JAA), mrfooooo, @AlsoJAA (JAA), betamax (betamax), sembiance (sembiance), SketchCow, rewby|m, @Sanqui|m (Sanqui), tech234a|m, Ruk8 (Ruk8), MinePlayersPEMyNey|m, xxia|m, andrewvieyra|m, theblazehen|m, mind_combatant, saouroun|m, nyuuzyou (nyuuzyou), schwarzkatz|m, Maakuth|m, DigitalDragon (DigitalDragon), ragu|m, britmob|m, audrooku|m, mpeter|m, Ajay, igneousx (igneousx), Passiing|m, kaz__|m, joepie91|m, noxious, thermospheric (Thermospheric), marius851000, EmeraldSnorlax|m, Peetz0r|m, haha-whered-it-go|m, x9fff00 (x9fff00), madpro|m, will1|m, NickS|m, TappyToes, atphoenix (atphoenix), monika (boom), G4te_Keep3r349, omni (omni), Wingy (Wingy), cptcobalt, Soul_, loopy, neggles (neggles), notbasetwo, simon8162 (simon816), RKenshin, @hook54321 (hook54321), [42] (N4Y), Dalek (Dalek), @rewby (rewby), mwfc (mwfc), marked1 (marked1), maxfan8_ (maxfan8), nickofni1 (nickofnicks), Doomaholic, JTL (jtl), asie2, erenrich, siinus (siinus), njha1, dunger (dunger) | |
| 17:09:26 | Minkafighter7225 joins | |
| 17:10:37 | raxxy-137409 quits [Quit: No Ping reply in 180 seconds.] | |
| 17:10:40 | Gereon6200 quits [Client Quit] | |
| 17:10:40 | Minkafighter722 quits [Client Quit] | |
| 17:10:40 | Island quits [Remote host closed the connection] | |
| 17:10:40 | sknebel quits [Remote host closed the connection] | |
| 17:10:40 | Minkafighter7225 is now known as Minkafighter722 | |
| 17:10:45 | Gereon6200 (Gereon) joins | |
| 17:11:20 | raxxy-137409 joins | |
| 17:11:30 | sknebel (sknebel) joins | |
| 17:39:58 | sec^nd (second) joins | |
| 17:50:23 | sec^nd quits [Remote host closed the connection] | |
| 18:03:25 | sec^nd (second) joins | |
| 18:52:38 | <h2ibot> | JustAnotherArchivist edited Issuu (+151, Update info box): https://wiki.archiveteam.org/?diff=49481&oldid=49478 |
| 19:06:41 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+383, /* 2023 */ Add AnaitGames forums): https://wiki.archiveteam.org/?diff=49482&oldid=49480 |
| 19:43:02 | sec^nd quits [Ping timeout: 276 seconds] | |
| 19:53:05 | sec^nd (second) joins | |
| 20:12:08 | qwertyasdfuiopghjkl joins | |
| 20:28:55 | umgr036 joins | |
| 20:29:10 | user_ quits [Ping timeout: 252 seconds] | |
| 20:31:56 | <h2ibot> | Katocala edited Entomology (+12289): https://wiki.archiveteam.org/?diff=49483&oldid=45742 |
| 20:37:57 | <h2ibot> | Katocala edited Entomology (+1): https://wiki.archiveteam.org/?diff=49484&oldid=49483 |
| 20:38:57 | <h2ibot> | Katocala edited Entomology (+0): https://wiki.archiveteam.org/?diff=49485&oldid=49484 |
| 20:41:28 | fl0w joins | |
| 20:44:01 | fl0w_ quits [Ping timeout: 252 seconds] | |
| 20:44:58 | <h2ibot> | Katocala edited Entomology (+1): https://wiki.archiveteam.org/?diff=49486&oldid=49485 |
| 20:44:59 | <h2ibot> | Pokechu22 edited Deathwatch (+243, /* Pining for the Fjords (Dying) */ 2026:…): https://wiki.archiveteam.org/?diff=49487&oldid=49482 |
| 20:46:59 | <h2ibot> | Katocala edited Entomology (-1): https://wiki.archiveteam.org/?diff=49488&oldid=49486 |
| 21:31:08 | umgr036 quits [Remote host closed the connection] | |
| 21:31:11 | umgr036 joins | |
| 21:47:44 | Gereon6200 quits [Client Quit] | |
| 21:47:44 | umgr036 quits [Remote host closed the connection] | |
| 21:47:57 | umgr036 joins | |
| 22:07:16 | <h2ibot> | Katocala edited Entomology (+57): https://wiki.archiveteam.org/?diff=49489&oldid=49488 |
| 22:10:17 | <h2ibot> | Katocala edited Entomology (-21): https://wiki.archiveteam.org/?diff=49490&oldid=49489 |
| 22:13:17 | <h2ibot> | Katocala edited Entomology (+0): https://wiki.archiveteam.org/?diff=49491&oldid=49490 |
| 22:16:18 | <h2ibot> | Katocala edited Entomology (-16): https://wiki.archiveteam.org/?diff=49492&oldid=49491 |
| 22:16:19 | Ketchup902 quits [Remote host closed the connection] | |
| 22:17:05 | Ketchup901 (Ketchup901) joins | |
| 22:17:18 | fishingforsoup__ is now authenticated as fishingforsoup | |
| 22:37:23 | <h2ibot> | Katocala edited Entomology (+27): https://wiki.archiveteam.org/?diff=49493&oldid=49492 |
| 23:07:03 | BlueMaxima joins | |
| 23:10:23 | hitgrr8 quits [Client Quit] |