00:38:50pabs quits [Remote host closed the connection]
00:39:39pabs (pabs) joins
01:33:58<pabs>is AB capable of transforming the URLs that it finds before downloading them and finding more URLs in them?
01:34:02<pabs>alternatively, what libraries/frameworks do folks usually use for writing custom crawlers to get URLs for AB !ao < ?
01:34:25<pabs>(I have a site where the links are all broken, but if you transform them then they work)
01:37:33<pokechu22>I don't think archivebot can do that, unfortunately. I sometimes use `$$("a").map(it => it.href).join("\n")` in a browser as a very crappy way to get all the links on a given page (you can do more advanced stuff like `$$("a[href*=\"/watch\"]")` too) but that requires further postprocessing to be useful
01:37:45<pokechu22>(and doesn't scale at all)
01:37:46BlueMaxima joins
01:38:56<pokechu22>What site is it? Might be easier to make suggestions with examples
01:40:00fishingforsoup__ joins
01:42:10<@JAA>AB does not have URL rewriting, no. It's been on my wishlist for a while.
01:42:25<@JAA>I typically do such things with curl + grep/sed/awk/...
01:43:19fishingforsoup_ quits [Ping timeout: 252 seconds]
01:43:33<pabs>pokechu22: http://phase.ckut.ca/ - I'm working on anarcat's request from a few days ago
01:44:14<pabs>it need to be a site-recursive crawl, with URL modifications
01:44:29<pabs>perhaps I'll use one of the Python libs
01:45:01<pokechu22>example of a page with broken links?
01:46:14<pokechu22>oh, http://phase.ckut.ca/cgi-bin/ckut-grid.pl -> http://ckut.ca/c/en/oldgrid -> https://ckut.ca/c/en/oldgrid -> 404 when it should be http://phase.ckut.ca/c/en/oldgrid
01:46:15<pabs>in particular http://phase.ckut.ca/homeless/, anything that links to http://ckut.ca/homeless/.* should be translated to http://phase.ckut.ca/.*
01:46:27<pabs>yeah, also the front page
01:47:16<pokechu22>Within http://phase.ckut.ca/c/en/oldgrid things seem to be OK though
01:47:39<pabs>yeah, some parts of the site use relative links I think, others (broken) absolute ones
01:47:58<pokechu22>Yeah, that part looks like it'd be a pain to deal with :/
01:48:46<@JAA>It could be done with a wpull plugin I think, but that's not particularly pleasant.
01:49:26<pabs>maybe its time I learned scrapy, I don't think it is a very large site
01:54:36umgr036 quits [Remote host closed the connection]
01:54:39umgr036 joins
02:02:50<@JAA>wget-at with a Lua script is another option.
02:06:08Ketchup901 quits [Ping timeout: 276 seconds]
02:06:45Ketchup901 (Ketchup901) joins
02:18:20user_ quits [Ping timeout: 252 seconds]
02:18:40gazorpazorp (gazorpazorp) joins
02:47:45TheTechRobo quits [Remote host closed the connection]
02:49:25TheTechRobo (TheTechRobo) joins
04:41:14Ketchup901 quits [Remote host closed the connection]
04:41:49Ketchup901 (Ketchup901) joins
04:47:20<pabs>I managed to do it with scrapy, bit hacky
05:44:05BlueMaxima quits [Client Quit]
05:47:54Island quits [Read error: Connection reset by peer]
06:26:34Matthww1 quits [Ping timeout: 252 seconds]
06:29:30Matthww1 joins
06:30:39treora quits [Remote host closed the connection]
06:30:40treora joins
06:47:35lennier1 quits [Client Quit]
06:48:24lennier1 (lennier1) joins
07:25:47Ketchup901 quits [Remote host closed the connection]
07:27:34Ketchup901 (Ketchup901) joins
07:33:05gazorpazorp quits [Read error: Connection reset by peer]
07:33:23gazorpazorp (gazorpazorp) joins
08:02:16hackbug quits [Ping timeout: 252 seconds]
08:19:20hitgrr8 joins
08:39:36Ketchup901 quits [Client Quit]
08:42:43Ketchup901 (Ketchup901) joins
08:43:44umgr036 quits [Remote host closed the connection]
08:43:57umgr036 joins
08:53:02umgr036 quits [Remote host closed the connection]
08:58:51umgr036 joins
09:54:47Minkafighter722 quits [Quit: The Lounge - https://thelounge.chat]
09:56:41Minkafighter722 joins
10:26:56user_ joins
10:30:24umgr036 quits [Ping timeout: 252 seconds]
11:12:34dan_a quits [Read error: Connection reset by peer]
11:12:59dan_a (dan_a) joins
11:15:12zrracer joins
11:15:12zrracer quits [K-Lined]
11:56:48Ruthalas5 quits [Quit: Ping timeout (120 seconds)]
11:57:07Ruthalas5 (Ruthalas) joins
12:33:52user_ quits [Remote host closed the connection]
12:34:12user_ joins
12:35:39driib (driib) joins
12:40:52user_ quits [Remote host closed the connection]
12:41:06user_ joins
12:54:52eroc1990 quits [Ping timeout: 252 seconds]
12:57:21eroc1990 (eroc1990) joins
12:57:38hackbug (hackbug) joins
13:49:19Arcorann quits [Ping timeout: 252 seconds]
13:53:54CraftByte quits [Quit: Ping timeout (120 seconds)]
13:54:07CraftByte (DragonSec|CraftByte) joins
14:06:02Matthww13 joins
14:06:13Matthww1 quits [Read error: Connection reset by peer]
14:06:13Matthww13 is now known as Matthww1
14:58:39eroc1990 quits [Client Quit]
15:15:45eroc1990 (eroc1990) joins
15:24:21eroc1990 quits [Client Quit]
15:30:47eroc1990 (eroc1990) joins
15:37:22Island joins
15:46:06Ketchup902 (Ketchup901) joins
15:48:36Ketchup901 quits [Remote host closed the connection]
17:01:56Island quits [Remote host closed the connection]
17:01:56user_ quits [Remote host closed the connection]
17:02:03Island joins
17:02:06user_ joins
17:06:43Island quits [Remote host closed the connection]
17:06:43_kallsyms quits [Ping timeout: 265 seconds]
17:06:43shreyasminocha quits [Ping timeout: 265 seconds]
17:06:43thehedgeh0g quits [Ping timeout: 265 seconds]
17:06:43atirclog quits [Ping timeout: 265 seconds]
17:06:43Connection closed.
17:06:55atirclog (atirclog) joins
17:06:55Topic: Lengthy ArchiveTeam-related discussions, questions here | Offtopic: #archiveteam-ot | https://twitter.com/textfiles/status/1069715869994020867
17:06:55Topic set by JAA at 2021-11-02 03:43:57Z
17:07:01Island joins
17:07:06Current users: Island, atirclog (atirclog), kallsyms, Gereon6200 (Gereon), shreyasminocha (shreyasminocha), thehedgeh0g (mrHedgehog0), Minkafighter722, evan, Craigle (Craigle), user_, Ketchup902 (Ketchup901), eroc1990 (eroc1990), Matthww1, CraftByte (DragonSec|CraftByte), hackbug (hackbug), driib (driib), Ruthalas5 (Ruthalas), dan_a (dan_a), hitgrr8, gazorpazorp (gazorpazorp), lennier1 (lennier1), treora, TheTechRobo (TheTechRobo), fishingforsoup__, pabs (pabs), katocala (katocala), nomad-geek, thuban, benjins (benjins), fl0w_, nothere, Sluggs, @arkiver (arkiver), luckcolors (luckcolors), sknebel (sknebel), Xesxen (Xesxen), mikael, coderobe (coderobe), yano (yano), nimaje, DLoader, sepro (sepro), Atom, HackMii (hacktheplanet), Icyelut (Icyelut), Terbium, ThreeHM (ThreeHeadedMonkey), wyatt8740, raxxy-137409, nepeat (nepeat), lun4 (lun4), ave (ave), fuzzy8021 (fuzzy8021), geezabiscuit, leo60228 (leo60228), anarcat (anarcat), sidpatchy, ArchivalEfforts, monoxane (monoxane), Pingerfowder (Pingerfowder), Church (Church), michaelblob (michaelblob), VerifiedJ (VerifiedJ), datechnoman (datechnoman), abirkill (abirkill), HugsNotDrugs, nyany (nyany), qw3rty, kiska (kiska), flashfire42 (flashfire42), s-crypt22 (s-crypt), Ryz2 (Ryz), JensRex (JensRex), h2ibot (h2ibot), Pichu0202, tzt (tzt), Lord_Nightmare (Lord_Nightmare), BPCZ (BPCZ), MrRadar (MrRadar), @chfoo (chfoo), lukash799, T31M (T31M), ats (ats), @Sanqui (Sanqui), xkey (xkey), hexa- (hexa-), adia (adia), klg (klg), superkuh_, yawkat (yawkat), Maddie, summerisle (summerisle), Dj-Wawa (Dj-Wawa), h3ndr1k (h3ndr1k), Doran (Doranwen), aismallard, phuzion (phuzion), Frogging101, benjins2__, jspiros (jspiros), jodizzle (jodizzle), fangfufu (fangfufu), endrift|ZNC, Suika_, knecht420 (knecht420), Barto (Barto), Kinille (Kinille), HotSwap (HotSwap), Larsenv (Larsenv), myself, wickedplayer494 (wickedplayer494), Gaelan_ (Gaelan), DopefishJustin (DopefishJustin), Arachnophine (Arachnophine), @OrIdow6 (OrIdow6), Chris5010 (Chris5010), Shjosan_ (Shjosan), Aoede (Aoede), marto_8, franga2000, Nulo, birdjj, Jake (Jake), daxxy (daxxy), @dxrt (dxrt), cronfox (Cronfox), le0n (le0n), immibis, Hackerpcs (Hackerpcs), pie_, jtagcat (jtagcat), celestial, adamus1red (adamus1red), girst (girst), tomodachi94 (tomodachi94), Thibaultmol, apache2, redbees, lunik17, sloop_, @Fusl (Fusl), jmtd, Ryz (Ryz), akaibu|m, igloo22225 (igloo22225), justcool393 (justcool393), AK (AK), @HCross (HCross), JSharp (JSharp), russss (russss), FalconK (FalconK), Ctrl-S, ghuntley (ghuntley), thejsa, NotEggplant, mgrandi (mgrandi), qxtal (qxtal), kpcyrd (kpcyrd), zifnab06, @jrwr (jrwr), devsnek (devsnek), tech234a (tech234a), @ChanServ, pcr (pcr), programmerq (programmerq), Somebody2 (Somebody2), Jonimus, VonGuard, murmur, Muad-Dib, eythian, avoozl, zhongfu (zhongfu), wessel1512 (wessel1512), Deewiant (Deewiant), kiskaLogBot, pokechu22 (pokechu22), @kaz (Kaz), mgrytbak, Hecz (Hecz), fionera (Fionera), masterX244 (masterX244), cm, colona (colona), @JAA (JAA), mrfooooo, @AlsoJAA (JAA), betamax (betamax), sembiance (sembiance), SketchCow, rewby|m, @Sanqui|m (Sanqui), tech234a|m, Ruk8 (Ruk8), MinePlayersPEMyNey|m, xxia|m, andrewvieyra|m, theblazehen|m, mind_combatant, saouroun|m, nyuuzyou (nyuuzyou), schwarzkatz|m, Maakuth|m, DigitalDragon (DigitalDragon), ragu|m, britmob|m, audrooku|m, mpeter|m, Ajay, igneousx (igneousx), Passiing|m, kaz__|m, joepie91|m, noxious, thermospheric (Thermospheric), marius851000, EmeraldSnorlax|m, Peetz0r|m, haha-whered-it-go|m, x9fff00 (x9fff00), madpro|m, will1|m, NickS|m, TappyToes, atphoenix (atphoenix), monika (boom), G4te_Keep3r349, omni (omni), Wingy (Wingy), cptcobalt, Soul_, loopy, neggles (neggles), notbasetwo, simon8162 (simon816), RKenshin, @hook54321 (hook54321), [42] (N4Y), Dalek (Dalek), @rewby (rewby), mwfc (mwfc), marked1 (marked1), maxfan8_ (maxfan8), nickofni1 (nickofnicks), Doomaholic, JTL (jtl), asie2, erenrich, siinus (siinus), njha1, dunger (dunger)
17:09:26Minkafighter7225 joins
17:10:37raxxy-137409 quits [Quit: No Ping reply in 180 seconds.]
17:10:40Gereon6200 quits [Client Quit]
17:10:40Minkafighter722 quits [Client Quit]
17:10:40Island quits [Remote host closed the connection]
17:10:40sknebel quits [Remote host closed the connection]
17:10:40Minkafighter7225 is now known as Minkafighter722
17:10:45Gereon6200 (Gereon) joins
17:11:20raxxy-137409 joins
17:11:30sknebel (sknebel) joins
17:39:58sec^nd (second) joins
17:50:23sec^nd quits [Remote host closed the connection]
18:03:25sec^nd (second) joins
18:52:38<h2ibot>JustAnotherArchivist edited Issuu (+151, Update info box): https://wiki.archiveteam.org/?diff=49481&oldid=49478
19:06:41<h2ibot>JustAnotherArchivist edited Deathwatch (+383, /* 2023 */ Add AnaitGames forums): https://wiki.archiveteam.org/?diff=49482&oldid=49480
19:43:02sec^nd quits [Ping timeout: 276 seconds]
19:53:05sec^nd (second) joins
20:12:08qwertyasdfuiopghjkl joins
20:28:55umgr036 joins
20:29:10user_ quits [Ping timeout: 252 seconds]
20:31:56<h2ibot>Katocala edited Entomology (+12289): https://wiki.archiveteam.org/?diff=49483&oldid=45742
20:37:57<h2ibot>Katocala edited Entomology (+1): https://wiki.archiveteam.org/?diff=49484&oldid=49483
20:38:57<h2ibot>Katocala edited Entomology (+0): https://wiki.archiveteam.org/?diff=49485&oldid=49484
20:41:28fl0w joins
20:44:01fl0w_ quits [Ping timeout: 252 seconds]
20:44:58<h2ibot>Katocala edited Entomology (+1): https://wiki.archiveteam.org/?diff=49486&oldid=49485
20:44:59<h2ibot>Pokechu22 edited Deathwatch (+243, /* Pining for the Fjords (Dying) */ 2026:…): https://wiki.archiveteam.org/?diff=49487&oldid=49482
20:46:59<h2ibot>Katocala edited Entomology (-1): https://wiki.archiveteam.org/?diff=49488&oldid=49486
21:31:08umgr036 quits [Remote host closed the connection]
21:31:11umgr036 joins
21:47:44Gereon6200 quits [Client Quit]
21:47:44umgr036 quits [Remote host closed the connection]
21:47:57umgr036 joins
22:07:16<h2ibot>Katocala edited Entomology (+57): https://wiki.archiveteam.org/?diff=49489&oldid=49488
22:10:17<h2ibot>Katocala edited Entomology (-21): https://wiki.archiveteam.org/?diff=49490&oldid=49489
22:13:17<h2ibot>Katocala edited Entomology (+0): https://wiki.archiveteam.org/?diff=49491&oldid=49490
22:16:18<h2ibot>Katocala edited Entomology (-16): https://wiki.archiveteam.org/?diff=49492&oldid=49491
22:16:19Ketchup902 quits [Remote host closed the connection]
22:17:05Ketchup901 (Ketchup901) joins
22:37:23<h2ibot>Katocala edited Entomology (+27): https://wiki.archiveteam.org/?diff=49493&oldid=49492
23:07:03BlueMaxima joins
23:10:23hitgrr8 quits [Client Quit]