03:14:51<HP_Archivist>WBM captured this page: https://encompass.com/model/EPSE10000XLPH
03:15:17<HP_Archivist>But what I want to do is capture the outlinks for each part listed on each of the 7 pages. Any suggestions other than manually going in and doing that over again for all?
03:22:41<pabs>HP_Archivist: locally download all the pages, extract outlinks from each of them, compile into a list, upload to transfer, do #archivebot !ao < $transferurl
03:23:13<HP_Archivist>I used HTTrack to do that, but couldn't figure out where to look for the links
03:23:17<HP_Archivist>pabs ^
03:24:00<pabs>personally I would just do this: curl -sL $page | pup 'a attr{href}' | sort -u
03:24:27<pabs>have you got the URLs to the other pages? seems to need JS?
03:25:13<pabs>(pup is a html parser, the 'a attr{href}' commands means find all <a> tags and grab their href attributes)
03:28:53<HP_Archivist>No, that's just it. Need an easy way to grab those URLs to the other pages/individual parts pages
03:29:11<pabs>parts pages will work with pup
03:29:20<pabs>the 7 pages not sure, will check
03:31:03<HP_Archivist>Luckily, WBM did just find with the 7 pages
03:31:11<HP_Archivist>Did not grab the parts pages though
03:31:21<HP_Archivist>Did just fine*
03:33:12<HP_Archivist>I mean, the individual parts pages doesn't have much detail from this partner supplies company. The pages don't list much other than part number and some vague name. But they're still important to save as reference
03:36:04<pabs>lol, those "pages" are all in the same HTML file
03:36:20<pabs>so a simple pup href extraction works, sec
03:37:48<HP_Archivist>Ah, that's why SPN did fine with it
03:40:05<pabs>if you did SPN outlinks then it should have all pages
03:40:14<pabs>but if not, here is what pup got: https://transfer.archivete.am/S4sVE/encompass.com-model-EPSE10000XLPH-outlinks.txt
03:40:53<pabs>edit that, reupload it, do something like this in the #archivebot chan: !ao < https://transfer.archivete.am/S4sVE/encompass.com-model-EPSE10000XLPH-outlinks.txt
03:40:59<HP_Archivist>I did, but maybe I just need to give it time for them to show up in WBM because clicking each part link just brings up not a not crawled SPN page
03:41:07<HP_Archivist>Many thanks, pabs
03:41:19<HP_Archivist>I'll submit it in ab in a few
03:42:05<HP_Archivist>This particular scanner from Epson is used in the VGPC community and is also widely used generally for large format archival scanning. So, any reference to its technical literate and parts list identification is good to save
03:42:16<HP_Archivist>literature*
04:11:28Justin[home] is now known as DopefishJustin
04:21:43fireonlive quits [Client Quit]
04:23:27fireonlive (fireonlive) joins
06:01:46AlsoHP_Archivist joins
06:02:02atphoenix__ (atphoenix) joins
06:02:24s-crypt20 (s-crypt) joins
06:02:35sepro8 (sepro) joins
06:02:35fireonlive quits [Client Quit]
06:02:35Flashfire42 quits [Client Quit]
06:02:35s-crypt2 quits [Client Quit]
06:02:35nulldata quits [Client Quit]
06:02:35kiska quits [Client Quit]
06:02:36s-crypt20 is now known as s-crypt2
06:02:39TheTechRobo quits [Client Quit]
06:02:39sepro quits [Client Quit]
06:02:39HP_Archivist quits [Remote host closed the connection]
06:02:39atphoenix_ quits [Remote host closed the connection]
06:02:39sepro8 is now known as sepro
06:02:39andrew quits [Client Quit]
06:02:39project10 quits [Client Quit]
06:02:39Ryz quits [Client Quit]
06:02:52nulldata (nulldata) joins
06:03:11Ryz (Ryz) joins
06:04:03kiska (kiska) joins
06:04:12fireonlive (fireonlive) joins
06:04:37TheTechRobo (TheTechRobo) joins
06:22:23Jake quits [Ping timeout: 272 seconds]
06:28:55Jake (Jake) joins
06:37:46Flashfire42 joins
06:56:44Arcorann (Arcorann) joins
06:59:06project10 (project10) joins
08:44:40qwertyasdfuiopghjkl quits [Remote host closed the connection]
09:48:24qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
09:56:30[42] quits [Remote host closed the connection]
09:57:25[42] (N4Y) joins
11:53:02dxrt_ joins
11:53:11project10 quits [Client Quit]
11:53:11TheTechRobo quits [Client Quit]
11:53:12fireonlive quits [Client Quit]
11:53:12[42] quits [Max SendQ exceeded]
11:53:12dxrt quits [Client Quit]
11:53:12qwertyasdfuiopghjkl quits [Client Quit]
11:53:12balrog quits [Client Quit]
11:53:21[42] (N4Y) joins
11:53:57balrog (balrog) joins
11:53:57project10 (project10) joins
11:54:44fireonlive (fireonlive) joins
11:57:06fireonlive quits [Killed (NickServ (GHOST command used by fireonlive9))]
11:57:42project10 quits [Client Quit]
11:57:57project10 (project10) joins
11:58:35fireonlive (fireonlive) joins
11:59:40TheTechRobo (TheTechRobo) joins
12:01:36TheTechRobo quits [Excess Flood]
12:03:59TheTechRobo (TheTechRobo) joins
12:44:55Arcorann quits [Ping timeout: 272 seconds]
14:11:31Matthww1192 joins
14:13:35Matthww119 quits [Ping timeout: 272 seconds]
14:13:35Matthww1192 is now known as Matthww119
17:30:31nicolas17 joins
21:16:01project10 quits [Ping timeout: 272 seconds]
21:21:52project10 (project10) joins
21:44:59Jake quits [Client Quit]
21:45:53Jake (Jake) joins
21:48:49qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
21:51:02Jake quits [Client Quit]
21:52:55Jake (Jake) joins
22:30:42BearFortress_ joins
22:34:34BearFortress quits [Ping timeout: 265 seconds]
22:42:43TheTechRobo quits [Client Quit]
22:42:43project10 quits [Client Quit]
22:43:29project10 (project10) joins
22:48:02TheTechRobo (TheTechRobo) joins
23:00:25BearFortress_ quits [Read error: Connection reset by peer]
23:00:31BearFortress joins
23:00:41AlsoHP_Archivist quits [Read error: Connection reset by peer]
23:01:29AlsoHP_Archivist joins
23:01:43atphoenix_ (atphoenix) joins
23:01:49sepro quits [Client Quit]
23:02:01fireonlive quits [Client Quit]
23:02:01Jake quits [Client Quit]
23:02:09s-crypt21 (s-crypt) joins
23:02:20sepro (sepro) joins
23:02:20Justin[home] joins
23:02:22Flashfire424 joins
23:02:25G4te_Keep3r3492 quits [Quit: Ping timeout (120 seconds)]
23:02:31Ryz quits [Client Quit]
23:02:32geezabiscuit quits [Read error: Connection reset by peer]
23:02:39nulldata3 (nulldata) joins
23:02:47G4te_Keep3r3492 joins
23:02:49dxrt_ quits [Client Quit]
23:03:00Lord_Nightmare quits [Client Quit]
23:03:00geezabiscuit joins
23:03:00geezabiscuit quits [Changing host]
23:03:00geezabiscuit (geezabiscuit) joins
23:03:07dxrt joins
23:03:09dxrt quits [Changing host]
23:03:09dxrt (dxrt) joins
23:03:10Ryz (Ryz) joins
23:03:15nulldata quits [Read error: Connection reset by peer]
23:03:16nulldata3 is now known as nulldata
23:03:20Lord_Nightmare (Lord_Nightmare) joins
23:03:27Jake (Jake) joins
23:03:38fireonlive (fireonlive) joins
23:03:54kiska6 (kiska) joins
23:04:19s-crypt2 quits [Ping timeout: 272 seconds]
23:04:19DopefishJustin quits [Ping timeout: 272 seconds]
23:04:19s-crypt21 is now known as s-crypt2
23:04:57Flashfire42 quits [Ping timeout: 272 seconds]
23:04:57kiska quits [Ping timeout: 272 seconds]
23:04:57atphoenix__ quits [Ping timeout: 272 seconds]
23:04:57kiska6 is now known as kiska
23:04:58Flashfire424 is now known as Flashfire42
23:16:39DLoader_ joins
23:16:39DLoader_ quits [Excess Flood]
23:17:27DLoader_ joins
23:18:15rewby quits [Ping timeout: 272 seconds]
23:18:40rewby (rewby) joins
23:20:09DLoader quits [Ping timeout: 272 seconds]
23:20:09DLoader_ is now known as DLoader