00:05:55Arcorann (Arcorann) joins
01:02:35dm4v_ joins
01:03:55dm4v quits [Ping timeout: 265 seconds]
01:03:55dm4v_ is now known as dm4v
01:03:56dm4v quits [Changing host]
01:03:56dm4v (dm4v) joins
01:10:56internet_od (internet_od) joins
01:13:29dm4v_ joins
01:16:00dm4v quits [Ping timeout: 265 seconds]
01:16:00dm4v_ is now known as dm4v
01:16:01dm4v quits [Changing host]
01:16:01dm4v (dm4v) joins
01:16:18benjinsmith joins
01:16:22benjins quits [Ping timeout: 245 seconds]
01:25:15internet_od quits [Client Quit]
01:26:50benjinsmith is now known as benjins
01:51:47Discant quits [Ping timeout: 245 seconds]
02:15:41Iki quits [Remote host closed the connection]
02:15:57Iki joins
02:27:32bonga quits [Ping timeout: 265 seconds]
02:29:07<h2ibot>Entartet edited List of websites excluded from the Wayback Machine (+47, Added host.net and theepicentre.com.): https://wiki.archiveteam.org/?diff=48676&oldid=48673
02:31:07<h2ibot>PolarManne edited 4chan (+659): https://wiki.archiveteam.org/?diff=48677&oldid=48659
02:36:08<h2ibot>JustAnotherArchivist edited Wikidot (+28, Merge edit by…): https://wiki.archiveteam.org/?diff=48678&oldid=48645
02:36:41<@JAA>aismallard: ^ Sorry, that took much longer than it should have because there was an edit conflict.
02:37:58bonga joins
02:45:37<aismallard>Cool, thanks
05:43:19Iki1 joins
05:47:09Iki quits [Ping timeout: 265 seconds]
06:00:46BlueMaxima joins
06:24:27endrift|ZNC is now known as endrift
07:20:00lennier1 quits [Client Quit]
07:20:45lennier1 (lennier1) joins
07:38:43Discant joins
07:47:13lennier2 joins
07:49:42lennier1 quits [Ping timeout: 245 seconds]
07:50:02lennier2 is now known as lennier1
08:05:48savefactwirehk joins
08:06:30<savefactwirehk>Hi (sorry if this is a double post, new to IRC). Factwire, an independent HK media outlet, announced 10 mins ago it will be shut down today. Any way to back it all up before it shuts down in a few hours? Not many articles, probably ~1000, so should be doable.
08:06:39<savefactwirehk>Their shutdown statement: https://www.instagram.com/p/CencgN-vDaZ/?hl=en
08:08:05<savefactwirehk>Website: https://www.factwire.orgFacebook: https://www.facebook.com/factwireTwitter: https://twitter.com/factwirenewsIG: https://www.instagram.com/factwirenews/Youtube: https://www.youtube.com/FactWireVideoMeWe: https://mewe.com/p/factwire
08:08:07lennier1 quits [Read error: Connection reset by peer]
08:08:23<savefactwirehk>Website: https://www.factwire.orgFacebook: https://www.facebook.com/factwireTwitter: https://twitter.com/factwirenewsIG: https://www.instagram.com/factwirenews/Youtube: https://www.youtube.com/FactWireVideoMeWe: https://mewe.com/p/factwire
08:08:25lennier1 (lennier1) joins
08:12:16<Doranwen>I left a msg in the announcements channel and hopefully someone can throw it into ArchiveBot
08:14:21lennier2 joins
08:14:34savefactwirehk62 joins
08:14:38Discant quits [Remote host closed the connection]
08:14:57Discant joins
08:15:51<savefactwirehk62>Hopefully this sends correctly. Factwire webpages:1. Website: https://www.factwire.org***2. Facebook: https://www.facebook.com/factwire***3. Twitter: https://twitter.com/factwirenews***4. IG: https://www.instagram.com/factwirenews/***5. Youtube: https://www.youtube.com/FactWireVideo***6. MeWe: https://mewe.com/p/factwire
08:16:30savefactwirehk quits [Ping timeout: 265 seconds]
08:18:06<Doranwen>savefactwirehk62: you've sent it three times now
08:18:11lennier2_ joins
08:18:21<Doranwen>so it's just waiting for the people who can deal with it to see it
08:18:55lennier1 quits [Ping timeout: 265 seconds]
08:19:01lennier2_ is now known as lennier1
08:21:47lennier2 quits [Ping timeout: 245 seconds]
08:23:30<@Sanqui>ArchiveBot can't do it :/
08:28:30<savefactwirehk62>Yeah sorry the formatting shows up weird on my web IRC client every time I send it so tried changing the formatting. i'm an irc noob.
08:47:43pie_ quits []
08:47:50JackThompson4 joins
08:47:59pie_ joins
08:49:17JackThompson quits [Ping timeout: 245 seconds]
08:49:17JackThompson4 is now known as JackThompson
08:51:47qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
09:02:41Megame (Megame) joins
09:06:08lennier1 quits [Read error: Connection reset by peer]
09:06:26lennier1 (lennier1) joins
09:13:53lennier1 quits [Ping timeout: 265 seconds]
09:17:56savefactwirehk62 quits [Remote host closed the connection]
09:28:00lennier1 (lennier1) joins
09:29:23pie_ quits [Remote host closed the connection]
09:53:51BlueMaxima quits [Client Quit]
11:30:21Discant quits [Remote host closed the connection]
11:30:28Discant joins
11:32:07pie_ joins
11:39:59pie_ quits [Ping timeout: 265 seconds]
11:53:41pie_ joins
11:54:28Mateon1 quits [Quit: Mateon1]
11:55:45Mateon1 joins
12:01:11Mateon1 quits [Remote host closed the connection]
12:01:34Mateon1 joins
12:05:28HackMii quits [Remote host closed the connection]
12:08:02HackMii (hacktheplanet) joins
12:54:42LeGoupil joins
13:30:43pabs quits [Client Quit]
13:36:31Chris5010 quits [Client Quit]
13:37:36driib quits [Client Quit]
13:37:36Matthww quits [Client Quit]
13:37:36JackThompson quits [Client Quit]
13:37:36dm4v quits [Client Quit]
13:37:40dm4v joins
13:37:41dm4v quits [Changing host]
13:37:41dm4v (dm4v) joins
13:37:42driib (driib) joins
13:37:46Matthww joins
13:37:51pabs (pabs) joins
13:37:51JackThompson joins
13:46:35Chris5010 (Chris5010) joins
14:01:22Arcorann quits [Ping timeout: 245 seconds]
14:04:26Megame quits [Client Quit]
14:05:03<@arkiver>checking factwire.org
15:08:22qwertyasdfuiopghjkl joins
17:03:52robbi5 quits [Remote host closed the connection]
17:17:24jdp__ is now known as drexler
17:17:42<drexler>How fast am I allowed to download from Internet Archive?
17:17:52<drexler>I'd like to train on some of the collections but don't want to overload their servers.
17:18:16<drexler>I tried searching for the answer to this and couldn't find a clear answer.
18:09:42Mateon1 quits [Ping timeout: 245 seconds]
18:09:55Mateon1 joins
18:20:40bonga quits [Ping timeout: 265 seconds]
18:20:43bonga joins
18:23:19<@JAA>drexler: Well, there isn't a clear answer. Although there are exceptions, their servers and/or network are constantly under high load. So if you throw anything significant at it, it will have an impact on others. How much do you want to download?
18:40:46<thuban>in re factwire: main site is behind cloudflare; facebook and instagram are subject to heavy rate limiting and can't practically be scraped; twitter is done (archivebot job id cxbpyyoiqc754i0ync7rq885y); youtube is done; mewe appears to require login
18:49:07<drexler>JAA, I'm not in a hurry, but I'd like to be able to grab all of stuff like Rave Archive
18:49:19<drexler>So I'm willing to spread the download out over weeks/months
18:55:38<@JAA>drexler: Right. Well, the longer you spread it out, the better, basically. But all of collection:ravearchive is only 289 GiB, so that won't impact things for long even if you pull it very fast.
19:07:04HP_Archivist (HP_Archivist) joins
19:09:23<drexler>JAA, alright, thanks.
19:10:16HP_Archivist quits [Client Quit]
19:18:17<Ryz>Heya folks, is there a way to extract the download links from stuff like https://www.futuremanagers.com/download/ and https://www.futuremanagers.com/exams/ ? ArchiveBot can't pick it up since it's behind JS, and holy shit is this tedious trying to just fetch the links manually because of the weird way these are set up... I would like to have them
19:18:17<Ryz>processed into a text file so I can run it via AB
19:18:59<drexler>If the JS is too heavy, use selenium or something.
19:33:11HP_Archivist (HP_Archivist) joins
19:40:17bonga quits [Ping timeout: 265 seconds]
19:41:35bonga joins
19:43:48HP_Archivist quits [Client Quit]
19:46:16pie_ quits [Remote host closed the connection]
20:31:47Discant quits [Ping timeout: 245 seconds]
20:33:09gurrs joins
20:33:30<gurrs>is anybody still here?
20:34:16gurrs leaves
20:35:14<TheTechRobo>Ryz: The links are in the page source, I don't know what you're talking about there. But everything links to a 404 for me anyway.
20:35:39<TheTechRobo>gurrs: If you're reading logs, don't ask to ask, just ask
20:41:48<Ryz>That's the problem TheTechRobo; it opens up more folders and files within the same page, but opening it on it's own just gives 404s
20:42:09jacobk quits [Ping timeout: 265 seconds]
20:42:11<thuban>Ryz: in cases like this, you basically have three options:
20:42:24<TheTechRobo>Oh, you click the folder icon, not the name.
20:42:28<TheTechRobo>Kind of counter-intuitive imo
20:42:32<thuban>(1) reverse-engineering the ajax to write a scraper,
20:43:41<thuban>(2) scripting a browser automation tool (like selenium) to do the necessary interaction, then saving and processing the page state,
20:44:38<thuban>or (3) doing the necessary interaction manually, then saving and processing the page state (i believe most browsers have a menu item for this but you can save the html from the page inspector if all else fails).
20:45:02<thuban>which one is the most work will depend on your skills and the details of the particular site
20:45:15<thuban>but i'm running (1) for you right now, stay tuned...
20:45:47<TheTechRobo>thuban: oh, i was going to create a scraper!
20:46:00<TheTechRobo>do you want me to send you some findings?
20:47:22<@arkiver>do we know if the AB job will be enough for Unity Answers?
20:47:33<thuban>TheTechRobo: seems redundant as mine's already done, but maybe as a double-check?
20:47:35LeGoupil quits [Client Quit]
20:47:38pie_ joins
20:47:47<TheTechRobo>thuban: yours is already done? wow
20:48:12<TheTechRobo>Nevermind then. I thought you'd just started.
20:54:44<thuban>Ryz: https://transfer.archivete.am/Lfrvh/download.txt ("exams" coming up next)
20:59:56TheTechRobo quits [Remote host closed the connection]
21:01:34<thuban>Ryz: https://transfer.archivete.am/h6vo1/exams.txt
21:02:19TheTechRobo (TheTechRobo) joins
21:02:54TheTechRobo quits [Remote host closed the connection]
21:03:39<thuban>these are _just_ the files; the ajax calls are all post and wouldn't play back in the wbm anyway.
21:05:04TheTechRobo (TheTechRobo) joins
21:17:43Trent joins
21:39:45TheTechRobo quits [Remote host closed the connection]
21:40:08TheTechRobo (TheTechRobo) joins
21:43:21spirit quits [Quit: Leaving]
21:50:27<Ryz>Hmm s:
22:00:15Trent quits [Remote host closed the connection]
22:00:28Sluggs quits [Ping timeout: 276 seconds]
22:01:30Sluggs joins
22:01:47bonga quits [Ping timeout: 245 seconds]
22:02:06bonga joins
22:19:18lukash7 quits [Remote host closed the connection]
22:38:12lukash7 joins
22:47:14lukash7 quits [Client Quit]
23:02:20lukash7 joins
23:13:52michaelblob quits [Ping timeout: 245 seconds]
23:25:24lukash7 quits [Client Quit]
23:43:11michaelblob (michaelblob) joins
23:55:04lukash7 joins
23:57:54lukash7 quits [Client Quit]