| 00:05:55 | | Arcorann (Arcorann) joins |
| 01:02:35 | | dm4v_ joins |
| 01:03:55 | | dm4v quits [Ping timeout: 265 seconds] |
| 01:03:55 | | dm4v_ is now known as dm4v |
| 01:03:56 | | dm4v is now authenticated as dm4v |
| 01:03:56 | | dm4v quits [Changing host] |
| 01:03:56 | | dm4v (dm4v) joins |
| 01:10:56 | | internet_od (internet_od) joins |
| 01:13:29 | | dm4v_ joins |
| 01:16:00 | | dm4v quits [Ping timeout: 265 seconds] |
| 01:16:00 | | dm4v_ is now known as dm4v |
| 01:16:01 | | dm4v is now authenticated as dm4v |
| 01:16:01 | | dm4v quits [Changing host] |
| 01:16:01 | | dm4v (dm4v) joins |
| 01:16:18 | | benjinsmith joins |
| 01:16:22 | | benjins quits [Ping timeout: 245 seconds] |
| 01:25:15 | | internet_od quits [Client Quit] |
| 01:26:50 | | benjinsmith is now known as benjins |
| 01:26:52 | | benjins is now authenticated as benjins |
| 01:51:47 | | Discant quits [Ping timeout: 245 seconds] |
| 02:15:41 | | Iki quits [Remote host closed the connection] |
| 02:15:57 | | Iki joins |
| 02:27:32 | | bonga quits [Ping timeout: 265 seconds] |
| 02:29:07 | <h2ibot> | Entartet edited List of websites excluded from the Wayback Machine (+47, Added host.net and theepicentre.com.): https://wiki.archiveteam.org/?diff=48676&oldid=48673 |
| 02:31:07 | <h2ibot> | PolarManne edited 4chan (+659): https://wiki.archiveteam.org/?diff=48677&oldid=48659 |
| 02:36:08 | <h2ibot> | JustAnotherArchivist edited Wikidot (+28, Merge edit by…): https://wiki.archiveteam.org/?diff=48678&oldid=48645 |
| 02:36:41 | <@JAA> | aismallard: ^ Sorry, that took much longer than it should have because there was an edit conflict. |
| 02:37:58 | | bonga joins |
| 02:45:37 | <aismallard> | Cool, thanks |
| 05:43:19 | | Iki1 joins |
| 05:47:09 | | Iki quits [Ping timeout: 265 seconds] |
| 06:00:46 | | BlueMaxima joins |
| 06:24:27 | | endrift|ZNC is now known as endrift |
| 07:20:00 | | lennier1 quits [Client Quit] |
| 07:20:45 | | lennier1 (lennier1) joins |
| 07:38:43 | | Discant joins |
| 07:47:13 | | lennier2 joins |
| 07:49:42 | | lennier1 quits [Ping timeout: 245 seconds] |
| 07:50:02 | | lennier2 is now known as lennier1 |
| 08:05:48 | | savefactwirehk joins |
| 08:06:30 | <savefactwirehk> | Hi (sorry if this is a double post, new to IRC). Factwire, an independent HK media outlet, announced 10 mins ago it will be shut down today. Any way to back it all up before it shuts down in a few hours? Not many articles, probably ~1000, so should be doable. |
| 08:06:39 | <savefactwirehk> | Their shutdown statement: https://www.instagram.com/p/CencgN-vDaZ/?hl=en |
| 08:08:05 | <savefactwirehk> | Website: https://www.factwire.orgFacebook: https://www.facebook.com/factwireTwitter: https://twitter.com/factwirenewsIG: https://www.instagram.com/factwirenews/Youtube: https://www.youtube.com/FactWireVideoMeWe: https://mewe.com/p/factwire |
| 08:08:07 | | lennier1 quits [Read error: Connection reset by peer] |
| 08:08:23 | <savefactwirehk> | Website: https://www.factwire.orgFacebook: https://www.facebook.com/factwireTwitter: https://twitter.com/factwirenewsIG: https://www.instagram.com/factwirenews/Youtube: https://www.youtube.com/FactWireVideoMeWe: https://mewe.com/p/factwire |
| 08:08:25 | | lennier1 (lennier1) joins |
| 08:12:16 | <Doranwen> | I left a msg in the announcements channel and hopefully someone can throw it into ArchiveBot |
| 08:14:21 | | lennier2 joins |
| 08:14:34 | | savefactwirehk62 joins |
| 08:14:38 | | Discant quits [Remote host closed the connection] |
| 08:14:57 | | Discant joins |
| 08:15:51 | <savefactwirehk62> | Hopefully this sends correctly. Factwire webpages:1. Website: https://www.factwire.org***2. Facebook: https://www.facebook.com/factwire***3. Twitter: https://twitter.com/factwirenews***4. IG: https://www.instagram.com/factwirenews/***5. Youtube: https://www.youtube.com/FactWireVideo***6. MeWe: https://mewe.com/p/factwire |
| 08:16:30 | | savefactwirehk quits [Ping timeout: 265 seconds] |
| 08:18:06 | <Doranwen> | savefactwirehk62: you've sent it three times now |
| 08:18:11 | | lennier2_ joins |
| 08:18:21 | <Doranwen> | so it's just waiting for the people who can deal with it to see it |
| 08:18:55 | | lennier1 quits [Ping timeout: 265 seconds] |
| 08:19:01 | | lennier2_ is now known as lennier1 |
| 08:21:47 | | lennier2 quits [Ping timeout: 245 seconds] |
| 08:23:30 | <@Sanqui> | ArchiveBot can't do it :/ |
| 08:28:30 | <savefactwirehk62> | Yeah sorry the formatting shows up weird on my web IRC client every time I send it so tried changing the formatting. i'm an irc noob. |
| 08:47:43 | | pie_ quits [] |
| 08:47:50 | | JackThompson4 joins |
| 08:47:59 | | pie_ joins |
| 08:49:17 | | JackThompson quits [Ping timeout: 245 seconds] |
| 08:49:17 | | JackThompson4 is now known as JackThompson |
| 08:51:47 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 09:02:41 | | Megame (Megame) joins |
| 09:06:08 | | lennier1 quits [Read error: Connection reset by peer] |
| 09:06:26 | | lennier1 (lennier1) joins |
| 09:13:53 | | lennier1 quits [Ping timeout: 265 seconds] |
| 09:17:56 | | savefactwirehk62 quits [Remote host closed the connection] |
| 09:28:00 | | lennier1 (lennier1) joins |
| 09:29:23 | | pie_ quits [Remote host closed the connection] |
| 09:53:51 | | BlueMaxima quits [Client Quit] |
| 11:30:21 | | Discant quits [Remote host closed the connection] |
| 11:30:28 | | Discant joins |
| 11:32:07 | | pie_ joins |
| 11:39:59 | | pie_ quits [Ping timeout: 265 seconds] |
| 11:53:41 | | pie_ joins |
| 11:54:28 | | Mateon1 quits [Quit: Mateon1] |
| 11:55:45 | | Mateon1 joins |
| 12:01:11 | | Mateon1 quits [Remote host closed the connection] |
| 12:01:34 | | Mateon1 joins |
| 12:05:28 | | HackMii quits [Remote host closed the connection] |
| 12:08:02 | | HackMii (hacktheplanet) joins |
| 12:54:42 | | LeGoupil joins |
| 13:30:43 | | pabs quits [Client Quit] |
| 13:36:31 | | Chris5010 quits [Client Quit] |
| 13:37:36 | | driib quits [Client Quit] |
| 13:37:36 | | Matthww quits [Client Quit] |
| 13:37:36 | | JackThompson quits [Client Quit] |
| 13:37:36 | | dm4v quits [Client Quit] |
| 13:37:40 | | dm4v joins |
| 13:37:41 | | dm4v is now authenticated as dm4v |
| 13:37:41 | | dm4v quits [Changing host] |
| 13:37:41 | | dm4v (dm4v) joins |
| 13:37:42 | | driib (driib) joins |
| 13:37:46 | | Matthww joins |
| 13:37:51 | | pabs (pabs) joins |
| 13:37:51 | | JackThompson joins |
| 13:46:35 | | Chris5010 (Chris5010) joins |
| 14:01:22 | | Arcorann quits [Ping timeout: 245 seconds] |
| 14:04:26 | | Megame quits [Client Quit] |
| 14:05:03 | <@arkiver> | checking factwire.org |
| 15:08:22 | | qwertyasdfuiopghjkl joins |
| 17:03:52 | | robbi5 quits [Remote host closed the connection] |
| 17:17:24 | | jdp__ is now known as drexler |
| 17:17:42 | <drexler> | How fast am I allowed to download from Internet Archive? |
| 17:17:52 | <drexler> | I'd like to train on some of the collections but don't want to overload their servers. |
| 17:18:16 | <drexler> | I tried searching for the answer to this and couldn't find a clear answer. |
| 18:09:42 | | Mateon1 quits [Ping timeout: 245 seconds] |
| 18:09:55 | | Mateon1 joins |
| 18:20:40 | | bonga quits [Ping timeout: 265 seconds] |
| 18:20:43 | | bonga joins |
| 18:23:19 | <@JAA> | drexler: Well, there isn't a clear answer. Although there are exceptions, their servers and/or network are constantly under high load. So if you throw anything significant at it, it will have an impact on others. How much do you want to download? |
| 18:40:46 | <thuban> | in re factwire: main site is behind cloudflare; facebook and instagram are subject to heavy rate limiting and can't practically be scraped; twitter is done (archivebot job id cxbpyyoiqc754i0ync7rq885y); youtube is done; mewe appears to require login |
| 18:49:07 | <drexler> | JAA, I'm not in a hurry, but I'd like to be able to grab all of stuff like Rave Archive |
| 18:49:19 | <drexler> | So I'm willing to spread the download out over weeks/months |
| 18:55:38 | <@JAA> | drexler: Right. Well, the longer you spread it out, the better, basically. But all of collection:ravearchive is only 289 GiB, so that won't impact things for long even if you pull it very fast. |
| 19:07:04 | | HP_Archivist (HP_Archivist) joins |
| 19:09:23 | <drexler> | JAA, alright, thanks. |
| 19:10:16 | | HP_Archivist quits [Client Quit] |
| 19:18:17 | <Ryz> | Heya folks, is there a way to extract the download links from stuff like https://www.futuremanagers.com/download/ and https://www.futuremanagers.com/exams/ ? ArchiveBot can't pick it up since it's behind JS, and holy shit is this tedious trying to just fetch the links manually because of the weird way these are set up... I would like to have them |
| 19:18:17 | <Ryz> | processed into a text file so I can run it via AB |
| 19:18:59 | <drexler> | If the JS is too heavy, use selenium or something. |
| 19:33:11 | | HP_Archivist (HP_Archivist) joins |
| 19:40:17 | | bonga quits [Ping timeout: 265 seconds] |
| 19:41:35 | | bonga joins |
| 19:43:48 | | HP_Archivist quits [Client Quit] |
| 19:46:16 | | pie_ quits [Remote host closed the connection] |
| 20:31:47 | | Discant quits [Ping timeout: 245 seconds] |
| 20:33:09 | | gurrs joins |
| 20:33:30 | <gurrs> | is anybody still here? |
| 20:34:16 | | gurrs leaves |
| 20:35:14 | <TheTechRobo> | Ryz: The links are in the page source, I don't know what you're talking about there. But everything links to a 404 for me anyway. |
| 20:35:39 | <TheTechRobo> | gurrs: If you're reading logs, don't ask to ask, just ask |
| 20:41:48 | <Ryz> | That's the problem TheTechRobo; it opens up more folders and files within the same page, but opening it on it's own just gives 404s |
| 20:42:09 | | jacobk quits [Ping timeout: 265 seconds] |
| 20:42:11 | <thuban> | Ryz: in cases like this, you basically have three options: |
| 20:42:24 | <TheTechRobo> | Oh, you click the folder icon, not the name. |
| 20:42:28 | <TheTechRobo> | Kind of counter-intuitive imo |
| 20:42:32 | <thuban> | (1) reverse-engineering the ajax to write a scraper, |
| 20:43:41 | <thuban> | (2) scripting a browser automation tool (like selenium) to do the necessary interaction, then saving and processing the page state, |
| 20:44:38 | <thuban> | or (3) doing the necessary interaction manually, then saving and processing the page state (i believe most browsers have a menu item for this but you can save the html from the page inspector if all else fails). |
| 20:45:02 | <thuban> | which one is the most work will depend on your skills and the details of the particular site |
| 20:45:15 | <thuban> | but i'm running (1) for you right now, stay tuned... |
| 20:45:47 | <TheTechRobo> | thuban: oh, i was going to create a scraper! |
| 20:46:00 | <TheTechRobo> | do you want me to send you some findings? |
| 20:47:22 | <@arkiver> | do we know if the AB job will be enough for Unity Answers? |
| 20:47:33 | <thuban> | TheTechRobo: seems redundant as mine's already done, but maybe as a double-check? |
| 20:47:35 | | LeGoupil quits [Client Quit] |
| 20:47:38 | | pie_ joins |
| 20:47:47 | <TheTechRobo> | thuban: yours is already done? wow |
| 20:48:12 | <TheTechRobo> | Nevermind then. I thought you'd just started. |
| 20:54:44 | <thuban> | Ryz: https://transfer.archivete.am/Lfrvh/download.txt ("exams" coming up next) |
| 20:59:56 | | TheTechRobo quits [Remote host closed the connection] |
| 21:01:34 | <thuban> | Ryz: https://transfer.archivete.am/h6vo1/exams.txt |
| 21:02:19 | | TheTechRobo (TheTechRobo) joins |
| 21:02:54 | | TheTechRobo quits [Remote host closed the connection] |
| 21:03:39 | <thuban> | these are _just_ the files; the ajax calls are all post and wouldn't play back in the wbm anyway. |
| 21:05:04 | | TheTechRobo (TheTechRobo) joins |
| 21:17:43 | | Trent joins |
| 21:39:45 | | TheTechRobo quits [Remote host closed the connection] |
| 21:40:08 | | TheTechRobo (TheTechRobo) joins |
| 21:43:21 | | spirit quits [Quit: Leaving] |
| 21:50:27 | <Ryz> | Hmm s: |
| 22:00:15 | | Trent quits [Remote host closed the connection] |
| 22:00:28 | | Sluggs quits [Ping timeout: 276 seconds] |
| 22:01:30 | | Sluggs joins |
| 22:01:47 | | bonga quits [Ping timeout: 245 seconds] |
| 22:02:06 | | bonga joins |
| 22:19:18 | | lukash7 quits [Remote host closed the connection] |
| 22:38:12 | | lukash7 joins |
| 22:47:14 | | lukash7 quits [Client Quit] |
| 23:02:20 | | lukash7 joins |
| 23:13:52 | | michaelblob quits [Ping timeout: 245 seconds] |
| 23:25:24 | | lukash7 quits [Client Quit] |
| 23:43:11 | | michaelblob (michaelblob) joins |
| 23:55:04 | | lukash7 joins |
| 23:57:54 | | lukash7 quits [Client Quit] |