| 00:29:20 | <Ryz> | Heya folks, I need help determining whether http://schreibfabrik.de/spielenacht/ is found from http://schreibfabrik.de/ - if not, have to do two archives of it on AB instead of one; I tried to find even a single link from searching those pages a bit harder, but nope |
| 00:29:34 | <Ryz> | I can maybe archive it as http://www.schreibfabrik.de/spielenacht but I have no idea it it'll find a link to http://www.schreibfabrik.de/ then |
| 00:30:32 | <Ryz> | ...Oh, oooooh, it would have to be 3, because archiving http://schreibfabrik.de/spielenacht/ would not get into the photos and videos sections in http://schreibfabrik.de/spielenacht/rueckblick.php - since the URLs are under something like http://www.schreibfabrik.de/img/spielenacht2019/ (and I can't archive it as http://www.schreibfabrik.de/img/ .. |
| 00:30:32 | <Ryz> | .uuuuuugh) |
| 00:31:10 | | enterprisey joins |
| 00:33:35 | <enterprisey> | when will the Yahoo Answers project start, or is there a way I can be notified when it does? |
| 00:35:16 | <thuban> | enterprisey: watch this channel (or if you're using a warrior, just set it to "archiveteam's choice"; it will be switched over automatically) |
| 00:35:45 | <enterprisey> | sounds good, thanks! |
| 00:37:39 | <enterprisey> | I looked in the FAQs for anything on the legal implications, if any, of running the warrior, but couldn't find any |
| 00:39:51 | <@JAA> | Ryz: It's linked on http://schreibfabrik.de/leipzig.php |
| 00:39:59 | <thuban> | ^ whoops, forgot which channel i was in. you should watch #noanswers for the yahoo answers project |
| 00:41:27 | <Ryz> | Thank you JAA~ Meanwhile I'm building up a small list of websites associated with http://schreibfabrik.de/ or linked from http://schreibfabrik.de/ that I'll be archiving~ |
| 00:54:02 | | enterprisey quits [Remote host closed the connection] |
| 01:00:48 | <Ryz> | Oh goodness, sometimes going through the backlog of links to archive is basically hard mode on going through potlinks x_x; |
| 01:03:18 | | dm4v quits [Ping timeout: 250 seconds] |
| 01:05:57 | | dm4v joins |
| 01:05:59 | | dm4v is now authenticated as dm4v |
| 01:05:59 | | dm4v quits [Changing host] |
| 01:05:59 | | dm4v (dm4v) joins |
| 01:10:46 | | qw3rty__ quits [Read error: Connection reset by peer] |
| 01:24:48 | | lunik1 quits [Read error: Connection reset by peer] |
| 01:25:08 | | lunik1 joins |
| 01:25:14 | | lunik1 quits [Client Quit] |
| 01:28:10 | | lunik1 joins |
| 01:33:06 | | nico_32_ quits [Ping timeout: 244 seconds] |
| 01:33:14 | | nico_32 joins |
| 01:42:49 | | Mineroboter joins |
| 01:45:12 | | Mineroboter_ quits [Ping timeout: 258 seconds] |
| 01:52:10 | <@OrIdow6> | tech234a: Looks like it's been changed to "Archiveteam" now |
| 01:52:40 | <tech234a> | Yeah |
| 02:41:08 | | eyo quits [Quit: WeeChat 2.9] |
| 02:41:38 | | eyo (eyo) joins |
| 02:43:04 | | Ryz quits [Remote host closed the connection] |
| 02:43:53 | | Ryz (Ryz) joins |
| 03:42:56 | | sec^nd quits [Remote host closed the connection] |
| 03:43:20 | | sec^nd (second) joins |
| 03:50:04 | | etnguyen03 quits [Client Quit] |
| 03:55:17 | <Ryz> | Need help whether or not https://www.lysator.liu.se/~celeborn/sync/ can be found from https://www.lysator.liu.se/~celeborn/ - there's 5 pages initially but I can't seem to find it at all; I see the phrase 'sync' a bunch but never the link x_x; |
| 04:00:33 | <Ryz> | I may need a better way to find those links rather having to do manually, and ideally without asking other people for help if possible <_>; |
| 04:04:42 | <thuban> | Ryz: is there some reason you can't just run the parent site, then check the archivebot logs and run the child site if it didn't get hit? ab doesn't ascend to parent dirs, so there's no risk of duplication |
| 04:07:12 | <Ryz> | If it's a small website, maybe, but if running a much larger website, that sounds like a waste of archiving resources |
| 04:08:33 | <thuban> | perhaps i've misunderstood; what exactly is the goal here? |
| 04:10:11 | <Ryz> | Okay, I wanna archive https://www.lysator.liu.se/~celeborn/ - but I'm not sure if https://www.lysator.liu.se/~celeborn/sync/ can be found while doing AB; if it's not possible, I would have to archive both of those separately S: |
| 04:11:33 | <Ryz> | I'm not sure if trying to find it through search engines would be sufficient or maybe I'm doing that wrong |
| 04:12:35 | <thuban> | so you want to archive both of them anyway? i don't see how what i've suggested would be a waste then |
| 04:14:12 | <Ryz> | It's more of crystallize maintaining a good habit; like would I do something like that if there's a section that's 1 TiB worth of content? If say I archive both links, that would just be 1 TiB waste of duplicate data |
| 04:14:52 | <thuban> | which is why you would check the logs from the parent site, to see whether you got the child site already |
| 04:16:45 | <Ryz> | I would consider that too much of a delay, especially for my tastes~ |
| 04:18:27 | <@OrIdow6> | In any case you're basically going to have the same process |
| 04:18:48 | <@JAA> | That is the proper way, though it won't work if there are links from that path to other sections of the site that aren't linked elsewhere. All hope is lost in that case without some AB dev work. |
| 04:19:00 | <@OrIdow6> | There's no general way to discover whether the child is linked other than by spidering the site |
| 04:43:26 | <purplebot> | ArchiveTeam Warrior edited by Tech234a (+51, Share Docker socket to Warrior …) just now -- https://www.archiveteam.org/?diff=46511&oldid=46510 |
| 04:44:28 | <@JAA> | ^ That sounds like a horrible idea, but I don't know enough about Docker to be sure. |
| 05:04:51 | | Jonboy345 joins |
| 05:07:13 | | jonboy3452 quits [Ping timeout: 258 seconds] |
| 05:09:12 | | Gaelan quits [Quit: ZNC 1.8.2 - https://znc.in] |
| 05:09:44 | | Gaelan (Gaelan) joins |
| 05:10:30 | | fuzzy802 joins |
| 05:10:30 | | fuzzy8021 quits [Killed (NickServ (GHOST command used by fuzzy802!~fuzzy8021@173-224-26-244.ptcnet.net))] |
| 05:10:32 | | fuzzy802 is now known as fuzzy8021 |
| 05:10:32 | | fuzzy8021 is now authenticated as fuzzy8021 |
| 05:10:32 | | fuzzy8021 quits [Changing host] |
| 05:10:32 | | fuzzy8021 (fuzzy8021) joins |
| 05:10:34 | | snowpanda joins |
| 05:11:07 | <snowpanda> | Hi, I'm curious about why archive.org allows the archiveteam to upload WARCs to archive.org - from what I understand the archiveteam uses software that runs on a user's machine to scrape |
| 05:11:18 | <snowpanda> | Doesn't this mean that a user could potentially manipulate/edit the page captures? |
| 05:12:23 | <snowpanda> | Ie, couldn't a malicious member of the archiveteam manipulate pages before they are uploaded to the Wayback Machine? Ie, this could result in the Wayback Machine displaying a manipulated/edited page for a given URL |
| 05:34:37 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 05:38:34 | | sec^nd quits [Ping timeout: 255 seconds] |
| 05:38:57 | | Ajay1 joins |
| 05:39:48 | | Ajay quits [Ping timeout: 258 seconds] |
| 05:49:34 | | fuzzy802 joins |
| 05:49:34 | | fuzzy8021 quits [Killed (NickServ (GHOST command used by fuzzy802!~fuzzy8021@173-224-26-244.ptcnet.net))] |
| 05:49:38 | | fuzzy802 is now known as fuzzy8021 |
| 05:49:42 | | fuzzy8021 is now authenticated as fuzzy8021 |
| 05:49:42 | | fuzzy8021 quits [Changing host] |
| 05:49:42 | | fuzzy8021 (fuzzy8021) joins |
| 05:49:43 | | fuzzy8021 quits [Excess Flood] |
| 05:50:03 | | fuzzy8021 joins |
| 05:50:03 | | fuzzy8021 is now authenticated as fuzzy8021 |
| 05:50:03 | | fuzzy8021 quits [Changing host] |
| 05:50:03 | | fuzzy8021 (fuzzy8021) joins |
| 05:52:11 | | rynomad joins |
| 05:52:36 | <snowpanda> | Anyone have any idea about my questions? |
| 05:59:26 | <purplebot> | Google Poly edited by CosmicCoyote (+43) just now -- https://www.archiveteam.org/?diff=46512&oldid=46041 |
| 05:59:26 | <purplebot> | Reddit edited by S-crypt (+33, DoV banned) just now -- https://www.archiveteam.org/?diff=46513&oldid=45215 |
| 06:02:51 | | snowpanda quits [Remote host closed the connection] |
| 06:34:26 | | rynomad quits [Remote host closed the connection] |
| 06:46:08 | <tech234a> | JAA: fair enough, I wanted to get it in the instructions so it could be used in the future. Watchtower also works by bindmounting the Docker socket, which gave me the idea. The Docker container would only be stopped using this method on the Warrior when the shut down button is pressed in the web UI and this socket has been shared with the container. (Without this, the container would immediately be restarted when it shuts down.) |
| 07:09:45 | <tech234a> | that said, if someone thinks the socket mount should be removed, feel free to do so |
| 07:30:34 | | hooway joins |
| 07:38:00 | | Arcorann (Arcorann) joins |
| 07:42:26 | <purplebot> | ArchiveTeam Warrior edited by Tech234a (-51, Reverting) just now -- https://www.archiveteam.org/?diff=46514&oldid=46511 |
| 07:42:30 | <tech234a> | I decided to take it out for now |
| 08:03:40 | <mgrandi> | @snowpanda yeah, that is a risk yes |
| 09:00:25 | <purplebot> | URLTeam edited by Aarchi (+0, Fix title capitalization) just now -- https://www.archiveteam.org/?diff=46515&oldid=46480 |
| 09:56:28 | | ThreeHea1 (ThreeHeadedMonkey) joins |
| 09:56:44 | | ThreeHeadedMonkey quits [Ping timeout: 250 seconds] |
| 10:43:21 | | AlsoHP_Archivist quits [Read error: Connection reset by peer] |
| 10:44:06 | | AlsoHP_Archivist joins |
| 10:48:41 | | ThreeHea1 is now known as ThreeHeadedMonkey |
| 11:09:39 | | LeGoupil joins |
| 11:12:28 | | Arcorann_ joins |
| 11:15:36 | | Arcorann quits [Ping timeout: 250 seconds] |
| 11:51:10 | | Zopolis4 (Zopolis4) joins |
| 11:51:43 | <Zopolis4> | is it possible to incorporate a direct dump of a server into the wbm? |
| 11:56:37 | | Zopolis4 quits [Remote host closed the connection] |
| 11:58:53 | | Zopolis4 (Zopolis4) joins |
| 12:02:12 | | murmur quits [Read error: Connection reset by peer] |
| 12:07:13 | | murmur joins |
| 12:33:10 | | katocala quits [Ping timeout: 250 seconds] |
| 12:33:27 | <@OrIdow6> | Zopolis4: Not that I know of, though I don't see why you can't run a crawler on the same machine as (or close on the network to) the site |
| 12:35:43 | | yawkat quits [Ping timeout: 258 seconds] |
| 12:40:50 | <Zopolis4> | because the site is dead |
| 12:40:56 | <Zopolis4> | but we do have a server dump |
| 12:41:01 | <Zopolis4> | well once it gets released |
| 12:43:56 | <@OrIdow6> | I think that once ArchiveTeam took such a dump, restored the site, and then crawled it, at some point before I got here |
| 12:50:48 | | Zopolis4 quits [Remote host closed the connection] |
| 12:56:43 | | Zopolis4 (Zopolis4) joins |
| 12:56:47 | | Zopolis4 quits [Remote host closed the connection] |
| 12:59:55 | | katocala joins |
| 13:01:47 | | Zopolis4 (Zopolis4) joins |
| 13:01:51 | <Zopolis4> | seems a bit of a blunt solution but ittl work |
| 13:02:01 | | etnguyen03 (etnguyen03) joins |
| 13:02:23 | | katocala is now authenticated as katocala |
| 13:17:56 | | yawkat (yawkat) joins |
| 13:30:45 | | brgtt2 joins |
| 13:45:37 | | spirit joins |
| 14:07:43 | | katocala quits [Ping timeout: 258 seconds] |
| 14:07:59 | | katocala joins |
| 14:08:06 | | katocala is now authenticated as katocala |
| 14:14:54 | | superkuh__ joins |
| 14:14:57 | | superkuh_ quits [Read error: Connection reset by peer] |
| 14:15:30 | | atphoenix_ (atphoenix) joins |
| 14:17:41 | | atphoenix quits [Ping timeout: 258 seconds] |
| 14:25:29 | | Viniter (Viniter) joins |
| 14:31:56 | | Viniter_ joins |
| 14:33:47 | | Viniter_ quits [Client Quit] |
| 14:35:22 | | Viniter quits [Ping timeout: 250 seconds] |
| 14:37:19 | | Viniter (Viniter) joins |
| 14:47:25 | <purplebot> | File:Yahooanswers logo.png overwritten by Arkiver (+0) just now -- https://www.archiveteam.org/?diff=46516&oldid=0 |
| 15:13:16 | | katocala quits [Ping timeout: 258 seconds] |
| 15:14:01 | | katocala joins |
| 15:18:05 | | s-crypt quits [Remote host closed the connection] |
| 15:18:05 | | kiska quits [Remote host closed the connection] |
| 15:18:05 | | flashfire42 quits [Remote host closed the connection] |
| 15:21:41 | | Mateon1 quits [Remote host closed the connection] |
| 15:22:00 | | Mateon1 joins |
| 15:22:21 | | brgtt2 quits [Read error: Connection reset by peer] |
| 15:22:43 | | brgtt2 joins |
| 15:23:38 | | brgtt2 quits [Client Quit] |
| 16:04:50 | | DogsRNice (Webuser299) joins |
| 16:06:03 | | Arcorann__ joins |
| 16:09:37 | | Arcorann_ quits [Ping timeout: 258 seconds] |
| 16:15:22 | | Arcorann__ quits [Ping timeout: 258 seconds] |
| 16:30:05 | | Zopolis4 quits [Ping timeout: 244 seconds] |
| 16:41:34 | | sec^nd (second) joins |
| 16:53:03 | | sec^nd quits [Remote host closed the connection] |
| 16:54:45 | | sec^nd (second) joins |
| 17:06:08 | | sec^nd quits [Remote host closed the connection] |
| 17:07:25 | | sec^nd (second) joins |
| 17:12:49 | | sec^nd quits [Remote host closed the connection] |
| 17:14:07 | | sec^nd (second) joins |
| 17:36:54 | | snowpanda joins |
| 17:37:23 | <snowpanda> | Hi, anyone have an answer for my question for why archive.org allows archive-team to upload WARC page captures to the Wayback Machine? |
| 17:37:50 | <@EggplantN> | They allow anyone afaik? |
| 17:37:57 | <snowpanda> | Doesn't this introduce the risk that a malicious archiveteam member could edit a page capture and then upload it, thus creating edited/incorrect page history? |
| 17:38:10 | <@EggplantN> | Yes it does and if that happens we can take action |
| 17:38:25 | <snowpanda> | I don't think they allow anyone, otherwise the Wayback Machine is not reliable as an archive... |
| 17:38:37 | <snowpanda> | But how would you detect if it happens? |
| 17:39:04 | <snowpanda> | If they allowed anyone to upload then there could be tons of manipulated / edited pages in the archive |
| 17:39:40 | <AK> | Theoretically there could be yes, but there is a level of error checking and trust |
| 17:39:50 | <@EggplantN> | We trust each other. There is only 3 people who handle the upload process for warrior projects. I trust the other 2 and the other 2 trust me |
| 17:39:55 | <@EggplantN> | I have better things to do with my time |
| 17:39:58 | <@EggplantN> | As do both of them |
| 17:40:23 | <snowpanda> | I see, I thought anyone could download the archiveteam software and run uploads? |
| 17:40:34 | <Sanqui> | In general there is no way to way to protect against a malicious actor anyway |
| 17:40:41 | <@EggplantN> | To our targets yes. |
| 17:41:02 | <@EggplantN> | Nobody uploads directly to the IA on warrior based projects |
| 17:41:14 | <snowpanda> | Ah I see, that makes sense then |
| 17:41:51 | <snowpanda> | Sanqui not sure what you mean by that. As far as I'm aware, the Wayback Machine does not put user uploaded warcs from random people into the archive |
| 17:41:54 | <jodizzle> | For clarity: anyone can upload WARCs to IA, but only WARCs from whitelisted accounts make it into the WBM |
| 17:42:11 | <snowpanda> | Because of the possible lack of authenticity concern |
| 17:42:25 | <snowpanda> | jodizzle: Okay, that makes sense |
| 17:43:32 | <@JAA> | There was a bug with that whitelisting a few years ago which led to all uploaded WARCs being included in the WBM. And guess what, there was manipulated stuff in there, and that's how the bug was discovered. |
| 17:44:26 | <snowpanda> | Interesting, good to know there are safeguards. The archive wouldn't be quite as useful if I couldn't have a good amount of trust in the contents |
| 17:45:49 | <jodizzle> | There was a bit of discussion a while back about having the Warrior process do some more evaluation of the content sent to the targets. There was a separate, non-AT project for saving youtube annotations that had a "trust" system based on saving annotations multiple times and people sending similar-looking annotations. |
| 17:46:58 | <jodizzle> | As a method for building trust, I mean. If people sent annotations that matched what other people were sending, their worker built trust. |
| 17:47:23 | <snowpanda> | Yeah, that sounds like a useful check |
| 17:47:24 | <jodizzle> | In principle something like that could be done for the Warriors too, but it would be work |
| 17:48:05 | <snowpanda> | Of course I wouldn't publicize whatever security mechanism you choose to use :) makes it easier for people to get around it |
| 17:49:15 | <jodizzle> | JAA: Is there any insight on how content was manipulated? I'm curious. |
| 17:53:24 | | kiskaWeebChat quits [Ping timeout: 250 seconds] |
| 17:54:02 | <@JAA> | jodizzle: I have no idea. The above is essentially all I know. 'We found some manipulated content, which shouldn't have entered the WBM, and now it's fixed' or similar. |
| 18:18:27 | | dm4v_ joins |
| 18:19:11 | | dm4v quits [Ping timeout: 258 seconds] |
| 18:19:11 | | dm4v_ is now known as dm4v |
| 18:19:11 | | dm4v is now authenticated as dm4v |
| 18:19:11 | | dm4v quits [Changing host] |
| 18:19:11 | | dm4v (dm4v) joins |
| 18:23:08 | <tech234a> | Perhaps another option would be to somehow include HTTPS verification data in the WARCs (probably not in the current standard though) |
| 18:23:54 | <Sanqui> | tech234a: there is no such verification data possible with https |
| 18:23:54 | <@JAA> | That's impossible. |
| 18:24:36 | <tech234a> | Got it. Would have assumed somehow the certificates would sign the page... |
| 18:24:50 | <AK> | The biggest downside to the method of having multiple people request the same thing is the performance and size. That turns the 25TB I've done of urls into 50 or 75 if we do 2 or 3 downloads to check |
| 18:24:58 | <AK> | On some projects there's barely time to get everything once |
| 18:25:02 | <AK> | Let alone 2/3 times |
| 18:25:31 | <snowpanda> | AK: I see the Wayback Machine does display the source of the capture though, so I can tell if something is from a Wayback crawl or from another collection |
| 18:25:34 | <Sanqui> | tech234a: nope. HTTPS, or rather TLS, does not provide non-repudiation by design |
| 18:26:07 | <AK> | snowpanda: yep it should be clear if it was us or someone else |
| 18:26:10 | <Sanqui> | (I've looked into the same possibility) |
| 18:27:10 | <tech234a> | Interesting |
| 18:28:17 | <@OrIdow6> | (From last time this came up) Part of the problem with comparing different warrior results is that pages change |
| 18:28:39 | <AK> | Or get served from different places |
| 18:28:46 | <AK> | (Anycast dns or a cdn) |
| 18:28:52 | <@OrIdow6> | Random tokens generated server-side, some sort of multithreaded page generation that rearranges the order of things, A/B testing |
| 18:29:26 | <@OrIdow6> | So you would need to do a large amount of work to define what counts as equivalent |
| 18:29:56 | <snowpanda> | Hmm yeah, I guess for being confident about reliability the best way is maybe still to just check the source of the capture |
| 18:30:32 | <@OrIdow6> | (Not an exact quote of what I wrote last time, but I think that was basically it) |
| 18:30:38 | <snowpanda> | I'm at least pretty confident that page captures in the Wayback Machine that were from Wayback crawls or the "save page now" feature don't have the risk of user manipulation |
| 18:31:11 | <snowpanda> | Or at least, shouldn't have the risk of user manipulation as long as the Wayback Machine's software systems don't have security holes :) |
| 18:32:40 | <AK> | Fairly certain they use an identifiable user agent |
| 18:32:46 | <AK> | So that's still not guaranteed |
| 18:33:26 | <snowpanda> | AK: Why would an identifiable user agent introduce risk of user manipulation? |
| 18:33:40 | <snowpanda> | Oh I guess from the website owner themselves is what you're saying |
| 18:33:56 | <AK> | Yeah from the website owner |
| 18:34:08 | <snowpanda> | Ie, the website owner could serve different content based on the user agent. I was talking about user manipulation from a third party with no control over the website |
| 18:37:07 | <AK> | But the way AT works means there is an element of trust |
| 18:37:10 | <AK> | And sometimes things get weird |
| 18:37:31 | <AK> | We archive the location that the dns gives us at the time |
| 18:37:50 | <AK> | If dns returns 0.0.0.0 because the new domain owner is weird, we will archive whatever website your local ip returns |
| 18:37:51 | <AK> | https://web.archive.org/web/20210214121720/https://rapidshare.com/ |
| 18:37:58 | <AK> | In this case, my "Oh hello" page from nginx |
| 18:38:35 | <@JAA> | Connecting to 0.0.0.0 should fail though. You mean 127.0.0.0/8? |
| 18:38:47 | <AK> | I thought it was returning 0.0.0.0 at the time, lemme check irc logs again |
| 18:39:19 | <AK> | rapidshare.com returns 0.0.0.0 |
| 18:39:29 | <@JAA> | Mhm |
| 18:39:34 | <@JAA> | 0.0.0.0 is non-routable though. |
| 18:39:36 | <AK> | Which for whatever reason meant the pipeline requested against the nginx on the host |
| 18:40:02 | <@JAA> | That sounds like something's very broken then. |
| 18:40:39 | <@OrIdow6> | Looks like that is how wget acts |
| 18:41:07 | <AK> | I think on Ubuntu 0.0.0.0 can refer to default route |
| 18:41:16 | <AK> | Curl returns my website if I "curl rapidshare.com" |
| 18:41:20 | <@OrIdow6> | Oh, could be the OS too, I'm on Debian |
| 18:41:29 | <snowpanda> | AT: Hmm but for Wayback Machine's crawls and "save page now" features, it should be secure from third parties who don't have control over the website right? |
| 18:41:35 | <@OrIdow6> | Yeah, looks like it |
| 18:42:14 | <AK> | snowpanda: Yep it should be, I tend to trust wayback machine and check a couple of crawls before+after to confirm whether something changed or it stayed the same |
| 18:42:48 | <@JAA> | tech234a: TLS establishes a shared symmetric key between client and server. There is no asymmetric signature or similar, so the client can manipulate it freely. Or rather, the client can't possibly prove to someone else that the data wasn't modified. |
| 18:43:13 | <@JAA> | (This is true even for AES-GCM cipher suites etc. They don't authenticate the contents.) |
| 18:44:14 | <@JAA> | This is another fun one due to 'search example.org' lines in resolv.conf: https://web.archive.org/web/*/http://www/ |
| 18:46:05 | <yano> | heh, fun |
| 18:47:34 | <@OrIdow6> | You're connected to Internet, please refresh your page. |
| 18:50:04 | <@OrIdow6> | (From someone running #//) |
| 18:51:19 | | snowpanda quits [Remote host closed the connection] |
| 18:53:02 | <AK> | For about 5 seconds I did contemplate changing my "Oh Hello" to a joke for people looking back later. But I decided that wasn't helpful |
| 18:53:32 | <@OrIdow6> | Realistically, I think maybe something should be added to prevent this in projects |
| 18:53:53 | <@OrIdow6> | Not sure how, though, there are several things going on that cause this |
| 18:54:20 | <tech234a> | Start providing our own DNS server? |
| 18:55:02 | | Daloader_ joins |
| 18:55:16 | <AK> | You'll get ddos'd I think when we spin up large |
| 18:55:33 | <tech234a> | True |
| 18:55:37 | <AK> | I think the best option is gonna be having the workers/warriors check what the domain resolves to |
| 18:55:44 | <@OrIdow6> | Though I suppose it doesn't do much damage as long as it's confined to "weird" URLs |
| 18:55:58 | <AK> | And cancelling if it resolves to a local domain, or to something else |
| 18:56:05 | <@OrIdow6> | (I've seen this in the "real" WBM crawls, too) |
| 18:56:26 | <@JAA> | tech234a: Own DNS server doesn't really fix that problem I mentioned. We'd need to implement the lookups directly in wget or whatever. |
| 18:57:02 | <tech234a> | Hmm |
| 19:01:28 | | snowpanda joins |
| 19:04:35 | | snowpanda quits [Remote host closed the connection] |
| 19:58:23 | | LeGoupil quits [Client Quit] |
| 20:00:41 | | katocala is now authenticated as katocala |
| 20:15:31 | | AlsoIDK quits [Remote host closed the connection] |
| 20:16:24 | | thuban quits [Ping timeout: 250 seconds] |
| 20:18:19 | | thuban joins |
| 20:22:54 | | Daloader_ quits [Ping timeout: 250 seconds] |
| 20:27:24 | | jtagcat quits [Quit: Bye!] |
| 20:47:26 | <purplebot> | ArchiveTeam Warrior edited by Tech234a (-12, Change restart policy for Warrior …) just now -- https://www.archiveteam.org/?diff=46517&oldid=46514 |
| 20:47:55 | | rsn quits [Ping timeout: 258 seconds] |
| 20:48:09 | <tech234a> | ^ that solves the immediate reboot problem when shutting down using the web interface |
| 20:48:25 | <tech234a> | thanks Fu sl |
| 20:50:54 | <thuban> | does shutdown of the host count as 'failure'? |
| 20:52:22 | <tech234a> | Yes, so it will restart the container on restart (I tested by using the restart Docker option on Windows) |
| 20:52:38 | <thuban> | ah, cool |
| 20:53:12 | | jtagcat (jtagcat) joins |
| 20:53:22 | | rsn joins |
| 21:07:20 | | marked quits [Remote host closed the connection] |
| 21:07:47 | | marked joins |
| 21:08:26 | | DogsRNice_ (Webuser299) joins |
| 21:09:03 | | s-crypt (s-crypt) joins |
| 21:09:12 | | flashfire42 (flashfire42) joins |
| 21:09:23 | | DogsRNice quits [Ping timeout: 258 seconds] |
| 21:09:47 | | kiska (kiska) joins |
| 21:20:23 | <SketchTheCow> | Hey Jason, |
| 21:20:23 | <SketchTheCow> | My company is working on setting up a kind of "crowdfunded X-Prize" product, and would like to use it to drum up support and incentive for archivists working on saving Yahoo Answers data before it gets blinkered out. In researching the situation, I found archiveteam, and realized that y'all have a substantial headstart in winning any eventual prize-pool. I think there may be an opportunity to |
| 21:20:29 | <SketchTheCow> | work together such that we can maximize the net percentage of archiving done by the deadline. |
| 21:20:32 | <SketchTheCow> | Would you, or someone else running point on the Yahoo project, be open to a quick chat sometime this week? |
| 21:20:35 | <SketchTheCow> | Best, |
| 21:20:38 | <SketchTheCow> | -Ryan |
| 21:20:40 | <SketchTheCow> | .... I'll be declining |
| 21:22:15 | <@EggplantN> | Free money SketchTheCow :P |
| 21:22:16 | <@EggplantN> | /s |
| 21:23:02 | <Barto> | ask him to donate to archive.org instead. |
| 21:26:36 | | Viniter quits [Ping timeout: 250 seconds] |
| 21:43:04 | | rsn_ joins |
| 21:43:56 | | rsn quits [Ping timeout: 250 seconds] |
| 21:47:11 | | LeighR (LeighR) joins |
| 21:50:55 | | Eighty quits [Remote host closed the connection] |
| 22:01:20 | <SketchTheCow> | I literally did that. |
| 22:01:29 | <SketchTheCow> | Like, that was my actual response |
| 22:01:36 | <SketchTheCow> | So I am glad we're all in lockstep |
| 22:01:38 | | rsn joins |
| 22:02:50 | <Ajay1> | they sent the same message in the yahoo answers IRC channel a few days ago |
| 22:03:49 | | rsn_ quits [Ping timeout: 258 seconds] |
| 22:05:19 | | rsn_ joins |
| 22:07:45 | | rsn quits [Ping timeout: 250 seconds] |
| 22:09:23 | | rsn joins |
| 22:11:14 | | rsn_ quits [Ping timeout: 250 seconds] |
| 22:13:47 | | Wayward quits [Ping timeout: 258 seconds] |
| 22:17:18 | | AlsoHP_Archivist quits [Ping timeout: 250 seconds] |
| 22:18:05 | | AlsoHP_Archivist joins |
| 22:19:42 | <tech234a> | FYI It looks like you can save a blank revision to a redirect page to fix the caching problem (literally save it without changing anything, it won't log as a revision but it seems to clear the cache) |
| 22:19:46 | <tech234a> | on the wiki |
| 22:20:31 | <Ajay1> | when I tried that, it didn't allow it to be accepted |
| 22:21:25 | <tech234a> | Is your account manually moderated or automoderated? |
| 22:21:39 | <Ajay1> | manual |
| 22:21:52 | <@JAA> | Yeah, the mod tool doesn't like empty revisions. |
| 22:21:58 | <tech234a> | Yeah it probably worked for me because mine is automoderated |
| 22:22:27 | <@JAA> | Have you tried action=purge? |
| 22:22:40 | <@JAA> | That clears the cache directly. |
| 22:23:26 | <tech234a> | I wasn't aware that existed, good to know |
| 22:28:15 | | LeighR leaves |
| 22:28:26 | <purplebot> | Yahoo! Answers edited by Ajay (+59, Added new tracker and that archiving …) just now -- https://www.archiveteam.org/?diff=46518&oldid=46501 |
| 22:29:25 | <purplebot> | GeoCities edited by C-Nagy (+9, Fixed a few links) just now -- https://www.archiveteam.org/?diff=46519&oldid=45435 |
| 23:05:55 | | AlsoHP_Archivist quits [Ping timeout: 258 seconds] |
| 23:06:17 | | AlsoHP_Archivist joins |
| 23:07:26 | <purplebot> | FTP/List edited by Pokechu22 (+412, consistent whitespace before {{online}}; …) just now -- https://www.archiveteam.org/?diff=46521&oldid=46288 |
| 23:11:52 | | rsn_ joins |
| 23:14:44 | | rsn quits [Ping timeout: 258 seconds] |
| 23:23:26 | <purplebot> | Yahoo! Answers edited by Ajay (-59, Undo revision 46518 by [[Special:Contributions/Ajay|Ajay]] …) 17 minutes ago -- https://www.archiveteam.org/?diff=46520&oldid=46518 |
| 23:45:01 | | hooway quits [Client Quit] |