| 00:13:46 | | BigBrain quits [Ping timeout: 245 seconds] |
| 00:15:40 | | BigBrain (bigbrain) joins |
| 00:26:48 | <tech234a> | https://blog.google/technology/safety-security/updating-our-inactive-account-policies/ |
| 00:36:15 | <pabs> | JAA: the opensource.com job has completed, can you do the grey area cookie for downloads thing? |
| 00:38:39 | <pabs> | the cookie btw, so you don't need to search IRC logs: STYXKEY_Drupal_visitor_gatedemail=osdc-gated-content |
| 00:41:46 | | lennier1 quits [Ping timeout: 252 seconds] |
| 00:44:11 | | lennier1 (lennier1) joins |
| 00:51:53 | <h2ibot> | Jscott edited Current Projects (+0, /* Warrior-based projects */): https://wiki.archiveteam.org/?diff=49788&oldid=49736 |
| 01:06:05 | <fireonlive> | oops |
| 01:08:50 | <SketchCow> | !!!!!!!!!!!!!!!!!!! |
| 01:08:53 | <SketchCow> | I was cast to -bs! |
| 01:08:54 | <SketchCow> | Me! |
| 01:09:31 | <fireonlive> | nah, me |
| 01:09:35 | <fireonlive> | :p |
| 01:10:03 | <fireonlive> | luckily for y'all i should be gone in a few days! |
| 01:10:13 | <Terbium> | we are all confined into the prison known as "-bs" together :P |
| 01:10:25 | <@JAA> | pabs: Ack, setting the extraction up now, though the last data hasn't been uploaded yet it seems. |
| 01:12:21 | | test___ quits [Remote host closed the connection] |
| 01:19:17 | <pabs> | great, thanks! |
| 01:21:24 | | nostalgebraist joins |
| 01:59:13 | <wickedplayer494> | Ah fuck, I really did do 2013 instead of 2023 on that DPReview bullet |
| 01:59:15 | <wickedplayer494> | Good catch, Jason! |
| 02:06:56 | <hlgs|m> | huh. getting a "save page now browser crashed on [url]" error for several urls |
| 02:07:10 | <hlgs|m> | they're all links to files |
| 02:07:33 | | tbc1887 quits [Read error: Connection reset by peer] |
| 02:07:39 | <pabs> | AB !ao < them instead? |
| 02:08:31 | | AlsoTheTechRobo is now known as TheTechRobo |
| 02:08:36 | <@JAA> | #internetarchive for IA/WBM/SPN stuff. |
| 02:08:45 | <hlgs|m> | i could hmm. this was both with the spn scripts and manually |
| 02:08:48 | <hlgs|m> | oh thanks! |
| 02:09:05 | <hlgs|m> | i don't actually know what channels there are <.< |
| 02:10:16 | <@JAA> | Most are documented *somewhere* on our wiki, but yeah, there isn't a simple list. |
| 02:12:56 | <hlgs|m> | (oh never mind, the urls have been saved, the error was just throwing me for a loop) |
| 02:13:01 | <hlgs|m> | (and throwing the spn scripts into a loop) |
| 02:13:08 | <hlgs|m> | (ignore me) |
| 02:33:49 | | nostalgebraist quits [Client Quit] |
| 02:40:58 | | xkey quits [Client Quit] |
| 02:41:21 | | xkey (xkey) joins |
| 02:47:58 | | decky_e joins |
| 02:50:26 | | BigBrain quits [Ping timeout: 245 seconds] |
| 03:27:12 | | decky_e quits [Read error: Connection reset by peer] |
| 03:59:13 | <fireonlive> | thanks ja a |
| 03:59:46 | | jackt1365|m joins |
| 04:00:00 | | aGerman quits [Client Quit] |
| 04:03:08 | | aGerman (aGerman) joins |
| 04:08:22 | | decky_e joins |
| 04:49:58 | | Island quits [Read error: Connection reset by peer] |
| 04:51:24 | | BigBrain (bigbrain) joins |
| 05:04:24 | <@JAA> | pabs: I'm using the presence of a 'gated-form' as the indicator. And there is indeed content outside of /downloads/ and the book which is behind a wall, e.g. https://opensource.com/content/cheat-sheet-gimp |
| 05:05:18 | <pabs> | ah, thanks for checking, that approach sounds good |
| 05:12:26 | | dumbgoy quits [Ping timeout: 265 seconds] |
| 05:57:44 | <h2ibot> | Nemo bis edited Google (+510, /* Vital Signs */ YouTube, Google Docs and…): https://wiki.archiveteam.org/?diff=49789&oldid=46262 |
| 06:17:09 | | Justin[home] is now known as DopefishJustin |
| 06:33:09 | | Minkafighter quits [Quit: The Lounge - https://thelounge.chat] |
| 06:33:44 | | Minkafighter joins |
| 07:17:20 | | that_lurker quits [Client Quit] |
| 07:18:29 | | that_lurker (that_lurker) joins |
| 07:19:56 | | BlueMaxima quits [Client Quit] |
| 07:28:11 | | Arcorann (Arcorann) joins |
| 07:32:52 | | birdjj quits [Client Quit] |
| 07:46:30 | | parfait (kdqep) joins |
| 07:46:59 | | birdjj joins |
| 07:55:19 | | LeGoupil joins |
| 07:57:18 | | birdjj1 joins |
| 07:59:44 | | birdjj quits [Ping timeout: 252 seconds] |
| 07:59:45 | | birdjj1 is now known as birdjj |
| 07:59:59 | | fullpwnmedia joins |
| 08:02:58 | | TastyWiener95 quits [Client Quit] |
| 08:04:37 | | TastyWiener95 (TastyWiener95) joins |
| 08:23:03 | | decky_e quits [Read error: Connection reset by peer] |
| 08:48:12 | | birdjj quits [Client Quit] |
| 08:48:33 | | birdjj joins |
| 09:02:27 | | birdjj quits [Read error: Connection reset by peer] |
| 09:02:31 | | birdjj1 joins |
| 09:05:14 | | birdjj1 quits [Read error: Connection reset by peer] |
| 09:05:16 | | birdjj joins |
| 09:32:58 | | birdjj quits [Read error: Connection reset by peer] |
| 09:33:03 | | Megame (Megame) joins |
| 09:33:15 | | birdjj joins |
| 09:50:49 | | sonick quits [Client Quit] |
| 10:07:31 | | vukky joins |
| 10:26:33 | | threedeeitguy quits [Quit: The Lounge - https://thelounge.chat] |
| 10:51:00 | | Megame quits [Client Quit] |
| 10:55:47 | | threedeeitguy joins |
| 11:36:26 | | threedeeitguy quits [Ping timeout: 252 seconds] |
| 11:54:20 | | vukky quits [Client Quit] |
| 11:54:25 | | vukky joins |
| 11:55:30 | | threedeeitguy joins |
| 12:00:05 | | vukky is now authenticated as vukky |
| 12:00:09 | | vukky quits [Client Quit] |
| 12:00:14 | | vukky (vukky) joins |
| 12:02:49 | | vukky quits [Changing host] |
| 12:02:49 | | vukky (vukky) joins |
| 12:10:21 | | nighthnh099_ joins |
| 12:13:35 | <nighthnh099_> | mentioned this in the archivebot channel, I found a gacha game that will shut down tomorrow, I did some https intercepting essentially to get the urls which I saved in #archivebot, the list is incomplete and also I'm not sure what to do about the apis the game uses |
| 12:14:16 | <nighthnh099_> | the game is Tales of Asteria (jp.co.bandainamcogames.NBGI0197) by the way |
| 12:14:41 | <nighthnh099_> | needs a japanese region google account to download, but I think it will work fine if installed with an apk |
| 12:25:46 | <pabs> | maybe upload the APK to an archive.org item? |
| 12:28:56 | <nighthnh099_> | I was going to do that but my problem would be the game apis and the rest of the game's data |
| 12:29:27 | <nighthnh099_> | for the rest of the data, maybe I should just upload the downloaded data since I'm having a hard time archiving the raw data from the server it downloads from |
| 12:29:53 | <nighthnh099_> | for the game apis, I have a pcap file with some responses |
| 12:44:38 | <pabs> | do the APIs require logins? if not, we could archive the URLs in that pcap |
| 12:44:49 | <pabs> | if they do, then upload the pcap I guess |
| 13:01:46 | | Raya joins |
| 13:03:48 | | threedeeitguy7 joins |
| 13:04:51 | <Raya> | Hello! Hope this is the right place for this. I'm having a hard time using the warrior today. Archiveteam's choice defaults o imgur, but their tasks all give "server returned bad response" and get stuck in loops. I can try install other projects, but no other tasks will start, and I can't shut the warrior down. I have to force-shutdown with "stop immediately" to get it to run any other tasks |
| 13:05:08 | | kiwec joins |
| 13:05:28 | | kiwec leaves |
| 13:07:11 | <nighthnh099_> | pabs: the urls don't need logins but they do need parameters; I already archived the ones I am aware of in that txt file earlier |
| 13:07:51 | <nighthnh099_> | my concern is the stuff that I don't have, since there doesn't seem to be an active effort to archive that game in it's own community |
| 13:09:01 | <masterx244|m> | Raya: #imgone and the 403s are expected due to some imgur fuckery |
| 13:12:25 | | icedice (icedice) joins |
| 13:12:58 | <Raya> | ty! Thought this was a more generic issue, will check in there. Also there's some projects that fail to load and the warrior doesn't do anything about them, should I report that in the specific channels or is it useless/redundant/they know? |
| 13:17:10 | <icedice> | Sanqui Sanqui|m JAA Could you take a look at the ArchiveBot archivation job for Bulbagarden Forums when you have time? iirc it got some error and aborted itself: https://archive.fart.website/archivebot/viewer/job/ckr2m |
| 13:17:54 | <icedice> | It'd be nice to know if it has to be run again before Imgur has wiped everything |
| 13:19:39 | <masterx244|m> | <Raya> "ty! Thought this was a more..." <- Some projects fail in the warrior due to outdated code and only work when running with their dedicated docker image. thats known |
| 13:20:29 | | dumbgoy joins |
| 13:34:37 | | Arcorann quits [Ping timeout: 265 seconds] |
| 13:35:38 | | sonick (sonick) joins |
| 13:39:24 | | todb joins |
| 13:50:35 | | parfait_ joins |
| 13:52:31 | | Pingerfowder quits [Quit: ZNC - https://znc.in] |
| 13:52:39 | | Pingerfowder (Pingerfowder) joins |
| 13:52:42 | | hyvac joins |
| 13:53:46 | | parfait quits [Ping timeout: 252 seconds] |
| 13:55:31 | <todb> | Hello AT! In my flailing around looking for archive solutions to battle CVE linkrot, I came across your website. I could sure use some help, guidance, tips, and criticism on my quixotic effort to have useful archives of security vulnerability information that spans all sorts of websites, is fragile, and is rotting away from under us. Background and so far: https://github.com/todb/junkdrawer/blob/main/cve-kev-refs/README.md |
| 13:56:45 | <todb> | I figure if anyone has already solved most of the first-steps problems I'm running into, it'd be you. Thanks in advance. |
| 14:04:09 | | hitgrr8 joins |
| 14:05:19 | | threedeeitguy quits [Client Quit] |
| 14:05:19 | | threedeeitguy7 is now known as threedeeitguy |
| 14:15:58 | <pabs> | todb: not sure but I think for "Automating archiving new CVE ID references", the URLs project (channel #//) is the best option for this. basically it regularly downloads things and passes the URLs found therein to volunteers who download those URLs. you would need a page that contains the the *new* URLs to be archived |
| 14:16:29 | <pabs> | usually this is used for news and link posting sites, by grabbing time-based index pages |
| 14:16:55 | <pabs> | https://wiki.archiveteam.org/index.php/URLs https://tracker.archiveteam.org/urls/ |
| 14:17:34 | <pabs> | this is the github repo with all the URL sources https://github.com/ArchiveTeam/urls-sources |
| 14:19:55 | <todb> | pabs ah thanks for the pointer I'll check it out! Yeah I have two overall goals -- archive everything new, and also archive what's still around in the extant CVE refs. |
| 14:21:37 | | threedeeitguy is now authenticated as threedeeitguy |
| 14:23:35 | <pabs> | how many links are we talking in the second category? presumably those are just the link itself and any page resources? ie no outlinks or subdirectories? |
| 14:24:37 | <pabs> | hopefully the second category is also not stuff loaded by JS, because ArchiveBot doesn't support that, and AB would be a reasonable way to do it if the list isn't too big |
| 14:24:59 | <pabs> | if it is big then it will have to go through the URLs project I am guessing |
| 14:25:08 | | anarcat (anarcat) joins |
| 14:26:10 | <todb> | pabs: so there are about 215k CVE IDs in the world today. My wild guess is that there are maybe 3 references on average for each, but not all of them are unique. So.... 600k links? No subdirs or anything, they're all endpoints, and they're links to things like mailing list archives, specific advisories, blogs, tweets, etc |
| 14:26:20 | | katocala quits [Remote host closed the connection] |
| 14:26:47 | <todb> | There is /loads/ of JS in there. Tweets, for example. Very heavyweight |
| 14:27:49 | <anarcat> | that's not so big for archivebot :) |
| 14:27:53 | <pabs> | that makes it slightly more complicated. there is snscrape for twitter stuff, but I'm not sure it does individual tweets, usually we do entire accounts (except there is a 3200-recent-tweets limit right now) |
| 14:28:10 | <anarcat> | tweets? |
| 14:28:19 | <anarcat> | i mean archivebot should be able to deal with one tweet, iirc |
| 14:28:32 | <pabs> | the only thing that can do JS properly these days is SPN2 I thought |
| 14:28:37 | <anarcat> | ah |
| 14:28:41 | <todb> | The most interesting CVE ID refs that are tweets also are whole twitter threads |
| 14:29:35 | <todb> | Twitter is the reason why I started worrying about all this -- a huge swath of infosec twitter left or got banned in October and thus, all their vuln intel disappeared. |
| 14:29:50 | <pabs> | ouch. |
| 14:30:03 | <anarcat> | ouch indeed |
| 14:30:04 | <pabs> | anarcat suggested elsewhere that starting a wiki page about this would be good. then we can map out all the potential issues |
| 14:30:05 | <todb> | https://github.com/todb/junkdrawer/tree/main/cve-twitter-refs |
| 14:30:27 | <anarcat> | todb: yeah pabs shared that link elsewhere already, but that's not writeable by us :) |
| 14:30:41 | <anarcat> | i mean if you want archiveteam people to jump in there, you need to jump in the tools too ;) |
| 14:30:43 | <todb> | pabs: i am 100% on board with getting help through the AT wiki :) |
| 14:30:47 | <anarcat> | although i think you first need to request an account |
| 14:31:04 | <anarcat> | i forgot how that works, i got the edit bit at some point and promptly forgot :p |
| 14:31:52 | <pabs> | ok, snscrape does support tweets+threads https://github.com/JustAnotherArchivist/snscrape |
| 14:32:01 | <pabs> | (not sure if that is working right now though) |
| 14:32:09 | <todb> | i shall poke around for howto get an AT wiki username tysm |
| 14:32:39 | <anarcat> | i can just make a page i guess |
| 14:33:30 | <anarcat> | actually, just go to https://wiki.archiveteam.org/index.php?title=Special:CreateAccount&returnto=Main+Page |
| 14:35:11 | <pabs> | todb: to start with, can you make a text file, one per line, of all the refs (minus twitter). then do this: curl --upload-file cve-refs https://transfer.archivete.am/cve-refs.txt |
| 14:35:17 | <todb> | anarcat: sweet thanks done. Also, that is a wild captcha :) |
| 14:35:31 | <pabs> | then we can run ArchiveBot over them: http://archiveteam.org/index.php?title=ArchiveBot |
| 14:35:31 | <anarcat> | isn't it :) |
| 14:35:36 | <anarcat> | yeah |
| 14:35:55 | <anarcat> | that's a great start |
| 14:35:56 | <pabs> | this AB will give us a baseline without the tweets and without JS stuff, but we can go from there |
| 14:36:07 | <todb> | roger that will do |
| 14:36:07 | <anarcat> | i think it will fetch some of the twitter stuff, personnally |
| 14:36:10 | <anarcat> | but maybe i got that wrong |
| 14:36:23 | <anarcat> | i think the thing with snscrape is that it recurses through tweets |
| 14:36:26 | <anarcat> | so it gets the threading right |
| 14:36:33 | <anarcat> | and replies and so on |
| 14:36:41 | <anarcat> | but AB should be able to get *one* tweet, no? |
| 14:37:04 | <pabs> | there is no content for me in the browser when JS is off |
| 14:37:59 | <pabs> | hmm, ISTR there being a u-a trick for changing that but can't remember the details :( |
| 14:40:47 | | pabs peruses https://wiki.archiveteam.org/index.php?title=Twitter |
| 14:40:53 | <anarcat> | pabs: yeah, but if AB pulls all the bits and shoves them in the wayback machine, and *then* you have JS on when you browse wayback, it should work right? |
| 14:41:11 | <anarcat> | i mean in any case i think it's a good idea to start with a cve-refs.txt and shove that in archivebot |
| 14:41:23 | <anarcat> | then we can trim that down to social media and shove *that* in snscrape |
| 14:41:44 | <anarcat> | and of course at some point JAA will be awake and will correct all the bullshit i said and set us up straight again :) |
| 14:42:12 | <pabs> | but AB would need to run JS to pull the tweet content though |
| 14:43:45 | <pabs> | because it simply isn't in the HTML and isn't linked to by it |
| 14:43:59 | <anarcat> | ah right |
| 14:44:00 | <anarcat> | makes sense |
| 14:44:03 | <anarcat> | stupid web |
| 14:44:39 | <pabs> | indeed, gopher ftw :) |
| 14:44:50 | <anarcat> | ha |
| 14:44:55 | <anarcat> | i'm fine with plain HTML |
| 14:45:07 | <anarcat> | but this is getting -ot :p |
| 15:01:25 | | Guest7273 joins |
| 15:31:16 | <@JAA> | pabs: Tweet scraping works fine. |
| 15:31:37 | <@JAA> | anarcat: AB hasn't been able to grab tweets properly for a good while now. |
| 15:34:35 | | Island joins |
| 15:35:20 | | LeGoupil quits [Client Quit] |
| 15:39:21 | | hyvac quits [Remote host closed the connection] |
| 15:39:42 | | hyvac joins |
| 15:40:19 | | nostalgebraist joins |
| 15:40:25 | | nostalgebraist quits [Client Quit] |
| 15:53:49 | | hyvac quits [Ping timeout: 265 seconds] |
| 15:54:49 | <@JAA> | icedice: Just ran the log through wpull2-log-extract-errors, the only significant errors were the pagination of profile posts on one huge profile, which are too slow and exceed AB's time limit. |
| 15:55:01 | <@JAA> | Cc Sanqui ^ |
| 16:04:27 | | parfait_ quits [Ping timeout: 265 seconds] |
| 16:09:16 | | test___ (decky_e) joins |
| 16:13:50 | | Billy549 quits [Ping timeout: 252 seconds] |
| 16:15:50 | | test___ quits [Ping timeout: 252 seconds] |
| 16:24:40 | | cascode joins |
| 16:25:11 | | lflare quits [Ping timeout: 252 seconds] |
| 16:25:37 | <todb> | pabs: thanks again for your help; https://transfer.archivete.am/P6uNh/cve-refs.txt is up now (all minus twitter links). I'll read up on https://wiki.archiveteam.org/index.php?title=ArchiveBot to learn how to track progress and run a node and all that. I'm super new to all this. |
| 16:32:55 | <todb> | (I also extracted all the Twitter links and threw them up at https://transfer.archivete.am/EZdNi/cve-twitter-refs.txt , there's only 405 of them but maybe half or so are already gone or are useless.) |
| 16:37:15 | <icedice> | <JAA> icedice: Just ran the log through wpull2-log-extract-errors, the only significant errors were the pagination of profile posts on one huge profile, which are too slow and exceed AB's time limit. |
| 16:37:22 | <icedice> | Ah, that not a big issue then |
| 16:37:31 | <icedice> | Pretty much nobody reads profile posts |
| 16:37:43 | <icedice> | Threads and images are what's important |
| 16:37:56 | <icedice> | I thought the archivation job aborted itself or something lol |
| 16:40:15 | | icedice quits [Client Quit] |
| 16:58:57 | | Guest7273 quits [Client Quit] |
| 17:11:42 | <h2ibot> | JustAnotherArchivist edited ArchiveBot (-175, /* Volunteer to run a Pipeline */ Make it…): https://wiki.archiveteam.org/?diff=49790&oldid=49131 |
| 17:13:42 | <h2ibot> | Hans5958 edited URLTeam/Dead (+2125, Checking round on 2023-05-16): https://wiki.archiveteam.org/?diff=49791&oldid=49186 |
| 17:13:43 | <h2ibot> | Hans5958 edited URLTeam (-2636, Checking round on 2023-05-16, put example on…): https://wiki.archiveteam.org/?diff=49792&oldid=49785 |
| 17:14:12 | | icedice (icedice) joins |
| 17:14:22 | <icedice> | JAA: Could you check that Serebii Forums finished successfully without any major errors? I missed the end of that archivation job: https://archive.fart.website/archivebot/viewer/job/2c1vq |
| 17:14:42 | <h2ibot> | Vukky edited Deathwatch (+26, link to PTCGO section of website instead of the…): https://wiki.archiveteam.org/?diff=49795&oldid=49779 |
| 17:14:47 | <icedice> | That's the last big Pokémon forum I requested archivation for |
| 17:16:39 | <icedice> | I'll be back later |
| 17:16:41 | <todb> | Alright created https://wiki.archiveteam.org/index.php?title=ArchiveBot/CVE&modqueued=1 per advice here. Hope I'm doing it right. |
| 17:16:43 | | icedice quits [Client Quit] |
| 17:19:23 | <@JAA> | I don't think this belongs on a subpage of ArchiveBot, but it can be moved later. |
| 17:19:43 | <h2ibot> | Todb created ArchiveBot/CVE (+2765, Kick off a CVE reference project.): https://wiki.archiveteam.org/?title=ArchiveBot/CVE |
| 17:20:13 | <todb> | JAA: yeah I'm super noob and not sure how organization on the wiki works yet |
| 17:37:00 | <nicolas17> | there's some random video uploaded to the dynabook / tb2b ftp lol |
| 17:38:08 | <nicolas17> | (fullpwnmedia said the ftp server has write access) |
| 17:38:34 | <nicolas17> | er fullpwndotnet |
| 17:49:32 | | icedice (icedice) joins |
| 17:56:05 | | nighthnh099_ quits [Client Quit] |
| 18:00:51 | <h2ibot> | JAABot edited URLTeam/Dead (+0): https://wiki.archiveteam.org/?diff=49797&oldid=49791 |
| 18:01:33 | <@JAA> | TIL my bot does that. |
| 18:12:44 | | rr9 quits [Client Quit] |
| 18:13:13 | | rr (rr) joins |
| 18:15:59 | | rr quits [Client Quit] |
| 18:16:45 | | rr (rr) joins |
| 18:19:40 | | HP_Archivist (HP_Archivist) joins |
| 18:40:39 | | hyvac joins |
| 18:43:56 | | cascode quits [Read error: Connection reset by peer] |
| 18:44:03 | | cascode joins |
| 18:44:25 | | cascode quits [Read error: Connection reset by peer] |
| 18:44:42 | | cascode joins |
| 18:55:39 | | katocala joins |
| 18:56:02 | | katocala is now authenticated as katocala |
| 19:09:50 | | cascode quits [Ping timeout: 252 seconds] |
| 19:10:01 | | hyvac quits [Remote host closed the connection] |
| 19:10:09 | | hyvac joins |
| 19:10:50 | | cascode joins |
| 19:32:52 | <Raya> | Hey - I've been working on reddit all day. Then tried switching to imgur, got a "project did not install correctly" message. Tried rebooting a couple times. Now, most projects give me the same message. For example, reddit returns this: |
| 19:32:54 | <Raya> | 2023-05-17 19:32:28,257 - seesaw.warrior - ERROR - Project failed to install: Cloning into '/home/warrior/projects/reddit'... |
| 19:32:54 | <Raya> | fatal: unable to access 'https://github.com/ArchiveTeam/reddit-grab/': gnutls_handshake() failed: The TLS connection was non-properly terminated. |
| 19:32:54 | <Raya> | git returned 128 |
| 19:32:56 | <Raya> | 2023-05-17 19:32:28,261 - seesaw.warrior - DEBUG - Result of the install process: False |
| 19:32:58 | <Raya> | 2023-05-17 19:32:28,262 - seesaw.warrior - WARNING - Project reddit did not install correctly and we're ignoring this problem. |
| 19:33:51 | <Raya> | Do you have a clue what this could be? Few projects seem to be working, and the ones I worked until 20 mins ago suddenly don't anymore |
| 19:33:54 | <nicolas17> | I heard github is having problems |
| 19:34:33 | <Raya> | Oooh that'd explain it |
| 19:35:59 | | cascode quits [Read error: Connection reset by peer] |
| 19:36:16 | | cascode joins |
| 19:43:25 | <fireonlive> | current advice is to just retry until it grabs |
| 19:54:59 | <threedeeitguy> | My warriors broke so I just switched to docker, no more issues. |
| 20:00:22 | | parfait (kdqep) joins |
| 20:07:25 | <@JAA> | The container images don't require contacting GitHub, so that makes sense. |
| 20:07:39 | <@JAA> | Well, the project images don't, the warrior image does I think. |
| 20:08:31 | <@JAA> | icedice: Seven random thread pages failed on forums.serebii.net, i.e. it can be considered complete as well. |
| 20:09:33 | <@JAA> | And all but one of those are indeed broken on the server side, returning 500. |
| 20:13:52 | | nimaje1 is now known as nimaje |
| 20:22:02 | | parfait quits [Read error: Connection reset by peer] |
| 20:45:46 | | umgr036 joins |
| 20:46:36 | | umgr036 quits [Remote host closed the connection] |
| 20:46:49 | | umgr036 joins |
| 20:46:57 | | umgr036 quits [Client Quit] |
| 20:54:56 | | cascode quits [Ping timeout: 265 seconds] |
| 20:54:59 | | cascode joins |
| 21:01:59 | | lexikiq joins |
| 21:15:52 | | HP_Archivist quits [Client Quit] |
| 21:34:18 | | hitgrr8 quits [Client Quit] |
| 21:36:53 | | test___ (decky_e) joins |
| 21:36:53 | | test___ is now known as decky_e |
| 21:37:57 | | cascode quits [Read error: Connection reset by peer] |
| 21:38:10 | | cascode joins |
| 21:42:17 | <icedice> | <JAA> And all but one of those are indeed broken on the server side, returning 500. |
| 21:42:22 | <icedice> | Nothing to do about that then |
| 21:43:01 | <icedice> | Good that it got archived and Imgur didn't 429 on Bulbagarden Forums or Serebii Forums |
| 21:43:22 | | decky_e leaves |
| 21:47:04 | <masterX244> | i ignore imgur straight away on my scrapes due to that |
| 21:47:27 | | decky_e (decky_e) joins |
| 21:48:04 | <@JAA> | icedice: I only checked onsite stuff for errors. |
| 21:48:41 | <icedice> | Ah |
| 21:49:15 | <icedice> | Could you check a few of the Imgur links from both sites when you have time? |
| 21:51:20 | | cascode quits [Ping timeout: 252 seconds] |
| 21:51:43 | | cascode joins |
| 21:51:46 | <@JAA> | Not sure when I have time for that, maybe Friday. |
| 21:51:54 | <icedice> | Ok |
| 21:51:56 | <icedice> | Thanks |
| 21:52:18 | <icedice> | Could you check what hosting provider they were archived via? |
| 21:52:32 | <icedice> | If it's OVH or Hetzner we pretty much already know the answer |
| 21:52:59 | <icedice> | Though the sites did get archived pretty early on |
| 21:53:02 | <icedice> | So who knows |
| 21:53:19 | | cascode quits [Read error: Connection reset by peer] |
| 21:53:35 | | cascode joins |
| 21:53:59 | <@JAA> | Most likely one of those, yes. |
| 21:55:24 | <@JAA> | I think they were with --no-offsite-links anyway, so any links to images/albums/whatever would be missing anyway. |
| 22:03:32 | <andrew> | is there any web crawler (like wget -m or httrack) that supports concurrency and WARC writing? |
| 22:03:42 | <andrew> | or is it better to just run httrack behind a WARC-writing proxy or something |
| 22:03:59 | | decky_e quits [Ping timeout: 252 seconds] |
| 22:04:22 | | decky_e (decky_e) joins |
| 22:06:48 | <andrew> | oh maybe I can use grab-site :D |
| 22:10:01 | <masterX244> | yeah, grab-site is the way to go for warcs as a normal user |
| 22:10:27 | <andrew> | ideally I'd like to also have convenient access through the filesystem with wget's link rewriting |
| 22:10:40 | <andrew> | but I suppose I can just replay the WARC and point wget at it |
| 22:12:10 | | lflare (lflare) joins |
| 22:19:57 | <andrew> | and 5 yaks later I'm rebuilding ffmpeg from source |
| 22:21:34 | | hyvac quits [Client Quit] |
| 22:27:40 | | MrTumnus joins |
| 22:38:22 | <icedice> | <JAA> I think they were with --no-offsite-links anyway, so any links to images/albums/whatever would be missing anyway. |
| 22:38:26 | <icedice> | What the shit |
| 22:38:50 | <icedice> | Getting Imgur links archived at the same time was the reason that they were being archived now |
| 22:39:14 | <pokechu22> | The plan is to extract them from the WARC, not the log, to my understanding |
| 22:39:20 | <icedice> | I guess the links could be extracted from the WARC and archived separately |
| 22:39:26 | <icedice> | Yeah, true |
| 22:39:28 | <pokechu22> | and then throw them into #imgone, yeah |
| 22:39:37 | <pokechu22> | rather than trying to make archivebot run slowly to not get banned by imgur |
| 22:39:57 | <icedice> | Assuming Imgur doesn't wipe everything by then |
| 22:40:19 | <icedice> | Any idea if they've started deleting stuff yet? |
| 22:42:04 | <fireonlive> | some signs of deletion yes but very slowly (as the catalogue is massive) |
| 22:42:42 | <icedice> | Yeah, I thought so |
| 22:43:16 | <icedice> | Noticed some three day old upload had gotten deleted, so I figured that was probably by Imgur |
| 22:44:31 | <icedice> | Them not locking down uploads to registered accounts when announcing the purge is just another level of assholeishness |
| 22:44:32 | <fireonlive> | none of my self-uploaded canaries have died yet, and a chunk of a list of 'worked recently' only a few are gone (but can be hard to tell why) |
| 22:45:03 | <fireonlive> | yeah; there's a lot of newly uploaded stuff by people who have no idea that's just oging to go away it seems |
| 22:45:19 | <icedice> | Like they're actively encouraging people to get their shit deleted at this point |
| 22:45:56 | <icedice> | True, it's impossible to know for sure if it's Imgur or the uploader that deleted it |
| 22:46:38 | <icedice> | Has any NSFW subreddits banned Imgur links yet? |
| 22:46:44 | <icedice> | They really should tbh |
| 22:47:14 | <icedice> | Future links, not the ones already there |
| 22:48:56 | <fireonlive> | not sure. some of the ones I monitor still have imgur links trickling in but I havne't checked for rule updates |
| 22:58:27 | | nicolas17 quits [Client Quit] |
| 23:00:37 | | Raya quits [Client Quit] |
| 23:14:26 | | eroc1990 quits [Client Quit] |
| 23:20:12 | | BlueMaxima joins |
| 23:31:59 | | wyatt8740 quits [Ping timeout: 252 seconds] |
| 23:32:29 | | wyatt8740 joins |
| 23:58:36 | | wyatt8740 quits [Ping timeout: 265 seconds] |
| 23:58:51 | | wyatt8740 joins |