00:00:00Jake quits [Quit: Leaving for a bit!]
00:01:40<qwertyasdfuiopghjkl>JAA: Looks like you can just fill in whatever in the email/country selection thing on pages like https://opensource.com/downloads/linux-metacharacters-cheat-sheet and it sets a cookie with the name STYXKEY_Drupal_visitor_gatedemail and value osdc-gated-content that makes pages show the actual download link without needing a login. Guessing it's
00:01:41<qwertyasdfuiopghjkl>just to get people to sign up for the newsletter.
00:03:50Jake (Jake) joins
00:07:04sonick (sonick) joins
00:23:35nicolas17 quits [Client Quit]
00:27:47graham (graham) joins
00:42:31programmerq quits [Ping timeout: 252 seconds]
00:44:03programmerq (programmerq) joins
00:56:36benjinsm is now known as benjins
02:17:27tzt quits [Remote host closed the connection]
02:17:50tzt (tzt) joins
02:23:06Guest50 quits [Client Quit]
02:24:48Guest50 joins
02:52:36Guest50 quits [Client Quit]
02:54:02dumbgoy__ joins
02:56:43dumbgoy_ quits [Ping timeout: 252 seconds]
03:49:07TheTechRobo quits [Remote host closed the connection]
03:49:47<Hans5958|m>What is a good way to scrape links from a Google search?
03:49:47TheTechRobo (TheTechRobo) joins
03:58:18TheTechRobo quits [Read error: Connection reset by peer]
03:58:25AlsoTheTechRobo (TheTechRobo) joins
04:01:07AlsoTheTechRobo quits [Remote host closed the connection]
04:01:56AlsoTheTechRobo (TheTechRobo) joins
04:05:20<lennier1>"According to Twitter's policy, users should log in to their account at least once every 30 days to avoid permanent removal due to prolonged inactivity." I had not heard of this policy before.
04:05:25eythian quits [Quit: http://quassel-irc.org - Chat comfortabel. Waar dan ook.]
04:05:55eythian joins
04:22:19<datechnoman>30 days is far from "prolonged inactivity" lol? Maybe like 12 months or more.....
04:26:08Guest50 joins
04:36:39AlsoTheTechRobo quits [Remote host closed the connection]
04:37:15AlsoTheTechRobo (TheTechRobo) joins
04:40:58Sluggs quits [Excess Flood]
04:44:04Sluggs joins
04:44:13<Hans5958|m>Even Google has two years
04:45:26sepro quits [Ping timeout: 252 seconds]
04:53:23sepro (sepro) joins
04:55:42Nulo quits [Ping timeout: 252 seconds]
05:15:58Nulo joins
05:19:00qwertyasdfuiopghjkl quits [Quit: qwertyasdfuiopghjkl]
05:26:46Jonboy3451 joins
05:29:48Jonboy345 quits [Ping timeout: 252 seconds]
05:38:50BlueMaxima quits [Client Quit]
06:15:47Island quits [Read error: Connection reset by peer]
06:16:44Guest50 quits [Ping timeout: 265 seconds]
06:18:03hitgrr8 joins
06:54:00<pabs>rewby|backup: was a date mentioned?
06:54:23<@rewby|backup>No
06:54:54<@rewby|backup>Premptive more than anything
06:55:07<@rewby|backup>If a site loses all staff it doesn't go down immediately
06:55:10<pabs>would AB be the thing to use to save it?
06:55:16<@rewby|backup>But also probably won't live long
06:55:37<@rewby|backup>Unsure, I'm not an expert on rgat
06:55:39<@rewby|backup>*that
06:56:52<pabs>I'll ask a RedHat person if they are able to make downloads public
07:01:30<pabs>you might want to ask through your channels too
07:01:53<@rewby|backup>I don't have any channels
07:06:00<pabs>where did you read about the staff firing?
07:07:02<@rewby|backup>pabs: Friend who has friends there told me
07:08:35<pabs>emailing their site address says: Thank you for your contributions. The Opensource.com community publication, including this email listserv, is no longer supported by Red Hat.
07:10:52<pabs>JAA: perhaps we should kick off an AB job and then later if they make the downloads public, do just those?
07:12:41pabs started with a snscrape of the twitter account for now
07:18:59lexikiq quits [Client Quit]
07:50:32Arcorann (Arcorann) joins
07:55:01sec^nd quits [Ping timeout: 245 seconds]
08:01:32sec^nd (second) joins
08:15:09<@OrIdow6>Wasn't there some kerfluffle a few years ago about Twitter deleting old accounts?
08:15:38<@OrIdow6>People complained, they paused it, and that was the last I heard of it
08:35:22<AK>Think it was around people who had passed away iirc
08:36:00<AK>Families+Friends wanted their accounts to remain as a preserved memory
09:15:34Ivan226 quits [Ping timeout: 265 seconds]
09:56:10pie_ quits [Ping timeout: 265 seconds]
10:04:16pie_ joins
10:04:37Ruthalas5 quits [Ping timeout: 252 seconds]
10:06:02Ruthalas5 (Ruthalas) joins
11:57:22Barto quits [Ping timeout: 252 seconds]
12:10:03Icyelut (Icyelut) joins
12:11:06fred44 joins
12:11:18Icyelut|2 quits [Ping timeout: 252 seconds]
12:13:25fred44 quits [Remote host closed the connection]
12:19:22benjins quits [Ping timeout: 252 seconds]
12:35:39icedice (icedice) joins
12:38:36tjwds quits [Quit: Ping timeout (120 seconds)]
12:38:42tjwds joins
12:47:50therubberduckie quits [Remote host closed the connection]
12:53:50TastyWiener95 quits [Ping timeout: 252 seconds]
13:11:48HP_Archivist quits [Ping timeout: 252 seconds]
13:23:57benjins joins
13:35:59CaldeiraG (CaldeiraG) joins
13:37:20Barto (Barto) joins
13:42:55rubberduck joins
14:03:38rubberduck quits [Ping timeout: 265 seconds]
14:09:00Billy549 quits [Ping timeout: 252 seconds]
14:10:24Arcorann quits [Ping timeout: 265 seconds]
14:10:45pabs quits [Read error: Connection reset by peer]
14:11:56pabs (pabs) joins
14:16:01<icedice>Sanqui: How is it going with https://pokecommunity.com/, by the way?
14:16:15<icedice>Any ETA for when that archivation job might begin?
14:22:47Billy549 (Billy549) joins
14:28:46CaldeiraG quits [Ping timeout: 265 seconds]
14:41:58Island joins
14:42:10umgr036 joins
14:43:07umgr036 quits [Remote host closed the connection]
14:43:21umgr036 joins
14:53:55<@Sanqui>icedice: unfortunately there's a cloudflare wall so it can't be done with archivebot
15:08:16rubberduck joins
15:12:15nostalgebraist joins
15:15:38umgr036 quits [Remote host closed the connection]
15:15:54umgr036 joins
15:34:37AlsoTheTechRobo quits [Remote host closed the connection]
15:35:15AlsoTheTechRobo (TheTechRobo) joins
15:37:44Ivan226 joins
15:40:53Guest50 joins
15:51:50umgr036 quits [Remote host closed the connection]
15:52:05umgr036 joins
16:01:12nicolas17 joins
16:04:07retromouse (retromouse) joins
16:07:01Nulo quits [Read error: Connection reset by peer]
16:07:07Nulo joins
16:12:53<@JAA>pabs: I don't have time to watch it, but yes, an AB job now is a good idea either way.
16:20:09AlsoTheTechRobo quits [Remote host closed the connection]
16:20:49AlsoTheTechRobo (TheTechRobo) joins
16:23:51threedeeitguy_ joins
16:27:14threedeeitguy quits [Ping timeout: 252 seconds]
16:39:31Guest50 quits [Ping timeout: 252 seconds]
16:57:25Hackerpcs quits [Quit: Hackerpcs]
17:00:39Hackerpcs (Hackerpcs) joins
17:07:02Guest50 joins
17:12:26threedeeitguy joins
17:14:32threedeeitguy_ quits [Ping timeout: 252 seconds]
17:21:19Guest50 quits [Read error: Connection reset by peer]
17:22:45<icedice>Saqui: Well shit. Is there any way to do it?
17:23:08<icedice>Or are we stuck with just scraping Imgur URLs from it?
17:23:18<icedice>Or is that even possible
17:23:37Webuser710 joins
17:23:42<icedice>If nothing else I could try asking the webmaster for assistance
17:24:06<icedice>Assuming he doesn't just yell at me and ban me for even suggesting it
17:24:51<icedice>* Sanqui
17:24:57<pokechu22>There's a note at the top of https://www.pokecommunity.com/forumdisplay.php?fn=scarlet-violet mentioning imgur, so they at least know about it, and probably would be happy to help if they can
17:25:03<icedice>(I spelled your nick wrong)
17:25:10<icedice>Yeah
17:25:30<@Sanqui>Yes if somebody could get a list of imgur urls either by scraping through other means than archivebot or from the forum administrators that would be ideal.
17:25:33<pokechu22>I don't see a search feature (it may only be available when signed in), but if there is one you can try searching imgur and seeing what links show up
17:25:49<pokechu22>though depending on the forum software that might be incomplete
17:25:53<icedice>Of the three major Pokémon forums, they're the one that is the most chill by far
17:26:05<icedice>Has a ROM hacking section
17:26:22<icedice>Used to allow manga scans until 2009 when VIZ started publishing again
17:27:07<icedice>Is there some way for the admin to add an exception for ArchiveBot in CloudFlare?
17:27:16<icedice>Like an allowed IP or user-agent or something
17:27:52<pokechu22>Oh, yeah, another large forum: https://hypixel.net/forums/ which also seems to be imgur-heavy (but also 4 million threads and 33 million posts). It does have a search: https://hypixel.net/search/11815554/?q=imgur&o=date - and from https://hypixel.net/search/11815554/?page=5&q=imgur&o=date there's a "view older results" link so you can keep going back, but I haven't seen how
17:27:54<pokechu22>far back it actually goes
17:30:16<myself>icedice: CF theoretically has a concept of "friendly bots" but I don't know if anyone's ever pursued getting AB listed as such.
17:33:47<icedice>I remember that a manga ripper had a CloudFlare bypass thing
17:33:54<icedice>Something with cookies iirc
17:35:27<pokechu22>On further thought I can probably hack together a bookmarklet to do the hypixel forums
17:36:17<pokechu22>CloudFlare tends to give you a cookie that allows longer access once you load the page in a browser (and possibly complete a captcha) but that cookie is time-limited. It's not really suitable for archivebot (since you can't set custom cookies on it) but it is something I've done for a few wikis
17:52:05<icedice><pokechu22> On further thought I can probably hack together a bookmarklet to do the hypixel forums
17:52:16<icedice>Could you make one for The PokéCommunity as well?
17:52:52<icedice>Assuming the admin doesn't agree to help
17:53:06<pokechu22>If there's a search page, maybe, but it really depends on how the search page behaves
17:54:11Barto quits [Ping timeout: 265 seconds]
17:54:49<pokechu22>Some forums show the whole post if it's a search result, and others only give a snippet and a link. Hypixel is the latter so I'd need to extract content from each post separately (I'm planning on using archivebot for that, which would be an issue for PokéCommunity)
18:00:41<icedice>The PokéCommunity uses XenForo in case that tells you anything
18:03:38HP_Archivist (HP_Archivist) joins
18:07:30<pokechu22>ah, found it: https://www.pokecommunity.com/search.php requires being signed in
18:08:59theavery joins
18:10:03<pokechu22>yeah, I don't think that's going to work in the same way :/
18:11:22icedice quits [Ping timeout: 252 seconds]
18:13:56umgr036 quits [Remote host closed the connection]
18:14:10umgr036 joins
18:17:13icedice (icedice) joins
18:26:57theavery quits [Remote host closed the connection]
18:59:08Guest50 joins
19:00:08whoami quits [Ping timeout: 252 seconds]
19:05:43Ivan226 quits [Ping timeout: 265 seconds]
19:10:12Guest50_ joins
19:12:14Guest50 quits [Ping timeout: 252 seconds]
19:24:02Megame (Megame) joins
19:24:05Guest50_ quits [Ping timeout: 265 seconds]
19:41:23sonick quits [Client Quit]
19:56:07BigBoris57 joins
19:56:50Barto (Barto) joins
19:59:22BigBoris quits [Ping timeout: 265 seconds]
20:08:06qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
20:30:04hitgrr8 quits [Client Quit]
20:48:10Webuser710 quits [Remote host closed the connection]
21:11:55Billy549 quits [Client Quit]
21:30:02TastyWiener95 (TastyWiener95) joins
21:32:49Billy549 (Billy549) joins
21:33:36Webuser124 joins
21:34:49<@rewby>Now that things are working correctly, we've slammed into the 10 limit immediatelyt
21:38:07AlsoTheTechRobo quits [Remote host closed the connection]
21:38:50AlsoTheTechRobo (TheTechRobo) joins
21:44:29lexikiq joins
21:53:55dumbgoy__ quits [Ping timeout: 265 seconds]
22:06:09Guest50 joins
22:06:12<datechnoman>rewby What is the 10 limit you set out of curiosity?
22:06:58<Billy549>Hey, something a friend forked has just got DMCA'd but is still up on GitHub - what's the best way to immediately archive it?
22:07:11<Billy549>https://github.com/shchmue/Lockpick_RCM for reference
22:07:23<@rewby>datechnoman: I've implemented some code to limit our max upload concurrency
22:07:51<datechnoman>Ahhh upload concurrency. Nice
22:08:06<datechnoman>Thanks :)
22:21:51dumbgoy__ joins
22:35:00BigBoris57 quits [Ping timeout: 265 seconds]
22:39:23<JTL>Billy549: Not seeing any DMCA registered with GitHub for anything related to that, so... ?
22:39:56<Billy549>JTL: theyve been privately sent the DMCA request
22:39:59<@JAA>Billy549: #gitgud is our GitHub project.
22:40:13<Billy549>"ah it's going to be processed after 1 business day / grace period for counter notice or making changes
22:40:18<Billy549>@JAA noted
22:40:19dumbgoy__ quits [Ping timeout: 252 seconds]
22:40:19<JTL>Billy549: ahh
22:40:56<@JAA>So best to request it there with note of urgency. I'll take care of it now.
22:41:04<Billy549>Thank you ^^
22:42:42dumbgoy__ joins
22:46:26Webuser124 quits [Remote host closed the connection]
22:51:18Megame quits [Client Quit]
22:53:05dumbgoy joins
22:53:22dumbgoy__ quits [Ping timeout: 265 seconds]
22:58:27<@OrIdow6>It appears that the newworld and playlostark forums have indeed frozen
22:58:38<@OrIdow6>Does Discourse work well in AB?
23:00:06retromouse quits [Read error: Connection reset by peer]
23:00:43retromouse (retromouse) joins
23:01:17sonick (sonick) joins
23:02:37<pokechu22>Yes, ish - there isn't an ignoreset for it IIRC but it generally works OK
23:02:56<@OrIdow6>Nothing in the forums ignoreset?
23:04:01<pokechu22>I usually still apply it, but I think nothing in it specifically targets discourse - see https://github.com/ArchiveTeam/ArchiveBot/issues/317
23:06:11<@OrIdow6>Ah thanks
23:06:26<@OrIdow6>Good that we have people keeping track of this stuff
23:07:46<@JAA>Discourse works reasonably well for archival, but playback in the WBM is often broken unless you disable JS.
23:09:56<@OrIdow6>A shame but better than many other sites
23:12:11icedice quits [Client Quit]
23:13:37Ruthalas5 quits [Client Quit]
23:13:58Ruthalas5 (Ruthalas) joins
23:14:32icedice (icedice) joins
23:17:39<h2ibot>OrIdow6 edited Discourse (+300, Archiving with AB & New World/Lost Ark forums): https://wiki.archiveteam.org/?diff=49737&oldid=49670
23:29:49Guest50 quits [Ping timeout: 252 seconds]
23:49:04retromouse quits [Ping timeout: 252 seconds]
23:49:58Guest50 joins
23:55:34Arcorann (Arcorann) joins