00:34:56blankie joins
00:34:56blankie quits [Changing host]
00:34:56blankie (blankie) joins
00:49:57Mineroboter_ joins
00:51:07Mineroboter quits [Ping timeout: 258 seconds]
00:58:33IKI joins
00:59:04HackMii_ quits [Remote host closed the connection]
01:00:48HackMii_ (hacktheplanet) joins
01:02:51dm4v quits [Read error: Connection reset by peer]
01:06:03Qub3d (Qub3d) joins
01:07:21dm4v joins
01:07:24dm4v quits [Changing host]
01:07:24dm4v (dm4v) joins
01:24:43britm0b joins
01:26:00britmob quits [Ping timeout: 258 seconds]
01:28:16Mineroboter_ quits [Client Quit]
01:30:44Mineroboter joins
01:48:41<flashfire42>https://app.box.com/s/6b9wmjvr582c95uzma1136exumk6p989/folder/135953042066 we should probably keep an eye on this
01:51:26<@OrIdow6>flashfire42: Are those documents involved in the Apple lawsuit?
01:51:33<flashfire42>yes
01:51:53<flashfire42>came from https://www.ign.com/articles/epic-vs-apple-shows-the-courts-were-not-prepared-for-the-games-industrys-obsessive-secrecy
01:51:59<flashfire42>tried archivebot and it didnt work
01:59:55<@JAA>Yup, Box is JS crap. I'm not aware of any good way to archive it.
01:59:58<@OrIdow6>Yeah, that's way too JS-heavy
02:00:06<@OrIdow6>Ther'es a download button that compiles it into a ZIP
02:00:31<@OrIdow6>Obviously not going to work in the WBM
02:21:59<mgrandi>Reuters is now on the naughty list apparently: https://twitter.com/felixsalmon/status/1389649958895333380
02:33:57<thuban>in re giantbomb discussion: the same late-2020 acquisition (viacomcbs selling cnet media group to red ventures) put gamefaqs under new ownership; it has been suggested on the subreddit that the boards may be in danger
02:34:20<thuban>(op says "The website was bought by a new company recently and they have said that most of the people who visit the site these days just wanna look at the faqs and the message boards are not visited much anymore..." but doesn't provide a source, zero points)
02:35:32<thuban>eyeballing the front page suggests several million topics, let alone pages, so probably not an archivebot job, but maybe worth looking into
02:37:06<Jack_Thompson>Would be pretty devastating to lose GameFAQs imo
02:37:11<Jack_Thompson>A lot of gaming history there
02:41:34<Krownest>Moved from doing everything manually to docker.
02:44:24<mgrandi>We can start a project for each, all it takes is one person to create a url list honestly
02:45:38<@JAA>thuban: 80 million topic IDs, post IDs are approaching a billion.
02:46:54<@JAA>Definitely no AB matter, obviously.
02:47:16<@arkiver>reuters losing it
02:47:31<@arkiver>JAA: nice, could do a tiny project
02:48:04<mgrandi>So on the reuters thing , they might have just been migrating stuff
02:48:06<mgrandi>https://twitter.com/felixsalmon/status/1389672394353258498
02:50:37<mgrandi>But is giant bomb owned by the same parent company as gamefaqs?
02:52:29<thuban>mgrandi: yes, they were both part of viacomcbs
02:55:55<thuban>JAA: ids look sequential, but (a) board urls appear to require a name slug and (b) while i suspect thread ids are unique sitewide, thread urls appear to require the board id/name
02:56:57<@JAA>Yep, I found the same thing. Looking for a URL that just uses the topic ID, but haven't seen anything yet.
02:57:15<thuban>that and their 'one board for every single game, browse by system' would make enumeration doable but nontrivial
03:00:46<thuban>(oh, and the non-game boards have equally bespoke categorization and listing)
03:03:32<mgrandi>Do they require a name slug?
03:03:37<mgrandi>Those are usually optional
03:04:10<thuban>the boards? looks like, yes (unless there is a secret alternate url scheme)
03:04:25<@JAA>Nope, slug is optional: https://gamefaqs.gamespot.com/boards/234547-/79440096
03:04:29<@JAA>But you need the ID.
03:04:57<thuban>oh! i was fooled because i didn't leave the _trailing hyphen_
03:06:18<mgrandi>Cool
03:06:34<mgrandi>Although one might want the full slug since that's what people would want
03:06:45<mgrandi>Maybe doing a header only request for that url will return the full url?
03:07:39<thuban>nope
03:08:50<thuban>the thread and (more importantly) pagination links all include the slug, though, so it would be easily had
03:09:28<@JAA>We're obviously not going to bruteforce 300k board IDs times 80M topic IDs.
03:09:58<thuban>no, of course not
03:11:14<@JAA>But we could use it to bruteforce the boards without having to traverse their games list etc.
03:11:46<thuban>right right, and if necessary (seems likely) use them to spider topics
03:12:00<@JAA>Hmm, https://gamefaqs.gamespot.com/boards/3- only shows topics from the past week.
03:14:46<thuban>aggressive pruning? some of the game boards definitely have very old posts
03:17:12<@JAA>Smells like it, yeah. https://gamefaqs.gamespot.com/boards/3-poll-of-the-day/77020198 existed in late 2018: https://web.archive.org/web/20181007202729/https://gamefaqs.gamespot.com/boards/3-poll-of-the-day/77020198
03:21:05<@JAA>Some boards have access restrictions: https://gamefaqs.gamespot.com/boards/306-gamefaqs-usa-atlantic
03:21:48<mgrandi>I'm not saying brute forcing , all of the board IDs should be discoverable
03:21:52<thuban>er, and by "very old" i mean 2008; gamefaqs appears to have had boards (of some form) as far back as 2000.
03:24:06nerdguy1138 quits [Ping timeout: 250 seconds]
03:24:46<@JAA>Wikipedia mentions that boards were shared between GameFAQs and GameSpot between 2004 and 2012.
03:25:04<@JAA>I've only seen a couple posts from before 2012, so that seems related.
03:26:19DogsRNice quits [Read error: Connection reset by peer]
03:27:25<@JAA>There are also topics that don't show up on the corresponding board: https://gamefaqs.gamespot.com/boards/11-sballin/62745571
03:27:36<mgrandi>They say they have boards per game
03:28:07<mgrandi>https://gamefaqs.gamespot.com/boards/533287-super-mario-sunshine?page=33 has posts from 2008
03:29:02<mgrandi>https://gamefaqs.gamespot.com/boards/198848-super-mario-64?page=61 maybe 2008 is as far back as they go?
03:30:23<mgrandi>https://gamefaqs.gamespot.com/boards/197341-final-fantasy-vii?page=1247 also 2008
03:31:04<nuroten>arkiver, SketchTheCow: what was the question about RTHK? was someone searching for translation, general suggestions on what to focus on backing up or ...
03:34:41systwi quits [Quit: Give me your HAND, and I'll help you across.]
03:38:18nerdguy1138 (nerdguy1138) joins
03:39:00<nuroten>if the question is about who might be interested in an archive of the materials — maybe independent online media, haven't got a name but there was a museum that displayed items from the Umbrella movement
03:44:42qw3rty_ joins
03:45:48<nuroten>https://www.newschoolfreepress.com/2020/09/30/we-are-all-hongkongers-an-art-exhibit-that-recorded-a-revolution/
03:47:55<nuroten>the LIHKG forum is frequented by local residents, they will probably be able to provide some names
03:48:22qw3rty__ quits [Ping timeout: 250 seconds]
03:54:00<nuroten>local museums and libraries may have the connections but I'm guessing most won't take them anymore
03:57:14systwi (systwi) joins
03:58:40etnguyen03 quits [Client Quit]
04:00:49<nuroten>thuban: I'm slowly wading through the list of Letter to Hong Kong, at least to grab the audio files, no need to look into that one in particular
04:06:40<nuroten>back later, thanks archiveteam and anyone else helping with the RTHK things!
04:14:52atphoenix quits [Read error: Connection reset by peer]
04:15:31atphoenix (atphoenix) joins
04:39:09<mgrandi>Later :)
05:05:38<@JAA>I was going to write to MeriStation, but their contact form is broken (404 after submission). Welp.
05:09:17Jonboy3451 joins
05:11:47Jonboy345 quits [Ping timeout: 258 seconds]
05:45:57guest00014 joins
05:49:46<guest00014>Everything I could say is that your are doing a great job. You've done so much for archiving Internet sites. And Internet is so unexpectedly (for me) fragile... Have you heard of codepad.org that won't respond for several weeks? That is terrifying (in a way)...
06:04:02<thuban>the good news is that the 'streaming' version of _hong kong connection_ is in 720p (as opposed to the 'archive', which is 480p)
06:05:19<thuban>the bad news is that it will therefore take one zillion years to download
06:05:50<mgrandi>@guest00014: what about codepad?
06:08:35<nuroten>thuban: haha ... whichever you think is the better approach
06:11:57pawbs quits [Quit: My ZNC server died. Probably updating my kernel...]
06:11:59<guest00014>@mcgrandi codepad.org used to be an online code interpreter (you know, like jsfiddle, but for several languages), created by Steven Hazel back in 2008. People used it to save code under user accounts, etc. It seems like it .. just disappeared in March 2021, without notice...
06:17:09<thuban>arkiver: each 'Hong Kong Connection' episode comes in a 720p 'vod' version hosted as a playlist with segments, and a 480p 'archive' version hosted as a single video. i plan to download the higher-quality versions to upload to ia, but which do we want for the wbm? both?
06:20:15<thuban>(whoops, actually there are several segmented versions for each ep, at different qualities; i've just only been paying attention to the best)
06:24:16<guest00014>And since it had no CORS, it used to be a cosy place to store js-code and then embed its raw-code to use as part of bookmarklets etc (e.g. pastebin is paid-only for disable CORS, hastebin uses CORS etc). A bit terrifying (and maybe unexpected) is that there is no info what has really happened. But it seems like it can happen to any site...
06:46:56LeGoupil joins
07:07:08hooway joins
07:25:01VukkyWork (VukkyWork) joins
07:32:52guest0001476 joins
07:35:29guest00014 quits [Ping timeout: 244 seconds]
07:38:59lennier2 joins
07:41:30lennier1 quits [Ping timeout: 250 seconds]
07:41:30lennier2 is now known as lennier1
07:59:15guest0001476 quits [Ping timeout: 244 seconds]
08:45:30BlueMaxima quits [Client Quit]
08:55:53Arcorann (Arcorann) joins
09:09:55<masterX244>Was off for the night. Only linked that apple vs epic box.com link since i got a shitty upload at my end and no quick way to bounce the pdfs over my server for faster upload
09:14:24Arcorann_ joins
09:16:24Arcorann quits [Ping timeout: 250 seconds]
10:11:57<@OrIdow6>masterX244: "Only linked"? I don't understand what it was you did
10:59:10Inhonion quits [Remote host closed the connection]
11:01:59Inhonion joins
11:08:08<masterX244>referring to a link i posted earlier in this chat
11:08:28<masterX244>was regarding to a comment on that it doesnt work with the WBM
11:09:45<masterX244>I got really slow upload ==>600MB takes a while for upload. Linked that folder here yesterday so others are aware of it
11:32:02VukkyWork quits [Remote host closed the connection]
12:32:27Peca21 joins
12:33:21<Peca21>I am getting this error in warrior Retrying after 60 seconds...
12:33:21<Peca21>exit code 5 for Item y0iGzQ6W
12:33:33<Peca21>max connections (-1) reached??? why -1
12:35:18<rewby>For what project?
12:37:22<Peca21>Pastebin
12:38:56<Peca21>also this
12:38:57<Peca21>Retrying after 60 seconds...
12:38:58<Peca21>RsyncUpload for Item gGR6rR0w
12:39:20<Peca21>NVM thats the same thing
12:40:23<@Kaz>EggplantN: your box?
12:40:37<Peca21>I switched to reddit now but it wont switch because its stuck on this error
12:40:47<@EggplantN>pastebin?
12:40:47<@EggplantN>ye
12:49:16serx joins
12:52:41<Peca21>reddit works fine for me
13:15:36serx quits [Remote host closed the connection]
13:17:02Peca21 quits [Remote host closed the connection]
13:36:04hooway_ joins
13:36:04hooway quits [Read error: Connection reset by peer]
13:40:21mary joins
13:58:30<yarrow>Does anyone know how 130k items got added to the Yahoo Answers archive between today and yesterday?
13:59:39<rewby>kid urls were still responding for a few hours, so they tried scraping them
14:00:08<@EggplantN>A question for #noanswers
14:01:45<yarrow>I was inexplicably banned from there despite personally creating over 3% of the Yahoo Answers archive, recruiting many people to the project, and taking time to answer questions in the channel =\
14:06:12<yarrow>I think maybe you were annoyed that when someone asked how much we had archived I joked that it was 69.420%?
14:08:34<yarrow>Please understand I'm a volunteer, I gave up my time for this, I spent my own money on spinning up a lot of VMs, and it took away from my obligations and relationships to devote my attention to this project. I just ask to be treated with basic respect and decency.
14:10:39Mateon1 quits [Remote host closed the connection]
14:10:48Mateon1 joins
14:16:15etnguyen03 (etnguyen03) joins
14:26:42<@Kaz>checking..
14:38:05Jonboy3451 quits [Read error: Connection reset by peer]
14:44:21Jonboy345 joins
15:11:56spirit joins
15:20:13nuroten quits [Remote host closed the connection]
15:20:50nuroten joins
15:24:12<nuroten>nyany: looks like framasoft is already on deathwatch but maybe a request could be made to queue the pastebin earlier?
15:27:25Arcorann_ quits [Ping timeout: 258 seconds]
15:36:42<ThreeHeadedMonkey>Looks like you'd have to provide the decryption key as part of the URL in order to view anything on framabin or you'll just see an error message
15:38:11<ThreeHeadedMonkey>Although the encrypted ciphertext is always downloaded even if the key is missing, so that could be saved
15:38:36<ThreeHeadedMonkey>Although saving a bunch of encrypted messages without key doesn't exactly sound very useful...
15:38:41<ThreeHeadedMonkey>keys*
15:43:24Qub3d quits [Client Quit]
15:46:21<@hook54321>nyany: apparently they were set to expire after a week by default
15:49:14<nyany>nuroten / hook54321 ah i should have checked dw first
15:49:24<nyany>i just happened to stumble upon it lol
15:54:08<nuroten>yeah, they've been slowly winding down their hosted services for some time, a few at a time, so maybe it's not a bad idea to ask if backups haven't already yet been scheduled
16:03:20<nuroten>the closing schedule: https://alt.framasoft.org/en/
16:07:01second (second) joins
16:09:28sec^nd quits [Ping timeout: 255 seconds]
16:09:28second is now known as sec^nd
16:10:47<ThreeHeadedMonkey>"We refuse to become the « default » solution and to monopolize your uses and attention" - What a creative excuse for deleting their user's data
16:10:48<@arkiver>nyany: any public data on framabin?
16:15:47<nuroten>they are a small non-profit that probably wanted to raise awareness about open source, let people try out the apps. the sunset is happening slowly at least, not disappearing overnight
16:17:00<nuroten>they don't really have a business model to sustain the hosting, it was done through crowdfunding basically
16:18:24sec^nd quits [Remote host closed the connection]
16:18:44sec^nd (second) joins
16:24:08<nuroten>the initial announcement was back in September 2019, most of the stuff is still online but read-only
16:24:29<lunik1>it might be worth reaching out to them directly?
16:55:57hooway_ quits [Read error: Connection reset by peer]
16:56:07hooway joins
17:01:52<nyany>I don't know if that's possible because it's an encrypted pastebin
17:02:27<nyany>arkiver: from a quick glance no, but crawls contain links back to their service
17:06:47LeonardoSaponara joins
17:07:32LeonardoSaponara quits [Client Quit]
17:19:30<betamax>Has anyone looked at archiving the websites / social media from the UK elections tomorrow?
17:19:33<betamax>("UK elections" => actually parliment elections for Scotland / Wales, and council elections for England)
17:19:53<betamax>I can get a lot of websites / twitters / facebooks / etc... from democracy club
17:20:36<betamax>But I recall from a month or so ago that there were issue archiving from facebook / instagram due to rate limiting - anyone know if this is still the case (JAA?)
17:23:05<@JAA>betamax: That is still the case, and in fact Facebook has become even worse recently.
17:23:26<betamax>Is twitter still OK? (with the latest version of snscrape?)
17:23:49<@JAA>Yeah
17:24:14<betamax>OK, I'll focus on candidate + party websites and twitter first
17:40:49madcarbs quits [Changing host]
17:40:50madcarbs (madcarbs) joins
18:02:16russss (russss) joins
18:05:46<@JAA>MeriStation is gone, by the way.
18:06:26<Jake>:(
18:06:50DogsRNice (Webuser299) joins
18:20:14@OrIdow6 quits [Ping timeout: 250 seconds]
18:34:51<thuban>so, uh... liveleak is not actually gone. the front page and 'browse' are redirecting, but channel pages and individual video items are still there. probably not for long. are we going to get on this?
18:40:40<Ryz>*breathes in heavily* We better get a load of LiveLeak as /much/ as possible before it completely BOMBS itself!
18:40:54<Ryz>JAA, arkiver, etc ^
18:41:23DogsRNice quits [Ping timeout: 258 seconds]
18:42:12<thuban>very little xhr on video pages (video urls in html as <video> <source>, related item links in html wrapped in some js); channels are less nice
18:42:31<Ryz>Provide as much information as possible on how the content can be accessed
18:42:42DogsRNice (Webuser299) joins
18:43:09<thuban>example video: https://www.liveleak.com/v?t=rpjp29oypa
18:43:22<thuban>example channel: https://www.liveleak.com/v/cute
18:43:44<Jake>is there an easy way to discover channels?
18:44:02<thuban>search is also working: https://www.liveleak.com/list?q=test&a=list&submit=Submit
18:44:12<Ryz>New IRC channel suggestion: liveleaked ?
18:44:28<thuban>deadleak, surely
18:44:47<Ryz>Waiting for JAA for approval
18:45:05<Ryz>livedry?
18:45:35<thuban>^ whoops, my mistake: channel and search pages both include results in the html, no api finagling needed
18:45:46<holbrooke>"live die repeat" -> "live-die-liveleak"
18:46:44<@JAA>++deadleak
18:47:06<Ryz>Waiting for the channel to openm
18:47:07<Ryz>*open
18:47:28<Jake>deadleak is my favorite of the bunch
18:47:31Doranwen likes deadleak, even though she can't really help with this one
18:48:10<@JAA>Created
18:48:23<Ryz>My contribution is saying obvious and crappy channel names and hope others come up with a more creative name :p
18:48:27<Ryz>#deadleak
19:18:52VerifiedJ quits [Quit: The Lounge - https://thelounge.chat]
19:20:32<@arkiver>#liveleaked
19:20:43<@arkiver>^ thats the channel
19:21:37VerifiedJ (VerifiedJ) joins
19:22:18<@arkiver>use #deadleak
19:32:18<Terbium>is there any existing tool to dedupe a bunch of WARC files by digest/hash? I have a bunch of WARCs with dupe record payloads.
19:45:20<@JAA>If anyone here knows/understands Bulgarian and could tell me what a guy in a short 35-second video is roughly saying, please get in touch.
19:45:33<@JAA>Yes, this is archival-related. :-)
19:54:01<gazorpazorp>I'm Bulgarian
19:54:18<nyany>JAA: ^
19:54:43<@JAA>Wonderful!
19:56:01<gazorpazorp>I'll translate, just send something to translate :)
19:57:42<@JAA>See PM
20:16:27mutantmnky quits [Ping timeout: 258 seconds]
20:16:44mutantmnky (mutantmonkey) joins
20:55:57spirit quits [Client Quit]
21:11:03<crispyalice2>Any word on when the parler stuff is going to be made public?
21:23:32<mgrandi>No idea how the upload is going
21:23:58LeGoupil quits [Client Quit]
21:24:48<crispyalice2>theres a collection on the archive.org with a bunch of items but its privated
21:24:56dm4v quits [Client Quit]
21:25:59dm4v joins
21:26:01dm4v quits [Changing host]
21:26:01dm4v (dm4v) joins
21:26:59<mgrandi>I'm not sure if that is from the project, it might be marked as private due to the weird legal issues around it , would have to ask
21:36:35hooway quits [Read error: Connection reset by peer]
21:36:37hooway_ joins
21:41:59IKI quits [Remote host closed the connection]
21:54:52marked quits [Client Quit]
21:55:30marked joins
22:12:59nerdguy1138 quits [Ping timeout: 258 seconds]
22:24:14hooway_ quits [Client Quit]
22:28:09nerdguy1138 (nerdguy1138) joins
22:40:17BlueMaxima joins
23:15:08OrIdow6 (OrIdow6) joins
23:15:08@ChanServ sets mode: +o OrIdow6