00:05:15Arcorann (Arcorann) joins
00:28:06HP_Archivist (HP_Archivist) joins
00:50:32<TheTechRobo>How ironic is it that I can't find an AT repo?
00:54:01<@JAA>Which one?
00:54:26<TheTechRobo>Could've sworn I put it here. https://github.com/ArchiveTeam/shutdownify-grab
00:56:09<Jake>According to WBM that always 404'd.
00:56:55<h2ibot>TheTechRobo edited LiveJournal (-45, The IRC channel isn't registered or configured,…): https://wiki.archiveteam.org/?diff=48883&oldid=47338
00:57:00<@JAA>What was it supposed to be?
00:57:18<TheTechRobo>JAA: It's linked from here. https://wiki.archiveteam.org/index.php/Shutdownify
00:57:34<TheTechRobo>Not sure if it's a private repository or deleted, but it's not accessible.
00:59:51<joepie91|m>mmm https://gizmodo.com/shutdownify-is-fake-1725170019
01:00:20<@JAA>I don't think that repo ever existed.
01:00:39<TheTechRobo>Is it possible that the grab scripts were never completed?
01:00:40<@JAA>Chances are it was added to the wiki page in anticipation of getting created later.
01:00:47<TheTechRobo>Ah
01:05:26<TheTechRobo>This twitter account is actually a goldmine https://twitter.com/status_updates
01:05:55tzt (tzt) joins
01:07:56<h2ibot>TheTechRobo edited LiveJournal (+85): https://wiki.archiveteam.org/?diff=48884&oldid=48883
01:09:57<TheTechRobo>From JAA's talk page:
01:09:59<TheTechRobo>> Bangladesh gov blocked about 178 news portals sometime ago. Could you help archiving them? The list is at https://www.banglatribune.com/national/707022/১৭৮টি-অনিবন্ধিত-নিউজ-পোর্টাল-বন্ধ-করলো-বিটিআরসি Hasional (talk) 10:14, 21 June 2022 (UTC)
01:21:46BlueMaxima quits [Read error: Connection reset by peer]
01:27:13tzt quits [Read error: Connection reset by peer]
01:27:59<h2ibot>TheTechRobo edited LISWiki (-48, I thought I'd already fixed this...): https://wiki.archiveteam.org/?diff=48887&oldid=48841
01:29:48<thuban>^ i'll ocr the list
01:30:00tzt (tzt) joins
01:32:53<@OrIdow6>We did that
01:33:02<@OrIdow6>I remember trying to work OCR around that
01:33:15<@OrIdow6>I believe Ryz was managing it
01:33:16<TheTechRobo>Oh, interesting. I hadn't remembered.
01:33:20<@OrIdow6>thuban
01:36:41<thuban>hm, i see some ab jobs from after the publication of that article but before the talk page message
01:36:59<thuban>but spot-checking suggests that only the first 100 were covered
01:40:15<@OrIdow6>What kind of "spot-checking" is that? You may be getting tripped up by that many were already dead
01:40:34<@OrIdow6>Check your #ab logs
01:42:39<thuban>i don't log in #archivebot
01:46:19<thuban>does look like a lot of them are dead though. however, i think we should retrieve (or reassemble) a list of the live ones to add to urls-sources
02:19:07<h2ibot>TheTechRobo edited Distributed recursive crawls (-3, Update status): https://wiki.archiveteam.org/?diff=48888&oldid=48430
02:46:03HP_Archivist quits [Client Quit]
03:00:45sec^nd quits [Remote host closed the connection]
03:01:35sec^nd (second) joins
03:10:16march_happy quits [Ping timeout: 240 seconds]
03:10:29march_happy (march_happy) joins
03:14:46march_happy quits [Ping timeout: 240 seconds]
03:15:11march_happy (march_happy) joins
03:23:46sec^nd quits [Ping timeout: 240 seconds]
03:23:58BlueMaxima joins
03:26:46march_happy quits [Ping timeout: 240 seconds]
03:27:03march_happy (march_happy) joins
03:28:06sec^nd (second) joins
03:31:16march_happy quits [Ping timeout: 240 seconds]
03:32:22march_happy (march_happy) joins
03:37:13march_happy quits [Ping timeout: 265 seconds]
03:38:05march_happy (march_happy) joins
03:40:10march_happy quits [Read error: Connection reset by peer]
03:55:35qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
03:57:47michaelblob_ (michaelblob) joins
04:01:23michaelblob quits [Ping timeout: 265 seconds]
04:05:00michaelblob (michaelblob) joins
04:05:16michaelblob_ quits [Ping timeout: 240 seconds]
04:50:28thuban quits [Ping timeout: 240 seconds]
04:52:17thuban joins
04:53:58qwertyasdfuiopghjkl joins
04:56:00Arcorann quits [Ping timeout: 265 seconds]
05:45:40h3ndr1k quits [Ping timeout: 265 seconds]
06:21:15thuban quits [Read error: Connection reset by peer]
06:21:35h3ndr1k (h3ndr1k) joins
06:21:48thuban joins
06:24:52sec^nd quits [Remote host closed the connection]
06:25:29sec^nd (second) joins
07:00:02<Ryz>Twas trying to find some posts regarding Diablo II since playing a fair amount of Diablo II: Resurrected; https://www.diabloii.net/forums/threads/token-of-absolution.751221/ doesn't work anymore, but I already archived it back in 2021 March
07:00:14<Ryz>This is why I do my proactive archives like this nowadays <#>;
07:32:06<Ryz>Also a reminder, if there's something that holds dear to you, you should seriously /seriously/ archive it~
07:56:25<systwi_>^ I say this also applies to physical media, such as photos, homemade CDs/DVDs, VHS/βeta tapes, etc. & also things like buildings/monuments, interviews with elderly family members, the list goes on.
07:56:35<systwi_>Soooo many things to save >_<;
07:57:54Arcorann (Arcorann) joins
08:00:31<thuban>i couldn't find a complete list of the proscribed bangladeshi news links in the #-bs logs, so here is one (all _sic_): https://transfer.archivete.am/J8foX/bangladesh_links.txt
08:02:09<thuban>and here's a list of those i confirmed were working (and not just parking pages): https://transfer.archivete.am/12WUw4/bangladesh_links_filtered.txt
08:06:01<thuban>it would be nice to put those in urls-sources, but nb: one https://www.shompadak.com/) is behind cloudflare, and a few others (http://www.ctgpost.com/, http://dailybdnews.net/, http://www.eccnews.net/index.php) don't seem quite on topic and may be squatters
08:23:21qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
08:37:40<datechnoman>thuban I'd link those news sites on the urls project channel as news sites are scanned hourly for new articles etc
09:08:56BlueMaxima quits [Read error: Connection reset by peer]
09:43:46sec^nd quits [Ping timeout: 240 seconds]
09:45:49sec^nd (second) joins
09:59:55tech_exorcist (tech_exorcist) joins
10:08:15tech_exorcist quits [Write error: Broken pipe]
10:09:07tech_exorcist (tech_exorcist) joins
10:10:20<systwi_>Not sure when the Something Awful forums made this change (I don't visit there), but I noticed that threads older than "six months or so" [sic] are paywalled: http://forums.somethingawful.com/showthread.php?threadid=3423670
10:10:35<systwi_>"Viewing this content requires the archives upgrade. Regular users can only view threads from the last six months or so. Users with archives access can view old threads dating all the way back to 2001. WOWZERS!"
10:11:54<systwi_>I'm not entirely familiar with the forum but I have heard that it's pretty popular and, I think, non-PC (which probably explains the paywall).
10:20:23tech_exorcist_ (tech_exorcist) joins
10:22:46tech_exorcist quits [Ping timeout: 240 seconds]
10:25:54<@OrIdow6>https://forums.sufficientvelocity.com/threads/murazors-big-list-of-recommended-fanfics.1487/page-6 suggests the 6 months for the "archives upgrade" (the text to search) has been in place since at least 2013
10:27:15<systwi_>Thanks for researching this. :-)
10:29:44<@OrIdow6>Np
11:54:44sec^nd quits [Remote host closed the connection]
11:55:25sec^nd (second) joins
12:02:08Iki joins
12:34:04lennier1 quits [Ping timeout: 240 seconds]
13:14:02WesleyBidsnipes joins
13:28:10Bee joins
13:28:20Bee quits [Remote host closed the connection]
14:02:28wyatt8740 quits [Ping timeout: 240 seconds]
14:07:14wyatt8740 joins
14:27:46Arcorann quits [Ping timeout: 240 seconds]
14:52:50Nulo quits [Read error: Connection reset by peer]
15:03:38Nulo joins
15:13:48WesleyBidsnipes quits [Client Quit]
15:23:51jacobk quits [Ping timeout: 265 seconds]
15:47:04sec^nd quits [Remote host closed the connection]
15:47:44sec^nd (second) joins
15:48:47WesleyBidsnipes joins
15:52:32atphoenix_ is now known as atphoenix
15:58:10adamus1red quits [Quit: SigTerm]
15:59:44adamus1red (adamus1red) joins
16:21:09lennier1 joins
16:21:09hackbug quits [Quit: Lost terminal]
16:23:16hackbug (hackbug) joins
16:41:18AK quits [Quit: AK]
16:55:41jacobk joins
17:17:34AK (AK) joins
17:25:46Atom-- quits [Ping timeout: 240 seconds]
17:26:52qwertyasdfuiopghjkl joins
17:46:45LeGoupil joins
17:52:23tech_exorcist_ quits [Remote host closed the connection]
17:52:40tech_exorcist (tech_exorcist) joins
18:27:25tech_exorcist quits [Remote host closed the connection]
18:27:48tech_exorcist (tech_exorcist) joins
18:38:35tech_exorcist quits [Write error: Connection reset by peer]
18:39:13tech_exorcist (tech_exorcist) joins
18:48:55<h2ibot>Hasional edited ArchiveBot/Educational institutions/list (+208, /* Bangladesh */): https://wiki.archiveteam.org/?diff=48889&oldid=48675
18:49:55<h2ibot>NightHnh099 edited List of websites excluded from the Wayback Machine (+63, Added www.laurentien.com): https://wiki.archiveteam.org/?diff=48890&oldid=48831
18:49:56<h2ibot>Themadprogramer edited Discourse (+103, Category: Social Networks + /* Active…): https://wiki.archiveteam.org/?diff=48891&oldid=48788
18:49:57<h2ibot>Themadprogramer edited Template:Navigation box (+20, Added Discourse under Forums/Message boards): https://wiki.archiveteam.org/?diff=48892&oldid=48024
18:56:07AK quits [Remote host closed the connection]
18:57:26AK (AK) joins
19:01:22AK quits [Remote host closed the connection]
19:07:58AK (AK) joins
19:08:20<AK>Seems like scratch-grab ate up my storage, (110GB+) per container, anyone else seeing this?
19:15:02luxx99 joins
19:15:08luxx99 quits [Remote host closed the connection]
19:20:44tech_exorcist quits [Write error: Broken pipe]
19:20:50tech_exorcist (tech_exorcist) joins
19:27:22LeGoupil quits [Client Quit]
19:27:30LeGoupil joins
19:39:50LeGoupil quits [Client Quit]
19:53:04jacobk quits [Ping timeout: 265 seconds]
20:28:27jacobk joins
20:37:53IDK (IDK) joins
20:48:10jacobk quits [Ping timeout: 265 seconds]
20:55:08jacobk joins
20:58:15jacobk quits [Remote host closed the connection]
20:59:39jacobk joins
21:04:46jacobk quits [Ping timeout: 240 seconds]
21:07:47lennier1 quits [Client Quit]
21:08:45lennier1 (lennier1) joins
21:14:18tech_exorcist quits [Remote host closed the connection]
21:14:45tech_exorcist (tech_exorcist) joins
22:09:01tech_exorcist quits [Client Quit]
22:09:19Stilett0 joins
22:09:22Stiletto quits [Ping timeout: 265 seconds]
22:11:24WesleyBidsnipes quits [Client Quit]
22:21:32WesleyBidsnipes joins
22:22:22<Ruk8>Hello everyone! Sorry if I bother you with this again. Like all my previous messages, I have another list of urls to be archived. Here's the link: https://transfer.archivete.am/v4UXP/urls2.txt
22:26:17WesleyBidsnipes quits [Ping timeout: 265 seconds]
22:26:47<systwi_>Hi Ruk8, there's no worry about bothering us (or me, at least). Could you explain your list and why you want it saved?
22:28:18<systwi_>Some small suggestions to your list: I see URLs like:
22:28:20<systwi_>https://static.wixstatic.com/media/eb7f03_e03521ce8b814bf3a931c4aee15d9e53~mv2.jpg/v1/fill/w_786,h_442,al_c,q_80,usm_0.66_1.00_0.01,enc_auto/eb7f03_e03521ce8b814bf3a931c4aee15d9e53~mv2.jpg
22:28:25<systwi_>which appear to work as:
22:28:29<systwi_>https://static.wixstatic.com/media/eb7f03_e03521ce8b814bf3a931c4aee15d9e53~mv2.jpg
22:28:46<systwi_>That should, in theory, be the unmodified image. ^
22:30:12<systwi_>Although likely a slight difference, the same applies here:
22:30:23<systwi_>https://static.wixstatic.com/media/eb7f03_bd53ecc423424f4aa12a5b69392b9f76~mv2_d_1270_2268_s_2.png?dn=20190113ayaka.png
22:30:25<systwi_>and:
22:30:28<systwi_>https://static.wixstatic.com/media/eb7f03_bd53ecc423424f4aa12a5b69392b9f76~mv2_d_1270_2268_s_2.png
22:31:33<Ruk8>This time is all mixed up, sorry for that, most of all is not so important but still not trash. The majority of urls are from (in order): a utauloid site (an anime/manga japanese themed vocal synthetizer for music), resales of some software from open cascade, archives of the bain map of the allen instute, required by the brain map explorer 2 toolkit and finally some pages of a stenography blog.
22:31:41jacobk joins
22:32:31<Ruk8>I forgot to clear the urls 💀
22:32:55<Ruk8>relases*
22:32:56<systwi_>Ah, no worries, the URLs you have in the list look okay to me.
22:33:25<systwi_>I think, regarding cleaning up the URLs, it might be better to include both the original ones (like you have already) _and_ the clean ones.
22:35:23<systwi_>Do you have an estimated capacity to which these will total?
22:36:53<Ruk8>more than 10gb, less than 60gb.
22:37:16<Ruk8>If that's necessary, I could do a more precise estimate
22:37:21<systwi_>Thanks, that sounds about accurate to me.
22:37:39<systwi_>That's okay, the estimate is okay.
22:37:47<systwi_>*the estimate you provided
22:38:40<systwi_>If you're able to also include the clean URLs in the list, I can run it through for you.
22:39:01<systwi_>I mean, I could either way, but it would be nice.
22:39:02<Ruk8>Yeah, sure
22:39:06<systwi_>Thanks.
22:52:07<Ruk8>Done, here's the list: https://transfer.archivete.am/OVa2v/urls2.txt
22:53:23<systwi_>Awesome, thanks. Will run it through #archivebot for you.
22:53:52<Ruk8>Thx a lot
22:54:07<Ruk8>Ultimately, I have another question
22:56:14<Jake>https://dontasktoask.com/ feel free to put the question in :)
22:57:35<systwi_>Jake has it right. :-)
23:00:37<Ruk8>As I said here some time ago, I own a website not update since like a year and a half (and, it will not). That site was probably not indexed by the archivebot or the internet archive's crawler since no archive is present on the wayback machine. So... Is there any way I could help to simplify the process of crawling the pages since I'm the webmaster? N.B. the website is hosted on blogger, there is no hurry to archive the site but obviously it will not
23:00:37<Ruk8>remain online forever.
23:03:03<Jake>Yeah. Just give us the URL and we'll give it a shot.
23:03:30<@OrIdow6>We have a system set up to recursively crawl sites fairly easily
23:04:19<Ruk8>here's the site: mag.greenwolves.xyz
23:07:36<systwi_>If there are any anti-bot mechanisms, like captcha pages or rate limiting, disabling those would make the archival process much, much easier.
23:13:10<Ruk8>I guess there is a one from Google, since the site is hosted by them
23:13:20<thuban>i was going to suggest a sitemap, but i see you've already got one :)
23:14:08<Ruk8>Ruk8: And searching around there is no option to disable or change such things
23:15:57<systwi_>Hmm, too bad. :-/ Maybe, and hopefully, it won't be problematic during the archive.
23:18:40sepro quits [Quit: Ping timeout (120 seconds)]
23:19:05sepro (sepro) joins
23:27:08@OrIdow6 is now known as @OrIdow6^2
23:27:11@OrIdow6^2 is now known as @OrIdow6