| 00:05:15 | | Arcorann (Arcorann) joins |
| 00:28:06 | | HP_Archivist (HP_Archivist) joins |
| 00:50:32 | <TheTechRobo> | How ironic is it that I can't find an AT repo? |
| 00:54:01 | <@JAA> | Which one? |
| 00:54:26 | <TheTechRobo> | Could've sworn I put it here. https://github.com/ArchiveTeam/shutdownify-grab |
| 00:56:09 | <Jake> | According to WBM that always 404'd. |
| 00:56:55 | <h2ibot> | TheTechRobo edited LiveJournal (-45, The IRC channel isn't registered or configured,…): https://wiki.archiveteam.org/?diff=48883&oldid=47338 |
| 00:57:00 | <@JAA> | What was it supposed to be? |
| 00:57:18 | <TheTechRobo> | JAA: It's linked from here. https://wiki.archiveteam.org/index.php/Shutdownify |
| 00:57:34 | <TheTechRobo> | Not sure if it's a private repository or deleted, but it's not accessible. |
| 00:59:51 | <joepie91|m> | mmm https://gizmodo.com/shutdownify-is-fake-1725170019 |
| 01:00:20 | <@JAA> | I don't think that repo ever existed. |
| 01:00:39 | <TheTechRobo> | Is it possible that the grab scripts were never completed? |
| 01:00:40 | <@JAA> | Chances are it was added to the wiki page in anticipation of getting created later. |
| 01:00:47 | <TheTechRobo> | Ah |
| 01:05:26 | <TheTechRobo> | This twitter account is actually a goldmine https://twitter.com/status_updates |
| 01:05:55 | | tzt (tzt) joins |
| 01:07:56 | <h2ibot> | TheTechRobo edited LiveJournal (+85): https://wiki.archiveteam.org/?diff=48884&oldid=48883 |
| 01:09:57 | <TheTechRobo> | From JAA's talk page: |
| 01:09:59 | <TheTechRobo> | > Bangladesh gov blocked about 178 news portals sometime ago. Could you help archiving them? The list is at https://www.banglatribune.com/national/707022/১৭৮টি-অনিবন্ধিত-নিউজ-পোর্টাল-বন্ধ-করলো-বিটিআরসি Hasional (talk) 10:14, 21 June 2022 (UTC) |
| 01:21:46 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 01:27:13 | | tzt quits [Read error: Connection reset by peer] |
| 01:27:59 | <h2ibot> | TheTechRobo edited LISWiki (-48, I thought I'd already fixed this...): https://wiki.archiveteam.org/?diff=48887&oldid=48841 |
| 01:29:48 | <thuban> | ^ i'll ocr the list |
| 01:30:00 | | tzt (tzt) joins |
| 01:32:53 | <@OrIdow6> | We did that |
| 01:33:02 | <@OrIdow6> | I remember trying to work OCR around that |
| 01:33:15 | <@OrIdow6> | I believe Ryz was managing it |
| 01:33:16 | <TheTechRobo> | Oh, interesting. I hadn't remembered. |
| 01:33:20 | <@OrIdow6> | thuban |
| 01:36:41 | <thuban> | hm, i see some ab jobs from after the publication of that article but before the talk page message |
| 01:36:59 | <thuban> | but spot-checking suggests that only the first 100 were covered |
| 01:40:15 | <@OrIdow6> | What kind of "spot-checking" is that? You may be getting tripped up by that many were already dead |
| 01:40:34 | <@OrIdow6> | Check your #ab logs |
| 01:42:39 | <thuban> | i don't log in #archivebot |
| 01:46:19 | <thuban> | does look like a lot of them are dead though. however, i think we should retrieve (or reassemble) a list of the live ones to add to urls-sources |
| 02:19:07 | <h2ibot> | TheTechRobo edited Distributed recursive crawls (-3, Update status): https://wiki.archiveteam.org/?diff=48888&oldid=48430 |
| 02:46:03 | | HP_Archivist quits [Client Quit] |
| 03:00:45 | | sec^nd quits [Remote host closed the connection] |
| 03:01:35 | | sec^nd (second) joins |
| 03:10:16 | | march_happy quits [Ping timeout: 240 seconds] |
| 03:10:29 | | march_happy (march_happy) joins |
| 03:14:46 | | march_happy quits [Ping timeout: 240 seconds] |
| 03:15:11 | | march_happy (march_happy) joins |
| 03:23:46 | | sec^nd quits [Ping timeout: 240 seconds] |
| 03:23:58 | | BlueMaxima joins |
| 03:26:46 | | march_happy quits [Ping timeout: 240 seconds] |
| 03:27:03 | | march_happy (march_happy) joins |
| 03:28:06 | | sec^nd (second) joins |
| 03:31:16 | | march_happy quits [Ping timeout: 240 seconds] |
| 03:32:22 | | march_happy (march_happy) joins |
| 03:37:13 | | march_happy quits [Ping timeout: 265 seconds] |
| 03:38:05 | | march_happy (march_happy) joins |
| 03:40:10 | | march_happy quits [Read error: Connection reset by peer] |
| 03:55:35 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 03:57:47 | | michaelblob_ (michaelblob) joins |
| 04:01:23 | | michaelblob quits [Ping timeout: 265 seconds] |
| 04:05:00 | | michaelblob (michaelblob) joins |
| 04:05:16 | | michaelblob_ quits [Ping timeout: 240 seconds] |
| 04:50:28 | | thuban quits [Ping timeout: 240 seconds] |
| 04:52:17 | | thuban joins |
| 04:53:58 | | qwertyasdfuiopghjkl joins |
| 04:56:00 | | Arcorann quits [Ping timeout: 265 seconds] |
| 05:45:40 | | h3ndr1k quits [Ping timeout: 265 seconds] |
| 06:21:15 | | thuban quits [Read error: Connection reset by peer] |
| 06:21:35 | | h3ndr1k (h3ndr1k) joins |
| 06:21:48 | | thuban joins |
| 06:24:52 | | sec^nd quits [Remote host closed the connection] |
| 06:25:29 | | sec^nd (second) joins |
| 07:00:02 | <Ryz> | Twas trying to find some posts regarding Diablo II since playing a fair amount of Diablo II: Resurrected; https://www.diabloii.net/forums/threads/token-of-absolution.751221/ doesn't work anymore, but I already archived it back in 2021 March |
| 07:00:14 | <Ryz> | This is why I do my proactive archives like this nowadays <#>; |
| 07:32:06 | <Ryz> | Also a reminder, if there's something that holds dear to you, you should seriously /seriously/ archive it~ |
| 07:56:25 | <systwi_> | ^ I say this also applies to physical media, such as photos, homemade CDs/DVDs, VHS/βeta tapes, etc. & also things like buildings/monuments, interviews with elderly family members, the list goes on. |
| 07:56:35 | <systwi_> | Soooo many things to save >_<; |
| 07:57:54 | | Arcorann (Arcorann) joins |
| 08:00:31 | <thuban> | i couldn't find a complete list of the proscribed bangladeshi news links in the #-bs logs, so here is one (all _sic_): https://transfer.archivete.am/J8foX/bangladesh_links.txt |
| 08:02:09 | <thuban> | and here's a list of those i confirmed were working (and not just parking pages): https://transfer.archivete.am/12WUw4/bangladesh_links_filtered.txt |
| 08:06:01 | <thuban> | it would be nice to put those in urls-sources, but nb: one https://www.shompadak.com/) is behind cloudflare, and a few others (http://www.ctgpost.com/, http://dailybdnews.net/, http://www.eccnews.net/index.php) don't seem quite on topic and may be squatters |
| 08:23:21 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 08:37:40 | <datechnoman> | thuban I'd link those news sites on the urls project channel as news sites are scanned hourly for new articles etc |
| 09:08:56 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 09:43:46 | | sec^nd quits [Ping timeout: 240 seconds] |
| 09:45:49 | | sec^nd (second) joins |
| 09:52:21 | | fangfufu is now authenticated as fangfufu |
| 09:59:55 | | tech_exorcist (tech_exorcist) joins |
| 10:08:15 | | tech_exorcist quits [Write error: Broken pipe] |
| 10:09:07 | | tech_exorcist (tech_exorcist) joins |
| 10:10:20 | <systwi_> | Not sure when the Something Awful forums made this change (I don't visit there), but I noticed that threads older than "six months or so" [sic] are paywalled: http://forums.somethingawful.com/showthread.php?threadid=3423670 |
| 10:10:35 | <systwi_> | "Viewing this content requires the archives upgrade. Regular users can only view threads from the last six months or so. Users with archives access can view old threads dating all the way back to 2001. WOWZERS!" |
| 10:11:54 | <systwi_> | I'm not entirely familiar with the forum but I have heard that it's pretty popular and, I think, non-PC (which probably explains the paywall). |
| 10:20:23 | | tech_exorcist_ (tech_exorcist) joins |
| 10:22:46 | | tech_exorcist quits [Ping timeout: 240 seconds] |
| 10:25:54 | <@OrIdow6> | https://forums.sufficientvelocity.com/threads/murazors-big-list-of-recommended-fanfics.1487/page-6 suggests the 6 months for the "archives upgrade" (the text to search) has been in place since at least 2013 |
| 10:27:15 | <systwi_> | Thanks for researching this. :-) |
| 10:29:44 | <@OrIdow6> | Np |
| 11:54:44 | | sec^nd quits [Remote host closed the connection] |
| 11:55:25 | | sec^nd (second) joins |
| 12:02:08 | | Iki joins |
| 12:34:04 | | lennier1 quits [Ping timeout: 240 seconds] |
| 13:14:02 | | WesleyBidsnipes joins |
| 13:28:10 | | Bee joins |
| 13:28:20 | | Bee quits [Remote host closed the connection] |
| 14:02:28 | | wyatt8740 quits [Ping timeout: 240 seconds] |
| 14:07:14 | | wyatt8740 joins |
| 14:27:46 | | Arcorann quits [Ping timeout: 240 seconds] |
| 14:52:50 | | Nulo quits [Read error: Connection reset by peer] |
| 15:03:38 | | Nulo joins |
| 15:13:48 | | WesleyBidsnipes quits [Client Quit] |
| 15:23:51 | | jacobk quits [Ping timeout: 265 seconds] |
| 15:47:04 | | sec^nd quits [Remote host closed the connection] |
| 15:47:44 | | sec^nd (second) joins |
| 15:48:47 | | WesleyBidsnipes joins |
| 15:52:32 | | atphoenix_ is now known as atphoenix |
| 15:58:10 | | adamus1red quits [Quit: SigTerm] |
| 15:59:44 | | adamus1red (adamus1red) joins |
| 16:21:09 | | lennier1 joins |
| 16:21:09 | | lennier1 is now authenticated as lennier1 |
| 16:21:09 | | hackbug quits [Quit: Lost terminal] |
| 16:23:16 | | hackbug (hackbug) joins |
| 16:41:18 | | AK quits [Quit: AK] |
| 16:55:41 | | jacobk joins |
| 17:17:34 | | AK (AK) joins |
| 17:25:46 | | Atom-- quits [Ping timeout: 240 seconds] |
| 17:26:52 | | qwertyasdfuiopghjkl joins |
| 17:46:45 | | LeGoupil joins |
| 17:52:23 | | tech_exorcist_ quits [Remote host closed the connection] |
| 17:52:40 | | tech_exorcist (tech_exorcist) joins |
| 18:27:25 | | tech_exorcist quits [Remote host closed the connection] |
| 18:27:48 | | tech_exorcist (tech_exorcist) joins |
| 18:38:35 | | tech_exorcist quits [Write error: Connection reset by peer] |
| 18:39:13 | | tech_exorcist (tech_exorcist) joins |
| 18:48:55 | <h2ibot> | Hasional edited ArchiveBot/Educational institutions/list (+208, /* Bangladesh */): https://wiki.archiveteam.org/?diff=48889&oldid=48675 |
| 18:49:55 | <h2ibot> | NightHnh099 edited List of websites excluded from the Wayback Machine (+63, Added www.laurentien.com): https://wiki.archiveteam.org/?diff=48890&oldid=48831 |
| 18:49:56 | <h2ibot> | Themadprogramer edited Discourse (+103, Category: Social Networks + /* Active…): https://wiki.archiveteam.org/?diff=48891&oldid=48788 |
| 18:49:57 | <h2ibot> | Themadprogramer edited Template:Navigation box (+20, Added Discourse under Forums/Message boards): https://wiki.archiveteam.org/?diff=48892&oldid=48024 |
| 18:56:07 | | AK quits [Remote host closed the connection] |
| 18:57:26 | | AK (AK) joins |
| 19:01:22 | | AK quits [Remote host closed the connection] |
| 19:07:58 | | AK (AK) joins |
| 19:08:20 | <AK> | Seems like scratch-grab ate up my storage, (110GB+) per container, anyone else seeing this? |
| 19:15:02 | | luxx99 joins |
| 19:15:08 | | luxx99 quits [Remote host closed the connection] |
| 19:20:44 | | tech_exorcist quits [Write error: Broken pipe] |
| 19:20:50 | | tech_exorcist (tech_exorcist) joins |
| 19:27:22 | | LeGoupil quits [Client Quit] |
| 19:27:30 | | LeGoupil joins |
| 19:39:50 | | LeGoupil quits [Client Quit] |
| 19:53:04 | | jacobk quits [Ping timeout: 265 seconds] |
| 20:28:27 | | jacobk joins |
| 20:37:53 | | IDK (IDK) joins |
| 20:48:10 | | jacobk quits [Ping timeout: 265 seconds] |
| 20:55:08 | | jacobk joins |
| 20:58:15 | | jacobk quits [Remote host closed the connection] |
| 20:59:39 | | jacobk joins |
| 21:04:46 | | jacobk quits [Ping timeout: 240 seconds] |
| 21:07:47 | | lennier1 quits [Client Quit] |
| 21:08:45 | | lennier1 (lennier1) joins |
| 21:14:18 | | tech_exorcist quits [Remote host closed the connection] |
| 21:14:45 | | tech_exorcist (tech_exorcist) joins |
| 22:09:01 | | tech_exorcist quits [Client Quit] |
| 22:09:19 | | Stilett0 joins |
| 22:09:22 | | Stiletto quits [Ping timeout: 265 seconds] |
| 22:11:24 | | WesleyBidsnipes quits [Client Quit] |
| 22:21:32 | | WesleyBidsnipes joins |
| 22:22:22 | <Ruk8> | Hello everyone! Sorry if I bother you with this again. Like all my previous messages, I have another list of urls to be archived. Here's the link: https://transfer.archivete.am/v4UXP/urls2.txt |
| 22:26:17 | | WesleyBidsnipes quits [Ping timeout: 265 seconds] |
| 22:26:47 | <systwi_> | Hi Ruk8, there's no worry about bothering us (or me, at least). Could you explain your list and why you want it saved? |
| 22:28:18 | <systwi_> | Some small suggestions to your list: I see URLs like: |
| 22:28:20 | <systwi_> | https://static.wixstatic.com/media/eb7f03_e03521ce8b814bf3a931c4aee15d9e53~mv2.jpg/v1/fill/w_786,h_442,al_c,q_80,usm_0.66_1.00_0.01,enc_auto/eb7f03_e03521ce8b814bf3a931c4aee15d9e53~mv2.jpg |
| 22:28:25 | <systwi_> | which appear to work as: |
| 22:28:29 | <systwi_> | https://static.wixstatic.com/media/eb7f03_e03521ce8b814bf3a931c4aee15d9e53~mv2.jpg |
| 22:28:46 | <systwi_> | That should, in theory, be the unmodified image. ^ |
| 22:30:12 | <systwi_> | Although likely a slight difference, the same applies here: |
| 22:30:23 | <systwi_> | https://static.wixstatic.com/media/eb7f03_bd53ecc423424f4aa12a5b69392b9f76~mv2_d_1270_2268_s_2.png?dn=20190113ayaka.png |
| 22:30:25 | <systwi_> | and: |
| 22:30:28 | <systwi_> | https://static.wixstatic.com/media/eb7f03_bd53ecc423424f4aa12a5b69392b9f76~mv2_d_1270_2268_s_2.png |
| 22:31:33 | <Ruk8> | This time is all mixed up, sorry for that, most of all is not so important but still not trash. The majority of urls are from (in order): a utauloid site (an anime/manga japanese themed vocal synthetizer for music), resales of some software from open cascade, archives of the bain map of the allen instute, required by the brain map explorer 2 toolkit and finally some pages of a stenography blog. |
| 22:31:41 | | jacobk joins |
| 22:32:31 | <Ruk8> | I forgot to clear the urls 💀 |
| 22:32:55 | <Ruk8> | relases* |
| 22:32:56 | <systwi_> | Ah, no worries, the URLs you have in the list look okay to me. |
| 22:33:25 | <systwi_> | I think, regarding cleaning up the URLs, it might be better to include both the original ones (like you have already) _and_ the clean ones. |
| 22:35:23 | <systwi_> | Do you have an estimated capacity to which these will total? |
| 22:36:53 | <Ruk8> | more than 10gb, less than 60gb. |
| 22:37:16 | <Ruk8> | If that's necessary, I could do a more precise estimate |
| 22:37:21 | <systwi_> | Thanks, that sounds about accurate to me. |
| 22:37:39 | <systwi_> | That's okay, the estimate is okay. |
| 22:37:47 | <systwi_> | *the estimate you provided |
| 22:38:40 | <systwi_> | If you're able to also include the clean URLs in the list, I can run it through for you. |
| 22:39:01 | <systwi_> | I mean, I could either way, but it would be nice. |
| 22:39:02 | <Ruk8> | Yeah, sure |
| 22:39:06 | <systwi_> | Thanks. |
| 22:52:07 | <Ruk8> | Done, here's the list: https://transfer.archivete.am/OVa2v/urls2.txt |
| 22:53:23 | <systwi_> | Awesome, thanks. Will run it through #archivebot for you. |
| 22:53:52 | <Ruk8> | Thx a lot |
| 22:54:07 | <Ruk8> | Ultimately, I have another question |
| 22:56:14 | <Jake> | https://dontasktoask.com/ feel free to put the question in :) |
| 22:57:35 | <systwi_> | Jake has it right. :-) |
| 23:00:37 | <Ruk8> | As I said here some time ago, I own a website not update since like a year and a half (and, it will not). That site was probably not indexed by the archivebot or the internet archive's crawler since no archive is present on the wayback machine. So... Is there any way I could help to simplify the process of crawling the pages since I'm the webmaster? N.B. the website is hosted on blogger, there is no hurry to archive the site but obviously it will not |
| 23:00:37 | <Ruk8> | remain online forever. |
| 23:03:03 | <Jake> | Yeah. Just give us the URL and we'll give it a shot. |
| 23:03:30 | <@OrIdow6> | We have a system set up to recursively crawl sites fairly easily |
| 23:04:19 | <Ruk8> | here's the site: mag.greenwolves.xyz |
| 23:07:36 | <systwi_> | If there are any anti-bot mechanisms, like captcha pages or rate limiting, disabling those would make the archival process much, much easier. |
| 23:13:10 | <Ruk8> | I guess there is a one from Google, since the site is hosted by them |
| 23:13:20 | <thuban> | i was going to suggest a sitemap, but i see you've already got one :) |
| 23:14:08 | <Ruk8> | Ruk8: And searching around there is no option to disable or change such things |
| 23:15:57 | <systwi_> | Hmm, too bad. :-/ Maybe, and hopefully, it won't be problematic during the archive. |
| 23:18:40 | | sepro quits [Quit: Ping timeout (120 seconds)] |
| 23:19:05 | | sepro (sepro) joins |
| 23:27:08 | | @OrIdow6 is now known as @OrIdow6^2 |
| 23:27:11 | | @OrIdow6^2 is now known as @OrIdow6 |