| 00:13:15 | <@OrIdow6> | It would be nice to have this glencoe stuff moved to a channel |
| 00:14:10 | <@arkiver> | agreed |
| 00:14:29 | <@arkiver> | any ideas? |
| 00:19:54 | <yay> | #glencoh-no :p |
| 00:20:26 | <yay> | or #glencohno without the - |
| 00:22:28 | <TheTechRobo> | #glenc-ohno |
| 00:22:30 | <TheTechRobo> | ? |
| 00:22:43 | <TheTechRobo> | or without the dash, I don't mind |
| 00:31:14 | | driib8 (driib) joins |
| 00:33:19 | <yay> | I'm partial to #glenc-ohno |
| 00:33:52 | | driib quits [Ping timeout: 245 seconds] |
| 00:33:52 | | driib8 is now known as driib |
| 00:40:04 | | Arcorann (Arcorann) joins |
| 01:02:34 | | dm4v_ joins |
| 01:04:36 | | dm4v quits [Ping timeout: 265 seconds] |
| 01:04:36 | | dm4v_ is now known as dm4v |
| 01:04:37 | | dm4v is now authenticated as dm4v |
| 01:04:37 | | dm4v quits [Changing host] |
| 01:04:37 | | dm4v (dm4v) joins |
| 01:37:10 | <h2ibot> | JustAnotherArchivist edited Internet infrastructure (+604, Add NTP): https://wiki.archiveteam.org/?diff=48657&oldid=48635 |
| 01:50:08 | <thuban> | <@arkiver> is everything just single pages like http://glencoe.mheducation.com/sites/2138132181/information_center_view0/ , or is there some deeper structure sometimes? |
| 01:51:57 | <thuban> | there is deeper structure (but i believe it is always crawlable with standard wpull extraction). previous examples/discussion: https://hackint.logs.kiska.pw/archiveteam-bs/20220520#c318850 |
| 01:52:30 | <thuban> | (i like #glencohno) |
| 01:59:02 | <Jake> | Did you see the list above in the pdf thuban? |
| 02:05:25 | <thuban> | yes, interesting albeit known incomplete. |
| 02:07:03 | <thuban> | unfortunately the ban prevents me from poking around in the www.glencoe.com or highered.mheducation.com domains |
| 02:09:30 | <thuban> | former doesn't really look enumerable; latter seems to have the same url structure as glencoe.mheduction.com, but at this point i have little hope for being able to search it unless someone discovers something clever |
| 02:11:41 | <yay> | hmm, didn't know that highered.mheducation.com existed |
| 02:12:45 | <yay> | would it be possible to ask Warrior instances to help brute-force it? |
| 02:14:45 | <thuban> | possible, yes; practical, i have doubts |
| 02:15:10 | <yay> | how many are there, anyways? |
| 02:16:23 | <thuban> | warrior instances? |
| 02:16:46 | <yay> | yep |
| 02:17:48 | <thuban> | i'm not sure how many there are in the general fleet. usually when there's a project on much of the work is contributed by a few volunteers who spin up huge numbers of docker containers |
| 02:18:11 | <thuban> | (https://wiki.archiveteam.org/index.php/Running_Archive_Team_Projects_with_Docker) |
| 02:18:31 | <yay> | I see |
| 02:18:48 | <yay> | I don't know how many more requests/s the glencoe servers can support, too |
| 02:19:20 | <yay> | probably not much? |
| 02:20:52 | <thuban> | hard to say. (project trackers support centralized rate limiting, for what that's worth) |
| 02:26:09 | <@JAA> | DPoS projects won't get anywhere near 7k req/s from one CPU. More like 7. |
| 02:27:00 | <@JAA> | So it'd be a gigantic waste of energy compared to what we've been doing. |
| 02:27:45 | | march_happy quits [Read error: Connection reset by peer] |
| 02:28:03 | <yay> | on the bright side, 1,000,000,000 / 2.628e+6 seconds in a month only gives us 380.5 requests/s |
| 02:28:31 | <@JAA> | Yeah, we were going much faster than needed. |
| 02:28:45 | <@JAA> | Although we'll still need time to archive the sites themselves after the bruteforcing. |
| 02:29:21 | <@JAA> | Aanyway, channel... I don't like glencoh-no, but glencohno and glenc-ohno both seem fine to me. |
| 02:29:45 | <TheTechRobo> | ditto for the channel |
| 02:30:16 | <yay> | thuban likes #glencohno, and so it shall be |
| 02:30:37 | <yay> | (either that or #glenc-ohon since there's several people there already) |
| 02:30:52 | <Jake> | Incomplete as in, missing any actual books or missing those test sites we found, or just unknown? |
| 02:31:41 | <yay> | let's move to #glencohno |
| 02:33:19 | | march_happy (march_happy) joins |
| 02:45:48 | <klg> | recently Yahoo Japan stopped serving some of its content to European addresses |
| 02:46:16 | <klg> | I miss direct access to chiebukuro :'( |
| 02:46:22 | <@OrIdow6> | Do you know why? |
| 02:46:46 | <klg> | they don't give reason, just "From Wednesday, April 6, 2022, Yahoo! JAPAN is no longer available in the EEA and the United Kingdom" |
| 02:47:36 | <Jake> | (Possibly GDPR related?) |
| 02:47:58 | <Jake> | Appears to be fine from US still. |
| 02:49:29 | <@OrIdow6> | https://www.theverge.com/2022/2/1/22911965/yahoo-japan-europe-offline-regulations-compliance-gdpr |
| 02:49:33 | <klg> | that would be my guess, but it's a bit late to notice GDPR just now; anyway doesn't seem to be shitting down |
| 02:49:58 | | yay quits [Ping timeout: 265 seconds] |
| 02:50:24 | <klg> | shutting* |
| 02:51:00 | <Jake> | hilarious. Only like 4 years late. |
| 02:53:23 | | qwertyasdfuiopghjkl joins |
| 03:07:43 | <Doranwen> | that's Yahoo for you, lol |
| 03:08:28 | <Doranwen> | I recall referring to it as "stunningly incompetent" at least once during the Yahoo Groups project |
| 03:10:14 | <@JAA> | Yahoo! Japan is completely unrelated to Yahoo! elsewhere though. The only thing they have in common is the brand. |
| 03:10:30 | <@JAA> | As I understand it, anyway. |
| 03:19:29 | | yay joins |
| 03:30:37 | | sec^nd quits [Remote host closed the connection] |
| 03:30:45 | <tech234a> | thuban, yay: as for Warrior fleet size there's 12 Warriors set to auto according to https://warriorhq.archiveteam.org/ though I'm not sure how accurate that site is anymore. Note that it only includes Warriors and not project-specific containers. |
| 03:31:44 | | sec^nd (second) joins |
| 03:32:03 | <Jake> | (I'd imagine it's MUCH higher than 12, even for just warrior.) |
| 03:33:58 | <@JAA> | Yeah, no way that's accurate. |
| 03:35:00 | <tech234a> | Yeah I don't think all the Warriors register themselves with the database to get an ID from warriorhq |
| 03:45:21 | | yay is now authenticated as yay |
| 04:18:45 | <@OrIdow6> | goat.me/goat.at coming alone |
| 04:19:03 | <@OrIdow6> | I increased the speed, hopefully it should finish in time |
| 04:30:31 | | Swicher joins |
| 04:30:51 | <Swicher> | Hello everyone, according to https://twitter.com/Hispachan/status/1531053620694659072 it seems that hispachan.org is going to close in less than 48 hours (there is even a countdown on the page), would it be possible to archive the site using the Warrior? From already thank you very much. |
| 04:34:17 | <@OrIdow6> | Swicher: according to Google Translate, the Tweet makes a distinction between something happening on May 31 and something on June 12; can you clarify what it is? |
| 04:35:40 | <yay> | something about "eliminating all(toda) pages" |
| 04:37:45 | <tech234a> | It sort of looks like read only May 31, full deletion June 12? |
| 04:38:36 | <@OrIdow6> | I am asking for a spanish speaker |
| 04:38:47 | <@JAA> | Based on the counter on the homepage and the post IDs in the various boards, it looks like posts also get continuously deleted, so we'd only be archiving recent activity? |
| 04:39:29 | <Swicher> | Yes, what happens is that supposedly the site will remain as "read only" after May 31 until June 12 (and then they will delete everything). I say supposedly because the site administration usually does things from one moment to the next without warning and I don't have much confidence that they keep their word (for example, yesterday they deleted a very popular board called /mx/ and only gave an hour's notice). It would be more than anything a preve |
| 04:39:30 | <Swicher> | ntive backup, but if you want to wait until June 1, I can understand. |
| 04:40:20 | <@OrIdow6> | JAA: https://www.hispachan.org/reglas/ -> What Google translate gives as "WHY HAS MY THREAD OR POST BEEN DELETED? " supports that |
| 04:42:06 | <@JAA> | It's also how chans typically operate, so it wouldn't surprise me one bit. |
| 04:42:33 | <Swicher> | @OrIdow6, my native language is Spanish (if I take a long time to respond, it's because I'm using a translator). @JAA the site is an anonymous image board like 4chan, so you can already get an idea of ​​how it works. |
| 04:44:09 | <@OrIdow6> | Don't see anything substantial on the Bibanon wiki |
| 04:44:49 | <Swicher> | ...but unlike 4chan, Hispachan aren't many working archive projects, only http://hispafiles.ru/ but there you have to archive threads manually and in the Wayback Machine the site is blocked. |
| 04:44:55 | <@JAA> | https://hispafiles.ru/ calls itself an 'Archivo de Hispachan'. |
| 04:45:01 | <@JAA> | Heh |
| 04:45:12 | <@JAA> | Ah, right. |
| 04:46:15 | <@OrIdow6> | I also see mentions of a wiki somewhere |
| 04:46:53 | <@JAA> | Hmm https://archive.org/details/hispachan_wiki |
| 04:47:06 | <@OrIdow6> | Oh, it closed |
| 04:47:08 | <@OrIdow6> | yeah |
| 05:04:41 | <h2ibot> | Themadprogramer edited Discourse (+47, /* Active Discourses */ added Elastic Stack): https://wiki.archiveteam.org/?diff=48658&oldid=48595 |
| 05:04:42 | <h2ibot> | Arcorann edited 4chan (+2, /* yuki.la */ fix heading): https://wiki.archiveteam.org/?diff=48659&oldid=48634 |
| 05:05:18 | | yay quits [Ping timeout: 265 seconds] |
| 05:28:47 | | jtagcat6 quits [Quit: Bye!] |
| 05:29:05 | | jtagcat6 (jtagcat) joins |
| 05:32:30 | <Swicher> | From what I read in the #archivebot topic, the Warrior seems to be saturated. Do you prefer that I come back in 2 days to request the archive again (if Hispachan is still accessible) or can you start now? |
| 05:36:21 | <tech234a> | Swicher: ArchiveBot is separate from the Warrior |
| 05:36:46 | <h2ibot> | Switchnode edited Deathwatch (+213, /* 2022 */ add hispachan): https://wiki.archiveteam.org/?diff=48660&oldid=48656 |
| 05:37:05 | <@OrIdow6> | Which is immaterial, since this is more a question of what there is to archive |
| 05:38:31 | <@JAA> | (And whether we want to, given what we have/haven't been doing about image boards in the past due to questionable content etc.) |
| 05:43:30 | <Swicher> | Well, I don't know if the site will disappear on June 12 or if it will before because of what I said above, so I thought that an emergency job could be put in place to cover the entire site just in case. |
| 05:43:31 | <Swicher> | JAA, as far as I've seen the site is quite responsible for moderating problematic content, so I doubt there's anything to do with legal issues or something else. |
| 05:44:08 | <@JAA> | Ok, that sounds better than most *chans then. :-) |
| 06:09:18 | | DiscantX joins |
| 07:00:32 | | march_happy quits [Ping timeout: 245 seconds] |
| 07:00:57 | | march_happy (march_happy) joins |
| 07:06:11 | <pabs> | I asked the Ubuntu/Canonical sysadmins to bring back lococouncil.ubuntu.com so it can go in AB one last time shut down. they are asking which IP addresses to allow access to the service in their firewall. does AB have defined IPs? is there a way to do that? |
| 07:06:24 | <pabs> | er, one last time before shut down |
| 07:08:57 | | pronoiac joins |
| 07:09:41 | <pronoiac> | Hey all! Would this be a good place to suggest a site that might be worth archiving? |
| 07:13:03 | <@OrIdow6> | pronoiac: Yes |
| 07:13:23 | | tech234a|m leaves |
| 07:13:25 | <@Sanqui|m> | pabs: that would be possible, somebody (like JAA, sorry!) would have to pick a pipeline |
| 07:13:26 | <@Sanqui|m> | proniac: indeed! |
| 07:16:00 | <pronoiac> | Cool |
| 07:16:43 | <pronoiac> | context: I was reading this blog post - https://raymii.org/s/blog/Using_a_Windows_Mobile_2003_PDA_hp_ipaq_in_2022_including_whatsapp.html |
| 07:17:19 | <pronoiac> | It linked this collection of Pocket PC games - https://oldhandhelds.com/?dir=Pocket%20Pc%20Software/Games/ |
| 07:18:18 | <pronoiac> | Poking around, it feels like it might be worth crawling. I could crawl the whole thing, but I lack tools for sorting individual files automatically. |
| 07:19:17 | <pronoiac> | Of possible specific note is https://oldhandhelds.com/?dir=Full-Dump |
| 07:19:59 | <pronoiac> | The tarball in there was picked up by the Wayback Machine on 2022-01-10. |
| 07:21:09 | <@Sanqui|m> | interesting |
| 07:21:51 | <@Sanqui|m> | I actually tried to archive oldhandhelds.com with ArchiveBot once but a 83GB file is a bit too much for archivebot at this time |
| 07:22:22 | <@Sanqui|m> | so the job crashed due to running out of disk space 😅 |
| 07:22:27 | <pronoiac> | I'm downloading it right now |
| 07:22:29 | <@Sanqui|m> | I'll make a note of it though |
| 07:22:50 | <@Sanqui|m> | nvm, it's already in my knowledge base. lol |
| 07:23:29 | <pronoiac> | would uploading it to the Internet Archive cause the same crash? |
| 07:24:00 | <@Sanqui|m> | no, you can go right ahead and do that! |
| 07:25:04 | <pronoiac> | Ugh, that would take almost a day to upload from here. |
| 07:25:46 | <pronoiac> | I'm in San Francisco. I think the last time I visited the Archive and tried to upload over wifi, it was slow. |
| 07:26:30 | <@Sanqui|m> | I think IA is sorta slow for everybody |
| 07:27:31 | <pronoiac> | I mean, I think the connection right there was slower than my home connection |
| 07:27:49 | <@OrIdow6> | <Sanqui|m> I think IA is sorta slow for everybody |
| 07:28:05 | <@OrIdow6> | Which remains an accurate observation even though there are OOM differences of what "slow" is |
| 07:34:39 | | Arcorann quits [Remote host closed the connection] |
| 07:34:39 | <pronoiac> | Ok, I'll try to get the tarball and upload it, and you've entered it on your to-do list, so I think we're good. Thanks! |
| 07:34:39 | | chrismeller quits [Remote host closed the connection] |
| 07:34:39 | | onetruth quits [Remote host closed the connection] |
| 07:34:39 | | dm4v quits [Client Quit] |
| 07:34:39 | | Iki quits [Remote host closed the connection] |
| 07:34:43 | | dm4v joins |
| 07:34:44 | | dm4v is now authenticated as dm4v |
| 07:34:44 | | dm4v quits [Changing host] |
| 07:34:44 | | dm4v (dm4v) joins |
| 07:34:45 | | Iki joins |
| 07:34:49 | | onetruth joins |
| 07:35:07 | | chrismeller (chrismeller) joins |
| 07:35:30 | | chrismeller quits [Remote host closed the connection] |
| 07:36:36 | | chrismeller (chrismeller) joins |
| 07:37:00 | | chrismeller quits [Remote host closed the connection] |
| 07:38:06 | | chrismeller (chrismeller) joins |
| 07:38:30 | | chrismeller quits [Remote host closed the connection] |
| 07:39:36 | | chrismeller (chrismeller) joins |
| 07:40:00 | | chrismeller quits [Remote host closed the connection] |
| 07:40:46 | | Arcorann (Arcorann) joins |
| 07:41:06 | | chrismeller (chrismeller) joins |
| 07:41:30 | | chrismeller quits [Remote host closed the connection] |
| 07:42:36 | | chrismeller (chrismeller) joins |
| 07:42:48 | | pronoiac leaves |
| 07:43:00 | | chrismeller quits [Remote host closed the connection] |
| 07:44:06 | | chrismeller (chrismeller) joins |
| 07:44:30 | | chrismeller quits [Remote host closed the connection] |
| 07:45:36 | | chrismeller (chrismeller) joins |
| 07:46:00 | | chrismeller quits [Remote host closed the connection] |
| 07:46:23 | | chrismeller (chrismeller) joins |
| 09:41:54 | | T31M_ joins |
| 09:42:16 | | T31M quits [Client Quit] |
| 09:42:16 | | jtagcat6 quits [Client Quit] |
| 09:42:16 | | mikael quits [Client Quit] |
| 09:42:16 | | Iki quits [Remote host closed the connection] |
| 09:42:16 | | dm4v quits [Client Quit] |
| 09:42:16 | | Stiletto quits [Remote host closed the connection] |
| 09:42:16 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 09:42:16 | | T31M_ is now known as T31M |
| 09:42:18 | | jtagcat6 (jtagcat) joins |
| 09:42:20 | | dm4v joins |
| 09:42:20 | | mikael joins |
| 09:42:22 | | Iki joins |
| 09:42:29 | | dm4v is now authenticated as dm4v |
| 09:42:29 | | dm4v quits [Changing host] |
| 09:42:29 | | dm4v (dm4v) joins |
| 09:43:01 | | Stiletto joins |
| 10:18:40 | | qwertyasdfuiopghjkl joins |
| 10:23:02 | | nepeat quits [Quit: ZNC - https://znc.in] |
| 10:48:43 | | nepeat (nepeat) joins |
| 10:58:52 | | wickedplayer494 quits [Ping timeout: 245 seconds] |
| 11:04:52 | <pabs> | JAA: any thoughts on my AB IP address question from above? |
| 11:09:25 | | wickedplayer494 joins |
| 11:09:44 | | wickedplayer494 is now authenticated as wickedplayer494 |
| 11:16:08 | | sec^nd quits [Remote host closed the connection] |
| 11:19:16 | | seednode4943 quits [Client Quit] |
| 11:19:23 | | seednode4943 (seednode) joins |
| 11:19:26 | | sec^nd (second) joins |
| 11:19:48 | | nepeat quits [Client Quit] |
| 11:22:09 | | nepeat (nepeat) joins |
| 11:32:36 | | ertzuio joins |
| 11:32:38 | | Webuser513 joins |
| 11:32:54 | | Webuser513 quits [Remote host closed the connection] |
| 11:33:39 | | ertzuio quits [Remote host closed the connection] |
| 11:34:17 | | qwertyasdfuiopghjkl93 joins |
| 11:34:44 | | qwertyasdfuiopghjkl93 quits [Client Quit] |
| 11:35:42 | | chrismeller quits [Ping timeout: 265 seconds] |
| 11:35:49 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 11:35:58 | | qwertyasdfuiopghjkl joins |
| 12:18:51 | | wickedplayer494 quits [Ping timeout: 265 seconds] |
| 12:46:53 | | march_happy quits [Ping timeout: 265 seconds] |
| 12:47:40 | | HP_Archivist (HP_Archivist) joins |
| 13:22:10 | | LeGoupil joins |
| 13:26:31 | <pabs> | is it possible to AB a directory? this person is retiring from Debian https://ftp-master.debian.org/users/twerner/ |
| 13:27:15 | <pabs> | (see the emeritus process note on https://nm.debian.org/person/twerner/ ) |
| 13:38:02 | | DiscantX quits [Ping timeout: 245 seconds] |
| 13:50:30 | <thuban> | pabs: yes, the default settings for `!archive` should handle it just fine |
| 13:51:09 | <pabs> | great! thanks to whoever does it :) |
| 13:55:39 | <pabs> | is it best to put AB requests here or in #archivebot? |
| 13:56:48 | <thuban> | probably #archivebot (but i've already mentioned this one there) |
| 13:58:35 | <pabs> | ok, will do in future |
| 13:59:42 | | Arcorann quits [Ping timeout: 245 seconds] |
| 14:03:39 | <thuban> | running now :) |
| 14:08:48 | | Mateon1 quits [Remote host closed the connection] |
| 14:09:45 | | Mateon1 joins |
| 14:15:00 | | HP_Archivist quits [Client Quit] |
| 14:42:10 | | wickedplayer494 joins |
| 14:42:28 | | wickedplayer494 is now authenticated as wickedplayer494 |
| 14:49:59 | | Swicher quits [Client Quit] |
| 15:00:59 | | yay joins |
| 15:01:15 | | yay is now authenticated as yay |
| 15:26:15 | | yay quits [Remote host closed the connection] |
| 15:33:39 | | march_happy (march_happy) joins |
| 16:03:05 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 16:04:58 | | yay (yay) joins |
| 16:09:01 | | qwertyasdfuiopghjkl joins |
| 16:27:09 | | nerdguy1138 quits [Ping timeout: 265 seconds] |
| 16:28:27 | | march_happy quits [Ping timeout: 245 seconds] |
| 16:37:12 | | bonga quits [Ping timeout: 245 seconds] |
| 16:38:12 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
| 16:38:20 | | bonga joins |
| 16:40:46 | | Lord_Nightmare (Lord_Nightmare) joins |
| 16:42:58 | | nerdguy1138 (nerdguy1138) joins |
| 16:50:58 | | yay quits [Ping timeout: 265 seconds] |
| 17:02:28 | | Caspian joins |
| 17:06:08 | | Caspian quits [Remote host closed the connection] |
| 17:09:06 | | spirit joins |
| 17:20:29 | | niku quits [Remote host closed the connection] |
| 17:28:16 | | yay (yay) joins |
| 17:31:03 | | yay leaves |
| 17:31:49 | | yay (yay) joins |
| 18:31:04 | | a joins |
| 18:31:19 | | a quits [Remote host closed the connection] |
| 19:25:25 | | DogsRNice (Webuser299) joins |
| 19:33:18 | | DogsRNice quits [Remote host closed the connection] |
| 19:34:00 | | DogsRNice (Webuser299) joins |
| 20:25:01 | | nerdguy1138 quits [Client Quit] |
| 20:25:52 | | nerdguy1138 (nerdguy1138) joins |
| 20:28:55 | | michaelblob quits [Read error: Connection reset by peer] |
| 20:30:19 | | michaelblob (michaelblob) joins |
| 20:31:09 | | spirit quits [Client Quit] |
| 20:31:51 | | michaelblob quits [Read error: Connection reset by peer] |
| 20:47:57 | | michaelblob (michaelblob) joins |
| 21:34:35 | | LeGoupil quits [Client Quit] |
| 21:56:13 | | DiscantX joins |
| 22:05:49 | | march_happy (march_happy) joins |
| 22:12:46 | | kn1003 joins |
| 22:13:52 | | kn100 quits [Ping timeout: 245 seconds] |
| 22:13:52 | | kn1003 is now known as kn100 |
| 22:16:20 | <@OrIdow6> | Goat.me/goat.at download is waiting on one large straggler item |
| 22:56:39 | | BlueMaxima joins |
| 23:07:58 | | DiscantX quits [Ping timeout: 265 seconds] |
| 23:38:46 | | geezabiscuit quits [Ping timeout: 265 seconds] |
| 23:46:21 | | geezabiscuit (geezabiscuit) joins |