00:13:15<@OrIdow6>It would be nice to have this glencoe stuff moved to a channel
00:14:10<@arkiver>agreed
00:14:29<@arkiver>any ideas?
00:19:54<yay>#glencoh-no :p
00:20:26<yay>or #glencohno without the -
00:22:28<TheTechRobo>#glenc-ohno
00:22:30<TheTechRobo>?
00:22:43<TheTechRobo>or without the dash, I don't mind
00:31:14driib8 (driib) joins
00:33:19<yay>I'm partial to #glenc-ohno
00:33:52driib quits [Ping timeout: 245 seconds]
00:33:52driib8 is now known as driib
00:40:04Arcorann (Arcorann) joins
01:02:34dm4v_ joins
01:04:36dm4v quits [Ping timeout: 265 seconds]
01:04:36dm4v_ is now known as dm4v
01:04:37dm4v quits [Changing host]
01:04:37dm4v (dm4v) joins
01:37:10<h2ibot>JustAnotherArchivist edited Internet infrastructure (+604, Add NTP): https://wiki.archiveteam.org/?diff=48657&oldid=48635
01:50:08<thuban><@arkiver> is everything just single pages like http://glencoe.mheducation.com/sites/2138132181/information_center_view0/ , or is there some deeper structure sometimes?
01:51:57<thuban>there is deeper structure (but i believe it is always crawlable with standard wpull extraction). previous examples/discussion: https://hackint.logs.kiska.pw/archiveteam-bs/20220520#c318850
01:52:30<thuban>(i like #glencohno)
01:59:02<Jake>Did you see the list above in the pdf thuban?
02:05:25<thuban>yes, interesting albeit known incomplete.
02:07:03<thuban>unfortunately the ban prevents me from poking around in the www.glencoe.com or highered.mheducation.com domains
02:09:30<thuban>former doesn't really look enumerable; latter seems to have the same url structure as glencoe.mheduction.com, but at this point i have little hope for being able to search it unless someone discovers something clever
02:11:41<yay>hmm, didn't know that highered.mheducation.com existed
02:12:45<yay>would it be possible to ask Warrior instances to help brute-force it?
02:14:45<thuban>possible, yes; practical, i have doubts
02:15:10<yay>how many are there, anyways?
02:16:23<thuban>warrior instances?
02:16:46<yay>yep
02:17:48<thuban>i'm not sure how many there are in the general fleet. usually when there's a project on much of the work is contributed by a few volunteers who spin up huge numbers of docker containers
02:18:11<thuban>(https://wiki.archiveteam.org/index.php/Running_Archive_Team_Projects_with_Docker)
02:18:31<yay>I see
02:18:48<yay>I don't know how many more requests/s the glencoe servers can support, too
02:19:20<yay>probably not much?
02:20:52<thuban>hard to say. (project trackers support centralized rate limiting, for what that's worth)
02:26:09<@JAA>DPoS projects won't get anywhere near 7k req/s from one CPU. More like 7.
02:27:00<@JAA>So it'd be a gigantic waste of energy compared to what we've been doing.
02:27:45march_happy quits [Read error: Connection reset by peer]
02:28:03<yay>on the bright side, 1,000,000,000 / 2.628e+6 seconds in a month only gives us 380.5 requests/s
02:28:31<@JAA>Yeah, we were going much faster than needed.
02:28:45<@JAA>Although we'll still need time to archive the sites themselves after the bruteforcing.
02:29:21<@JAA>Aanyway, channel... I don't like glencoh-no, but glencohno and glenc-ohno both seem fine to me.
02:29:45<TheTechRobo>ditto for the channel
02:30:16<yay>thuban likes #glencohno, and so it shall be
02:30:37<yay>(either that or #glenc-ohon since there's several people there already)
02:30:52<Jake>Incomplete as in, missing any actual books or missing those test sites we found, or just unknown?
02:31:41<yay>let's move to #glencohno
02:33:19march_happy (march_happy) joins
02:45:48<klg>recently Yahoo Japan stopped serving some of its content to European addresses
02:46:16<klg>I miss direct access to chiebukuro :'(
02:46:22<@OrIdow6>Do you know why?
02:46:46<klg>they don't give reason, just "From Wednesday, April 6, 2022, Yahoo! JAPAN is no longer available in the EEA and the United Kingdom"
02:47:36<Jake>(Possibly GDPR related?)
02:47:58<Jake>Appears to be fine from US still.
02:49:29<@OrIdow6>https://www.theverge.com/2022/2/1/22911965/yahoo-japan-europe-offline-regulations-compliance-gdpr
02:49:33<klg>that would be my guess, but it's a bit late to notice GDPR just now; anyway doesn't seem to be shitting down
02:49:58yay quits [Ping timeout: 265 seconds]
02:50:24<klg>shutting*
02:51:00<Jake>hilarious. Only like 4 years late.
02:53:23qwertyasdfuiopghjkl joins
03:07:43<Doranwen>that's Yahoo for you, lol
03:08:28<Doranwen>I recall referring to it as "stunningly incompetent" at least once during the Yahoo Groups project
03:10:14<@JAA>Yahoo! Japan is completely unrelated to Yahoo! elsewhere though. The only thing they have in common is the brand.
03:10:30<@JAA>As I understand it, anyway.
03:19:29yay joins
03:30:37sec^nd quits [Remote host closed the connection]
03:30:45<tech234a>thuban, yay: as for Warrior fleet size there's 12 Warriors set to auto according to https://warriorhq.archiveteam.org/ though I'm not sure how accurate that site is anymore. Note that it only includes Warriors and not project-specific containers.
03:31:44sec^nd (second) joins
03:32:03<Jake>(I'd imagine it's MUCH higher than 12, even for just warrior.)
03:33:58<@JAA>Yeah, no way that's accurate.
03:35:00<tech234a>Yeah I don't think all the Warriors register themselves with the database to get an ID from warriorhq
04:18:45<@OrIdow6>goat.me/goat.at coming alone
04:19:03<@OrIdow6>I increased the speed, hopefully it should finish in time
04:30:31Swicher joins
04:30:51<Swicher>Hello everyone, according to https://twitter.com/Hispachan/status/1531053620694659072 it seems that hispachan.org is going to close in less than 48 hours (there is even a countdown on the page), would it be possible to archive the site using the Warrior? From already thank you very much.
04:34:17<@OrIdow6>Swicher: according to Google Translate, the Tweet makes a distinction between something happening on May 31 and something on June 12; can you clarify what it is?
04:35:40<yay>something about "eliminating all(toda) pages"
04:37:45<tech234a>It sort of looks like read only May 31, full deletion June 12?
04:38:36<@OrIdow6>I am asking for a spanish speaker
04:38:47<@JAA>Based on the counter on the homepage and the post IDs in the various boards, it looks like posts also get continuously deleted, so we'd only be archiving recent activity?
04:39:29<Swicher>Yes, what happens is that supposedly the site will remain as "read only" after May 31 until June 12 (and then they will delete everything). I say supposedly because the site administration usually does things from one moment to the next without warning and I don't have much confidence that they keep their word (for example, yesterday they deleted a very popular board called /mx/ and only gave an hour's notice). It would be more than anything a preve
04:39:30<Swicher>ntive backup, but if you want to wait until June 1, I can understand.
04:40:20<@OrIdow6>JAA: https://www.hispachan.org/reglas/ -> What Google translate gives as "WHY HAS MY THREAD OR POST BEEN DELETED? " supports that
04:42:06<@JAA>It's also how chans typically operate, so it wouldn't surprise me one bit.
04:42:33<Swicher>@OrIdow6, my native language is Spanish (if I take a long time to respond, it's because I'm using a translator). @JAA the site is an anonymous image board like 4chan, so you can already get an idea of ​​how it works.
04:44:09<@OrIdow6>Don't see anything substantial on the Bibanon wiki
04:44:49<Swicher>...but unlike 4chan, Hispachan aren't many working archive projects, only http://hispafiles.ru/ but there you have to archive threads manually and in the Wayback Machine the site is blocked.
04:44:55<@JAA>https://hispafiles.ru/ calls itself an 'Archivo de Hispachan'.
04:45:01<@JAA>Heh
04:45:12<@JAA>Ah, right.
04:46:15<@OrIdow6>I also see mentions of a wiki somewhere
04:46:53<@JAA>Hmm https://archive.org/details/hispachan_wiki
04:47:06<@OrIdow6>Oh, it closed
04:47:08<@OrIdow6>yeah
05:04:41<h2ibot>Themadprogramer edited Discourse (+47, /* Active Discourses */ added Elastic Stack): https://wiki.archiveteam.org/?diff=48658&oldid=48595
05:04:42<h2ibot>Arcorann edited 4chan (+2, /* yuki.la */ fix heading): https://wiki.archiveteam.org/?diff=48659&oldid=48634
05:05:18yay quits [Ping timeout: 265 seconds]
05:28:47jtagcat6 quits [Quit: Bye!]
05:29:05jtagcat6 (jtagcat) joins
05:32:30<Swicher>From what I read in the #archivebot topic, the Warrior seems to be saturated. Do you prefer that I come back in 2 days to request the archive again (if Hispachan is still accessible) or can you start now?
05:36:21<tech234a>Swicher: ArchiveBot is separate from the Warrior
05:36:46<h2ibot>Switchnode edited Deathwatch (+213, /* 2022 */ add hispachan): https://wiki.archiveteam.org/?diff=48660&oldid=48656
05:37:05<@OrIdow6>Which is immaterial, since this is more a question of what there is to archive
05:38:31<@JAA>(And whether we want to, given what we have/haven't been doing about image boards in the past due to questionable content etc.)
05:43:30<Swicher>Well, I don't know if the site will disappear on June 12 or if it will before because of what I said above, so I thought that an emergency job could be put in place to cover the entire site just in case.
05:43:31<Swicher>JAA, as far as I've seen the site is quite responsible for moderating problematic content, so I doubt there's anything to do with legal issues or something else.
05:44:08<@JAA>Ok, that sounds better than most *chans then. :-)
06:09:18DiscantX joins
07:00:32march_happy quits [Ping timeout: 245 seconds]
07:00:57march_happy (march_happy) joins
07:06:11<pabs>I asked the Ubuntu/Canonical sysadmins to bring back lococouncil.ubuntu.com so it can go in AB one last time shut down. they are asking which IP addresses to allow access to the service in their firewall. does AB have defined IPs? is there a way to do that?
07:06:24<pabs>er, one last time before shut down
07:08:57pronoiac joins
07:09:41<pronoiac>Hey all! Would this be a good place to suggest a site that might be worth archiving?
07:13:03<@OrIdow6>pronoiac: Yes
07:13:23tech234a|m leaves
07:13:25<@Sanqui|m>pabs: that would be possible, somebody (like JAA, sorry!) would have to pick a pipeline
07:13:26<@Sanqui|m>proniac: indeed!
07:16:00<pronoiac>Cool
07:16:43<pronoiac>context: I was reading this blog post - https://raymii.org/s/blog/Using_a_Windows_Mobile_2003_PDA_hp_ipaq_in_2022_including_whatsapp.html
07:17:19<pronoiac>It linked this collection of Pocket PC games - https://oldhandhelds.com/?dir=Pocket%20Pc%20Software/Games/
07:18:18<pronoiac>Poking around, it feels like it might be worth crawling. I could crawl the whole thing, but I lack tools for sorting individual files automatically.
07:19:17<pronoiac>Of possible specific note is https://oldhandhelds.com/?dir=Full-Dump
07:19:59<pronoiac>The tarball in there was picked up by the Wayback Machine on 2022-01-10.
07:21:09<@Sanqui|m>interesting
07:21:51<@Sanqui|m>I actually tried to archive oldhandhelds.com with ArchiveBot once but a 83GB file is a bit too much for archivebot at this time
07:22:22<@Sanqui|m>so the job crashed due to running out of disk space 😅
07:22:27<pronoiac>I'm downloading it right now
07:22:29<@Sanqui|m>I'll make a note of it though
07:22:50<@Sanqui|m>nvm, it's already in my knowledge base. lol
07:23:29<pronoiac>would uploading it to the Internet Archive cause the same crash?
07:24:00<@Sanqui|m>no, you can go right ahead and do that!
07:25:04<pronoiac>Ugh, that would take almost a day to upload from here.
07:25:46<pronoiac>I'm in San Francisco. I think the last time I visited the Archive and tried to upload over wifi, it was slow.
07:26:30<@Sanqui|m>I think IA is sorta slow for everybody
07:27:31<pronoiac>I mean, I think the connection right there was slower than my home connection
07:27:49<@OrIdow6><Sanqui|m> I think IA is sorta slow for everybody
07:28:05<@OrIdow6>Which remains an accurate observation even though there are OOM differences of what "slow" is
07:34:39Arcorann quits [Remote host closed the connection]
07:34:39<pronoiac>Ok, I'll try to get the tarball and upload it, and you've entered it on your to-do list, so I think we're good. Thanks!
07:34:39chrismeller quits [Remote host closed the connection]
07:34:39onetruth quits [Remote host closed the connection]
07:34:39dm4v quits [Client Quit]
07:34:39Iki quits [Remote host closed the connection]
07:34:43dm4v joins
07:34:44dm4v quits [Changing host]
07:34:44dm4v (dm4v) joins
07:34:45Iki joins
07:34:49onetruth joins
07:35:07chrismeller (chrismeller) joins
07:35:30chrismeller quits [Remote host closed the connection]
07:36:36chrismeller (chrismeller) joins
07:37:00chrismeller quits [Remote host closed the connection]
07:38:06chrismeller (chrismeller) joins
07:38:30chrismeller quits [Remote host closed the connection]
07:39:36chrismeller (chrismeller) joins
07:40:00chrismeller quits [Remote host closed the connection]
07:40:46Arcorann (Arcorann) joins
07:41:06chrismeller (chrismeller) joins
07:41:30chrismeller quits [Remote host closed the connection]
07:42:36chrismeller (chrismeller) joins
07:42:48pronoiac leaves
07:43:00chrismeller quits [Remote host closed the connection]
07:44:06chrismeller (chrismeller) joins
07:44:30chrismeller quits [Remote host closed the connection]
07:45:36chrismeller (chrismeller) joins
07:46:00chrismeller quits [Remote host closed the connection]
07:46:23chrismeller (chrismeller) joins
09:41:54T31M_ joins
09:42:16T31M quits [Client Quit]
09:42:16jtagcat6 quits [Client Quit]
09:42:16mikael quits [Client Quit]
09:42:16Iki quits [Remote host closed the connection]
09:42:16dm4v quits [Client Quit]
09:42:16Stiletto quits [Remote host closed the connection]
09:42:16qwertyasdfuiopghjkl quits [Client Quit]
09:42:16T31M_ is now known as T31M
09:42:18jtagcat6 (jtagcat) joins
09:42:20dm4v joins
09:42:20mikael joins
09:42:22Iki joins
09:42:29dm4v quits [Changing host]
09:42:29dm4v (dm4v) joins
09:43:01Stiletto joins
10:18:40qwertyasdfuiopghjkl joins
10:23:02nepeat quits [Quit: ZNC - https://znc.in]
10:48:43nepeat (nepeat) joins
10:58:52wickedplayer494 quits [Ping timeout: 245 seconds]
11:04:52<pabs>JAA: any thoughts on my AB IP address question from above?
11:09:25wickedplayer494 joins
11:16:08sec^nd quits [Remote host closed the connection]
11:19:16seednode4943 quits [Client Quit]
11:19:23seednode4943 (seednode) joins
11:19:26sec^nd (second) joins
11:19:48nepeat quits [Client Quit]
11:22:09nepeat (nepeat) joins
11:32:36ertzuio joins
11:32:38Webuser513 joins
11:32:54Webuser513 quits [Remote host closed the connection]
11:33:39ertzuio quits [Remote host closed the connection]
11:34:17qwertyasdfuiopghjkl93 joins
11:34:44qwertyasdfuiopghjkl93 quits [Client Quit]
11:35:42chrismeller quits [Ping timeout: 265 seconds]
11:35:49qwertyasdfuiopghjkl quits [Client Quit]
11:35:58qwertyasdfuiopghjkl joins
12:18:51wickedplayer494 quits [Ping timeout: 265 seconds]
12:46:53march_happy quits [Ping timeout: 265 seconds]
12:47:40HP_Archivist (HP_Archivist) joins
13:22:10LeGoupil joins
13:26:31<pabs>is it possible to AB a directory? this person is retiring from Debian https://ftp-master.debian.org/users/twerner/
13:27:15<pabs>(see the emeritus process note on https://nm.debian.org/person/twerner/ )
13:38:02DiscantX quits [Ping timeout: 245 seconds]
13:50:30<thuban>pabs: yes, the default settings for `!archive` should handle it just fine
13:51:09<pabs>great! thanks to whoever does it :)
13:55:39<pabs>is it best to put AB requests here or in #archivebot?
13:56:48<thuban>probably #archivebot (but i've already mentioned this one there)
13:58:35<pabs>ok, will do in future
13:59:42Arcorann quits [Ping timeout: 245 seconds]
14:03:39<thuban>running now :)
14:08:48Mateon1 quits [Remote host closed the connection]
14:09:45Mateon1 joins
14:15:00HP_Archivist quits [Client Quit]
14:42:10wickedplayer494 joins
14:49:59Swicher quits [Client Quit]
15:00:59yay joins
15:26:15yay quits [Remote host closed the connection]
15:33:39march_happy (march_happy) joins
16:03:05qwertyasdfuiopghjkl quits [Remote host closed the connection]
16:04:58yay (yay) joins
16:09:01qwertyasdfuiopghjkl joins
16:27:09nerdguy1138 quits [Ping timeout: 265 seconds]
16:28:27march_happy quits [Ping timeout: 245 seconds]
16:37:12bonga quits [Ping timeout: 245 seconds]
16:38:12Lord_Nightmare quits [Quit: ZNC - http://znc.in]
16:38:20bonga joins
16:40:46Lord_Nightmare (Lord_Nightmare) joins
16:42:58nerdguy1138 (nerdguy1138) joins
16:50:58yay quits [Ping timeout: 265 seconds]
17:02:28Caspian joins
17:06:08Caspian quits [Remote host closed the connection]
17:09:06spirit joins
17:20:29niku quits [Remote host closed the connection]
17:28:16yay (yay) joins
17:31:03yay leaves
17:31:49yay (yay) joins
18:31:04a joins
18:31:19a quits [Remote host closed the connection]
19:25:25DogsRNice (Webuser299) joins
19:33:18DogsRNice quits [Remote host closed the connection]
19:34:00DogsRNice (Webuser299) joins
20:25:01nerdguy1138 quits [Client Quit]
20:25:52nerdguy1138 (nerdguy1138) joins
20:28:55michaelblob quits [Read error: Connection reset by peer]
20:30:19michaelblob (michaelblob) joins
20:31:09spirit quits [Client Quit]
20:31:51michaelblob quits [Read error: Connection reset by peer]
20:47:57michaelblob (michaelblob) joins
21:34:35LeGoupil quits [Client Quit]
21:56:13DiscantX joins
22:05:49march_happy (march_happy) joins
22:12:46kn1003 joins
22:13:52kn100 quits [Ping timeout: 245 seconds]
22:13:52kn1003 is now known as kn100
22:16:20<@OrIdow6>Goat.me/goat.at download is waiting on one large straggler item
22:56:39BlueMaxima joins
23:07:58DiscantX quits [Ping timeout: 265 seconds]
23:38:46geezabiscuit quits [Ping timeout: 265 seconds]
23:46:21geezabiscuit (geezabiscuit) joins