| 00:00:37 | | chrismeller (chrismeller) joins |
| 00:01:01 | | chrismeller quits [Remote host closed the connection] |
| 00:01:02 | <Jake> | Yes, everything times out, pings and HTTP request. Blanket ban on that IP. |
| 00:01:24 | | chrismeller (chrismeller) joins |
| 00:03:02 | | nerdguy1138 (nerdguy1138) joins |
| 00:05:55 | | chrismeller quits [Remote host closed the connection] |
| 00:05:55 | | onetruth quits [Remote host closed the connection] |
| 00:06:06 | | onetruth joins |
| 00:06:37 | | chrismeller (chrismeller) joins |
| 00:07:01 | | chrismeller quits [Remote host closed the connection] |
| 00:08:07 | | chrismeller (chrismeller) joins |
| 00:08:31 | | chrismeller quits [Remote host closed the connection] |
| 00:09:37 | | chrismeller (chrismeller) joins |
| 00:10:01 | | chrismeller quits [Remote host closed the connection] |
| 00:11:07 | | chrismeller (chrismeller) joins |
| 00:11:11 | | qw3rty quits [Read error: Connection reset by peer] |
| 00:11:31 | | chrismeller quits [Remote host closed the connection] |
| 00:11:32 | | Atom__ quits [Read error: Connection reset by peer] |
| 00:11:51 | | qw3rty joins |
| 00:12:14 | | Atom__ joins |
| 00:12:37 | | chrismeller (chrismeller) joins |
| 00:13:01 | | chrismeller quits [Remote host closed the connection] |
| 00:14:07 | | chrismeller (chrismeller) joins |
| 00:14:31 | | chrismeller quits [Remote host closed the connection] |
| 00:15:37 | | chrismeller (chrismeller) joins |
| 00:16:01 | | chrismeller quits [Remote host closed the connection] |
| 00:16:22 | | yay joins |
| 00:17:02 | <yay> | might want to consider starting to download everything before they implement measures to prevent "DOS attempts" |
| 00:17:07 | | chrismeller (chrismeller) joins |
| 00:17:31 | | chrismeller quits [Remote host closed the connection] |
| 00:18:37 | | chrismeller (chrismeller) joins |
| 00:19:01 | | chrismeller quits [Remote host closed the connection] |
| 00:20:07 | | chrismeller (chrismeller) joins |
| 00:20:31 | | chrismeller quits [Remote host closed the connection] |
| 00:21:37 | | chrismeller (chrismeller) joins |
| 00:22:01 | | chrismeller quits [Remote host closed the connection] |
| 00:22:59 | | yay quits [Client Quit] |
| 00:23:07 | | chrismeller (chrismeller) joins |
| 00:23:28 | | wickedplayer494 quits [Remote host closed the connection] |
| 00:23:31 | | chrismeller quits [Remote host closed the connection] |
| 00:24:37 | | chrismeller (chrismeller) joins |
| 00:25:01 | | chrismeller quits [Remote host closed the connection] |
| 00:26:07 | | chrismeller (chrismeller) joins |
| 00:26:31 | | chrismeller quits [Remote host closed the connection] |
| 00:27:37 | | chrismeller (chrismeller) joins |
| 00:28:01 | | chrismeller quits [Remote host closed the connection] |
| 00:29:07 | | chrismeller (chrismeller) joins |
| 00:29:31 | | chrismeller quits [Remote host closed the connection] |
| 00:29:49 | | wickedplayer494 joins |
| 00:30:37 | | chrismeller (chrismeller) joins |
| 00:30:52 | <thuban> | JAA: thanks for putting that in! i should probably make a matching pr for the manpage (and one for the config loc...) |
| 00:30:59 | | wickedplayer494 is now authenticated as wickedplayer494 |
| 00:31:01 | | chrismeller quits [Remote host closed the connection] |
| 00:32:07 | | chrismeller (chrismeller) joins |
| 00:32:31 | | chrismeller quits [Remote host closed the connection] |
| 00:32:42 | <thuban> | i don't appear to be banned (on the ip i'm now using) either; possibly the range i was running finished before sysadmin started paying attention |
| 00:33:37 | | chrismeller (chrismeller) joins |
| 00:34:01 | | chrismeller quits [Remote host closed the connection] |
| 00:35:07 | | chrismeller (chrismeller) joins |
| 00:35:31 | | chrismeller quits [Remote host closed the connection] |
| 00:36:37 | | chrismeller (chrismeller) joins |
| 00:37:01 | | chrismeller quits [Remote host closed the connection] |
| 00:38:07 | | chrismeller (chrismeller) joins |
| 00:38:31 | | chrismeller quits [Remote host closed the connection] |
| 00:39:37 | | chrismeller (chrismeller) joins |
| 00:40:01 | | chrismeller quits [Remote host closed the connection] |
| 00:41:07 | | chrismeller (chrismeller) joins |
| 00:41:31 | | chrismeller quits [Remote host closed the connection] |
| 00:41:49 | <thuban> | yay: i'm hopeful that they won't (sophisticated dos protection seems like a waste of money on a lame-duck site, whereas pulling the plug before the promised shutdown date would be pretty cold) |
| 00:41:54 | <thuban> | but it's a thought. |
| 00:42:35 | <thuban> | arkiver, you mentioned setting up a project; do you want hits as they come in? |
| 00:42:37 | | chrismeller (chrismeller) joins |
| 00:43:01 | | chrismeller quits [Remote host closed the connection] |
| 00:43:15 | <@arkiver> | thuban: 'hits'? |
| 00:44:07 | | chrismeller (chrismeller) joins |
| 00:44:18 | <thuban> | actual sites to archive |
| 00:44:31 | | chrismeller quits [Remote host closed the connection] |
| 00:44:49 | <thuban> | (as opposed to urls where there are no sites, 'misses' in the brute-force search) |
| 00:45:37 | | chrismeller (chrismeller) joins |
| 00:46:01 | | chrismeller quits [Remote host closed the connection] |
| 00:47:07 | | chrismeller (chrismeller) joins |
| 00:47:31 | | chrismeller quits [Remote host closed the connection] |
| 00:48:37 | | chrismeller (chrismeller) joins |
| 00:49:01 | | chrismeller quits [Remote host closed the connection] |
| 00:50:07 | | chrismeller (chrismeller) joins |
| 00:50:31 | | chrismeller quits [Remote host closed the connection] |
| 00:51:37 | | chrismeller (chrismeller) joins |
| 00:52:01 | | chrismeller quits [Remote host closed the connection] |
| 00:53:07 | | chrismeller (chrismeller) joins |
| 00:53:31 | | chrismeller quits [Remote host closed the connection] |
| 00:54:37 | | chrismeller (chrismeller) joins |
| 00:54:58 | | tbc1887 (tbc1887) joins |
| 00:55:01 | | chrismeller quits [Remote host closed the connection] |
| 00:55:03 | | tbc1887 quits [Remote host closed the connection] |
| 00:56:07 | | chrismeller (chrismeller) joins |
| 00:56:31 | | chrismeller quits [Remote host closed the connection] |
| 00:57:37 | | chrismeller (chrismeller) joins |
| 00:58:01 | | chrismeller quits [Remote host closed the connection] |
| 00:59:07 | | chrismeller (chrismeller) joins |
| 00:59:31 | | chrismeller quits [Remote host closed the connection] |
| 00:59:54 | | chrismeller (chrismeller) joins |
| 01:02:24 | | dm4v_ joins |
| 01:03:47 | | dm4v quits [Ping timeout: 265 seconds] |
| 01:03:47 | | dm4v_ is now known as dm4v |
| 01:03:48 | | dm4v is now authenticated as dm4v |
| 01:03:48 | | dm4v quits [Changing host] |
| 01:03:48 | | dm4v (dm4v) joins |
| 01:07:22 | | thetechrobo_ is now known as TheTechRobo |
| 01:07:30 | | TheTechRobo is now authenticated as TheTechRobo |
| 01:33:51 | <ThreeHM> | Got banned as well; happened some time after my last crawl finished |
| 01:40:44 | <Jake> | very interesting. |
| 01:44:04 | <@JAA> | My second machine also got banned a couple hours ago. |
| 01:44:50 | <@JAA> | Had only started it there tonight. |
| 01:49:31 | <TheTechRobo> | Haven't really been following the conversation; if there's a command I can run that even me with my slow internet can help, I'll run it :-) |
| 01:49:53 | <thuban> | TheTechRobo: see https://pad.notkiska.pw/p/glencoe |
| 01:53:10 | <TheTechRobo> | thuban: I assume I include the square brackets in the range? |
| 01:53:33 | <thuban> | yep (see curl's manpage for syntax details) |
| 01:54:24 | <TheTechRobo> | claimed a range |
| 01:54:31 | <thuban> | thanks! |
| 01:56:11 | <thuban> | n.b. that there appears to be a memory leak in curl and large ranges may get oom killed. |
| 01:57:10 | <thuban> | i've been sticking to 1e6 urls per curl process--ime that finishes a little too quickly to be convenient, but you can chain a bunch in a bash loop |
| 01:57:11 | <TheTechRobo> | > (you probably didn't mean to claim the whole remaining range) |
| 01:57:16 | <TheTechRobo> | yeah, i fixed it but my internet sucks |
| 01:57:27 | <TheTechRobo> | let me reload |
| 01:58:08 | <@JAA> | Who is geo? |
| 01:58:21 | | DiscantX joins |
| 02:02:14 | <TheTechRobo> | did my claiming of 2610000000-2650000000 go through? |
| 02:02:25 | <@JAA> | Yep |
| 02:02:25 | <TheTechRobo> | hard to tell, it doesn't confirm that my edits went through other than through reloading |
| 02:02:37 | <TheTechRobo> | Thans |
| 02:02:40 | <TheTechRobo> | *thanks |
| 02:05:05 | | yay joins |
| 02:06:25 | <yay> | regarding the curl memory leak: I ran the steps from https://unix.stackexchange.com/questions/36450/how-can-i-find-a-memory-leak-of-a-running-process, and the memory dump is many repeated instances of something like |
| 02:06:25 | <yay> | /etc/ssl/certs/etc/ssl/certs/ca-certificates.crtSayonara=Dewa_mata.com/dev/null/7.83.1TP/1.http://glencoe.mheducation.com/sites/1973909851/ |
| 02:06:39 | <yay> | (mentally insert newlines where appropriate) |
| 02:08:02 | <Jake> | .... that shouldn't be happening.... hahahaha |
| 02:09:57 | <TheTechRobo> | Weird |
| 02:10:02 | <thuban> | it's possible the problem is in openssl rather than curl itself (that's happened before) |
| 02:11:30 | <TheTechRobo> | That would make sense |
| 02:12:25 | | yay quits [Ping timeout: 265 seconds] |
| 02:16:00 | | Arcorann (Arcorann) joins |
| 02:23:28 | <TheTechRobo> | I got a single timeout but none after |
| 02:23:47 | <TheTechRobo> | or actually, connection reset by peer |
| 02:23:53 | <TheTechRobo> | i can reproduce it, too |
| 02:24:00 | <TheTechRobo> | anyone else get a connection reset here? http://glencoe.mheducation.com/sites/1810526697/ |
| 02:24:15 | <@JAA> | Yep |
| 02:24:23 | <TheTechRobo> | Weird. |
| 02:28:34 | <@JAA> | Getting a 404 now. |
| 02:28:44 | | qwertyasdfuiopghjkl joins |
| 02:30:32 | <TheTechRobo> | thuban: what do you mean by "ob1 error"? |
| 02:30:49 | <Jake> | A good test for me was to ping their site, all packets are blocked when you are banned. |
| 02:31:12 | <thuban> | off by one |
| 02:31:31 | <TheTechRobo> | oh |
| 02:31:53 | <TheTechRobo> | fixed |
| 02:37:48 | <TheTechRobo> | that memory leak is really annoying |
| 02:43:58 | | yay joins |
| 02:44:14 | <yay> | JAA: geo is a guy from Discord that I roped into helping |
| 02:46:14 | <@JAA> | Alright, thanks. |
| 02:52:58 | <TheTechRobo> | going to bed, gn all |
| 03:05:35 | <yay> | here I am debugging curl memory leaks :p |
| 03:22:19 | | yay quits [Client Quit] |
| 03:43:07 | | rmrm joins |
| 03:52:18 | | JackThompson05 is now known as JackThompson |
| 03:57:10 | | yay joins |
| 03:57:44 | <yay> | glencoe seems to be becoming a bit sluggish |
| 03:59:21 | <Jake> | Looking at the list, we have more people than ever hitting it, and at 300 requests/s? Somewhat surprised it's survived so far. |
| 04:00:02 | <yay> | pings are erratic, ranging from 70ms to 110ms |
| 04:00:29 | <@JAA> | 300? I was doing 7000... |
| 04:00:39 | <thuban> | that's not 300 requests per second, it's 300 requests in parallel. more like--yeah, what JAA said |
| 04:01:00 | <yay> | yeah that's max concurrent connections and not requests/sec |
| 04:02:08 | <Jake> | Yes, my bad. :) |
| 04:03:34 | <@JAA> | I'm really impressed by curl, by the way. I've done 800 req/s before with qwarc on the same machine, but that was with four processors, not one. Though it was full GET requests and also writing to WARC, not just fetching and discarding except for the status code. |
| 04:04:56 | <yay> | I still don't know about the curl memory leak |
| 04:05:02 | <yay> | kinda think it might be openssl now? |
| 04:05:31 | <yay> | all the memory leak detectors I've tried so far report 0B memory usage |
| 04:05:31 | <Jake> | Regardless, still a very impressive amount of requests to be handling for an EOL service :) |
| 04:06:49 | <@JAA> | Indeed |
| 04:15:07 | | lennier2 joins |
| 04:15:07 | | lennier2 quits [Excess Flood] |
| 04:15:34 | | lennier2 joins |
| 04:17:37 | | lennier1 quits [Ping timeout: 245 seconds] |
| 04:17:46 | | lennier2 is now known as lennier1 |
| 04:23:49 | | bonga quits [Remote host closed the connection] |
| 04:24:48 | | bonga joins |
| 04:32:31 | | thuban quits [Read error: Connection reset by peer] |
| 04:32:45 | | thuban joins |
| 04:34:11 | <yay> | https://github.com/curl/curl/issues/8933 filed a github issue |
| 04:34:18 | <@JAA> | Jake: On the upper end of your range, cf. the comment below geo's. |
| 04:35:16 | <Jake> | checking. |
| 05:03:31 | | seednode4943 quits [Ping timeout: 265 seconds] |
| 05:18:07 | <Jake> | I _think_ I did! Do we have a good way of double checking? |
| 05:21:19 | <@JAA> | I'll add my procedure to the pad. |
| 05:25:31 | <@JAA> | Oh yeah, I actually did it manually at the time. Hold my beer. :-) |
| 05:34:49 | <@JAA> | Jake: There. |
| 05:40:38 | <Jake> | ty. I can't say I used anything like that on OOM to figure out the "true" end, just looked at a hundred or so and found the lowest. (I assumed it was somewhat sequential!) Happy to rerun it if you think we've missed something. |
| 05:42:07 | <@JAA> | It's somewhat sequential, but in my OOM abort case, there was a difference of over 30k IDs between lowest unfetched and highest fetched. |
| 05:42:35 | <@JAA> | Which is only a few seconds worth, naturally. |
| 05:44:15 | <@JAA> | You can adapt the comm comand to display all IDs in the range that are not in the output file quite easily. Just replace the second seq parameter with your perceived end of range and remove the `head -1`. |
| 05:45:02 | <@JAA> | That'll give you a list of IDs within the range you claimed on the list but which are not in the output file. Then a little sed and piping to xargs curl to retry them, I guess. |
| 05:45:52 | <@JAA> | (And then upload a second filtered-* file if there is anything non-404 in those, which is unlikely but you never know.) |
| 05:46:22 | <Jake> | 👍will check |
| 05:46:53 | <yay> | it might be worth uploading the entire (compressed!) logs to make sure we didn't miss anything |
| 05:48:12 | <@JAA> | We could do that, I guess. It would be ~25 GiB with zstd -10 from my numbers. |
| 05:48:17 | | michaelblob_ quits [Read error: Connection reset by peer] |
| 05:49:22 | | yay quits [Remote host closed the connection] |
| 05:50:30 | | michaelblob (michaelblob) joins |
| 05:50:52 | | yay joins |
| 05:52:26 | <@JAA> | Or something smaller, like a list of attempted IDs. |
| 05:53:00 | <@JAA> | Or everyone can just run that comm command regardless of whether it was an OOM/^C/whatever or not to verify that it's correct before marking a range as done. |
| 05:58:18 | <Jake> | As long as everyone is verifying I don't think we need to waste the space of uploading full logs. I'm off to bed, but I'll check mine first thing tomorrow. |
| 06:05:36 | <@JAA> | Yeah, 25 GiB of almost entirely sequential ints and 404s is quite some waste of space. |
| 06:08:32 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 06:41:26 | | yay quits [Remote host closed the connection] |
| 07:31:25 | | rmrm quits [Ping timeout: 265 seconds] |
| 08:00:16 | <h2ibot> | OrIdow6 edited Tracker (+1773, Add a little history, all discerned through…): https://wiki.archiveteam.org/?diff=48654&oldid=46622 |
| 08:03:47 | <@OrIdow6> | FWIW goat.me/goat.at is one of those rare sites that I might be able to do myself |
| 08:03:55 | <@OrIdow6> | I.e. with GNU Parallel and wget-at |
| 08:04:10 | <@OrIdow6> | So will do that tomorrow |
| 08:06:09 | <@OrIdow6> | I suppose I could have done that instead of writing about the history of the tracker, but whatever |
| 08:25:01 | <@Sanqui> | history is important! |
| 08:25:36 | <@Sanqui> | archive team celebrated 13 this year, after all, we already have a fair bit of our own history |
| 08:27:31 | <@OrIdow6> | Yeah, I am glad to have slightly lit up the mostly dark abyss that is that 13 years |
| 09:14:17 | | Moritz72 joins |
| 09:14:28 | | Moritz72 quits [Remote host closed the connection] |
| 10:17:38 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 10:35:01 | | syntaxx leaves [The Lounge - https://thelounge.chat] |
| 11:19:09 | <thuban> | welp, i'm banned |
| 11:28:45 | <thuban> | (this means i will be unable to do re-runs, if i'm collating. there shouldn't be too many in total; treat them as hits? |
| 11:28:49 | <thuban> | ) |
| 11:35:59 | | march_happy quits [Ping timeout: 265 seconds] |
| 11:36:05 | | march_happy (march_happy) joins |
| 11:56:13 | | spirit joins |
| 12:16:47 | | march_happy quits [Ping timeout: 245 seconds] |
| 12:16:59 | | march_happy (march_happy) joins |
| 12:38:56 | | Iki1 quits [Client Quit] |
| 12:58:01 | | ThreeHM quits [Ping timeout: 265 seconds] |
| 13:10:32 | | march_happy quits [Ping timeout: 245 seconds] |
| 13:11:02 | | march_happy (march_happy) joins |
| 13:11:04 | | chrismeller quits [Ping timeout: 265 seconds] |
| 13:26:02 | | seednode4943 (seednode) joins |
| 13:27:06 | | HP_Archivist (HP_Archivist) joins |
| 13:38:27 | | Arcorann quits [Ping timeout: 245 seconds] |
| 13:48:54 | | march_happy quits [Ping timeout: 265 seconds] |
| 13:50:07 | | march_happy (march_happy) joins |
| 13:59:32 | | march_happy quits [Ping timeout: 265 seconds] |
| 13:59:51 | | march_happy (march_happy) joins |
| 13:59:52 | <TheTechRobo> | > Because of the parallel processing, responses are not received in order, so it will usually be the case that the end of the fully covered ID range is not the highest ID appearing in the output. |
| 14:00:03 | <TheTechRobo> | I don't really see why that's a problem; it's just a little bit of duplication, no? |
| 14:00:39 | <TheTechRobo> | (if you set the start of the range to be the last line of the file when it isn't the highest one) |
| 14:02:32 | <TheTechRobo> | Oh, I see now. |
| 14:03:50 | <TheTechRobo> | Some requests might be incomplete, etc. |
| 14:03:54 | <TheTechRobo> | That makes sense |
| 14:32:02 | | adamus1red quits [Quit: SigTerm] |
| 14:32:15 | <TheTechRobo> | https://transfer.archivete.am/O4B5L/2610000000-2620000000.tsv.zst for 261-262 |
| 14:32:30 | | Minkafighter quits [Quit: The Lounge - https://thelounge.chat] |
| 14:34:21 | | adamus1red (adamus1red) joins |
| 14:36:22 | <TheTechRobo> | that has been verified, btw |
| 14:43:52 | | yay joins |
| 14:43:55 | <yay> | welp, I'm banned to |
| 14:43:59 | <yay> | *too |
| 14:44:08 | <yay> | I have another IP though |
| 14:50:04 | | qwertyasdfuiopghjkl joins |
| 15:02:38 | | yay quits [Remote host closed the connection] |
| 15:12:07 | | Minkafighter joins |
| 15:31:00 | | HP_Archivist quits [Client Quit] |
| 15:47:37 | | DiscantX quits [Ping timeout: 245 seconds] |
| 15:54:39 | | Jake quits [Quit: Leaving for a bit!] |
| 15:54:53 | | Jake (Jake) joins |
| 16:07:32 | | rmrm joins |
| 16:26:57 | | qw3rty_ joins |
| 16:30:07 | | qw3rty quits [Ping timeout: 245 seconds] |
| 16:50:04 | <@JAA> | 24 hours later, I'm still banned. |
| 16:53:21 | <michaelblob> | on another note, didn't expect bare metal to be almost 2x faster than a vm when running curl |
| 16:57:49 | | Iki joins |
| 17:02:25 | | Minkafighter quits [Client Quit] |
| 17:10:21 | | Minkafighter joins |
| 17:44:56 | | ThreeHM (ThreeHeadedMonkey) joins |
| 18:13:52 | | nerdguy1138 quits [Ping timeout: 245 seconds] |
| 18:37:42 | | yay joins |
| 18:38:08 | <yay> | you know, it would be funny if I got my school banned by spamming glencoe with curl requests |
| 18:43:58 | | spirit quits [Client Quit] |
| 18:45:36 | <TheTechRobo> | Somehow still not banned on my VPS |
| 18:46:09 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 18:48:04 | <TheTechRobo> | How should I retry "Connection reset by peer" ones? My https://transfer.archivete.am/O4B5L/2610000000-2620000000.tsv.zst only has "connection reset" ones, it seems. Can someone either retry them or give me a command to use? |
| 18:48:40 | <yay> | I think we're retrying them later once we get the bulk of them dome |
| 18:48:45 | <yay> | *done |
| 18:49:14 | <TheTechRobo> | alright |
| 18:56:06 | <h2ibot> | OrIdow6 edited Tracker (+7, /* History */): https://wiki.archiveteam.org/?diff=48655&oldid=48654 |
| 19:05:57 | | emman joins |
| 19:06:16 | | emman quits [Remote host closed the connection] |
| 19:07:10 | | Minkafighter quits [Client Quit] |
| 19:08:33 | | Minkafighter joins |
| 19:09:12 | | knecht420 quits [Quit: The Lounge - https://thelounge.chat] |
| 19:09:44 | | knecht420 (knecht420) joins |
| 19:13:17 | | knecht420 quits [Read error: Connection reset by peer] |
| 19:13:20 | | knecht4209 (knecht420) joins |
| 19:16:44 | | knecht4209 quits [Client Quit] |
| 19:21:06 | | knecht4209 (knecht420) joins |
| 19:26:35 | | knecht4209 is now known as knecht420 |
| 19:41:19 | <@arkiver> | JAA: thuban: is it correct 'glencoe' is not on deathwatch page? |
| 19:41:44 | <@arkiver> | shutdown is 30 june according to https://pad.notkiska.pw/p/glencoe, but no mention on deathwatch page |
| 19:41:53 | <@JAA> | Yeah, appears to be missing. |
| 19:42:03 | <@arkiver> | can we please add it? |
| 19:42:10 | <@arkiver> | so many messages here, I'm not sure what is going on |
| 19:42:40 | <@arkiver> | and lets create a channel |
| 19:43:48 | <@arkiver> | so, I check https://transfer.archivete.am/ovn0l/filtered-1900000000-1959299859.tsv.zst and get http://glencoe.mheducation.com/sites/1901365308/ from that list. but it gives me a 404 |
| 19:43:52 | <@arkiver> | is that correct? |
| 19:44:48 | <@JAA> | arkiver: Those files contain '000' lines, which failed on the first retrieval attempt and need to be retried. Look for lines with '200' as the second field for working sites. |
| 19:44:57 | <@arkiver> | i see |
| 19:45:08 | <@JAA> | (Equivalent to wget response status 0) |
| 19:45:52 | <yay> | glencoe.mheducation.com/sites/1911124000/ is the single 200 response in that file, btw |
| 19:46:21 | <@arkiver> | we're not checking all IDs? |
| 19:46:27 | <@arkiver> | i see many gaps, what is the reason? |
| 19:46:34 | | DiscantX joins |
| 19:46:35 | <yay> | we are |
| 19:46:53 | <yay> | those are just verified 404s, which would be pretty pointless to include |
| 19:47:04 | <@arkiver> | i see, so everything not in the lists is not 404? |
| 19:47:05 | | DiscantX quits [Remote host closed the connection] |
| 19:47:14 | <yay> | arkiver: yes |
| 19:47:20 | <@arkiver> | then why would we include lines with 200 in the file? |
| 19:47:28 | <@JAA> | 200 = sites that exist |
| 19:47:33 | <@JAA> | 404 = sites that definitely don't exist |
| 19:47:35 | <@JAA> | 000 = ??? |
| 19:47:44 | | DiscantX joins |
| 19:47:54 | <@arkiver> | right, misinterpreted yay , thanks |
| 19:48:34 | <@JAA> | It looks like the proper domain might be glencoe.com, not glencoe.mheducation.com? |
| 19:48:53 | <yay> | > The glencoe.com site was retired on August 11th, 2017 as part of a continuous effort to provide you with the most relevant and up to date content. |
| 19:49:24 | <@JAA> | I'm getting the same retirement notice about June 30 on http://glencoe.com/ as I do on http://glencoe.mheducation.com/. |
| 19:49:33 | <yay> | mhm |
| 19:49:40 | <yay> | I wonder what "associate sites" there are |
| 19:49:58 | <@JAA> | But the 'sites' are 404s, yeah. |
| 19:50:13 | <@arkiver> | is everything just single pages like http://glencoe.mheducation.com/sites/2138132181/information_center_view0/ , or is there some deeper structure sometimes? |
| 19:50:46 | <yay> | I don't think so |
| 19:50:52 | <yay> | also https://mhedu.force.com/DTS/s/article/Glencoe-Online-Learning-Centers might be a useful reference |
| 19:51:30 | <yay> | that doesn't include all the sites, though |
| 19:51:37 | <yay> | it would be _such_ a tragedy if we lost ron's test page to history |
| 19:52:15 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+373, /* 2022 */ Add Glencoe): https://wiki.archiveteam.org/?diff=48656&oldid=48640 |
| 19:53:02 | <yay> | thanks arkiver |
| 19:53:05 | <@JAA> | novella.mhhe.com was also a thing, apparently. |
| 19:53:13 | <@JAA> | (That was me.) |
| 19:53:44 | <@arkiver> | thanks JAA |
| 19:54:32 | <yay> | ^ |
| 19:59:05 | | yay quits [Remote host closed the connection] |
| 19:59:58 | | yay joins |
| 20:05:20 | <yay> | https://transfer.archivete.am/wLH1P/known_sites.txt |
| 20:07:29 | <yay> | could we do something with these 700-ish sites from the official list? |
| 20:28:27 | | DiscantX quits [Ping timeout: 245 seconds] |
| 20:51:23 | | sonick (sonick) joins |
| 21:01:21 | | mikael quits [Ping timeout: 265 seconds] |
| 21:08:37 | | mikael joins |
| 21:13:02 | | mikael quits [Ping timeout: 245 seconds] |
| 21:14:57 | | mikael joins |
| 21:30:04 | <@OrIdow6> | goat.me/goat.at is running |
| 21:32:45 | <yay> | I'm downloading the known list of glencoe sites using wget to WARC |
| 21:32:50 | <yay> | should I upload to IA afterwards? |
| 21:33:30 | <yay> | looking at the lines scrolling by is definitely giving me a bit of a headache |
| 21:45:22 | | nerdguy1138 (nerdguy1138) joins |
| 21:50:59 | | nerdguy1138 quits [Client Quit] |
| 21:51:28 | | nerdguy1138 (nerdguy1138) joins |
| 22:42:10 | | BlueMaxima joins |
| 22:46:50 | | rmrm quits [Remote host closed the connection] |
| 22:48:31 | <TheTechRobo> | yay: yeah, but keep in mind that it's smart to do it via archivebot |
| 22:48:42 | <TheTechRobo> | only whitelisted users' warcs go into the wbm |
| 22:48:54 | <TheTechRobo> | and most people aren't whitelisted[citation needed] |
| 22:49:14 | <yay> | why archivebot? |
| 22:49:27 | <yay> | is that a "trusted" downloader or something? |
| 22:50:05 | <TheTechRobo> | It's an official Archive Team project created by trusted people, so yeah. |
| 22:50:39 | <TheTechRobo> | Uploading WARCs is fine, just don't expect them to go into the Wayback Machine. |
| 22:50:56 | <TheTechRobo> | Not a member of IA though so take my info with a grain of salt. :-) |
| 22:51:26 | <TheTechRobo> | s/member/employee |
| 22:53:20 | <TheTechRobo> | yay: ^ |
| 22:53:24 | <yay> | ArchiveBot looks like a bit of a pain to get running :) |
| 22:54:26 | <TheTechRobo> | Only to self-host, in which case your WARCs won't go in. |
| 22:54:34 | <TheTechRobo> | Otherwise, it's as simple as |
| 22:54:45 | <TheTechRobo> | !ao < https://example.com/imagelist |
| 22:54:55 | <TheTechRobo> | (a bit more difficult than that, but that's the gist of it |
| 22:54:55 | <TheTechRobo> | ) |
| 22:55:36 | <TheTechRobo> | https://wiki.archiveteam.org/index.php/ArchiveBot for more details |
| 22:55:37 | <TheTechRobo> | yay: ^ |
| 22:56:07 | <yay> | yep |
| 22:56:11 | <yay> | don't think I have perms though |
| 23:18:05 | <TheTechRobo> | You don't , but you can request someone else to do it for you |
| 23:24:07 | | BlueMaxima quits [Client Quit] |
| 23:27:21 | <yay> | I basically already have a list of 657 official urls which could be started on immediately |
| 23:27:59 | <yay> | the brute-force search might discover more, so I don't know if we want to wait on that for convenience with collating all of the discovered hits |
| 23:28:42 | <Jake> | Is this the same list that thub an had? |
| 23:29:58 | <yay> | I don't know |
| 23:30:15 | <yay> | it's the list that the http://glencoe.com/ retirement notice points to |
| 23:31:04 | <yay> | > A list of programs that have active Online Learning Centers can be found [here](https://mhedu.force.com/DTS/s/article/Glencoe-Online-Learning-Centers). |
| 23:40:30 | | march_happy quits [Ping timeout: 265 seconds] |
| 23:41:15 | | march_happy (march_happy) joins |
| 23:51:00 | | omglolbah_ quits [Ping timeout: 265 seconds] |
| 23:56:20 | | omglolbah joins |
| 23:59:52 | | chrismeller (chrismeller) joins |