00:02:10 | | etnguyen03 quits [Quit: Konversation terminated!] |
00:13:33 | | loug8318142 quits [Quit: The Lounge - https://thelounge.chat] |
00:20:30 | | robin quits [Ping timeout: 260 seconds] |
01:06:26 | | etnguyen03 (etnguyen03) joins |
01:06:39 | | nulldata quits [Read error: Connection reset by peer] |
01:07:37 | | nulldata (nulldata) joins |
01:35:02 | | etnguyen03 quits [Client Quit] |
01:37:43 | | Wohlstand quits [Remote host closed the connection] |
02:11:08 | | etnguyen03 (etnguyen03) joins |
03:01:39 | | etnguyen03 quits [Remote host closed the connection] |
03:05:52 | | linuxgemini (linuxgemini) joins |
04:38:01 | | Exorcism quits [Quit: Ping timeout (120 seconds)] |
04:38:20 | | Exorcism (exorcism) joins |
05:05:24 | | Wohlstand (Wohlstand) joins |
05:13:06 | <@JAA> | So apparently Kadokawa (owners of FromSoftware) are asking Sony to acquire them so there won't be a hostile takeover by Kakao, or something along those lines. It'd probably be good to archive Kadokawa, FromSoftware, and related things. |
05:44:39 | <@JAA> | > <b>Warning</b>: count(): Parameter must be an array or an object that implements Countable in <b>/var/www/forum.pclab.pl/system/Theme/Theme.php(847) : eval()'d code</b> on line <b>4389</b><br /> |
05:44:43 | <@JAA> | lol |
05:46:32 | <thuban> | i am literally looking at that now, was gonna write the spec script myself :'D |
05:51:02 | <@JAA> | Interestingly, I didn't get any of these when I went to test things for a bit under load. |
05:51:11 | <@JAA> | But I did while clicking around in a browser. |
05:52:21 | <@JAA> | I'd love to know why and what they're `eval`ing... |
06:08:42 | <@JAA> | ETA 26 hours |
06:09:40 | <thuban> | awesome, thank you!! <3 |
06:09:53 | <@JAA> | So it depends on when exactly they shut down. |
06:10:59 | <thuban> | the phrasing of the announcement makes me suspect it'll be midnight of the 29th/30th, but it could also be some time during the 30th |
06:11:12 | <@JAA> | Yeah |
06:11:46 | <thuban> | sorry for being kind of aggressive about this, the bus factor here is just not great |
06:14:50 | <@JAA> | That's fine, sorry for taking so long. :-/ |
06:15:22 | <@JAA> | Response times from the site aren't great. |
06:18:12 | <@JAA> | My logic for avoiding those PHP errors is checking for `>Powered by Invision Community<` (i.e. footer) and, on topic pages, for `id='elComment_` (i.e. at least one post). |
06:18:55 | <@JAA> | If the check fails, I don't write to WARC and retry. |
06:31:28 | <@JAA> | ETA has already shot up to 38 hours. |
06:46:06 | | bilboed0 quits [Quit: The Lounge - https://thelounge.chat] |
06:51:18 | | bilboed0 joins |
07:05:52 | | Unholy2361924645377131 (Unholy2361) joins |
07:14:40 | | Barto quits [Ping timeout: 260 seconds] |
07:31:00 | | Wohlstand quits [Ping timeout: 260 seconds] |
07:52:01 | | pixel (pixel) joins |
07:53:08 | <@JAA> | The site has slowed down further. ETA 54 hours now. |
08:09:37 | | flotwig_ joins |
08:10:40 | | flotwig quits [Ping timeout: 260 seconds] |
08:10:40 | | flotwig_ is now known as flotwig |
08:57:20 | | M60_ quits [Ping timeout: 260 seconds] |
08:57:37 | | loug8318142 joins |
08:57:59 | | M60_ joins |
09:01:08 | <szczot3k> | JAA is it really the site's fault, or can we just put more workers on it? |
09:01:22 | <szczot3k> | I like throwing compute power at problems and see if it resolves it |
09:06:05 | | ThreeHM quits [Ping timeout: 260 seconds] |
09:08:02 | | ThreeHM (ThreeHeadedMonkey) joins |
09:08:44 | | i_have_n0_idea (i_have_n0_idea) joins |
09:12:50 | <c3manu> | oh, netcup restocked the RS 2000 G11 deal for Manassas again |
09:12:59 | <c3manu> | that wasn't the case when i looked earlier |
09:15:18 | | nulldata quits [Quit: So long and thanks for all the fish!] |
09:15:46 | <c3manu> | JAA: are you awake? |
09:16:09 | | nulldata (nulldata) joins |
09:17:35 | <@JAA> | c3manu: pong |
09:17:56 | <c3manu> | should i get us another RS 2000 then? |
09:18:07 | <@JAA> | szczot3k: Yeah, it seems to be the site. qwarc is plenty capable. I've done close to 1k req/s with it from a single mediocre machine before. |
09:18:20 | <c3manu> | oh crap, i can't login from here |
09:18:58 | <c3manu> | nevermind. i'll have to check when i'me back from work whether it's still available |
09:19:04 | <c3manu> | sorry for the noise |
09:19:22 | <szczot3k> | If needed I can donate a node in poland for pclab |
09:22:10 | <@JAA> | I'm grabbing it from Vienna, so pretty nearby already. Latency isn't the issue, response times from the server are just awful. |
09:23:08 | <@JAA> | There might be an impact from very large topics. |
09:25:23 | <szczot3k> | Got it, if needed I'm still up for it |
09:27:12 | <@JAA> | When I was testing earlier, I got response times of roughly 400 ms. That shot up to about a second quite quickly after starting the real grab though. |
09:28:17 | <@JAA> | Not immediately though, which is a bit strange. Took like 15 minutes. |
09:28:26 | <szczot3k> | forum.pclab currently redirects to https://www.komputerswiat.pl/ |
09:28:34 | <szczot3k> | at least from my work |
09:29:00 | <@JAA> | Oh, did they shut it down already? |
09:29:05 | <szczot3k> | Might be the case of already switched DNS |
09:29:13 | <@JAA> | Yeah, possibly. I'm still getting 200s. |
09:29:35 | <szczot3k> | Unless my work is hijacking pclab's dns, it might be already down |
09:29:43 | <szczot3k> | or at least 'preparing to be down' |
09:29:43 | <@JAA> | Nope, I'm seeing the redirect as well here. |
09:30:00 | <szczot3k> | :/ |
09:30:43 | <szczot3k> | Trying to archive it might have accelerated the decision to shut it down |
09:31:19 | <@JAA> | DNS switch indeed. I'm connected to 91.199.101.202, but it now resolves to 178.239.128.26 and 195.93.178.26. |
09:32:03 | <szczot3k> | Will it affect archiving? |
09:32:43 | | Barto (Barto) joins |
09:35:08 | <@JAA> | That's a complex topic. |
09:36:30 | <@JAA> | It can be archived. Whether it can go into the Wayback Machine is a different question. |
09:36:52 | <@JAA> | Looks like they did something between 08:25 and 08:30. |
09:38:05 | <@JAA> | Worked fine at 08:25, got TLS errors at 08:30 until 08:53, then 404s. Not entirely sure how accurate this is with regards to reusing existing connections etc., but shouldn't be far off. |
09:38:39 | <@JAA> | I didn't mention this earlier, but I was running two processes, one for everything that was posted and one that continuously fetched new posts as they were being made. |
09:39:02 | <@JAA> | So I should have the last couple hours worth of posts, but only spotty coverage of the earlier ones. |
09:41:16 | <Barto> | fiber installed, 920 MBps down, 550 MBps up :-) |
09:47:50 | <katia> | MB!? |
09:48:40 | | mgrytbak quits [Ping timeout: 260 seconds] |
09:50:11 | | Gadelhas56283 joins |
09:53:55 | | Gadelhas5628 quits [Ping timeout: 260 seconds] |
09:53:55 | | Gadelhas56283 is now known as Gadelhas5628 |
09:54:32 | <@JAA> | I mean, could be 10G that doesn't saturate. :-P |
09:56:18 | | mgrytbak joins |
10:00:18 | | driib quits [Quit: The Lounge - https://thelounge.chat] |
10:00:48 | | driib (driib) joins |
10:04:06 | <c3manu> | JAA: if i can, should i get one of those netcup deals later? or do we have enough for the US stuff already? |
10:51:39 | <@JAA> | (Answered in #archivebot because that's where most of the prior discussion happened.) |
10:51:45 | <@JAA> | I'm still getting 200s from PCLab. |
10:53:59 | <@JAA> | This is still the already-established connection. I'm waiting for the process to finish with a 10k-page topic. Then I'll restart it with a DNS override. We can figure out the details of what to do with the data later. |
10:54:17 | <@JAA> | s/restart/resume/ I guess but with slightly different code. |
10:56:25 | <@JAA> | The last post was made at 2024-11-29T08:31:33Z. I didn't catch that one with my continuous thing. |
10:57:13 | <@JAA> | I wonder if someone will figure out the DNS trick and make another post. |
10:57:41 | <szczot3k> | On it ;) |
10:57:42 | <szczot3k> | Jk |
10:57:59 | <szczot3k> | DNS propagation is a real thing though, especially with some ISPs DNS |
10:58:07 | <szczot3k> | Wonder what's the TTL |
11:04:35 | <@JAA> | 300 seconds now, don't know what it was before, but if their sysadmins are half-competent, probably the same. |
11:06:37 | <@JAA> | Resumed, and I'm also running another continuous update crawl to catch that last post and anything still made now. |
11:06:47 | <@JAA> | Response times are much better than before, let's see if that holds. |
11:06:59 | <@JAA> | I suspect that 10k-page topic played a role. |
11:15:46 | <@JAA> | Yeah, same thing is happening again now that I'm hitting some topics with hundreds of pages. |
11:15:52 | <@JAA> | They bog everything down. |
11:17:42 | <@JAA> | Anyway, it's running, it'll grab what it'll grab until they take the actual server down. Data destination TBD. |
11:20:03 | <@JAA> | At the moment, the ETA is 19-ish hours, but I doubt it'll stay that way. |
11:20:29 | <szczot3k> | While on the 'polish forums' topic - was mpcforum.pl ever grabbed? |
11:20:46 | <@JAA> | Imagine your entire forum bogging down because one person loads a page from a big topic. |
11:21:04 | <szczot3k> | JAA, well, that might be one reason for the shutdown |
11:21:27 | <Barto> | katia: correct, Mbps, sorry about the confusion. good tho |
11:23:10 | <@JAA> | There was an attempt of mpcforum.pl a few years ago by someone who wrecklessly threw a huge number of big things into ArchiveBot. Something was grabbed, I guess, but it was aborted. |
11:23:39 | <szczot3k> | Would it be feasible to grab mpcforum? It's huge, and has large impact on the polish (gaming) community |
11:24:08 | <szczot3k> | I might throw some compute/networking and a couple of TBs of space |
11:24:13 | <@JAA> | I see strict Buttflare. |
11:25:13 | <szczot3k> | Does that mean it's impossible? |
11:26:18 | <@JAA> | At the moment, we don't have a way to get past that. |
11:26:40 | | Commander001 quits [Ping timeout: 260 seconds] |
11:27:03 | <szczot3k> | Is the problem strict rate limits, or something else? |
11:27:18 | <@JAA> | The JS challenge blocks all access. |
11:27:33 | <@JAA> | This is something the site operator can adjust. |
11:28:24 | <szczot3k> | So with the operator's permission, and cooperation it would be possible |
11:29:27 | <@JAA> | Yes |
11:29:53 | <szczot3k> | Wonder if they'd be open to this idea |
11:37:25 | | Commander001 joins |
12:00:04 | | Bleo182600722719623 quits [Quit: The Lounge - https://thelounge.chat] |
12:02:52 | | Bleo182600722719623 joins |
12:04:45 | | robin joins |
12:20:30 | | Unholy23619246453771312 (Unholy2361) joins |
12:24:25 | | Unholy2361924645377131 quits [Ping timeout: 260 seconds] |
12:25:35 | | Unholy23619246453771312 quits [Ping timeout: 260 seconds] |
12:38:05 | | SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962] |
12:39:42 | | SkilledAlpaca418962 joins |
13:19:32 | | sralracer (sralracer) joins |
14:37:45 | | Xanthon joins |
14:37:45 | | Xanthon is now authenticated as Xanthon |
14:37:45 | | Xanthon quits [Changing host] |
14:37:45 | | Xanthon (Xanthon) joins |
14:59:08 | | Wohlstand (Wohlstand) joins |
15:35:45 | | wyatt8740 quits [Ping timeout: 260 seconds] |
15:47:44 | | wyatt8740 joins |
15:57:20 | | Wohlstand quits [Ping timeout: 260 seconds] |
15:57:25 | <thuban> | JAA: what are the chances of (1) open-sourcing the existing set of qwarc spec scripts for forums so that others can more effectively contribute, and/or (2) getting at least one other op into a position to vet/run qwarc jobs? |
15:57:31 | <thuban> | we lost the sims forums in essentially this same scenario |
16:02:59 | | Xanthon quits [Read error: Connection reset by peer] |
16:26:27 | | th3z0l4 joins |
16:30:35 | | th3z0l4_ quits [Ping timeout: 260 seconds] |
16:42:35 | | Fiduro4830139107 quits [Quit: The Lounge - https://thelounge.chat] |
16:42:46 | | Fiduro4830139107 joins |
16:49:28 | | nulldata quits [Quit: Ping timeout (120 seconds)] |
16:50:24 | | nulldata (nulldata) joins |
17:01:21 | <TheTechRobo> | thuban: qwarc spec files are stored in the meta WARC. |
17:03:29 | | Webuser410129 joins |
17:06:35 | <Webuser410129> | I am looking for a track that was on artist union that was uploaded in 2016 I have found the archives at https://archive.org/details/archiveteam_theartistunion but i do not know how to go about searching this is the track I am looking for https://theartistunion.com/tracks/19486a i have tried a search at the top of the archives, but i get nothing |
17:06:35 | <Webuser410129> | found. any thoughts on how to search please? thank you |
17:08:49 | <thuban> | Webuser410129: https://tau.thetechrobo.ca/ |
17:10:17 | <Webuser410129> | ah thank you ever so much |
17:11:59 | <thuban> | TheTechRobo: ah, good point. would be nice to have them in a repo though |
17:12:50 | | Webuser410129 quits [Client Quit] |
17:29:49 | <h2ibot> | TheTechRobo edited The Artist Union (+206, Add link to retrieval tool): https://wiki.archiveteam.org/?diff=53902&oldid=48826 |
17:31:29 | | lflare quits [Quit: Bye] |
17:31:53 | | lflare (lflare) joins |
17:36:12 | | Webuser752070 joins |
17:36:23 | | Webuser752070 quits [Client Quit] |
17:48:50 | | Notrealname1234 (Notrealname1234) joins |
17:53:31 | | Notrealname1234 quits [Client Quit] |
18:52:28 | | hackbug quits [Remote host closed the connection] |
18:59:51 | | hackbug (hackbug) joins |
19:02:31 | | VerifiedJ9 (VerifiedJ) joins |
19:27:24 | | Wohlstand (Wohlstand) joins |
19:56:36 | <@JAA> | thuban: I guess they're not technically open-source since I didn't attach a licence to them, but yeah, the spec files are all in the meta WARCs. Although some projects aren't uploaded yet. |
19:57:48 | <@JAA> | I've been thinking about and procrastinating on putting all my project code (qwarc or not) in a repo or multiple repos for a while though. |
20:05:47 | | klaffty quits [Quit: klaffty] |
20:15:30 | <@JAA> | PCLab is still going. ETA 28 hours now |
20:16:03 | <@JAA> | No further posts have been made. |
20:36:58 | | Guest77 joins |
20:36:59 | <eggdrop> | [tell] Guest77: [2024-11-28T20:20:41Z] <OrIdow6> the setup you described is for viewing static files in a directory with wabac.js? |
20:37:32 | <Guest77> | OrIdow6: hi, no. It is for viewing the file without wabac.js, but only having a .warc file, obtained using grab-site |
20:38:13 | <Guest77> | there is (for sure) a way to use wabac.js, but i think you need to mount a webserver with nodejs or more complex than a plain http server |
20:39:48 | <Guest77> | i.e. you download a .warc file from the web, and upon extracting it you have all the original URLs. With this approach the extracted files are "processed" and shown in your usual browser, just by running this bash script i mention. Extracting several .warc files in a common folder would let you to navigate through those other websites as well |
20:39:57 | <Guest77> | .help |
20:40:35 | <Guest77> | hmm, how do i use the eggdrop tell command? |
20:45:32 | <@OrIdow6> | Guest77: !tell but because I'm in the channel that won't work; it's for people who disconnect and reconnect |
20:45:49 | <Guest77> | oh, i see |
20:45:59 | <Guest77> | only works with offline users, didnt know that |
20:45:59 | <@OrIdow6> | So you're extracting everything to disk and letting relative links resolve themselves through the filesystem? |
20:46:10 | <Guest77> | i think so OrIdow6 |
20:46:17 | <Guest77> | i'm not sure if its the best way |
20:46:26 | <Guest77> | may be a problem for big sites |
20:46:44 | <@OrIdow6> | I imagine Javascript would break on it in some way |
20:47:35 | <Guest77> | yes, i guess something as AJAX or a jquery thing wold not work as expected |
20:48:18 | <Guest77> | wabac.js would of course cover those things, since its mounted in a proper web server |
20:49:37 | <Guest77> | not sure if replaywebpage extracts all the files, i imagine it only gets what is asked to |
20:49:55 | <Guest77> | designed for big websites in mind |
21:06:32 | <h2ibot> | OrIdow6 created Qwarc (+447, Created page with "A Python framework written…): https://wiki.archiveteam.org/?title=Qwarc |
21:07:25 | <@OrIdow6> | Guest77: Well, waback.js works *most of the time*... |
21:07:39 | <Guest77> | you know how to mount it? OrIdow6 |
21:07:56 | <Guest77> | i mean, the whole procedure to do your own replayweb.page "offline" |
21:08:11 | <@OrIdow6> | Also Guest77, does the fact that you're replacing the domain with / mean that you have to extract everything in your root directory? |
21:08:53 | <Guest77> | i replace it with a empty "" value. with that i get: localhost:PORT/URL_folder/index.html |
21:09:00 | <@OrIdow6> | Not sure what you're referring to with "mount it" Guest77 |
21:09:37 | <@OrIdow6> | If you mean set up a local copy of replayweb.page, I haven't done that no |
21:09:54 | <@OrIdow6> | I've used the online version and pywb pretty much |
21:10:08 | <Guest77> | oh lol |
21:10:14 | <Guest77> | i just have found the intructions |
21:10:26 | <Guest77> | to do it, i should read carefully before asking |
21:15:31 | <Guest77> | hmm, it didnt work, i guess there is a problem with dependencies versions |
21:18:51 | <Guest77> | working now :D |
21:22:35 | <h2ibot> | TheTechRobo edited Qwarc (+158, Some additional information): https://wiki.archiveteam.org/?diff=53904&oldid=53903 |
21:26:28 | <@OrIdow6> | Thanks TheTechRobo, documenting stuff like this seems like it never ends |
21:26:44 | <@OrIdow6> | I'd reference that Tumblr meme but I can't remember how to spell Sysphisian |
21:27:07 | | TheTechRobo is not familiar with that meme |
21:28:39 | <steering> | sisyphean? also dunno the meme though |
21:40:55 | <@OrIdow6> | Umm like https://www.tumblr.com/cannibalchicken/725394050160771072 |
21:46:27 | | Island joins |
21:48:30 | | Wohlstand quits [Ping timeout: 260 seconds] |
22:00:26 | | Guest77 quits [Client Quit] |
22:00:40 | | Unholy23619246453771312 (Unholy2361) joins |
22:01:01 | | BlueMaxima joins |
22:16:16 | | robin quits [Quit: Konversation terminated!] |
22:20:45 | <h2ibot> | OrIdow6 edited Archiveteam:IRC (-2509, "Special Archive Team IRC rules" - remove a…): https://wiki.archiveteam.org/?diff=53905&oldid=53749 |
22:33:13 | <@arkiver> | OrIdow6: are those removed in the FAQ then? https://wiki.archiveteam.org/?diff=53905&oldid=53749 |
22:34:54 | <@arkiver> | i don't know about simply removing those OrIdow6 |
22:34:58 | <@arkiver> | CC JAA |
22:36:13 | <@arkiver> | these have been in there a long time, i don't know the background behind each of them, but |
22:36:15 | <@arkiver> | > "Special Archive Team IRC rules" - remove a bunch, this put me off when I first joined |
22:36:21 | <@OrIdow6> | arkiver: You can put them back if you want, or some of them; my main issue is that it seems most of them ("don't be childish", "don't demand answers from others", "don't [be] malicious[]") are not going to be read by the people who breach them anyway |
22:36:23 | <@arkiver> | does not seem like a good enough reason to just get rid of them at once |
22:37:01 | <@OrIdow6> | Maybe they can be rephrased |
22:37:17 | <@arkiver> | a discussion could have been started here, on these things i like to know what JAA thinks for example - some people here have been around a long time on IRC and know the background behind all this |
22:37:34 | <@arkiver> | outright deleting them without any form of discussion is not the way in my opinion |
22:37:38 | <@JAA> | That section predates my presence here. |
22:37:39 | <@OrIdow6> | Alright then |
22:37:47 | <h2ibot> | OrIdow6 edited Archiveteam:IRC (+2509, Undo revision 53905 by…): https://wiki.archiveteam.org/?diff=53906&oldid=53905 |
22:38:31 | <@JAA> | I do wonder how many people read them though, and also I don't think we've banned anyone for these reasons in a long time. |
22:38:40 | <@arkiver> | i think we could rewrite them perhaps, also to sound less 'aggressive' (not sure if that is the right word) |
22:38:57 | <@JAA> | Yeah, I think I'd agree with that. |
22:41:11 | <@arkiver> | JAA: i believe we have occasionally pointed to them, like the "don't feed the trolls" one |
22:41:14 | <@arkiver> | but yeah |
22:48:26 | <steering> | IMHO... I don't think we need to enumerate the rules of IRC |
22:48:46 | <steering> | there are lots of people here who have been around the block and can point out netiquette (which is all any of those really are) when needed |
22:49:29 | <steering> | or rather, they're all netiquette other than "Don't try to convince Archive Team about that archiving is bad." |
22:50:15 | <pokechu22> | It might be worth mentioning not pasting large blocks of text and instead uploading it onto https://transfer.archivete.am/ ... but that generally hasn't been much of an issue |
22:50:37 | <@arkiver> | perhaps most of the "standard" ones can be summarized in a line or two, maybe with reference to some text outside of the wiki that explains more |
22:51:02 | <@arkiver> | while elaborating/expanding on the part about not convincing archiving is bad |
22:51:15 | <@arkiver> | pokechu22: yeah good one |
23:07:15 | | Unholy23619246453771312 quits [Ping timeout: 260 seconds] |
23:26:58 | | StarletCharlotte quits [Remote host closed the connection] |
23:27:12 | | StarletCharlotte joins |
23:32:37 | <@JAA> | The request rate on PCLab is still varying wildly, even when I average over a full hour. So I'm not even going to calculate an ETA anymore. |
23:32:46 | <@JAA> | But it's still going. |
23:35:16 | | loug8318142 quits [Quit: The Lounge - https://thelounge.chat] |
23:40:54 | | etnguyen03 (etnguyen03) joins |