00:02:10etnguyen03 quits [Quit: Konversation terminated!]
00:13:33loug8318142 quits [Quit: The Lounge - https://thelounge.chat]
00:20:30robin quits [Ping timeout: 260 seconds]
01:06:26etnguyen03 (etnguyen03) joins
01:06:39nulldata quits [Read error: Connection reset by peer]
01:07:37nulldata (nulldata) joins
01:35:02etnguyen03 quits [Client Quit]
01:37:43Wohlstand quits [Remote host closed the connection]
02:11:08etnguyen03 (etnguyen03) joins
03:01:39etnguyen03 quits [Remote host closed the connection]
03:05:52linuxgemini (linuxgemini) joins
04:38:01Exorcism quits [Quit: Ping timeout (120 seconds)]
04:38:20Exorcism (exorcism) joins
05:05:24Wohlstand (Wohlstand) joins
05:13:06<@JAA>So apparently Kadokawa (owners of FromSoftware) are asking Sony to acquire them so there won't be a hostile takeover by Kakao, or something along those lines. It'd probably be good to archive Kadokawa, FromSoftware, and related things.
05:44:39<@JAA>> <b>Warning</b>: count(): Parameter must be an array or an object that implements Countable in <b>/var/www/forum.pclab.pl/system/Theme/Theme.php(847) : eval()'d code</b> on line <b>4389</b><br />
05:44:43<@JAA>lol
05:46:32<thuban>i am literally looking at that now, was gonna write the spec script myself :'D
05:51:02<@JAA>Interestingly, I didn't get any of these when I went to test things for a bit under load.
05:51:11<@JAA>But I did while clicking around in a browser.
05:52:21<@JAA>I'd love to know why and what they're `eval`ing...
06:08:42<@JAA>ETA 26 hours
06:09:40<thuban>awesome, thank you!! <3
06:09:53<@JAA>So it depends on when exactly they shut down.
06:10:59<thuban>the phrasing of the announcement makes me suspect it'll be midnight of the 29th/30th, but it could also be some time during the 30th
06:11:12<@JAA>Yeah
06:11:46<thuban>sorry for being kind of aggressive about this, the bus factor here is just not great
06:14:50<@JAA>That's fine, sorry for taking so long. :-/
06:15:22<@JAA>Response times from the site aren't great.
06:18:12<@JAA>My logic for avoiding those PHP errors is checking for `>Powered by Invision Community<` (i.e. footer) and, on topic pages, for `id='elComment_` (i.e. at least one post).
06:18:55<@JAA>If the check fails, I don't write to WARC and retry.
06:31:28<@JAA>ETA has already shot up to 38 hours.
06:46:06bilboed0 quits [Quit: The Lounge - https://thelounge.chat]
06:51:18bilboed0 joins
07:05:52Unholy2361924645377131 (Unholy2361) joins
07:14:40Barto quits [Ping timeout: 260 seconds]
07:31:00Wohlstand quits [Ping timeout: 260 seconds]
07:52:01pixel (pixel) joins
07:53:08<@JAA>The site has slowed down further. ETA 54 hours now.
08:09:37flotwig_ joins
08:10:40flotwig quits [Ping timeout: 260 seconds]
08:10:40flotwig_ is now known as flotwig
08:57:20M60_ quits [Ping timeout: 260 seconds]
08:57:37loug8318142 joins
08:57:59M60_ joins
09:01:08<szczot3k>JAA is it really the site's fault, or can we just put more workers on it?
09:01:22<szczot3k>I like throwing compute power at problems and see if it resolves it
09:06:05ThreeHM quits [Ping timeout: 260 seconds]
09:08:02ThreeHM (ThreeHeadedMonkey) joins
09:08:44i_have_n0_idea (i_have_n0_idea) joins
09:12:50<c3manu>oh, netcup restocked the RS 2000 G11 deal for Manassas again
09:12:59<c3manu>that wasn't the case when i looked earlier
09:15:18nulldata quits [Quit: So long and thanks for all the fish!]
09:15:46<c3manu>JAA: are you awake?
09:16:09nulldata (nulldata) joins
09:17:35<@JAA>c3manu: pong
09:17:56<c3manu>should i get us another RS 2000 then?
09:18:07<@JAA>szczot3k: Yeah, it seems to be the site. qwarc is plenty capable. I've done close to 1k req/s with it from a single mediocre machine before.
09:18:20<c3manu>oh crap, i can't login from here
09:18:58<c3manu>nevermind. i'll have to check when i'me back from work whether it's still available
09:19:04<c3manu>sorry for the noise
09:19:22<szczot3k>If needed I can donate a node in poland for pclab
09:22:10<@JAA>I'm grabbing it from Vienna, so pretty nearby already. Latency isn't the issue, response times from the server are just awful.
09:23:08<@JAA>There might be an impact from very large topics.
09:25:23<szczot3k>Got it, if needed I'm still up for it
09:27:12<@JAA>When I was testing earlier, I got response times of roughly 400 ms. That shot up to about a second quite quickly after starting the real grab though.
09:28:17<@JAA>Not immediately though, which is a bit strange. Took like 15 minutes.
09:28:26<szczot3k>forum.pclab currently redirects to https://www.komputerswiat.pl/
09:28:34<szczot3k>at least from my work
09:29:00<@JAA>Oh, did they shut it down already?
09:29:05<szczot3k>Might be the case of already switched DNS
09:29:13<@JAA>Yeah, possibly. I'm still getting 200s.
09:29:35<szczot3k>Unless my work is hijacking pclab's dns, it might be already down
09:29:43<szczot3k>or at least 'preparing to be down'
09:29:43<@JAA>Nope, I'm seeing the redirect as well here.
09:30:00<szczot3k>:/
09:30:43<szczot3k>Trying to archive it might have accelerated the decision to shut it down
09:31:19<@JAA>DNS switch indeed. I'm connected to 91.199.101.202, but it now resolves to 178.239.128.26 and 195.93.178.26.
09:32:03<szczot3k>Will it affect archiving?
09:32:43Barto (Barto) joins
09:35:08<@JAA>That's a complex topic.
09:36:30<@JAA>It can be archived. Whether it can go into the Wayback Machine is a different question.
09:36:52<@JAA>Looks like they did something between 08:25 and 08:30.
09:38:05<@JAA>Worked fine at 08:25, got TLS errors at 08:30 until 08:53, then 404s. Not entirely sure how accurate this is with regards to reusing existing connections etc., but shouldn't be far off.
09:38:39<@JAA>I didn't mention this earlier, but I was running two processes, one for everything that was posted and one that continuously fetched new posts as they were being made.
09:39:02<@JAA>So I should have the last couple hours worth of posts, but only spotty coverage of the earlier ones.
09:41:16<Barto>fiber installed, 920 MBps down, 550 MBps up :-)
09:47:50<katia>MB!?
09:48:40mgrytbak quits [Ping timeout: 260 seconds]
09:50:11Gadelhas56283 joins
09:53:55Gadelhas5628 quits [Ping timeout: 260 seconds]
09:53:55Gadelhas56283 is now known as Gadelhas5628
09:54:32<@JAA>I mean, could be 10G that doesn't saturate. :-P
09:56:18mgrytbak joins
10:00:18driib quits [Quit: The Lounge - https://thelounge.chat]
10:00:48driib (driib) joins
10:04:06<c3manu>JAA: if i can, should i get one of those netcup deals later? or do we have enough for the US stuff already?
10:51:39<@JAA>(Answered in #archivebot because that's where most of the prior discussion happened.)
10:51:45<@JAA>I'm still getting 200s from PCLab.
10:53:59<@JAA>This is still the already-established connection. I'm waiting for the process to finish with a 10k-page topic. Then I'll restart it with a DNS override. We can figure out the details of what to do with the data later.
10:54:17<@JAA>s/restart/resume/ I guess but with slightly different code.
10:56:25<@JAA>The last post was made at 2024-11-29T08:31:33Z. I didn't catch that one with my continuous thing.
10:57:13<@JAA>I wonder if someone will figure out the DNS trick and make another post.
10:57:41<szczot3k>On it ;)
10:57:42<szczot3k>Jk
10:57:59<szczot3k>DNS propagation is a real thing though, especially with some ISPs DNS
10:58:07<szczot3k>Wonder what's the TTL
11:04:35<@JAA>300 seconds now, don't know what it was before, but if their sysadmins are half-competent, probably the same.
11:06:37<@JAA>Resumed, and I'm also running another continuous update crawl to catch that last post and anything still made now.
11:06:47<@JAA>Response times are much better than before, let's see if that holds.
11:06:59<@JAA>I suspect that 10k-page topic played a role.
11:15:46<@JAA>Yeah, same thing is happening again now that I'm hitting some topics with hundreds of pages.
11:15:52<@JAA>They bog everything down.
11:17:42<@JAA>Anyway, it's running, it'll grab what it'll grab until they take the actual server down. Data destination TBD.
11:20:03<@JAA>At the moment, the ETA is 19-ish hours, but I doubt it'll stay that way.
11:20:29<szczot3k>While on the 'polish forums' topic - was mpcforum.pl ever grabbed?
11:20:46<@JAA>Imagine your entire forum bogging down because one person loads a page from a big topic.
11:21:04<szczot3k>JAA, well, that might be one reason for the shutdown
11:21:27<Barto>katia: correct, Mbps, sorry about the confusion. good tho
11:23:10<@JAA>There was an attempt of mpcforum.pl a few years ago by someone who wrecklessly threw a huge number of big things into ArchiveBot. Something was grabbed, I guess, but it was aborted.
11:23:39<szczot3k>Would it be feasible to grab mpcforum? It's huge, and has large impact on the polish (gaming) community
11:24:08<szczot3k>I might throw some compute/networking and a couple of TBs of space
11:24:13<@JAA>I see strict Buttflare.
11:25:13<szczot3k>Does that mean it's impossible?
11:26:18<@JAA>At the moment, we don't have a way to get past that.
11:26:40Commander001 quits [Ping timeout: 260 seconds]
11:27:03<szczot3k>Is the problem strict rate limits, or something else?
11:27:18<@JAA>The JS challenge blocks all access.
11:27:33<@JAA>This is something the site operator can adjust.
11:28:24<szczot3k>So with the operator's permission, and cooperation it would be possible
11:29:27<@JAA>Yes
11:29:53<szczot3k>Wonder if they'd be open to this idea
11:37:25Commander001 joins
12:00:04Bleo182600722719623 quits [Quit: The Lounge - https://thelounge.chat]
12:02:52Bleo182600722719623 joins
12:04:45robin joins
12:20:30Unholy23619246453771312 (Unholy2361) joins
12:24:25Unholy2361924645377131 quits [Ping timeout: 260 seconds]
12:25:35Unholy23619246453771312 quits [Ping timeout: 260 seconds]
12:38:05SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
12:39:42SkilledAlpaca418962 joins
13:19:32sralracer (sralracer) joins
14:37:45Xanthon joins
14:37:45Xanthon quits [Changing host]
14:37:45Xanthon (Xanthon) joins
14:59:08Wohlstand (Wohlstand) joins
15:35:45wyatt8740 quits [Ping timeout: 260 seconds]
15:47:44wyatt8740 joins
15:57:20Wohlstand quits [Ping timeout: 260 seconds]
15:57:25<thuban>JAA: what are the chances of (1) open-sourcing the existing set of qwarc spec scripts for forums so that others can more effectively contribute, and/or (2) getting at least one other op into a position to vet/run qwarc jobs?
15:57:31<thuban>we lost the sims forums in essentially this same scenario
16:02:59Xanthon quits [Read error: Connection reset by peer]
16:26:27th3z0l4 joins
16:30:35th3z0l4_ quits [Ping timeout: 260 seconds]
16:42:35Fiduro4830139107 quits [Quit: The Lounge - https://thelounge.chat]
16:42:46Fiduro4830139107 joins
16:49:28nulldata quits [Quit: Ping timeout (120 seconds)]
16:50:24nulldata (nulldata) joins
17:01:21<TheTechRobo>thuban: qwarc spec files are stored in the meta WARC.
17:03:29Webuser410129 joins
17:06:35<Webuser410129>I am looking for a track that was on artist union that was uploaded in 2016 I have found the archives at https://archive.org/details/archiveteam_theartistunion but i do not know how to go about searching this is the track I am looking for https://theartistunion.com/tracks/19486a i have tried a search at the top of the archives, but i get nothing
17:06:35<Webuser410129>found. any thoughts on how to search please? thank you
17:08:49<thuban>Webuser410129: https://tau.thetechrobo.ca/
17:10:17<Webuser410129>ah thank you ever so much
17:11:59<thuban>TheTechRobo: ah, good point. would be nice to have them in a repo though
17:12:50Webuser410129 quits [Client Quit]
17:29:49<h2ibot>TheTechRobo edited The Artist Union (+206, Add link to retrieval tool): https://wiki.archiveteam.org/?diff=53902&oldid=48826
17:31:29lflare quits [Quit: Bye]
17:31:53lflare (lflare) joins
17:36:12Webuser752070 joins
17:36:23Webuser752070 quits [Client Quit]
17:48:50Notrealname1234 (Notrealname1234) joins
17:53:31Notrealname1234 quits [Client Quit]
18:52:28hackbug quits [Remote host closed the connection]
18:59:51hackbug (hackbug) joins
19:02:31VerifiedJ9 (VerifiedJ) joins
19:27:24Wohlstand (Wohlstand) joins
19:56:36<@JAA>thuban: I guess they're not technically open-source since I didn't attach a licence to them, but yeah, the spec files are all in the meta WARCs. Although some projects aren't uploaded yet.
19:57:48<@JAA>I've been thinking about and procrastinating on putting all my project code (qwarc or not) in a repo or multiple repos for a while though.
20:05:47klaffty quits [Quit: klaffty]
20:15:30<@JAA>PCLab is still going. ETA 28 hours now
20:16:03<@JAA>No further posts have been made.
20:36:58Guest77 joins
20:36:59<eggdrop>[tell] Guest77: [2024-11-28T20:20:41Z] <OrIdow6> the setup you described is for viewing static files in a directory with wabac.js?
20:37:32<Guest77>OrIdow6: hi, no. It is for viewing the file without wabac.js, but only having a .warc file, obtained using grab-site
20:38:13<Guest77>there is (for sure) a way to use wabac.js, but i think you need to mount a webserver with nodejs or more complex than a plain http server
20:39:48<Guest77>i.e. you download a .warc file from the web, and upon extracting it you have all the original URLs. With this approach the extracted files are "processed" and shown in your usual browser, just by running this bash script i mention. Extracting several .warc files in a common folder would let you to navigate through those other websites as well
20:39:57<Guest77>.help
20:40:35<Guest77>hmm, how do i use the eggdrop tell command?
20:45:32<@OrIdow6>Guest77: !tell but because I'm in the channel that won't work; it's for people who disconnect and reconnect
20:45:49<Guest77>oh, i see
20:45:59<Guest77>only works with offline users, didnt know that
20:45:59<@OrIdow6>So you're extracting everything to disk and letting relative links resolve themselves through the filesystem?
20:46:10<Guest77>i think so OrIdow6
20:46:17<Guest77>i'm not sure if its the best way
20:46:26<Guest77>may be a problem for big sites
20:46:44<@OrIdow6>I imagine Javascript would break on it in some way
20:47:35<Guest77>yes, i guess something as AJAX or a jquery thing wold not work as expected
20:48:18<Guest77>wabac.js would of course cover those things, since its mounted in a proper web server
20:49:37<Guest77>not sure if replaywebpage extracts all the files, i imagine it only gets what is asked to
20:49:55<Guest77>designed for big websites in mind
21:06:32<h2ibot>OrIdow6 created Qwarc (+447, Created page with "A Python framework written…): https://wiki.archiveteam.org/?title=Qwarc
21:07:25<@OrIdow6>Guest77: Well, waback.js works *most of the time*...
21:07:39<Guest77>you know how to mount it? OrIdow6
21:07:56<Guest77>i mean, the whole procedure to do your own replayweb.page "offline"
21:08:11<@OrIdow6>Also Guest77, does the fact that you're replacing the domain with / mean that you have to extract everything in your root directory?
21:08:53<Guest77>i replace it with a empty "" value. with that i get: localhost:PORT/URL_folder/index.html
21:09:00<@OrIdow6>Not sure what you're referring to with "mount it" Guest77
21:09:37<@OrIdow6>If you mean set up a local copy of replayweb.page, I haven't done that no
21:09:54<@OrIdow6>I've used the online version and pywb pretty much
21:10:08<Guest77>oh lol
21:10:14<Guest77>i just have found the intructions
21:10:26<Guest77>to do it, i should read carefully before asking
21:15:31<Guest77>hmm, it didnt work, i guess there is a problem with dependencies versions
21:18:51<Guest77>working now :D
21:22:35<h2ibot>TheTechRobo edited Qwarc (+158, Some additional information): https://wiki.archiveteam.org/?diff=53904&oldid=53903
21:26:28<@OrIdow6>Thanks TheTechRobo, documenting stuff like this seems like it never ends
21:26:44<@OrIdow6>I'd reference that Tumblr meme but I can't remember how to spell Sysphisian
21:27:07TheTechRobo is not familiar with that meme
21:28:39<steering>sisyphean? also dunno the meme though
21:40:55<@OrIdow6>Umm like https://www.tumblr.com/cannibalchicken/725394050160771072
21:46:27Island joins
21:48:30Wohlstand quits [Ping timeout: 260 seconds]
22:00:26Guest77 quits [Client Quit]
22:00:40Unholy23619246453771312 (Unholy2361) joins
22:01:01BlueMaxima joins
22:16:16robin quits [Quit: Konversation terminated!]
22:20:45<h2ibot>OrIdow6 edited Archiveteam:IRC (-2509, "Special Archive Team IRC rules" - remove a…): https://wiki.archiveteam.org/?diff=53905&oldid=53749
22:33:13<@arkiver>OrIdow6: are those removed in the FAQ then? https://wiki.archiveteam.org/?diff=53905&oldid=53749
22:34:54<@arkiver>i don't know about simply removing those OrIdow6
22:34:58<@arkiver>CC JAA
22:36:13<@arkiver>these have been in there a long time, i don't know the background behind each of them, but
22:36:15<@arkiver>> "Special Archive Team IRC rules" - remove a bunch, this put me off when I first joined
22:36:21<@OrIdow6>arkiver: You can put them back if you want, or some of them; my main issue is that it seems most of them ("don't be childish", "don't demand answers from others", "don't [be] malicious[]") are not going to be read by the people who breach them anyway
22:36:23<@arkiver>does not seem like a good enough reason to just get rid of them at once
22:37:01<@OrIdow6>Maybe they can be rephrased
22:37:17<@arkiver>a discussion could have been started here, on these things i like to know what JAA thinks for example - some people here have been around a long time on IRC and know the background behind all this
22:37:34<@arkiver>outright deleting them without any form of discussion is not the way in my opinion
22:37:38<@JAA>That section predates my presence here.
22:37:39<@OrIdow6>Alright then
22:37:47<h2ibot>OrIdow6 edited Archiveteam:IRC (+2509, Undo revision 53905 by…): https://wiki.archiveteam.org/?diff=53906&oldid=53905
22:38:31<@JAA>I do wonder how many people read them though, and also I don't think we've banned anyone for these reasons in a long time.
22:38:40<@arkiver>i think we could rewrite them perhaps, also to sound less 'aggressive' (not sure if that is the right word)
22:38:57<@JAA>Yeah, I think I'd agree with that.
22:41:11<@arkiver>JAA: i believe we have occasionally pointed to them, like the "don't feed the trolls" one
22:41:14<@arkiver>but yeah
22:48:26<steering>IMHO... I don't think we need to enumerate the rules of IRC
22:48:46<steering>there are lots of people here who have been around the block and can point out netiquette (which is all any of those really are) when needed
22:49:29<steering>or rather, they're all netiquette other than "Don't try to convince Archive Team about that archiving is bad."
22:50:15<pokechu22>It might be worth mentioning not pasting large blocks of text and instead uploading it onto https://transfer.archivete.am/ ... but that generally hasn't been much of an issue
22:50:37<@arkiver>perhaps most of the "standard" ones can be summarized in a line or two, maybe with reference to some text outside of the wiki that explains more
22:51:02<@arkiver>while elaborating/expanding on the part about not convincing archiving is bad
22:51:15<@arkiver>pokechu22: yeah good one
23:07:15Unholy23619246453771312 quits [Ping timeout: 260 seconds]
23:26:58StarletCharlotte quits [Remote host closed the connection]
23:27:12StarletCharlotte joins
23:32:37<@JAA>The request rate on PCLab is still varying wildly, even when I average over a full hour. So I'm not even going to calculate an ETA anymore.
23:32:46<@JAA>But it's still going.
23:35:16loug8318142 quits [Quit: The Lounge - https://thelounge.chat]
23:40:54etnguyen03 (etnguyen03) joins