00:30:44 | <Flashfire42> | https://www.msn.com/en-au/money/other/billionaire-backed-sports-streamer-buys-foxtel/ar-AA1wkBwC?ocid=BingNewsSerp gonna archive some foxtel stuff I guess |
01:17:49 | | szczot3k9 (szczot3k) joins |
01:21:22 | | szczot3k quits [Ping timeout: 260 seconds] |
01:21:22 | | szczot3k9 is now known as szczot3k |
01:58:37 | <pabs> | next year I want to move my AB monitoring off my PC, does anyone want to sponsor server resources for it? |
01:58:47 | <pabs> | websocket is ~1TB/month, plus uploads to transfer. minimum 34G disk (some to be compressed), growing. CPU usage constant (multiple jq doing regexen). RAM is negligible. needs Debian with curl from backports, plus a few other packages |
01:58:50 | <pabs> | https://wiki.archiveteam.org/?title=ArchiveBot/Monitoring |
02:08:08 | <monoxane> | fuck me thats a lot of data on a ws |
02:08:35 | <monoxane> | id say yes but you wont get 5 9s of uptime since I dont have my 40g memebox in colo anymore |
02:08:36 | <kiska> | It's not really |
02:09:02 | <kiska> | The tracker websocket is about 12TB/month |
02:09:11 | <monoxane> | wew |
02:09:14 | <TheTechRobo> | o_O |
02:09:20 | <monoxane> | 12t of json is crazy |
02:09:26 | <monoxane> | are we archiving it? /s |
02:12:58 | <kiska> | If you consider some downsampling of the data to be "archiving" |
02:12:59 | <pabs> | also there is lots of scope to increase the amount of things being monitored for (add ideas to the wiki), which means more disk, more CPU. my PC is almost full, and is too old and too inadequately cooled to really add anything more |
02:13:26 | <kiska> | What specs are ideal? |
02:14:26 | <kiska> | I may have a dual Xeon server I can start up for this purpose |
02:14:44 | <pabs> | recent CPUs with higher clock speed I guess, whatever runs jq regex the fastest. and adequate cooling for constant CPU usage |
02:15:16 | <pabs> | my PC is from a dumpster, Intel i5 CPU manufactured in like 2013 |
02:15:43 | <@JAA> | Have you implemented the pre-filtering yet? That should drop CPU usage considerably. |
02:16:03 | <pabs> | I did swap around the regexen, helped a bit |
02:16:13 | <@JAA> | No, filtering before feeding to jq at all. |
02:16:21 | <pabs> | ah, not yet no |
02:16:31 | <pabs> | I need to try rewriting stuff in Python too |
02:17:11 | <@JAA> | If you do, don't use the built-in `json` module; it's pure Python and very slow. |
02:17:55 | <@JAA> | (Fine for basic usage, not fine for mass parsing.) |
02:18:13 | <@JAA> | But yeah, filtering with simple fixed-string grep will be a massive improvement. |
02:18:53 | <pabs> | fixed-string grep might not be feasible for all of them, especially the code one, which IIRC is the biggest CPU usage right now |
02:19:36 | <@JAA> | As long as there are some fixed substrings in your regex, it will help. |
02:19:51 | <pabs> | hmm, I wonder what katia's repeater uses for json. gunicorn uses a bit of CPU too |
02:20:30 | <@JAA> | Nothing, probably. I believe it just forwards the upstream data. |
02:22:35 | | skarz quits [Quit: Ooops, wrong browser tab.] |
02:22:48 | <pabs> | ah yep socket.recv_string |
02:23:22 | <pabs> | ah thats the zmq bit, ws is ws.recv() |
02:24:30 | | cow_2001 quits [Quit: ✡] |
02:25:02 | <pabs> | so you mean like this: grep -F -e foo -e bar -e baz |
02:25:07 | <@JAA> | Yes |
02:25:32 | | cow_2001 joins |
02:27:07 | <pabs> | !remindme 5d add to AB monitoring: grep -F -e foo -e bar -e baz |
02:27:07 | <eggdrop> | [remind] ok, i'll remind you at 2024-12-28T02:27:07Z |
04:28:25 | | Naruyoko joins |
06:44:00 | | BlueMaxima quits [Read error: Connection reset by peer] |
07:09:37 | | Unholy23619246453771312 quits [Ping timeout: 260 seconds] |
07:27:04 | | Wohlstand (Wohlstand) joins |
08:08:51 | <h2ibot> | Himond000 edited Deathwatch (+189, /* 2026 */ add nakayamamiho.com): https://wiki.archiveteam.org/?diff=54086&oldid=54082 |
09:52:23 | | Wohlstand quits [Ping timeout: 260 seconds] |
10:04:34 | | loug8318142 joins |
10:14:18 | | Radzig2 joins |
10:16:18 | | Radzig quits [Ping timeout: 260 seconds] |
10:16:18 | | Radzig2 is now known as Radzig |
10:46:13 | | ducky (ducky) joins |
11:22:27 | | Island quits [Read error: Connection reset by peer] |
11:24:40 | | ShastaTheDog joins |
11:43:45 | | MrMcNuggets (MrMcNuggets) joins |
12:00:01 | | Bleo182600722719623 quits [Quit: The Lounge - https://thelounge.chat] |
12:02:47 | | Bleo182600722719623 joins |
12:48:43 | | Barto quits [Quit: WeeChat 4.4.3] |
12:52:53 | | SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962] |
12:54:53 | | SkilledAlpaca418962 joins |
13:00:13 | | Barto (Barto) joins |
13:57:12 | | PredatorIWD2 quits [Read error: Connection reset by peer] |
14:19:53 | | IDK (IDK) joins |
14:22:07 | | PredatorIWD2 joins |
15:18:29 | | PredatorIWD2 quits [Read error: Connection reset by peer] |
15:24:02 | | PredatorIWD2 joins |
15:28:02 | | Barto quits [Client Quit] |
15:28:18 | | Barto (Barto) joins |
17:03:58 | | MrMcNuggets quits [Quit: WeeChat 4.3.2] |
17:09:43 | <pabs> | https://newsroom.lexmark.com/2024-12-23-Xerox-to-Acquire-Lexmark |
17:21:42 | <mikolaj|m> | https://wearesaudis.net is a web forum ran by the person responsible for the recent terrorist attack in Magdeburg. It appears to be still up. Was there any attempt to archive it? I randomly checked and found some threads there to be missing on the Wayback Machine's Web interface |
17:26:18 | | eleanorsilly (eleanorsilly) joins |
17:26:49 | <eleanorsilly> | hey, can someone help me with installing wget-lua? I'm getting the following error: |
17:26:52 | <eleanorsilly> | gettext infrastructure mismatch: using a Makefile.in.in from gettext version 0.20 but the autoconf macros are from gettext version 0.22 |
17:27:11 | <eleanorsilly> | I am running gettext 0.23 and it is my only install on this system |
17:28:52 | | IDK quits [Quit: Connection closed for inactivity] |
17:38:07 | | HP_Archivist (HP_Archivist) joins |
17:43:24 | | IDK (IDK) joins |
18:54:05 | | BearFortress quits [] |
19:40:33 | | eleanorsilly leaves |
19:41:58 | <mikolaj|m> | Ryz: thanks |
20:06:39 | | BearFortress joins |
20:31:27 | <@JAA> | mikolaj|m: Already archived that. :-) |
20:36:21 | | AlsoHP_Archivist joins |
20:36:52 | | Wohlstand (Wohlstand) joins |
20:40:07 | | HP_Archivist quits [Ping timeout: 252 seconds] |
20:52:36 | | Webuser2291391 joins |
20:58:34 | | Guest54 joins |
21:03:13 | | Guest54 quits [Ping timeout: 260 seconds] |
21:09:08 | | Webuser2291391 quits [Client Quit] |
21:21:32 | | Island joins |
21:48:52 | | IDK quits [Quit: Connection closed for inactivity] |
22:08:35 | | BornOn420 quits [Remote host closed the connection] |
22:09:11 | | BornOn420 (BornOn420) joins |
22:47:29 | | Mateon1 joins |
23:01:43 | | klg quits [Quit: brb] |
23:04:14 | | klg (klg) joins |
23:21:31 | <AlsoHP_Archivist> | pokechu22: Continue here? |
23:21:50 | <AlsoHP_Archivist> | How would you propose to grab? |
23:22:57 | | BlueMaxima joins |
23:27:52 | <Czechball> | Hello. Any ideas how to archive webpages that maliciously prevent archiving? |
23:28:06 | <Czechball> | it's detecting the IAs user agent I think |
23:28:45 | <Czechball> | I'm collecting Steam scam / phishing websites for fun in my saved web archives, but recently the scammers have caught up and prevent archiving |
23:30:57 | <Czechball> | IA at least managed to get a screenshot of the actual phishing page: https://web.archive.org/web/20241223232447/http://web.archive.org/screenshot/http://steam.workshopmodel.com/ |
23:31:26 | <Czechball> | but when visiting the archived url, it will just redirect to an actual steam workshop page |
23:31:52 | <Czechball> | I also tried archiving with archive.ph but got the same result |
23:32:27 | <Czechball> | obvious disclaimer, do not visit the live archive url lol |
23:32:34 | <Czechball> | *archived |
23:32:55 | <pokechu22> | I tried the live URL and it just redirected to https://steamcommunity.com/sharedfiles/filedetails/?id=3246613760 |
23:33:22 | <pokechu22> | so it might be a thing where they just randomly make it redirect? |
23:33:49 | <Czechball> | interesting... maybe they killed it while I was trying to save it? |
23:34:05 | <Czechball> | so I guess the IA screenshot is also after redirect |
23:34:14 | <pokechu22> | Yeah, that's my guess as to what happened |
23:34:38 | <pokechu22> | IA screenshots are of the redirect target, but show on the redirect source, which is a bit jank |
23:35:56 | <Czechball> | I received the phishing DM ~3 hours before I tried archiving it, maybe it was just a short campaign. Or maybe the campaign hasn't even started yet |
23:37:53 | <pokechu22> | AlsoHP_Archivist: with the way https://www.silverfast.com/get_demo/en.html uses cookies I don't think it's really feasible to save it at all :/ |
23:38:41 | <pokechu22> | That said it also looks like there aren't that many different products; it just makes you choose a specific scanner and model but then seems to recommend the same thing |
23:58:55 | <AlsoHP_Archivist> | pokechu22: Are you sure it's not recommended scanner/hardware and the language? I figured there were many different combos you could select. |
23:59:07 | <AlsoHP_Archivist> | Could be wrong though |
23:59:40 | <pokechu22> | I got the same links for a few different scanners. It's possible that it gives other links for very specific scanners but checking that seems like it would be a pain |