00:30:44<Flashfire42>https://www.msn.com/en-au/money/other/billionaire-backed-sports-streamer-buys-foxtel/ar-AA1wkBwC?ocid=BingNewsSerp gonna archive some foxtel stuff I guess
01:17:49szczot3k9 (szczot3k) joins
01:21:22szczot3k quits [Ping timeout: 260 seconds]
01:21:22szczot3k9 is now known as szczot3k
01:58:37<pabs>next year I want to move my AB monitoring off my PC, does anyone want to sponsor server resources for it?
01:58:47<pabs>websocket is ~1TB/month, plus uploads to transfer. minimum 34G disk (some to be compressed), growing. CPU usage constant (multiple jq doing regexen). RAM is negligible. needs Debian with curl from backports, plus a few other packages
01:58:50<pabs>https://wiki.archiveteam.org/?title=ArchiveBot/Monitoring
02:08:08<monoxane>fuck me thats a lot of data on a ws
02:08:35<monoxane>id say yes but you wont get 5 9s of uptime since I dont have my 40g memebox in colo anymore
02:08:36<kiska>It's not really
02:09:02<kiska>The tracker websocket is about 12TB/month
02:09:11<monoxane>wew
02:09:14<TheTechRobo>o_O
02:09:20<monoxane>12t of json is crazy
02:09:26<monoxane>are we archiving it? /s
02:12:58<kiska>If you consider some downsampling of the data to be "archiving"
02:12:59<pabs>also there is lots of scope to increase the amount of things being monitored for (add ideas to the wiki), which means more disk, more CPU. my PC is almost full, and is too old and too inadequately cooled to really add anything more
02:13:26<kiska>What specs are ideal?
02:14:26<kiska>I may have a dual Xeon server I can start up for this purpose
02:14:44<pabs>recent CPUs with higher clock speed I guess, whatever runs jq regex the fastest. and adequate cooling for constant CPU usage
02:15:16<pabs>my PC is from a dumpster, Intel i5 CPU manufactured in like 2013
02:15:43<@JAA>Have you implemented the pre-filtering yet? That should drop CPU usage considerably.
02:16:03<pabs>I did swap around the regexen, helped a bit
02:16:13<@JAA>No, filtering before feeding to jq at all.
02:16:21<pabs>ah, not yet no
02:16:31<pabs>I need to try rewriting stuff in Python too
02:17:11<@JAA>If you do, don't use the built-in `json` module; it's pure Python and very slow.
02:17:55<@JAA>(Fine for basic usage, not fine for mass parsing.)
02:18:13<@JAA>But yeah, filtering with simple fixed-string grep will be a massive improvement.
02:18:53<pabs>fixed-string grep might not be feasible for all of them, especially the code one, which IIRC is the biggest CPU usage right now
02:19:36<@JAA>As long as there are some fixed substrings in your regex, it will help.
02:19:51<pabs>hmm, I wonder what katia's repeater uses for json. gunicorn uses a bit of CPU too
02:20:30<@JAA>Nothing, probably. I believe it just forwards the upstream data.
02:22:35skarz quits [Quit: Ooops, wrong browser tab.]
02:22:48<pabs>ah yep socket.recv_string
02:23:22<pabs>ah thats the zmq bit, ws is ws.recv()
02:24:30cow_2001 quits [Quit: ✡]
02:25:02<pabs>so you mean like this: grep -F -e foo -e bar -e baz
02:25:07<@JAA>Yes
02:25:32cow_2001 joins
02:27:07<pabs>!remindme 5d add to AB monitoring: grep -F -e foo -e bar -e baz
02:27:07<eggdrop>[remind] ok, i'll remind you at 2024-12-28T02:27:07Z
04:28:25Naruyoko joins
06:44:00BlueMaxima quits [Read error: Connection reset by peer]
07:09:37Unholy23619246453771312 quits [Ping timeout: 260 seconds]
07:27:04Wohlstand (Wohlstand) joins
08:08:51<h2ibot>Himond000 edited Deathwatch (+189, /* 2026 */ add nakayamamiho.com): https://wiki.archiveteam.org/?diff=54086&oldid=54082
09:52:23Wohlstand quits [Ping timeout: 260 seconds]
10:04:34loug8318142 joins
10:14:18Radzig2 joins
10:16:18Radzig quits [Ping timeout: 260 seconds]
10:16:18Radzig2 is now known as Radzig
10:46:13ducky (ducky) joins
11:22:27Island quits [Read error: Connection reset by peer]
11:24:40ShastaTheDog joins
11:43:45MrMcNuggets (MrMcNuggets) joins
12:00:01Bleo182600722719623 quits [Quit: The Lounge - https://thelounge.chat]
12:02:47Bleo182600722719623 joins
12:48:43Barto quits [Quit: WeeChat 4.4.3]
12:52:53SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
12:54:53SkilledAlpaca418962 joins
13:00:13Barto (Barto) joins
13:57:12PredatorIWD2 quits [Read error: Connection reset by peer]
14:19:53IDK (IDK) joins
14:22:07PredatorIWD2 joins
15:18:29PredatorIWD2 quits [Read error: Connection reset by peer]
15:24:02PredatorIWD2 joins
15:28:02Barto quits [Client Quit]
15:28:18Barto (Barto) joins
17:03:58MrMcNuggets quits [Quit: WeeChat 4.3.2]
17:09:43<pabs>https://newsroom.lexmark.com/2024-12-23-Xerox-to-Acquire-Lexmark
17:21:42<mikolaj|m>https://wearesaudis.net is a web forum ran by the person responsible for the recent terrorist attack in Magdeburg. It appears to be still up. Was there any attempt to archive it? I randomly checked and found some threads there to be missing on the Wayback Machine's Web interface
17:26:18eleanorsilly (eleanorsilly) joins
17:26:49<eleanorsilly>hey, can someone help me with installing wget-lua? I'm getting the following error:
17:26:52<eleanorsilly>gettext infrastructure mismatch: using a Makefile.in.in from gettext version 0.20 but the autoconf macros are from gettext version 0.22
17:27:11<eleanorsilly>I am running gettext 0.23 and it is my only install on this system
17:28:52IDK quits [Quit: Connection closed for inactivity]
17:38:07HP_Archivist (HP_Archivist) joins
17:43:24IDK (IDK) joins
18:54:05BearFortress quits []
19:40:33eleanorsilly leaves
19:41:58<mikolaj|m>Ryz: thanks
20:06:39BearFortress joins
20:31:27<@JAA>mikolaj|m: Already archived that. :-)
20:36:21AlsoHP_Archivist joins
20:36:52Wohlstand (Wohlstand) joins
20:40:07HP_Archivist quits [Ping timeout: 252 seconds]
20:52:36Webuser2291391 joins
20:58:34Guest54 joins
21:03:13Guest54 quits [Ping timeout: 260 seconds]
21:09:08Webuser2291391 quits [Client Quit]
21:21:32Island joins
21:48:52IDK quits [Quit: Connection closed for inactivity]
22:08:35BornOn420 quits [Remote host closed the connection]
22:09:11BornOn420 (BornOn420) joins
22:47:29Mateon1 joins
23:01:43klg quits [Quit: brb]
23:04:14klg (klg) joins
23:21:31<AlsoHP_Archivist>pokechu22: Continue here?
23:21:50<AlsoHP_Archivist>How would you propose to grab?
23:22:57BlueMaxima joins
23:27:52<Czechball>Hello. Any ideas how to archive webpages that maliciously prevent archiving?
23:28:06<Czechball>it's detecting the IAs user agent I think
23:28:45<Czechball>I'm collecting Steam scam / phishing websites for fun in my saved web archives, but recently the scammers have caught up and prevent archiving
23:30:57<Czechball>IA at least managed to get a screenshot of the actual phishing page: https://web.archive.org/web/20241223232447/http://web.archive.org/screenshot/http://steam.workshopmodel.com/
23:31:26<Czechball>but when visiting the archived url, it will just redirect to an actual steam workshop page
23:31:52<Czechball>I also tried archiving with archive.ph but got the same result
23:32:27<Czechball>obvious disclaimer, do not visit the live archive url lol
23:32:34<Czechball>*archived
23:32:55<pokechu22>I tried the live URL and it just redirected to https://steamcommunity.com/sharedfiles/filedetails/?id=3246613760
23:33:22<pokechu22>so it might be a thing where they just randomly make it redirect?
23:33:49<Czechball>interesting... maybe they killed it while I was trying to save it?
23:34:05<Czechball>so I guess the IA screenshot is also after redirect
23:34:14<pokechu22>Yeah, that's my guess as to what happened
23:34:38<pokechu22>IA screenshots are of the redirect target, but show on the redirect source, which is a bit jank
23:35:56<Czechball>I received the phishing DM ~3 hours before I tried archiving it, maybe it was just a short campaign. Or maybe the campaign hasn't even started yet
23:37:53<pokechu22>AlsoHP_Archivist: with the way https://www.silverfast.com/get_demo/en.html uses cookies I don't think it's really feasible to save it at all :/
23:38:41<pokechu22>That said it also looks like there aren't that many different products; it just makes you choose a specific scanner and model but then seems to recommend the same thing
23:58:55<AlsoHP_Archivist>pokechu22: Are you sure it's not recommended scanner/hardware and the language? I figured there were many different combos you could select.
23:59:07<AlsoHP_Archivist>Could be wrong though
23:59:40<pokechu22>I got the same links for a few different scanners. It's possible that it gives other links for very specific scanners but checking that seems like it would be a pain