00:11:43scurvy_duck quits [Ping timeout: 260 seconds]
00:29:00etnguyen03 (etnguyen03) joins
00:33:36elliewebz joins
00:34:13elliewebz quits [Client Quit]
00:45:39@imer quits [Quit: Oh no]
00:46:02sec^nd quits [Ping timeout: 276 seconds]
00:46:13imer (imer) joins
00:46:14@ChanServ sets mode: +o imer
00:49:52sec^nd (second) joins
00:52:50cascode quits [Ping timeout: 250 seconds]
00:53:32cascode joins
00:54:23etnguyen03 quits [Client Quit]
01:00:31cascode quits [Read error: Connection reset by peer]
01:01:02cascode joins
01:37:00scurvy_duck joins
01:40:17<@arkiver>we're going to slowly start other projects again, which may be at the expense of #UncleSamsArchive a bit, will try to prioritize better there
01:45:26Webuser760773 joins
01:46:29etnguyen03 (etnguyen03) joins
01:54:13<pabs>c3manu: re web search scraping, I have a few browser console scripts for Google/Bing/Yandex. the little-things stuff wasn't working for me, probably because of captchas IIRC
01:54:37<pabs>(also Yandex gives captchas often, even with the browser console scripts)
02:08:45<pabs>also https://wiki.archiveteam.org/index.php/Site_exploration#Search_Engines
02:09:06cascode quits [Ping timeout: 250 seconds]
02:13:20cascode joins
02:17:20pabs did a scrape of nfdc.faa.gov and sent it to #UncleSamsArchive
02:19:17cascode quits [Read error: Connection reset by peer]
02:19:30cascode joins
02:19:47<@arkiver>pabs: is that something you can also !ao < in AB?
02:20:21<pabs>not sure what c3manu had planned for it, but I guess so yes. some of it might be better to !a like the airport codes section
02:21:32<@arkiver>!a < can still be used as well i think
02:24:57Hackerpcs quits [Quit: Hackerpcs]
02:26:08<h2ibot>PaulWise edited Site exploration (+656, add browser console scripts): https://wiki.archiveteam.org/?diff=54386&oldid=50868
02:28:40Hackerpcs (Hackerpcs) joins
02:36:49<pabs>I put it into !ao < for now
02:47:49Webuser760773 quits [Client Quit]
02:55:37<Hans5958>Someone should tell r/datahoarder that donating to AT is also an option. At this point there are just too many warriors that the better option would be helping to scale the targets instead
02:56:58<Hans5958>Or, yeah, put the "Donate" link more prominent
02:57:03<nicolas17>++
02:57:05<Hans5958>on the wiki
02:58:45<that_lurker>Donating to IA would be ideal as well. They are the ones that take the end bulk of data
03:05:19Webuser179214 joins
03:05:43Webuser179214 quits [Client Quit]
03:07:07<Hans5958>That one is true but I think speed is also important, since sites are going down, which means the targets should also be the priority
03:07:27<Hans5958>(by the way, I apologizes for being a busybody. Donated 1$ since my currency sucks)
03:19:19BlueMaxima quits [Read error: Connection reset by peer]
03:33:03etnguyen03 quits [Client Quit]
03:35:18scurvy_duck quits [Ping timeout: 260 seconds]
03:45:43etnguyen03 (etnguyen03) joins
03:48:02Webuser489155 joins
03:48:49Webuser489155 quits [Client Quit]
04:26:54etnguyen03 quits [Remote host closed the connection]
04:27:48nicolas17 quits [Ping timeout: 260 seconds]
04:31:23nicolas17 joins
04:35:28nicolas17 quits [Client Quit]
04:35:44nicolas17 joins
04:45:46i_have_n0_idea quits [Quit: The Lounge - https://thelounge.chat]
04:46:15i_have_n0_idea (i_have_n0_idea) joins
04:48:01lennier2_ joins
05:09:41scurvy_duck joins
05:18:57Roboguy joins
05:22:22<Roboguy>Hi! I was wondering if anyone has an archive of nifty before it went down last year. I'm looking for a specific blogpost made by Naoyuki Katoh on the development of a model kit (http://homepage2.nifty.com/NaoKatoh/making/starship_troopers/episode_8/index.html).
05:22:28Webuser637326 quits [Quit: Ooops, wrong browser tab.]
05:26:43Roboguy quits [Client Quit]
05:29:56Webuser482381 joins
05:35:22Roboguy joins
05:39:12Webuser482381 quits [Client Quit]
05:39:12Roboguy quits [Client Quit]
05:47:34earl joins
05:54:54Webuser457588 joins
05:56:26SootBector quits [Remote host closed the connection]
05:56:42SootBector (SootBector) joins
06:07:38<h2ibot>PaulWise edited Flickr (+379, document current status with AB /cc arkiver): https://wiki.archiveteam.org/?diff=54387&oldid=53134
06:13:32some_body quits [Quit: Leaving.]
06:14:28some_body joins
06:15:31SootBector quits [Remote host closed the connection]
06:15:48SootBector (SootBector) joins
06:17:50katocala quits [Ping timeout: 250 seconds]
06:18:16katocala joins
06:22:04qinplus_phone joins
06:26:56katocala quits [Ping timeout: 250 seconds]
06:27:09katocala joins
06:33:22DogsRNice quits [Read error: Connection reset by peer]
06:44:42some_body6 joins
06:45:08some_body quits [Ping timeout: 250 seconds]
06:45:08some_body6 is now known as some_body
06:56:39<pabs>Roboguy's URL has "This URL has been excluded from the Wayback Machine." in the WBM
06:58:18scurvy_duck quits [Ping timeout: 260 seconds]
07:39:14Island quits [Read error: Connection reset by peer]
07:43:31Naruyoko5 joins
07:47:53Naruyoko quits [Ping timeout: 260 seconds]
08:39:39<h2ibot>Bzc6p edited Kephost.com (+65, /* Archiving */ add total size of archives): https://wiki.archiveteam.org/?diff=54388&oldid=51294
08:46:40<h2ibot>Bzc6p edited Network.hu (-26, fix archives links): https://wiki.archiveteam.org/?diff=54389&oldid=51085
08:52:38<pabs>hmm, this URL is excluded from the WBM https://web.archive.org/web/20250209084927/https://www.dropbox.com/s/zqwuy8bfrw65156/Shoup%20CV%20March%202023.pdf?dl=1
08:52:48<pabs>is Dropbox more generally excluded?
08:53:26<tzt>pabs: yes, see https://wiki.archiveteam.org/index.php/List_of_websites_excluded_from_the_Wayback_Machine/Partial_exclusions
08:53:54SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
08:54:01<pabs>ugh...
08:54:37pabs wonders why the search doesn't find it https://wiki.archiveteam.org/index.php?search=dropbox&title=Special%3ASearch&fulltext=Search
09:03:44<h2ibot>Thezt edited Deathwatch (+105, Add WATMM): https://wiki.archiveteam.org/?diff=54390&oldid=54367
09:05:24SkilledAlpaca418962 joins
09:06:45<h2ibot>Thezt edited Deathwatch (+91, Add reference): https://wiki.archiveteam.org/?diff=54391&oldid=54390
09:12:45earl quits []
10:21:53qinplus_phone quits [Quit: Connection closed for inactivity]
11:41:41Gadelhas5628737 quits [Quit: auf Wiedersehen]
12:00:06Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat]
12:02:52Bleo18260072271962345 joins
12:04:31<@OrIdow6>!remind 8h me look into foro3djuegos
12:04:32<eggdrop>[remind] unable to parse time: unable to convert date-time string "me": syntax error (characters 0-1)
12:08:14fangfufu quits [Quit: ZNC 1.8.2+deb3.1+deb12u1 - https://znc.in]
12:12:59fangfufu joins
12:18:40<@OrIdow6>!remindme 8h look into foro3djuegos
12:18:41<eggdrop>[remind] ok, i'll remind you at 2025-02-09T20:18:40Z
12:34:15SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
12:34:46SkilledAlpaca418962 joins
12:40:03sec^nd quits [Remote host closed the connection]
12:46:55Gadelhas5628737 joins
13:01:36carleski joins
13:02:05carleski leaves
13:02:43sec^nd (second) joins
13:09:52carleski joins
13:10:46carleski quits [Client Quit]
13:19:57CapyLord joins
13:22:27CapyLord quits [Client Quit]
13:28:30monoxane4 (monoxane) joins
13:30:18monoxane quits [Ping timeout: 250 seconds]
13:30:19monoxane4 is now known as monoxane
13:35:37etnguyen03 (etnguyen03) joins
13:43:22catbottom quits [Quit: ZNC 1.8.2+deb2+deb11u1 - https://znc.in]
13:43:36catbottom joins
13:45:21scurvy_duck joins
14:07:05catbottom quits [Client Quit]
14:07:49catbottom joins
14:14:00catbottom quits [Client Quit]
14:15:14catbottom joins
14:35:38scurvy_duck quits [Ping timeout: 260 seconds]
14:52:24scurvy_duck joins
14:52:57Gadelhas5628737 quits [Client Quit]
14:54:56Gadelhas5628737 joins
15:21:52caylin quits [Quit: Ping timeout (120 seconds)]
15:22:11caylin (caylin) joins
15:29:59Deewiant quits [Remote host closed the connection]
15:31:10Deewiant (Deewiant) joins
15:55:01etnguyen03 quits [Client Quit]
15:55:22<Blueacid>There was a brief conversation yesterday in the UncleSam Archive channel about the Targets & how much they punish their disks. I was wondering: how many targets (typically) are there, and how much disk space do they typically consume when busy? My reasoning for asking being: Is there any worth in using the Hetzner S3-alike service for uploads? (e.g. get presigned URL for each warrior 'put' and
15:55:28<Blueacid>use that s3 for staging). Free ingress, free API calls for puts (it seems?) and free bandwidth out to servers hosted in the same Hetzner DC. So for $less-than-a-big-ssd-server it might be worth a thought? Aware this may well have been considered & dismissed, but wanted to ask in case it hadn't been thought of?
15:55:52<Blueacid>(and if ~50TB needs to hole up somewhere for a day or two, it might not be that expensive)
15:56:53<Blueacid>I was pricing up buying a server to offer, but wondered about just donating cash, given many of the target operators (seem to be?) using Hetzner and DO
16:01:18etnguyen03 (etnguyen03) joins
16:03:39Gadelhas5628737 quits [Client Quit]
16:03:56Gadelhas5628737 joins
16:07:01<@imer>Blueacid: https://wiki.archiveteam.org/index.php/Dev/Targets
16:07:02<@imer>S3 isn't a great fit since we need to pack individual uploads into larger files for uploading to archive.org
16:08:06<@imer>you can throw us money for general operational costs here https://opencollective.com/archiveteam (or donate to archive.org - helps us indirectly too since they host the data)
16:08:49<Blueacid>Yeah, the unsteereed thought I'd had was: warriors --> s3-esque service --> targets pull from here to repack --> Upload to archive.org
16:10:09<Blueacid>Thinking: the connections from warriors to s3 are then the 'problem' of Hetzner to deal with (so the cpu / firewalling / whatever is on them), then when a target is ready it immediately grabs, say, 100GB of warcs quickly over the Hetzner LAN, repacks, and uploads
16:10:11<@imer>ingest from warriors & packing usually isn't the problem - so that wouldn't do much for us aside from add complexity
16:10:18<Blueacid>Aha very fair :)
16:10:27<Blueacid>I appreciate you taking the time to listen and to explain <3
16:12:17<Blueacid>Oh, one further thought was: If targets are not ready to push to IA, or if they're already flat out, then S3 is what sits holding the data until it's drawn down. Don't know if that solves any problems or merely makes new & different ones :D
16:12:34etnguyen03 quits [Client Quit]
16:13:23<@imer>np, main bottleneck is usually uploading data to IA (for a variety of reasons, currently things are going pretty well and it's basically going as fast as IA can handle)
16:14:20<@imer>Blueacid: we do something like that for our temporary storage, yeah. do have to get that unloaded at some point though, can't keep piling things on top :D
16:14:22<Blueacid>Yeah I read they're limited to circa 12gbit ingest, and a finite number of connections
16:15:49<Blueacid>And yep, if the 12gbit limit at the IA stays as-is for the forseeable, that might start to become a challenge as projects get bigger (I mean, Livestream + US Gov both seem pretty large)
16:17:15<@imer>if only these video services would stop shutting down left and right. lol
16:17:15@imer looks at #dailydemotion
16:19:29<Blueacid>Haha yep, #dailydemotion is going to be tricky..
16:20:18<Blueacid>I work adjacent to video streaming for $dayjob, it's scary how large some of the SANs we have at work are.. multiple petabytes is seen as a 'meh, oh yeah' sort of quantity
16:24:08scurvy_duck quits [Ping timeout: 260 seconds]
16:31:33scurvy_duck joins
16:40:58scurvy_duck quits [Ping timeout: 250 seconds]
16:41:50devkev joins
16:43:19<Hans5958>Wanted to drop this so these can be backed up. Some Indonesian ministries were split by the elected president (PS. no mention of it on the "Elections" tab). This is the compilation of the priority ones to backup: https://transfer.archivete.am/inline/LDRMP/ministryid.md
16:44:43etnguyen03 (etnguyen03) joins
16:48:53dreamstream joins
16:51:29dreamstream quits [Client Quit]
17:07:45Webuser949870 joins
17:08:04Webuser949870 quits [Client Quit]
17:38:15DogsRNice joins
17:50:40^ quits [Remote host closed the connection]
17:50:49^ (^) joins
17:57:09Sidpatchy quits [Quit: The Lounge - https://thelounge.chat]
18:18:52devkev quits [Remote host closed the connection]
18:19:25devkev joins
18:23:06devkev quits [Remote host closed the connection]
18:24:01devkev joins
18:28:58devkev quits [Ping timeout: 260 seconds]
18:29:26devkev joins
18:34:04devkev quits [Ping timeout: 250 seconds]
18:42:20devkev joins
18:43:13datechnoman quits [Quit: Ping timeout (120 seconds)]
18:43:35datechnoman (datechnoman) joins
18:55:13devkev quits [Ping timeout: 260 seconds]
19:04:59Hackerpcs quits [Quit: Hackerpcs]
19:07:58devkev joins
19:09:39Hackerpcs (Hackerpcs) joins
19:11:16etnguyen03 quits [Client Quit]
19:12:38devkev quits [Ping timeout: 250 seconds]
19:13:06devkev joins
19:15:37devkev quits [Remote host closed the connection]
19:15:44devkev joins
19:17:23s-crypt quits [Ping timeout: 260 seconds]
19:17:23Ryz2 quits [Ping timeout: 260 seconds]
19:17:23Flashfire42 quits [Ping timeout: 260 seconds]
19:17:23kiska quits [Ping timeout: 260 seconds]
19:26:16Ryz2 (Ryz) joins
19:26:27s-crypt (s-crypt) joins
19:27:01Flashfire42 joins
19:27:12kiska (kiska) joins
19:33:13moth_ quits [Quit: Quit]
19:43:39sec^nd quits [Remote host closed the connection]
19:44:04sec^nd (second) joins
20:06:07<devkev>hello, is there any benefit to running a project-specific docker container over warrior-dockerfile with a project selected?
20:07:42<devkev>I was following the docker instructions after running a specific project and just stumbled upon the warrior-dockerfile which only appears to be mentioned in the podman section not the docker one
20:10:52devkev quits [Remote host closed the connection]
20:11:25devkev joins
20:14:04<TheTechRobo>Project-specific ones are much lower overhead, can have their concurrency set up to 20, and output logs to stdout
20:14:21<TheTechRobo>They are much better for running projects in bulk
20:14:28<TheTechRobo>The Warrior is a nice 'set it and forget it' thing
20:16:18devkev quits [Ping timeout: 260 seconds]
20:18:42<eggdrop>[remind] OrIdow6: look into foro3djuegos
20:20:09devkev joins
20:25:49etnguyen03 (etnguyen03) joins
20:32:05devkev quits [Remote host closed the connection]
20:32:33devkev joins
20:37:18devkev quits [Ping timeout: 260 seconds]
20:40:38scurvy_duck joins
20:46:05BornOn420 quits [Remote host closed the connection]
20:46:55BornOn420 (BornOn420) joins
20:47:02westonal joins
20:50:12devkev joins
20:53:41<westonal>Hi all, I made myself a command line leaderboard watcher, I found the web-based one too noisy and hard to find myself or see my rank. https://github.com/westonal/archive-warrior-leaderboard-cli hope someone else enjoys it. If anyone wants to contribute, main thing that's not idea about it is it polls a legacy API rather than using the web socket.
20:54:28devkev quits [Ping timeout: 250 seconds]
20:57:22lunik11 quits [Quit: :x]
21:01:06devkev joins
21:04:08ats quits [Ping timeout: 260 seconds]
21:06:01ats (ats) joins
21:07:00Island joins
21:31:12etnguyen03 quits [Client Quit]
21:31:32lunik11 joins
21:32:32dvb (dvb) joins
21:32:33lunik11 quits [Client Quit]
21:32:49lunik11 joins
21:39:32devkev quits [Ping timeout: 250 seconds]
21:39:51lennier2 joins
21:42:38lennier2_ quits [Ping timeout: 260 seconds]
21:43:46devkev joins
21:44:11devkev quits [Remote host closed the connection]
21:44:18devkev joins
21:58:54BlueMaxima joins
22:06:15devkev_ joins
22:07:43etnguyen03 (etnguyen03) joins
22:08:13devkev_ quits [Client Quit]
22:08:23devkev_ joins
22:09:03devkev_ quits [Client Quit]
22:10:16devkev_mobile (devkev) joins
22:13:39devkev quits [Remote host closed the connection]
22:14:16devkev_mobile quits [Remote host closed the connection]
22:15:11devkev (devkev) joins
22:19:35westonal quits [Client Quit]
22:19:47loug8318142 quits [Quit: The Lounge - https://thelounge.chat]
22:19:50devkev quits [Ping timeout: 250 seconds]
22:21:21devkev (devkev) joins
22:22:24devkev_mobile (devkev) joins
22:22:57scurvy_duck quits [Remote host closed the connection]
22:23:16scurvy_duck joins
22:23:29hexa- quits [Quit: WeeChat 4.4.3]
22:24:20devkev_mobile quits [Remote host closed the connection]
22:24:57hexa- (hexa-) joins
22:26:45devkev_mobile (devkev) joins
22:26:58devkev quits [Ping timeout: 260 seconds]
22:28:26westonal joins
22:29:56devkev (devkev) joins
22:30:34devkev_mobile quits [Remote host closed the connection]
22:40:28etnguyen03 quits [Client Quit]
22:43:18scurvy_duck quits [Ping timeout: 260 seconds]
22:45:04balrog quits [Quit: Bye]
22:45:24devkev quits [Ping timeout: 250 seconds]
22:50:18cascode quits [Ping timeout: 260 seconds]
22:50:54cascode joins
22:53:34balrog (balrog) joins
22:58:09devkev (devkev) joins
23:02:44devkev quits [Ping timeout: 250 seconds]
23:03:12devkev (devkev) joins
23:07:48devkev quits [Ping timeout: 260 seconds]
23:10:46devkev (devkev) joins
23:10:57devkev quits [Client Quit]
23:12:58devkev (devkev) joins
23:16:00cascode quits [Read error: Connection reset by peer]
23:16:13cascode joins
23:16:53dontwashyourhands (dontwashyourhands) joins
23:20:39scurvy_duck joins
23:32:12devkev quits [Ping timeout: 250 seconds]
23:37:10devkev (devkev) joins
23:41:38devkev quits [Ping timeout: 260 seconds]
23:41:38scurvy_duck quits [Ping timeout: 260 seconds]
23:45:31devkev (devkev) joins
23:46:40etnguyen03 (etnguyen03) joins
23:52:14<pabs>AB is currently archiving over a million pets. is this maybe excessive? for eg https://www.savethislife.com/900164000215773
23:52:49<that_lurker>every pet is important :-)
23:53:34<pabs>(their sitemap contains every single microchip ID)
23:53:40lunik11 quits [Client Quit]
23:53:52devkev quits [Ping timeout: 250 seconds]
23:53:56lunik11 joins
23:54:03pabs always walks into giant AB jobs...
23:54:37<nicolas17>how long has brickshelf been running? D:
23:55:24<pabs>https://po.savethislife.com/image-pets/0A12610808.jpg
23:55:28<nulldata>Gotta climb the Shameboard!
23:56:14<nicolas17>ugh I forgot the shameboard was showing bytes
23:56:15etnguyen03 quits [Client Quit]
23:56:24<Flashfire42>its fine its fine.
23:56:24<nicolas17>instead of a more readable unit