00:08:31utulien joins
00:10:16Webuser724606 quits [Client Quit]
00:10:53utulien is now known as utulien_
00:11:28cascode quits [Ping timeout: 260 seconds]
00:12:12cascode joins
00:19:15etnguyen03 quits [Client Quit]
00:28:44HP_Archivist (HP_Archivist) joins
00:36:27scurvy_duck joins
00:38:18utulien_ quits [Ping timeout: 260 seconds]
00:49:24etnguyen03 (etnguyen03) joins
00:55:33Webuser419634 quits [Client Quit]
01:07:06missaustraliana joins
01:07:07<eggdrop>[tell] missaustraliana: [2024-02-28T05:46:36Z] <fireonlive> thanks! grabbed :)
01:07:49<missaustraliana>datechnoman did you just point ur machines to usgovernment
01:08:00<datechnoman>mayyybbbeeee
01:08:19<missaustraliana>i js saw that real time ;-;
01:08:35<missaustraliana>recording too i'm cryinggg
01:08:36<datechnoman>Im only half spun up
01:08:44loug8318142 quits [Quit: The Lounge - https://thelounge.chat]
01:08:46<datechnoman>Will see how it goes for a bit
01:09:03<missaustraliana>i literally said, he rlly said "partys over" and swept in
01:09:28HP_Archivist quits [Read error: Connection reset by peer]
01:09:59<missaustraliana>is it still scraping the urls? because the to do is now going down
01:10:32HP_Archivist (HP_Archivist) joins
01:10:55<datechnoman>We just aren't discovering as much as we are processing basically
01:11:12<missaustraliana>okay so we are outrunning the to do?
01:11:15<missaustraliana>if so thats great
01:12:09<missaustraliana>datechnoman u hit top 5 already i'm cryin
01:13:00<datechnoman>missaustraliana - https://tracker.archiveteam.org/urls/
01:13:03<datechnoman>Look at that :P
01:13:09<datechnoman>Thats where I normally hang out
01:13:39<missaustraliana>samesies, i do it under fullpwnmedia though. how long have you been archiving urls for?
01:15:47HP_Archivist quits [Read error: Connection reset by peer]
01:17:02HP_Archivist (HP_Archivist) joins
01:17:04<@JAA>That message from fireonlive... :-|
01:17:18<TheTechRobo>fireonlive++
01:17:19<eggdrop>[karma] 'fireonlive' now has 869 karma!
01:19:30<@OrIdow6>fireonlive++
01:19:31<eggdrop>[karma] 'fireonlive' now has 870 karma!
01:20:58<missaustraliana>what is eggdrop and why did it text me "thanks! grabbed :)" ;-;
01:21:53<@JAA>missaustraliana: eggdrop's a bot. In this case, fireonlive used it to send a message to you the next time you joined the channel because you weren't around when he said that.
01:22:48<missaustraliana>um.. february 28 2024..
01:22:50<missaustraliana>yikies
01:24:01<@JAA>The reason it makes me sad is that fireonlive passed away last year. Literally a message from the grave. :-(
01:24:13<missaustraliana>WHAT
01:24:21<missaustraliana>i loved fireonlive
01:25:48<missaustraliana>when did that happen and why if you don't mind me asking
01:27:30<nicolas17>august 2024 "apparently he had a stroke weekend before last and is still barely responsive."
01:27:52<h2ibot>Nulldata edited US Government (+157, In progress...): https://wiki.archiveteam.org/?diff=54316&oldid=54313
01:28:32<nulldata>fireonlive++
01:28:33<eggdrop>[karma] 'fireonlive' now has 871 karma!
01:28:37<missaustraliana>noooo thats rlly sad :(
01:29:13<pokechu22>fireonlive++
01:29:13<eggdrop>[karma] 'fireonlive' now has 872 karma!
01:29:22<DigitalDragons>fireonlive++
01:29:23<eggdrop>[karma] 'fireonlive' now has 873 karma!
01:29:57<@JAA>He had a stroke in August, was in hospital for several weeks, and passed away on 2024-09-06.
01:31:07<missaustraliana>i'm actually like shocked, i couldnt tell if you were joking or not. i'm so sorry guys :(
01:32:47HP_Archivist quits [Read error: Connection reset by peer]
01:36:54<nicolas17>missaustraliana: for a while we were hoping *he* would join IRC and say it was all a joke
01:38:15<missaustraliana>that is actually so awful.. i apologise for bringing this up ;-;
01:39:28<@JAA>We all knew he wouldn't've made such a cruel joke, but yeah, that irrational feeling was certainly there.
01:39:32HP_Archivist (HP_Archivist) joins
01:39:35cascode quits [Read error: Connection reset by peer]
01:39:56cascode joins
01:41:38scurvy_duck quits [Ping timeout: 250 seconds]
01:42:07<missaustraliana>nooo this is actually making me sad
01:42:17HP_Archivist quits [Read error: Connection reset by peer]
01:42:30<missaustraliana>and i didnt speak to him other than just this channel
01:42:41<missaustraliana>i hope you guys are ok :(
01:49:36HP_Archivist (HP_Archivist) joins
01:54:45kitonthenet quits [Client Quit]
01:54:58etnguyen03 quits [Client Quit]
01:58:46dave quits [Remote host closed the connection]
01:59:08dave (dave) joins
02:02:40surrogatekey joins
02:08:27surrogatekey quits [Client Quit]
02:20:17Radzig quits [Remote host closed the connection]
02:21:43etnguyen03 (etnguyen03) joins
02:21:52Webuser137267 joins
02:25:30Radzig joins
02:30:45sec^nd quits [Remote host closed the connection]
02:30:46SootBector quits [Write error: Broken pipe]
02:31:05SootBector (SootBector) joins
02:31:07sec^nd (second) joins
02:37:31<@OrIdow6>JAA: On page move settings, it looks like letting automoderateds move pages is the default behavior; maybe the configuration has followed this "tip" and disabled it? https://www.mediawiki.org/wiki/Extension:Moderation#Additional_anti-vandalism_tips
02:46:49etnguyen03 quits [Client Quit]
02:46:58HP_Archivist quits [Read error: Connection reset by peer]
02:49:56HP_Archivist (HP_Archivist) joins
02:52:47HP_Archivist quits [Read error: Connection reset by peer]
02:58:24HP_Archivist (HP_Archivist) joins
02:58:26etnguyen03 (etnguyen03) joins
03:01:17HP_Archivist quits [Read error: Connection reset by peer]
03:09:56th3z0l4 joins
03:12:18th3z0l4_ quits [Ping timeout: 260 seconds]
03:24:33nine quits [Ping timeout: 260 seconds]
03:25:23<missaustraliana>JAA umm yalls charts for urlteam2 and urls are cooked
03:25:40<missaustraliana>https://tracker.archiveteam.org/urls/charts.json
03:27:11<missaustraliana>oh it's not working at all
03:27:47<@JAA>missaustraliana: charts.json has been disabled for years.
03:28:15<missaustraliana>oh! yeah it's also not fetching anything on warrior
03:30:30<missaustraliana>yeah is urls alright
03:31:12<missaustraliana>i can't grab any tasks
03:34:37nine joins
03:34:37nine quits [Changing host]
03:34:37nine (nine) joins
03:35:00<TheTechRobo>URLs is paused to make room for the US government archival
03:35:48theSquashSH joins
03:36:36<theSquashSH>cross-posting here for more visibility: If I wanted to manually create some compliant warcs for a whole domain, should I use wget, wget-lua, or wpull? wget --server-response --no-verbose --adjust-extension --convert-links --force-directories --backup-converted --compression=auto -e robots=off --restrict-file-names=unix --timeout=60 --warc-file=warc
03:36:36<theSquashSH>--page-requisites --no-check-certificate --no-hsts --mirror --no-parent --recursive --level=9 --warc-file="$(date +%s)" https://gml.noaa.gov/aftp/
03:38:53<TheTechRobo>Wget writes WARCs that a lot of tools can't handle, so avoid that. (Specifically, it puts angle brackets around the URLs, which was technically required by the standard at the time, but nobody else did it.) Wget-Lua and wpull are ok.
03:39:41<TheTechRobo>wpull standalone is kinda painful. It only works on EOL Python versions, for example.
03:40:16<theSquashSH>ok, I'm trying with wget-lua from its Dockerfile on github now
03:40:21<TheTechRobo>I'd suggest grab-site (https://github.com/ArchiveTeam/grab-site) for grabbing a page; it wraps a slightly updated version of wpull, has a dashboard, and allows you to control the crawl when it's running. Otherwise, I'd suggest Wget-Lua just because it's nicer to use.
03:40:30<TheTechRobo>s/nicer to use/nicer to use than wpull/
03:52:40<theSquashSH>ok wow grab-site is awesome, setup was easy and it's now downloading at >1GB/min
03:52:48<theSquashSH>way faster than wget
03:57:15tech234a (tech234a) joins
04:03:51etnguyen03 quits [Client Quit]
04:07:46etnguyen03 (etnguyen03) joins
04:20:33nine quits [Ping timeout: 260 seconds]
04:27:07skyrocket joins
04:31:17skyrocket quits [Client Quit]
04:35:02Webuser662749 joins
04:35:08lennier2_ quits [Ping timeout: 260 seconds]
04:36:02skyrocket joins
04:41:20lennier2_ joins
04:46:03missaustraliana quits [Quit: Ooops, wrong browser tab.]
05:17:20Webuser662749 quits [Client Quit]
05:33:58Sidpatchy (Sidpatchy) joins
05:39:32icedice quits [Quit: Leaving]
05:47:43etnguyen03 quits [Remote host closed the connection]
05:55:46scurvy_duck joins
06:09:49theSquashSH quits [Client Quit]
06:19:05DogsRNice quits [Read error: Connection reset by peer]
06:29:40earl joins
06:42:56<h2ibot>Himond000 edited Deathwatch (+522, /* 2025 */ add nonbirinakai.co.jp): https://wiki.archiveteam.org/?diff=54317&oldid=54304
06:53:08<BornOn420>https://www.theatlantic.com/health/archive/2025/01/cdc-dei-scientific-data/681531/
07:14:52scurvy_duck quits [Ping timeout: 250 seconds]
07:27:58Miki_57 joins
07:48:13cascode quits [Ping timeout: 260 seconds]
07:48:30cascode joins
08:10:06nine joins
08:10:06nine quits [Changing host]
08:10:06nine (nine) joins
08:37:12lennier2_ quits [Ping timeout: 250 seconds]
08:38:36lennier2 joins
08:39:33cascode quits [Ping timeout: 260 seconds]
08:39:48cascode joins
08:53:25Terbium quits [Remote host closed the connection]
09:11:24Dango360_ (Dango360) joins
09:11:52Dango360 quits [Ping timeout: 250 seconds]
09:13:15meisnick joins
09:14:58_Dango360 (Dango360) joins
09:17:28monoxane quits [Ping timeout: 260 seconds]
09:18:28meisnick quits [Client Quit]
09:18:38Dango360_ quits [Ping timeout: 260 seconds]
09:20:03lennier2_ joins
09:20:31Terbium joins
09:22:33lennier2__ joins
09:22:42lennier2 quits [Ping timeout: 250 seconds]
09:25:38lennier2_ quits [Ping timeout: 260 seconds]
09:32:32Wohlstand (Wohlstand) joins
09:35:14monoxane (monoxane) joins
09:42:12<@OrIdow6>https://peppafanon.fandom.com/ being removed by Wikia on the 13th https://peppafanon.fandom.com/wiki/Message_Wall:Sonicthehedgehog223?threadId=4400000000000080657&useskin=fandomdesktop
09:45:21<@OrIdow6>Also another use of the interesting phrase "become lost media" which I think I have pointed out in the past
09:48:44<@OrIdow6>On a quick look I don't see instrutions on the wiki to save Wikia sites in AB
09:53:18T31M quits [Quit: ZNC - https://znc.in]
09:55:13T31M joins
10:08:13cascode quits [Ping timeout: 260 seconds]
10:08:27Wohlstand quits [Client Quit]
10:09:00cascode joins
10:09:26BennyOtt quits [Quit: ZNC 1.9.1 - https://znc.in]
10:10:52BennyOtt (BennyOtt) joins
10:13:13Bleo18260072271962345 quits [Quit: Ping timeout (120 seconds)]
10:13:26Bleo18260072271962345 joins
10:13:47BennyOtt quits [Remote host closed the connection]
10:14:14Island quits [Read error: Connection reset by peer]
10:15:36BennyOtt (BennyOtt) joins
10:20:28cascode quits [Ping timeout: 260 seconds]
10:21:06cascode joins
10:29:22Dango360 (Dango360) joins
10:31:10_Dango360 quits [Ping timeout: 250 seconds]
10:34:24Dango360_ (Dango360) joins
10:37:14Dango360 quits [Ping timeout: 250 seconds]
10:38:59<@arkiver>OrIdow6: oh yeah we should prbably add those instructions...
10:52:14Webuser917028 joins
11:04:38Webuser917028 leaves
11:04:48Megame1_ (Megame) joins
11:13:35Megame1_ quits [Remote host closed the connection]
11:13:58Megame1_ (Megame) joins
11:14:22Megame1_ quits [Client Quit]
11:19:14Megame (Megame) joins
11:20:50ducky quits [Read error: Connection reset by peer]
11:28:48Megame quits [Client Quit]
11:29:17Megame joins
11:31:50ducky (ducky) joins
11:34:19Megame quits [Client Quit]
11:49:15Bleo18260072271962345 quits [Client Quit]
11:49:29Bleo18260072271962345 joins
12:00:02Bleo18260072271962345 quits [Client Quit]
12:00:59Chewie9999 joins
12:02:03<Chewie9999>hi everyone, is there some sort of notification system that can let you know when a new proejct is added to the warrior. I leave my warrior on all the time, but sometimes I miss joining projects that are time sensitive, and i'm not on IRC all the time.
12:02:47Bleo18260072271962345 joins
12:09:14<pabs>OrIdow6: #wikibot works for fandom sites, also good to AB as well
12:10:45<pabs>ISTR the usual mediawiki ignores work fine
12:34:47SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
12:35:17SkilledAlpaca418962 joins
13:01:22nepeat quits [Read error: Connection reset by peer]
13:05:00nepeat (nepeat) joins
13:18:54nepeat quits [Client Quit]
13:21:41nepeat (nepeat) joins
13:29:14notarobot10 joins
13:30:38notarobot1 quits [Ping timeout: 260 seconds]
13:30:39notarobot10 is now known as notarobot1
13:32:44Radzig quits [Ping timeout: 250 seconds]
13:37:02Radzig joins
13:54:33BennyOtt_ joins
13:55:43BennyOtt quits [Ping timeout: 260 seconds]
13:55:45BennyOtt_ is now known as BennyOtt
14:05:34SootBector quits [Ping timeout: 276 seconds]
14:07:05SootBector (SootBector) joins
14:17:43<TheTechRobo>Chewie9999: there are notifications in #at-changes and push notifications at https://notify.nulldata.foo/ , I believe ATWarrior is the one for new projects
14:18:59etnguyen03 (etnguyen03) joins
14:40:43<myself>I never set my warrior to AT's Choice as the project, but would that do the needful?
14:41:39loug8318142 joins
14:45:56etnguyen03 quits [Client Quit]
14:50:36loug8318142 quits [Client Quit]
14:50:54loug8318142 joins
14:53:08th3z0l4_ joins
14:53:50etnguyen03 (etnguyen03) joins
14:54:12th3z0l4 quits [Ping timeout: 250 seconds]
14:57:34scurvy_duck joins
14:57:45mete quits [Remote host closed the connection]
14:59:19<nulldata>myself - right now it would go to Telegram, but yeah generally that'll make your warrior go to whatever project is considered in need of resources. https://warriorhq.archiveteam.org/projects.json
14:59:44<nulldata>"auto_project" in the above JSON
14:59:53mete joins
15:00:31mete quits [Remote host closed the connection]
15:02:54mete joins
15:33:37etnguyen03 quits [Client Quit]
15:40:33etnguyen03 (etnguyen03) joins
16:11:16<Hans5958>I was wondering, what is being considered when it comes to AT's choice? US Gov't seems important enough yet Telegram is still being chosen (as usual).
16:11:23<Hans5958>(Not being busy body or anything, just wondering)
16:11:41<Hans5958>* I'm wondering, what is being considered when it comes to AT's choice? US Gov't seems important enough yet Telegram is still being chosen (as usual).
16:12:33<Hans5958>I'd assume inbound of warriors == inbound of targets?
16:13:36etnguyen03 quits [Client Quit]
16:30:30<angenieux>What is the usgovernment-inbox project?
16:33:33<@arkiver>angenieux: something to get around limitations of our software/infrastructure
16:35:35<angenieux>I see
16:35:57<angenieux>Is the ftp project still running?
16:44:46<nulldata>Hans5958 - yeah there's a few different factors for considering the auto project. Urgency, how stable/tested the code is, how stable the targets are, how stable the site being archived is, risk to those providing warriors, etc are all factors I think.
16:54:48th3z0l4 joins
16:57:08th3z0l4_ quits [Ping timeout: 260 seconds]
16:57:23HP_Archivist (HP_Archivist) joins
17:07:54<nicolas17>Hans5958: I think we're hitting other bottlenecks in usgovernment project right now rather than "lack of warriors"
17:11:16<anarcat>should we take #UncleSamsArchive as an opportunity to recruit?
17:13:02<Hans5958><nicolas17> "Hans5958: I think we're hitting..." <- What is it? I was thinking about the targets but I probably didn't notice the others
17:13:21<nicolas17>yeah targets
17:13:31<nicolas17>we're doing like 650MiB/s
17:17:12scurvy_duck quits [Ping timeout: 250 seconds]
17:27:36APOLLO03 quits [Quit: Leaving]
17:49:53scurvy_duck joins
18:03:03cascode quits [Ping timeout: 260 seconds]
18:03:11cascode joins
18:05:58scurvy_duck quits [Ping timeout: 260 seconds]
18:26:16HP_Archivist quits [Read error: Connection reset by peer]
18:35:42Webuser230651 joins
18:35:58Webuser230651 quits [Client Quit]
18:36:58scurvy_duck joins
18:51:14tzt quits [Ping timeout: 250 seconds]
18:59:28HP_Archivist (HP_Archivist) joins
19:02:51kansei quits [Quit: ZNC 1.9.1 - https://znc.in]
19:10:16kansei (kansei) joins
19:30:54<pokechu22>OrIdow6, pabs: doing wikia/fandom in #archivebot sucks because fandom sucks (and links to tons of other sites and has javascript messyness :|). #wikibot works well (though it doesn't end up on web.archive.org; it only provides the stuff needed to import into a new wiki)
19:34:55<h2ibot>Cooljeanius edited US Government (+16, use URL template): https://wiki.archiveteam.org/?diff=54318&oldid=54316
19:43:09Webuser663647 joins
19:43:33Webuser663647 quits [Client Quit]
20:03:23tzt (tzt) joins
20:11:12BornOn420 quits [Remote host closed the connection]
20:11:46BornOn420 (BornOn420) joins
20:37:04icedice (icedice) joins
20:39:35BlueMaxima joins
20:40:12BlueMaxima quits [Read error: Connection reset by peer]
20:44:03scurvy_duck quits [Ping timeout: 260 seconds]
20:46:17BlueMaxima joins
21:02:04icedice quits [Client Quit]
21:02:16<@JAA>BlogTalkRadio is still up. I'm at a little over 20% listed.
21:03:05<@JAA>But given what else is going on, not sure we can do the ~100 TB that would take currently.
21:03:16<@arkiver>JAA: what is their website?
21:03:27<@JAA>https://www.blogtalkradio.com/
21:05:18earl quits []
21:19:03scurvy_duck joins
21:30:16cascode quits [Ping timeout: 250 seconds]
21:30:18cascode joins
21:38:32BlueMaxima_ joins
21:41:48cascode quits [Ping timeout: 260 seconds]
21:41:48BlueMaxima quits [Ping timeout: 260 seconds]
21:41:52cascode joins
21:46:38BlueMaxima__ joins
21:49:58BlueMaxima_ quits [Ping timeout: 260 seconds]
22:09:57etnguyen03 (etnguyen03) joins
22:11:22<tech234a>DSLReports went down for a couple weeks. Now the home page is up again with a notice: "NEWS: The site will be switching to read-only shortly. While that is being arranged, only the home page is available". Would make sense to archive the threads once that happens.
22:13:04<@JAA>Ah, it's returning from its vacation.
22:21:10etnguyen03 quits [Client Quit]
22:21:24scurvy_duck quits [Ping timeout: 250 seconds]
22:29:31utulien joins
22:36:44utulien is now known as utulien_
22:37:16utulien_ is now known as utulien
22:37:49utulien is now known as utulien_
22:41:58BlueMaxima joins
22:44:48BlueMaxima__ quits [Ping timeout: 250 seconds]
22:48:08<@OrIdow6>pokechu22: DOes it at least get the basic HTML pages?
22:48:12<@OrIdow6>The page text and the like
22:48:35<pokechu22>Yes; the problem is that the job gets too much stuff from unrelated wikis
22:52:12<@OrIdow6>Ah
22:52:39<@OrIdow6>Stupid question but would an ignore of like, [^n]%.fandom%.com work?
22:52:45<@OrIdow6>Ignore the Lua escaping
22:52:57<@OrIdow6>To filter out 25/26 of stuff
22:53:21<@OrIdow6>My issue with Wikibot is just that few people are gonna look there
22:53:34<@OrIdow6>Especially the kind of person who writes for this site who I imagine is nontechnical
22:53:48<@OrIdow6>Maybe we can generate a list of URLs from the dump?
22:54:55meisnick joins
22:57:24etnguyen03 (etnguyen03) joins
23:09:58HP_Archivist quits [Quit: Leaving]
23:14:18Island joins
23:16:19scurvy_duck joins
23:51:59jen joins
23:54:42jen quits [Read error: Connection reset by peer]