00:03:02decky_e quits [Read error: Connection reset by peer]
00:14:09umgr036 quits [Remote host closed the connection]
00:14:24umgr036 joins
00:31:34Jake2 (Jake) joins
00:33:35Jake quits [Ping timeout: 252 seconds]
00:33:35Jake2 is now known as Jake
00:54:26fullpwnmedia quits [Read error: Connection reset by peer]
00:54:42fullpwnmedia joins
00:58:32<Ivan226>(resending unless it was archive in a different channel) can someone get these for me thanks https://transfer.archivete.am/vABs9/honkaiwiki-newlinks.txt https://transfer.archivete.am/7axZa/honkaiwiki-newfiles.txt
00:59:24<pokechu22>Ivan226: I did those earlier today in #archivebot
00:59:48<Ivan226>ah got it
01:07:40<pabs>tomodachi94: in #archivebot, `socialbot: snscrape twitter-profile foo` works a reasonable amount of time. it can gather 3200 recent tweets. the twitter-user option doesn't work at the moment, but there is a fix in snscrape git that isn't released yet
01:15:23DopefishJustin quits [Ping timeout: 252 seconds]
01:17:39DopefishJustin joins
01:18:50<nicolas17>apparently archive.org is being hammered by thousands of AWS instances "downloading the OCR text from our materials"
01:27:17icedice quits [Read error: Connection reset by peer]
01:27:59icedice (icedice) joins
01:32:23<@JAA>Doranwen: The Dropbox link recovered, and I'm pulling a copy.
01:36:27<nicolas17>IA S3 stats show a massive drop in uploads about 8 hours ago... did our targets finally catch up or what? :P
01:55:46AmAnd0A quits [Read error: Connection reset by peer]
01:56:02AmAnd0A joins
02:05:08<fireonlive>oh does IA store stuff in AWS?
02:05:53<@JAA>Of course not, but there's an S3-ish interface.
02:06:46<fireonlive>ah! i see :)
02:06:51<fireonlive>i thought that rather odd haha
02:07:46thuban joins
02:18:36<tomodachi94>pabs: thanks!
02:18:44<tomodachi94>Can someone snag this one too? https://transfer.archivete.am/1E4dG/tos.txt
02:19:12<pabs>archivebot can't do individual posts, but we can snag the user
02:19:44<pabs>I just did this: socialbot: snscrape twitter-profile TheOrderofSith
02:20:03<tomodachi94>Ah okay, much appreciated.
02:24:24AmAnd0A quits [Read error: Connection reset by peer]
02:25:00AmAnd0A joins
02:27:21<Doranwen>JAA: Thanks! Glad it did :)
02:38:08<tomodachi94>Oh lovely...... (full message at <https://matrix.hackint.org/_matrix/media/v3/download/hackint.org/uiXAnJTsoHshnzOiwaOqHSEY>)
03:23:21icee quits [Quit: leaving]
03:23:59decky_e joins
03:29:05AmAnd0A quits [Read error: Connection reset by peer]
03:29:22AmAnd0A joins
03:38:23AmAnd0A quits [Ping timeout: 252 seconds]
03:39:28AmAnd0A joins
03:39:58AmAnd0A quits [Read error: Connection reset by peer]
03:40:13AmAnd0A joins
03:46:38AmAnd0A quits [Ping timeout: 252 seconds]
03:47:10AmAnd0A joins
04:01:27fullpwnmedia quits [Remote host closed the connection]
04:01:40fullpwnmedia joins
04:23:24AmAnd0A quits [Read error: Connection reset by peer]
04:23:37AmAnd0A joins
04:51:21fredgido quits [Quit: I will be back]
04:55:42umgr036 quits [Remote host closed the connection]
04:57:35AmAnd0A quits [Ping timeout: 265 seconds]
04:59:26AmAnd0A joins
05:00:13umgr036 joins
05:01:07umgr036 quits [Remote host closed the connection]
05:01:21umgr036 joins
05:21:12fredgido (fredgido) joins
05:21:47AmAnd0A quits [Ping timeout: 252 seconds]
05:22:28BlueMaxima quits [Client Quit]
05:27:15AmAnd0A joins
05:27:22hitgrr8 joins
05:33:07c3manu (c3manu) joins
06:11:33<fireonlive>can grab-site be resumed in-place from another server? would like to move off this (higher priced) one at some point
06:12:17<fireonlive>'oh i'll only have this server like 4 days... "up 20 days"' where did the time gooo
06:12:54PredatorIWD quits [Quit: Leaving]
07:00:02nfriedly quits [Remote host closed the connection]
07:01:43tbc1887_ joins
07:04:50tbc1887 quits [Ping timeout: 252 seconds]
07:22:42Arcorann (Arcorann) joins
07:24:28igloo22225 quits [Quit: The Lounge - https://thelounge.chat]
07:24:36igloo22225 (igloo22225) joins
07:24:47retromouse (retromouse) joins
07:38:25retromouse quits [Client Quit]
08:22:09lukash joins
08:51:49decky_e quits [Read error: Connection reset by peer]
09:04:24fefstattoo joins
09:04:50fefstattoo quits [Remote host closed the connection]
09:20:52nfriedly joins
09:24:35decky_e (decky_e) joins
09:53:50cdreimanu (c3manu) joins
09:56:46c3manu quits [Ping timeout: 265 seconds]
10:02:32Ruthalas5 quits [Client Quit]
10:02:52Ruthalas5 (Ruthalas) joins
10:13:22<@arkiver>last two days have been chaotic for me. i may have missed important messages in channels, if you think I missed something please ping again
10:25:37<h2ibot>Tomodachi94 created Prnt.sc (+439, Create page): https://wiki.archiveteam.org/?title=Prnt.sc
10:25:38<h2ibot>Manu edited Coronavirus (+159, Add some German case, vaccination (+more) data…): https://wiki.archiveteam.org/?diff=49850&oldid=48266
10:32:24driib quits [Read error: Connection reset by peer]
10:33:20driib (driib) joins
10:38:14driib quits [Ping timeout: 252 seconds]
10:42:36<imer>i've now gone through my youtube archive and picked out all videos that aren't on yt anymore (either private or missing completely) - 4.9k videos/1.3tb
10:42:36<imer>mainly looking for some guidance if/how I should submit those to IA? (the yt-dl command used mostly matches the one on the wiki except for the info-json which I didnt add at the time)
11:02:03JohnnyJ quits [Read error: Connection reset by peer]
11:11:28bf_ joins
11:16:30decky_e quits [Remote host closed the connection]
11:36:32cdreimanu quits [Remote host closed the connection]
12:03:01icedice quits [Client Quit]
12:07:48driib (driib) joins
12:08:02Iki1 joins
12:12:05AnotherIki quits [Ping timeout: 252 seconds]
12:49:26<pabs>imer: join #down-the-tube
12:49:50<pabs>oh, you said no longer on YT, oops
12:51:48<imer>pabs: well, its in there as well now haha, probably more appropriate either way :)
12:51:54<imer>thanks
12:58:49icedice (icedice) joins
13:07:07icedice quits [Client Quit]
13:18:40icedice (icedice) joins
13:37:23umgr036 quits [Remote host closed the connection]
13:41:49umgr036 joins
13:42:45umgr036 quits [Remote host closed the connection]
13:42:58umgr036 joins
13:48:20Arcorann quits [Ping timeout: 252 seconds]
14:12:29Jon quits [Quit: ZNC - http://znc.in]
14:14:26Jon joins
14:16:29<@arkiver>JAA: do you know if we got the hl2dm.net forum?
14:16:37<@arkiver>closing may 31
14:16:42<@arkiver>according to deathwatch
14:18:04AnotherIki joins
14:22:07Iki1 quits [Ping timeout: 265 seconds]
14:25:29<pabs>arkiver: pokechu22 did it according to https://archive.fart.website/archivebot/viewer/job/64nco
14:55:37Minkafighter quits [Quit: The Lounge - https://thelounge.chat]
14:59:11Minkafighter joins
15:10:47Minkafighter quits [Client Quit]
15:14:21Minkafighter joins
15:14:47Minkafighter quits [Client Quit]
15:17:42Minkafighter joins
15:26:08<@arkiver>pabs: perfect, thank you
15:36:00lflare quits [Quit: Bye]
15:36:45lflare (lflare) joins
15:41:02Dirtmanisdirt joins
15:41:39Dirtmanisdirt quits [Remote host closed the connection]
15:44:53spirit joins
15:56:09lflare quits [Client Quit]
15:56:55lflare (lflare) joins
16:03:42dumbgoy joins
16:29:50c3manu (c3manu) joins
17:01:25JohnnyJ joins
17:24:06zhongfu (zhongfu) joins
17:31:38zhongfu quits [Client Quit]
17:36:21spirit quits [Client Quit]
17:38:43zhongfu (zhongfu) joins
17:39:36zhongfu quits [Client Quit]
17:46:25zhongfu (zhongfu) joins
17:47:07zhongfu quits [Client Quit]
17:58:58zhongfu (zhongfu) joins
18:09:22cultpony quits [Client Quit]
18:11:37cultpony (cultpony) joins
18:15:47zhongfu quits [Client Quit]
18:17:55Dango360 quits [Read error: Connection reset by peer]
18:21:10zhongfu (zhongfu) joins
18:42:58umgr036 quits [Ping timeout: 252 seconds]
18:47:53decky_e (decky_e) joins
18:53:10geezabiscuit quits [Read error: Connection reset by peer]
18:53:29geezabiscuit (geezabiscuit) joins
18:59:41drin joins
18:59:44geezabiscuit quits [Read error: Connection reset by peer]
19:00:20drin is now known as geezabiscuit
19:00:55<tomodachi94>Alright here's a second dump of 14452 URLs related to that group, sourced from a dump of their Discord: https://transfer.archivete.am/KlnAM/urls.txt
19:01:11<tomodachi94>It's mostly images.
19:07:21geezabiscuit quits [Read error: Connection reset by peer]
19:07:23Dango360 (Dango360) joins
19:13:32c3manu quits [Remote host closed the connection]
19:32:02Dango360 quits [Read error: Connection reset by peer]
19:40:16<h2ibot>Entartet edited List of websites excluded from the Wayback Machine (+30, Added patrickcollison.com.): https://wiki.archiveteam.org/?diff=49852&oldid=49833
19:43:44Dango360 (Dango360) joins
19:59:28that_lurker quits [Quit: my throat's getting sore from humming modem tones into my phone]
19:59:46that_lurker (that_lurker) joins
19:59:49Dango360 quits [Read error: Connection reset by peer]
20:01:03geezabiscuit (geezabiscuit) joins
20:02:41<manu|m>If I want to run a Warrior on a dedicated machine at home (headlessly via Docker), what would be reasonable specs for it?
20:06:03<@JAA>manu|m: There's no general answer as it depends entirely on the project. Some projects are CPU-intensive (e.g. sitemap parsing on URLs), some projects require significant disk space (anything with videos, e.g. YouTube), some require a lot of RAM due to recursion (e.g. Enjin I think)... If you want to run multiple projects at once, consider using the project images rather than the warrior.
20:08:02dumbgoy quits [Ping timeout: 252 seconds]
20:08:07tbc1887 (tbc1887) joins
20:08:28Dango360 (Dango360) joins
20:09:06<manu|m>so different projects/pipelines will pick the warriors they use based on their specs? I’d just like to make use of my internet connection when I don’t need it for myself
20:09:18imer quits [Quit: Oh no]
20:10:32<@JAA>No, either the warrior runs a specific selected project, or it runs the default project that we set on the tracker side, which is the same for all warriors set to 'ArchiveTeam's choice'.
20:10:32imer (imer) joins
20:10:35tbc1887_ quits [Ping timeout: 252 seconds]
20:11:34<manu|m>oh okay, thanks
20:11:59<manu|m>i'll check out the project pages then
20:12:52imer quits [Client Quit]
20:13:33<@JAA>If you want a 'set it up once and forget about it' thing, you'll want the warrior set to AT's choice.
20:13:45imer (imer) joins
20:13:53<@JAA>But a dedicated machine for that is a bit overkill.
20:15:39<manu|m>i’m not getting a second tower or a server rack for that, i just thought it might be a good idea to have it running on a machine that draws a bit less power than my desktop setup
20:20:29geezabiscuit quits [Ping timeout: 252 seconds]
20:20:41<manu|m>another question: once or twice a year (when there isn’t a pandemic going on) i’m attending Chaos events where there’s 4-7 days of practically unlimited bandwith available, where it’s possible to colocate machines. would it be useful to bring a warrior (or more) there to crunch through larger projects, or would that be counter-productive?
20:22:59<nicolas17>depends on the project, sometimes the website being archived has per-IP limits so having more bandwidth doesn't actually help
20:39:39leo60228 (leo60228) joins
20:53:46Jake quits [Client Quit]
20:54:01Jake (Jake) joins
20:55:42geezabiscuit (geezabiscuit) joins
21:04:34hitgrr8 quits [Client Quit]
21:05:27Island joins
21:12:36HiccupJul (HiccupJul) joins
21:14:12<HiccupJul>Is there a way to make archivebot login-walled content? I want to backup redump.org using archivebot, but some of the content is walled behind the requirement to submit a few discs to the site. Its not exactly public but its not exactly private either.
21:14:26<HiccupJul>*archivebot archive login-walled content
21:16:04<pokechu22>Archivebot can't do that, no :/ (I think there might be other tools that can (e.g. grab-site installed locally) but I haven't worked with those)
21:16:27<pokechu22>... hmm, and redump.org isn't loading for me at all, that's not a good sign :|
21:17:01<HiccupJul>Yeah it goes down occasionally, the admin is unresponsive, and the admin is against public backups.
21:17:22<HiccupJul>so I think it'd be a good thing to have it in the wayback machine
21:18:08<HiccupJul>is there any tool that supports login-walled sites, that can be used to ingest data into the wayback machine?
21:18:28<pokechu22>There have been some backups in the past, but without being logged in
21:19:25<HiccupJul>yeah, its just that misses a lot of data
21:19:34<pokechu22>Looks like the last full run of redump.org was on 2022-05-10, and forum.redump.org was last run on 2023-04-29, but both of those wouldn't be logged in
21:19:58<pokechu22>For main redum.porg, it'd be missing data for a few more recent systems, and also revision history, right? While forum.redump.org would be missing basically everything to my understanding
21:19:59<HiccupJul>i.e. it misses modern systems, change history, dump submission sub-forums
21:20:22<HiccupJul>hah, we said almost the same thing
21:20:42<HiccupJul>so yeah, we are on the same page
21:21:33<HiccupJul>all that stuff is pretty important to the continued operation of the site, imo
21:22:05<pokechu22>My understanding is that grab-site produces warcs, and those *can* be ingested into web.archive.org but won't necessarily be by default
21:22:34<that_lurker>https://github.com/webrecorder/browsertrix-crawler support supports logins as profiles https://github.com/webrecorder/browsertrix-crawler#creating-and-using-browser-profiles
21:24:49<HiccupJul>i assume wayback doesn't ingest stuff made by random people
21:25:30<HiccupJul>only things archived by archive team or IA services, or stuff from companies like alexa
21:26:23<pokechu22>My understanding is that yeah, that's roughly the case. Most outsider stuff ends up in https://archive.org/details/warczone
21:26:48<pokechu22>One other aspect to consider is that if you do save login-walled content, every single page will show you being logged in
21:28:02<@JAA>Data behind logins doesn't make it into the Wayback Machine in general.
21:28:59<@JAA>That's a relatively hard rule with few exceptions.
21:32:17<HiccupJul>i guess i should look into using something like archivebot, and then just hosting the static pages on free hosting so they can be browsed, in addition to the WARCs
21:32:26<@JAA>(And the exceptions are of historical nature, e.g. our SPUF project in 2017.)
21:32:32tzt quits [Ping timeout: 252 seconds]
21:33:47tzt (tzt) joins
21:33:53<@JAA>That sounds reasonable. grab-site is basically like AB but local, and you can give it cookies.
21:34:11<HiccupJul>someone did run grab-site, but i think it was a pretty long process
21:34:17<HiccupJul>and hard to get working
21:34:53<HiccupJul>i should check if the output of that was okay, then maybe i can just set that up to run every week and upload to IA (and github pages/neocities for a browsable version)
21:35:12tbc1887_ joins
21:35:17<@JAA>Shouldn't be hard to get working unless there's annoying 'DDoS protection' stuff in the way or extensive use of JS, but it certainly won't be fast, yeah.
21:35:59<HiccupJul>seems like that grab-site run was incomplete
21:36:05<HiccupJul>so the issues with it apparently weren't resolved
21:36:09<@JAA>Note that making the WARCs publicly accessible might allow others to hijack your account.
21:36:19<HiccupJul>i think i'll make a dummy account for it
21:36:53<HiccupJul>actually i believe i know someone who has one like that already
21:37:04<pokechu22>redump.org doesn't support https; I doubt it has proper DDoS protection :P
21:37:28<HiccupJul>but yeah i don't think i want to do it with my account, if only to avoid my name being plastered over it
21:37:58<pokechu22>I assume you'd be grabbing the submission history subforums but not the dumpers subforum, then?
21:38:16<HiccupJul>the account could probably get dumper access
21:38:34tbc1887 quits [Ping timeout: 265 seconds]
21:38:36<HiccupJul>i have plenty of low-priority discs (e.g. already verified ps2 shovelware) that i can use
21:39:06tbc1887 (tbc1887) joins
21:39:13<HiccupJul>actually, i can probably ask a moderator just to promote an account even without any disc submissions
21:41:20tbc1887_ quits [Ping timeout: 252 seconds]
21:41:25<HiccupJul>this command was used for the grab-site attempt: https://bpa.st/6EQWS
21:41:34<HiccupJul>ignores was: http://redump.org/discs/.*?/dumper/.*?
21:41:59<HiccupJul>took 33 hours, not too bad
21:42:12tbc1887_ joins
21:42:34<HiccupJul>ah, this bug prevented forum attachments being saved: https://github.com/ArchiveTeam/wpull/issues/291
21:43:49<@JAA>HTTP/0.9? Eww.
21:44:23<@JAA>Technically, that can't go into WARC either.
21:45:01<HiccupJul>technically?
21:45:11tbc1887 quits [Ping timeout: 252 seconds]
21:45:50dumbgoy joins
21:46:43<@JAA>The spec only permits HTTP/1.1, strictly speaking.
21:49:01<HiccupJul>hm, as long as it works, i guess it'd be fine
21:58:11decky_e quits [Remote host closed the connection]
22:01:44tbc1887 (tbc1887) joins
22:03:53tbc1887_ quits [Ping timeout: 252 seconds]
22:13:56tbc1887_ joins
22:17:05tbc1887 quits [Ping timeout: 252 seconds]
22:22:28lennier2 joins
22:24:29bf_ quits [Ping timeout: 265 seconds]
22:25:20lennier1 quits [Ping timeout: 252 seconds]
22:25:30lennier2 is now known as lennier1
22:27:22AmAnd0A quits [Ping timeout: 252 seconds]
22:27:48AmAnd0A joins
22:47:20bf_ joins
23:03:50ymgve quits [Ping timeout: 252 seconds]
23:06:52eroc19909 (eroc1990) joins
23:07:34<flashfire42>HiccupJul I see you got IRC working. Yes redump sucks in terms of tech
23:08:22<HiccupJul>i think hackint may have been down for a bit, or maybe some system clock fluke that caused a certificate error
23:09:32eroc1990 quits [Ping timeout: 252 seconds]
23:11:24<@JAA>hackint hasn't been down in a good while, but the webchat thingy was broken for about a week recently.
23:40:24BlueMaxima joins
23:40:24Sluggs quits [Excess Flood]
23:40:47dumbgoy_ joins
23:40:49Sluggs joins
23:43:37AmAnd0A quits [Read error: Connection reset by peer]
23:43:53AmAnd0A joins
23:44:32dumbgoy quits [Ping timeout: 252 seconds]