| 00:03:02 | | decky_e quits [Read error: Connection reset by peer] |
| 00:14:09 | | umgr036 quits [Remote host closed the connection] |
| 00:14:24 | | umgr036 joins |
| 00:31:34 | | Jake2 (Jake) joins |
| 00:33:35 | | Jake quits [Ping timeout: 252 seconds] |
| 00:33:35 | | Jake2 is now known as Jake |
| 00:54:26 | | fullpwnmedia quits [Read error: Connection reset by peer] |
| 00:54:42 | | fullpwnmedia joins |
| 00:58:32 | <Ivan226> | (resending unless it was archive in a different channel) can someone get these for me thanks https://transfer.archivete.am/vABs9/honkaiwiki-newlinks.txt https://transfer.archivete.am/7axZa/honkaiwiki-newfiles.txt |
| 00:59:24 | <pokechu22> | Ivan226: I did those earlier today in #archivebot |
| 00:59:48 | <Ivan226> | ah got it |
| 01:07:40 | <pabs> | tomodachi94: in #archivebot, `socialbot: snscrape twitter-profile foo` works a reasonable amount of time. it can gather 3200 recent tweets. the twitter-user option doesn't work at the moment, but there is a fix in snscrape git that isn't released yet |
| 01:15:23 | | DopefishJustin quits [Ping timeout: 252 seconds] |
| 01:17:39 | | DopefishJustin joins |
| 01:17:39 | | DopefishJustin is now authenticated as DopefishJustin |
| 01:18:50 | <nicolas17> | apparently archive.org is being hammered by thousands of AWS instances "downloading the OCR text from our materials" |
| 01:27:17 | | icedice quits [Read error: Connection reset by peer] |
| 01:27:59 | | icedice (icedice) joins |
| 01:32:23 | <@JAA> | Doranwen: The Dropbox link recovered, and I'm pulling a copy. |
| 01:36:27 | <nicolas17> | IA S3 stats show a massive drop in uploads about 8 hours ago... did our targets finally catch up or what? :P |
| 01:55:46 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 01:56:02 | | AmAnd0A joins |
| 02:05:08 | <fireonlive> | oh does IA store stuff in AWS? |
| 02:05:53 | <@JAA> | Of course not, but there's an S3-ish interface. |
| 02:06:46 | <fireonlive> | ah! i see :) |
| 02:06:51 | <fireonlive> | i thought that rather odd haha |
| 02:07:46 | | thuban joins |
| 02:18:36 | <tomodachi94> | pabs: thanks! |
| 02:18:44 | <tomodachi94> | Can someone snag this one too? https://transfer.archivete.am/1E4dG/tos.txt |
| 02:19:12 | <pabs> | archivebot can't do individual posts, but we can snag the user |
| 02:19:44 | <pabs> | I just did this: socialbot: snscrape twitter-profile TheOrderofSith |
| 02:20:03 | <tomodachi94> | Ah okay, much appreciated. |
| 02:24:24 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 02:25:00 | | AmAnd0A joins |
| 02:27:21 | <Doranwen> | JAA: Thanks! Glad it did :) |
| 02:38:08 | <tomodachi94> | Oh lovely...... (full message at <https://matrix.hackint.org/_matrix/media/v3/download/hackint.org/uiXAnJTsoHshnzOiwaOqHSEY>) |
| 03:23:21 | | icee quits [Quit: leaving] |
| 03:23:59 | | decky_e joins |
| 03:29:05 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 03:29:22 | | AmAnd0A joins |
| 03:38:23 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 03:39:28 | | AmAnd0A joins |
| 03:39:58 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 03:40:13 | | AmAnd0A joins |
| 03:46:38 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 03:47:10 | | AmAnd0A joins |
| 04:01:27 | | fullpwnmedia quits [Remote host closed the connection] |
| 04:01:40 | | fullpwnmedia joins |
| 04:23:24 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 04:23:37 | | AmAnd0A joins |
| 04:51:21 | | fredgido quits [Quit: I will be back] |
| 04:55:42 | | umgr036 quits [Remote host closed the connection] |
| 04:57:35 | | AmAnd0A quits [Ping timeout: 265 seconds] |
| 04:59:26 | | AmAnd0A joins |
| 05:00:13 | | umgr036 joins |
| 05:01:07 | | umgr036 quits [Remote host closed the connection] |
| 05:01:21 | | umgr036 joins |
| 05:21:12 | | fredgido (fredgido) joins |
| 05:21:47 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 05:22:28 | | BlueMaxima quits [Client Quit] |
| 05:27:15 | | AmAnd0A joins |
| 05:27:22 | | hitgrr8 joins |
| 05:33:07 | | c3manu (c3manu) joins |
| 06:11:33 | <fireonlive> | can grab-site be resumed in-place from another server? would like to move off this (higher priced) one at some point |
| 06:12:17 | <fireonlive> | 'oh i'll only have this server like 4 days... "up 20 days"' where did the time gooo |
| 06:12:54 | | PredatorIWD quits [Quit: Leaving] |
| 07:00:02 | | nfriedly quits [Remote host closed the connection] |
| 07:01:43 | | tbc1887_ joins |
| 07:04:50 | | tbc1887 quits [Ping timeout: 252 seconds] |
| 07:22:42 | | Arcorann (Arcorann) joins |
| 07:24:28 | | igloo22225 quits [Quit: The Lounge - https://thelounge.chat] |
| 07:24:36 | | igloo22225 (igloo22225) joins |
| 07:24:47 | | retromouse (retromouse) joins |
| 07:38:25 | | retromouse quits [Client Quit] |
| 08:22:09 | | lukash joins |
| 08:51:49 | | decky_e quits [Read error: Connection reset by peer] |
| 09:04:24 | | fefstattoo joins |
| 09:04:50 | | fefstattoo quits [Remote host closed the connection] |
| 09:20:52 | | nfriedly joins |
| 09:24:35 | | decky_e (decky_e) joins |
| 09:53:50 | | cdreimanu (c3manu) joins |
| 09:56:46 | | c3manu quits [Ping timeout: 265 seconds] |
| 10:02:32 | | Ruthalas5 quits [Client Quit] |
| 10:02:52 | | Ruthalas5 (Ruthalas) joins |
| 10:13:22 | <@arkiver> | last two days have been chaotic for me. i may have missed important messages in channels, if you think I missed something please ping again |
| 10:25:37 | <h2ibot> | Tomodachi94 created Prnt.sc (+439, Create page): https://wiki.archiveteam.org/?title=Prnt.sc |
| 10:25:38 | <h2ibot> | Manu edited Coronavirus (+159, Add some German case, vaccination (+more) data…): https://wiki.archiveteam.org/?diff=49850&oldid=48266 |
| 10:32:24 | | driib quits [Read error: Connection reset by peer] |
| 10:33:20 | | driib (driib) joins |
| 10:38:14 | | driib quits [Ping timeout: 252 seconds] |
| 10:42:36 | <imer> | i've now gone through my youtube archive and picked out all videos that aren't on yt anymore (either private or missing completely) - 4.9k videos/1.3tb |
| 10:42:36 | <imer> | mainly looking for some guidance if/how I should submit those to IA? (the yt-dl command used mostly matches the one on the wiki except for the info-json which I didnt add at the time) |
| 11:02:03 | | JohnnyJ quits [Read error: Connection reset by peer] |
| 11:11:28 | | bf_ joins |
| 11:16:30 | | decky_e quits [Remote host closed the connection] |
| 11:36:32 | | cdreimanu quits [Remote host closed the connection] |
| 12:03:01 | | icedice quits [Client Quit] |
| 12:07:48 | | driib (driib) joins |
| 12:08:02 | | Iki1 joins |
| 12:12:05 | | AnotherIki quits [Ping timeout: 252 seconds] |
| 12:49:26 | <pabs> | imer: join #down-the-tube |
| 12:49:50 | <pabs> | oh, you said no longer on YT, oops |
| 12:51:48 | <imer> | pabs: well, its in there as well now haha, probably more appropriate either way :) |
| 12:51:54 | <imer> | thanks |
| 12:58:49 | | icedice (icedice) joins |
| 13:07:07 | | icedice quits [Client Quit] |
| 13:18:40 | | icedice (icedice) joins |
| 13:37:23 | | umgr036 quits [Remote host closed the connection] |
| 13:41:49 | | umgr036 joins |
| 13:42:45 | | umgr036 quits [Remote host closed the connection] |
| 13:42:58 | | umgr036 joins |
| 13:48:20 | | Arcorann quits [Ping timeout: 252 seconds] |
| 14:12:29 | | Jon quits [Quit: ZNC - http://znc.in] |
| 14:14:26 | | Jon joins |
| 14:16:29 | <@arkiver> | JAA: do you know if we got the hl2dm.net forum? |
| 14:16:37 | <@arkiver> | closing may 31 |
| 14:16:42 | <@arkiver> | according to deathwatch |
| 14:18:04 | | AnotherIki joins |
| 14:22:07 | | Iki1 quits [Ping timeout: 265 seconds] |
| 14:25:29 | <pabs> | arkiver: pokechu22 did it according to https://archive.fart.website/archivebot/viewer/job/64nco |
| 14:55:37 | | Minkafighter quits [Quit: The Lounge - https://thelounge.chat] |
| 14:59:11 | | Minkafighter joins |
| 15:10:47 | | Minkafighter quits [Client Quit] |
| 15:14:21 | | Minkafighter joins |
| 15:14:47 | | Minkafighter quits [Client Quit] |
| 15:17:42 | | Minkafighter joins |
| 15:26:08 | <@arkiver> | pabs: perfect, thank you |
| 15:36:00 | | lflare quits [Quit: Bye] |
| 15:36:45 | | lflare (lflare) joins |
| 15:41:02 | | Dirtmanisdirt joins |
| 15:41:39 | | Dirtmanisdirt quits [Remote host closed the connection] |
| 15:44:53 | | spirit joins |
| 15:56:09 | | lflare quits [Client Quit] |
| 15:56:55 | | lflare (lflare) joins |
| 16:03:42 | | dumbgoy joins |
| 16:29:50 | | c3manu (c3manu) joins |
| 17:01:25 | | JohnnyJ joins |
| 17:24:06 | | zhongfu (zhongfu) joins |
| 17:31:38 | | zhongfu quits [Client Quit] |
| 17:36:21 | | spirit quits [Client Quit] |
| 17:38:43 | | zhongfu (zhongfu) joins |
| 17:39:36 | | zhongfu quits [Client Quit] |
| 17:46:25 | | zhongfu (zhongfu) joins |
| 17:47:07 | | zhongfu quits [Client Quit] |
| 17:58:58 | | zhongfu (zhongfu) joins |
| 18:09:22 | | cultpony quits [Client Quit] |
| 18:11:37 | | cultpony (cultpony) joins |
| 18:15:47 | | zhongfu quits [Client Quit] |
| 18:17:55 | | Dango360 quits [Read error: Connection reset by peer] |
| 18:21:10 | | zhongfu (zhongfu) joins |
| 18:42:58 | | umgr036 quits [Ping timeout: 252 seconds] |
| 18:47:53 | | decky_e (decky_e) joins |
| 18:53:10 | | geezabiscuit quits [Read error: Connection reset by peer] |
| 18:53:29 | | geezabiscuit (geezabiscuit) joins |
| 18:59:41 | | drin joins |
| 18:59:44 | | geezabiscuit quits [Read error: Connection reset by peer] |
| 19:00:20 | | drin is now known as geezabiscuit |
| 19:00:55 | <tomodachi94> | Alright here's a second dump of 14452 URLs related to that group, sourced from a dump of their Discord: https://transfer.archivete.am/KlnAM/urls.txt |
| 19:01:11 | <tomodachi94> | It's mostly images. |
| 19:07:21 | | geezabiscuit quits [Read error: Connection reset by peer] |
| 19:07:23 | | Dango360 (Dango360) joins |
| 19:13:32 | | c3manu quits [Remote host closed the connection] |
| 19:32:02 | | Dango360 quits [Read error: Connection reset by peer] |
| 19:40:16 | <h2ibot> | Entartet edited List of websites excluded from the Wayback Machine (+30, Added patrickcollison.com.): https://wiki.archiveteam.org/?diff=49852&oldid=49833 |
| 19:43:44 | | Dango360 (Dango360) joins |
| 19:59:28 | | that_lurker quits [Quit: my throat's getting sore from humming modem tones into my phone] |
| 19:59:46 | | that_lurker (that_lurker) joins |
| 19:59:49 | | Dango360 quits [Read error: Connection reset by peer] |
| 20:01:03 | | geezabiscuit (geezabiscuit) joins |
| 20:02:41 | <manu|m> | If I want to run a Warrior on a dedicated machine at home (headlessly via Docker), what would be reasonable specs for it? |
| 20:06:03 | <@JAA> | manu|m: There's no general answer as it depends entirely on the project. Some projects are CPU-intensive (e.g. sitemap parsing on URLs), some projects require significant disk space (anything with videos, e.g. YouTube), some require a lot of RAM due to recursion (e.g. Enjin I think)... If you want to run multiple projects at once, consider using the project images rather than the warrior. |
| 20:08:02 | | dumbgoy quits [Ping timeout: 252 seconds] |
| 20:08:07 | | tbc1887 (tbc1887) joins |
| 20:08:28 | | Dango360 (Dango360) joins |
| 20:09:06 | <manu|m> | so different projects/pipelines will pick the warriors they use based on their specs? I’d just like to make use of my internet connection when I don’t need it for myself |
| 20:09:18 | | imer quits [Quit: Oh no] |
| 20:10:32 | <@JAA> | No, either the warrior runs a specific selected project, or it runs the default project that we set on the tracker side, which is the same for all warriors set to 'ArchiveTeam's choice'. |
| 20:10:32 | | imer (imer) joins |
| 20:10:35 | | tbc1887_ quits [Ping timeout: 252 seconds] |
| 20:11:34 | <manu|m> | oh okay, thanks |
| 20:11:59 | <manu|m> | i'll check out the project pages then |
| 20:12:52 | | imer quits [Client Quit] |
| 20:13:33 | <@JAA> | If you want a 'set it up once and forget about it' thing, you'll want the warrior set to AT's choice. |
| 20:13:45 | | imer (imer) joins |
| 20:13:53 | <@JAA> | But a dedicated machine for that is a bit overkill. |
| 20:15:39 | <manu|m> | i’m not getting a second tower or a server rack for that, i just thought it might be a good idea to have it running on a machine that draws a bit less power than my desktop setup |
| 20:20:29 | | geezabiscuit quits [Ping timeout: 252 seconds] |
| 20:20:41 | <manu|m> | another question: once or twice a year (when there isn’t a pandemic going on) i’m attending Chaos events where there’s 4-7 days of practically unlimited bandwith available, where it’s possible to colocate machines. would it be useful to bring a warrior (or more) there to crunch through larger projects, or would that be counter-productive? |
| 20:22:59 | <nicolas17> | depends on the project, sometimes the website being archived has per-IP limits so having more bandwidth doesn't actually help |
| 20:39:39 | | leo60228 (leo60228) joins |
| 20:53:46 | | Jake quits [Client Quit] |
| 20:54:01 | | Jake (Jake) joins |
| 20:55:42 | | geezabiscuit (geezabiscuit) joins |
| 21:04:34 | | hitgrr8 quits [Client Quit] |
| 21:05:27 | | Island joins |
| 21:12:36 | | HiccupJul (HiccupJul) joins |
| 21:14:12 | <HiccupJul> | Is there a way to make archivebot login-walled content? I want to backup redump.org using archivebot, but some of the content is walled behind the requirement to submit a few discs to the site. Its not exactly public but its not exactly private either. |
| 21:14:26 | <HiccupJul> | *archivebot archive login-walled content |
| 21:16:04 | <pokechu22> | Archivebot can't do that, no :/ (I think there might be other tools that can (e.g. grab-site installed locally) but I haven't worked with those) |
| 21:16:27 | <pokechu22> | ... hmm, and redump.org isn't loading for me at all, that's not a good sign :| |
| 21:17:01 | <HiccupJul> | Yeah it goes down occasionally, the admin is unresponsive, and the admin is against public backups. |
| 21:17:22 | <HiccupJul> | so I think it'd be a good thing to have it in the wayback machine |
| 21:18:08 | <HiccupJul> | is there any tool that supports login-walled sites, that can be used to ingest data into the wayback machine? |
| 21:18:28 | <pokechu22> | There have been some backups in the past, but without being logged in |
| 21:19:25 | <HiccupJul> | yeah, its just that misses a lot of data |
| 21:19:34 | <pokechu22> | Looks like the last full run of redump.org was on 2022-05-10, and forum.redump.org was last run on 2023-04-29, but both of those wouldn't be logged in |
| 21:19:58 | <pokechu22> | For main redum.porg, it'd be missing data for a few more recent systems, and also revision history, right? While forum.redump.org would be missing basically everything to my understanding |
| 21:19:59 | <HiccupJul> | i.e. it misses modern systems, change history, dump submission sub-forums |
| 21:20:22 | <HiccupJul> | hah, we said almost the same thing |
| 21:20:42 | <HiccupJul> | so yeah, we are on the same page |
| 21:21:33 | <HiccupJul> | all that stuff is pretty important to the continued operation of the site, imo |
| 21:22:05 | <pokechu22> | My understanding is that grab-site produces warcs, and those *can* be ingested into web.archive.org but won't necessarily be by default |
| 21:22:34 | <that_lurker> | https://github.com/webrecorder/browsertrix-crawler support supports logins as profiles https://github.com/webrecorder/browsertrix-crawler#creating-and-using-browser-profiles |
| 21:24:49 | <HiccupJul> | i assume wayback doesn't ingest stuff made by random people |
| 21:25:30 | <HiccupJul> | only things archived by archive team or IA services, or stuff from companies like alexa |
| 21:26:23 | <pokechu22> | My understanding is that yeah, that's roughly the case. Most outsider stuff ends up in https://archive.org/details/warczone |
| 21:26:48 | <pokechu22> | One other aspect to consider is that if you do save login-walled content, every single page will show you being logged in |
| 21:28:02 | <@JAA> | Data behind logins doesn't make it into the Wayback Machine in general. |
| 21:28:59 | <@JAA> | That's a relatively hard rule with few exceptions. |
| 21:32:17 | <HiccupJul> | i guess i should look into using something like archivebot, and then just hosting the static pages on free hosting so they can be browsed, in addition to the WARCs |
| 21:32:26 | <@JAA> | (And the exceptions are of historical nature, e.g. our SPUF project in 2017.) |
| 21:32:32 | | tzt quits [Ping timeout: 252 seconds] |
| 21:33:47 | | tzt (tzt) joins |
| 21:33:53 | <@JAA> | That sounds reasonable. grab-site is basically like AB but local, and you can give it cookies. |
| 21:34:11 | <HiccupJul> | someone did run grab-site, but i think it was a pretty long process |
| 21:34:17 | <HiccupJul> | and hard to get working |
| 21:34:53 | <HiccupJul> | i should check if the output of that was okay, then maybe i can just set that up to run every week and upload to IA (and github pages/neocities for a browsable version) |
| 21:35:12 | | tbc1887_ joins |
| 21:35:17 | <@JAA> | Shouldn't be hard to get working unless there's annoying 'DDoS protection' stuff in the way or extensive use of JS, but it certainly won't be fast, yeah. |
| 21:35:59 | <HiccupJul> | seems like that grab-site run was incomplete |
| 21:36:05 | <HiccupJul> | so the issues with it apparently weren't resolved |
| 21:36:09 | <@JAA> | Note that making the WARCs publicly accessible might allow others to hijack your account. |
| 21:36:19 | <HiccupJul> | i think i'll make a dummy account for it |
| 21:36:53 | <HiccupJul> | actually i believe i know someone who has one like that already |
| 21:37:04 | <pokechu22> | redump.org doesn't support https; I doubt it has proper DDoS protection :P |
| 21:37:28 | <HiccupJul> | but yeah i don't think i want to do it with my account, if only to avoid my name being plastered over it |
| 21:37:58 | <pokechu22> | I assume you'd be grabbing the submission history subforums but not the dumpers subforum, then? |
| 21:38:16 | <HiccupJul> | the account could probably get dumper access |
| 21:38:34 | | tbc1887 quits [Ping timeout: 265 seconds] |
| 21:38:36 | <HiccupJul> | i have plenty of low-priority discs (e.g. already verified ps2 shovelware) that i can use |
| 21:39:06 | | tbc1887 (tbc1887) joins |
| 21:39:13 | <HiccupJul> | actually, i can probably ask a moderator just to promote an account even without any disc submissions |
| 21:41:20 | | tbc1887_ quits [Ping timeout: 252 seconds] |
| 21:41:25 | <HiccupJul> | this command was used for the grab-site attempt: https://bpa.st/6EQWS |
| 21:41:34 | <HiccupJul> | ignores was: http://redump.org/discs/.*?/dumper/.*? |
| 21:41:59 | <HiccupJul> | took 33 hours, not too bad |
| 21:42:12 | | tbc1887_ joins |
| 21:42:34 | <HiccupJul> | ah, this bug prevented forum attachments being saved: https://github.com/ArchiveTeam/wpull/issues/291 |
| 21:43:49 | <@JAA> | HTTP/0.9? Eww. |
| 21:44:23 | <@JAA> | Technically, that can't go into WARC either. |
| 21:45:01 | <HiccupJul> | technically? |
| 21:45:11 | | tbc1887 quits [Ping timeout: 252 seconds] |
| 21:45:50 | | dumbgoy joins |
| 21:46:43 | <@JAA> | The spec only permits HTTP/1.1, strictly speaking. |
| 21:49:01 | <HiccupJul> | hm, as long as it works, i guess it'd be fine |
| 21:58:11 | | decky_e quits [Remote host closed the connection] |
| 22:01:44 | | tbc1887 (tbc1887) joins |
| 22:03:53 | | tbc1887_ quits [Ping timeout: 252 seconds] |
| 22:13:56 | | tbc1887_ joins |
| 22:17:05 | | tbc1887 quits [Ping timeout: 252 seconds] |
| 22:22:28 | | lennier2 joins |
| 22:24:29 | | bf_ quits [Ping timeout: 265 seconds] |
| 22:25:20 | | lennier1 quits [Ping timeout: 252 seconds] |
| 22:25:30 | | lennier2 is now known as lennier1 |
| 22:27:22 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 22:27:48 | | AmAnd0A joins |
| 22:47:20 | | bf_ joins |
| 23:03:50 | | ymgve quits [Ping timeout: 252 seconds] |
| 23:06:52 | | eroc19909 (eroc1990) joins |
| 23:07:34 | <flashfire42> | HiccupJul I see you got IRC working. Yes redump sucks in terms of tech |
| 23:08:22 | <HiccupJul> | i think hackint may have been down for a bit, or maybe some system clock fluke that caused a certificate error |
| 23:09:32 | | eroc1990 quits [Ping timeout: 252 seconds] |
| 23:11:24 | <@JAA> | hackint hasn't been down in a good while, but the webchat thingy was broken for about a week recently. |
| 23:40:24 | | BlueMaxima joins |
| 23:40:24 | | Sluggs quits [Excess Flood] |
| 23:40:47 | | dumbgoy_ joins |
| 23:40:49 | | Sluggs joins |
| 23:43:37 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 23:43:53 | | AmAnd0A joins |
| 23:44:32 | | dumbgoy quits [Ping timeout: 252 seconds] |