00:00:43 | | Arcorann (Arcorann) joins |
00:04:05 | | jtagcat quits [Quit: Bye!] |
00:04:28 | | jtagcat (jtagcat) joins |
00:14:50 | | etnguyen03 quits [Ping timeout: 252 seconds] |
00:34:56 | | etnguyen03 (etnguyen03) joins |
00:40:54 | | railen63 joins |
00:50:11 | | Mateon1 joins |
00:51:39 | | etnguyen03 quits [Ping timeout: 265 seconds] |
01:09:31 | | etnguyen03 (etnguyen03) joins |
01:33:13 | | etnguyen03 quits [Ping timeout: 265 seconds] |
02:13:37 | | etnguyen03 (etnguyen03) joins |
02:44:59 | | etnguyen03 quits [Ping timeout: 252 seconds] |
02:49:23 | | AntiLiberal2 quits [Ping timeout: 252 seconds] |
02:54:33 | | etnguyen03 (etnguyen03) joins |
03:00:22 | | pseudorizer quits [Quit: ZNC 1.8.2 - https://znc.in] |
03:01:06 | | pseudorizer (pseudorizer) joins |
03:19:39 | | katocala joins |
03:19:53 | | katocala is now authenticated as katocala |
03:42:18 | | emery quits [Client Quit] |
03:44:20 | | ehmry joins |
04:17:25 | | BigBrain quits [Remote host closed the connection] |
04:17:49 | | BigBrain (bigbrain) joins |
04:34:57 | | dumbgoy__ quits [Ping timeout: 265 seconds] |
04:37:08 | <mgrandi> | Even if it has ads, the original video is still there for archival purposes so it's not the end of the world at that end |
04:45:06 | | project10 quits [Client Quit] |
04:50:38 | | project10 (project10) joins |
05:03:28 | | etnguyen03 quits [Ping timeout: 265 seconds] |
05:04:57 | | etnguyen03 (etnguyen03) joins |
05:13:14 | | DogsRNice quits [Read error: Connection reset by peer] |
05:27:40 | | etnguyen03 quits [Client Quit] |
05:43:46 | | project10 quits [Client Quit] |
05:49:27 | | project10 (project10) joins |
06:24:59 | | magmaus3 quits [Ping timeout: 252 seconds] |
06:30:28 | | treora quits [Ping timeout: 265 seconds] |
06:31:07 | | treora joins |
07:22:45 | | treora quits [Remote host closed the connection] |
07:22:47 | | treora joins |
07:38:04 | | magmaus3 (magmaus3) joins |
07:50:13 | | BlueMaxima quits [Read error: Connection reset by peer] |
08:05:39 | | hitgrr8 joins |
08:13:07 | | qwertyasdfuiopghjkl quits [Client Quit] |
08:14:46 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
08:16:19 | | sepro quits [Read error: Connection reset by peer] |
08:17:33 | | sepro (sepro) joins |
08:32:39 | | sepro2 (sepro) joins |
08:34:14 | | sepro quits [Ping timeout: 252 seconds] |
08:34:14 | | sepro2 is now known as sepro |
08:44:39 | | sepro quits [Read error: Connection reset by peer] |
08:45:04 | | sepro (sepro) joins |
08:51:40 | | Island quits [Read error: Connection reset by peer] |
08:54:30 | | bladem (bladem) joins |
09:25:20 | | Earendil7 quits [Quit: Leaving] |
09:26:37 | | Earendil7 (Earendil7) joins |
09:33:12 | <immibis> | i am banned from the-archive and distributed youtube archive, or else i might see what they have going for "important channel lists" |
09:33:45 | <immibis> | the petty politics is not good for archiving without duplication |
09:57:55 | | Kinille quits [] |
09:58:50 | | Kinille (Kinille) joins |
10:00:01 | | railen63 quits [Remote host closed the connection] |
10:00:17 | | railen63 joins |
10:04:48 | | Kinille quits [Client Quit] |
10:15:10 | | Kinille (Kinille) joins |
10:47:27 | | railen63 quits [Remote host closed the connection] |
11:19:47 | | lennier2 quits [Ping timeout: 252 seconds] |
11:21:30 | | lennier2 joins |
11:31:35 | | le0n quits [Ping timeout: 265 seconds] |
11:45:04 | | le0n (le0n) joins |
13:34:21 | | Arcorann quits [Ping timeout: 265 seconds] |
13:36:13 | <@arkiver> | immibis: what do you mean? |
13:42:23 | | dumbgoy__ joins |
13:57:29 | <immibis> | if the channels can't all be archived, having a list of channels that should be archived but currently aren't is still useful for when someone can do them |
14:02:42 | | jackdanielsfan55 joins |
14:16:32 | | etnguyen03 (etnguyen03) joins |
14:38:53 | | treora quits [Ping timeout: 252 seconds] |
14:39:25 | | treora joins |
15:26:02 | <@arkiver> | immibis: what channels are this and do you have descriptions of them? we also have #down-the-tube , but there are _very_ strict rules for that on the wiki |
15:40:23 | <immibis> | arkiver: that's what I meant - I'm not aware of any watchlist of channels worth archiving in advance of actually archiving them |
15:41:09 | <immibis> | my own system has a prioritized list of channels, and works through them at its own rate, with *months* of backlog |
15:44:25 | <audrooku|m> | > i am banned from the-archive and distributed youtube archive, or else i might see what they have going for "important channel lists" |
15:44:25 | <audrooku|m> | Nothing |
15:49:52 | <immibis> | i know that DYA has a spreadsheet covering stuff that is already archived |
15:50:03 | <immibis> | i'm pretty sure neither has a "want to have" list |
15:50:16 | <immibis> | probably because they just archive it, instead of putting it on a list |
15:51:46 | <audrooku|m> | Yeah |
15:54:02 | | etnguyen03 quits [Ping timeout: 265 seconds] |
15:57:01 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
16:16:12 | | Reece joins |
16:22:38 | | Reece quits [Remote host closed the connection] |
16:22:45 | | Reece joins |
16:22:55 | | Reece quits [Remote host closed the connection] |
16:26:35 | | Reece joins |
16:26:40 | <Reece> | !archive https://angrybirds.miraheze.org |
16:26:51 | | Reece leaves |
16:35:10 | <fireonlive> | lol |
16:35:16 | <audrooku|m> | "my work here is done" |
16:38:56 | | Wohlstand (Wohlstand) joins |
16:40:00 | <that_lurker> | at least its the correctish channel :P |
16:57:21 | | jackdanielsfan55 quits [Ping timeout: 265 seconds] |
17:15:38 | | fuzzy8021 quits [Read error: Connection reset by peer] |
17:16:35 | | fuzzy8021 (fuzzy8021) joins |
17:40:22 | | fuzzy8021 quits [Read error: Connection reset by peer] |
17:40:54 | | fuzzy8021 (fuzzy8021) joins |
18:02:50 | | etnguyen03 (etnguyen03) joins |
18:05:18 | <fireonlive> | well, they did try in #archiveteam after |
18:05:20 | <fireonlive> | xP |
18:21:05 | | etnguyen03 quits [Ping timeout: 252 seconds] |
18:22:08 | | driib quits [Client Quit] |
18:32:12 | <that_lurker> | oh yeah :P |
18:38:18 | | driib (driib) joins |
18:39:11 | | DogsRNice joins |
19:13:48 | | etnguyen03 (etnguyen03) joins |
19:25:13 | | just1602 quits [Quit: WeeChat 3.8] |
19:55:39 | | jacksonchen666 (jacksonchen666) joins |
20:13:06 | | tttt quits [Ping timeout: 265 seconds] |
20:37:59 | | dumbgoy__ quits [Read error: Connection reset by peer] |
20:38:29 | | jacksonchen666 quits [Remote host closed the connection] |
20:38:49 | | jacksonchen666 (jacksonchen666) joins |
20:45:19 | <@arkiver> | JAA: who runs archivebot pipelines? |
20:45:48 | <@arkiver> | we should really archive some hamas (and likely related) sites - but this may have negative implications for whatever IP this is run on |
20:45:49 | <@JAA> | arkiver: Yours truly. |
20:46:09 | <@arkiver> | JAA: ah :) |
20:46:15 | <@JAA> | Two machines are my own, the rest are rented by others, and I run everything from there on. |
20:46:31 | <@JAA> | s/rented // (not all are rented servers, actually) |
20:47:10 | <@arkiver> | so basically looking for someone who might want to run a temporary archivebot which we can use to archive hamas and related content |
20:55:05 | | tzt quits [Ping timeout: 252 seconds] |
20:56:23 | | tzt (tzt) joins |
21:22:35 | | Island joins |
21:22:45 | | BlueMaxima joins |
21:33:02 | | etnguyen03 quits [Ping timeout: 252 seconds] |
21:36:12 | <mgrandi> | Is there an existing project that handles a phpbb forum? Xentax is closing at the end of the year and has a lot of attachments that aren't hosted anywhere else |
21:38:02 | <thuban> | mgrandi: not specifically, plus (while there's been an ab job for the forums) i'm given to understand that attachments are login-walled |
21:38:58 | <mgrandi> | Yeah, that makes it hard for AB right |
21:39:22 | <mgrandi> | But I was seeing if maybe I could look at the seesaw project code if one exists for a phpbb forum |
21:41:50 | <thuban> | sort of--archivebot is technically capable of doing logged-in crawls, but the interface is designed not to allow them to be configured, because as a matter of policy we don't send them to the wbm |
21:42:38 | <thuban> | grab-site is basically the same internals and would work well for an 'unofficial' crawl if given login cookies (just use the forums igset) |
21:42:56 | | jacksonchen666 quits [Ping timeout: 245 seconds] |
21:45:30 | <mgrandi> | I know past seesaw scrapes do login crawls, dunno if policy has changed |
21:46:31 | <thuban> | the only one i'm aware of was yahoo groups, and that was agreed on as a special case |
21:50:00 | <mgrandi> | I know a few art site scrapes were cause you needed to be logged in to see nsfw art |
21:51:45 | <thuban> | til, i must not have been around for those |
21:52:47 | <thuban> | anyway, a grab-site run would be a good start--could dump it on ia as an item if nothing else |
21:53:12 | <pokechu22> | I think there's also JAA's qwarc - it *probably* could do logged in stuff if needed |
21:53:32 | <mgrandi> | https://github.com/ArchiveTeam/furaffinity-grab |
21:53:41 | <mgrandi> | Was one of them |
21:54:02 | <mgrandi> | I can see if I can write a script to grab urls, or see if one exists |
21:54:48 | <thuban> | 2015, wow |
21:55:26 | <@JAA> | Yeah, we did a couple projects with accounts, but the most recent one was over 5 years ago I believe. |
21:56:09 | <@JAA> | And generally, such data won't go into the WBM these days. |
21:56:25 | <thuban> | JAA: yahoo groups was 2019-2020. but that one was... special |
21:56:45 | <@JAA> | Well, yeah, but we didn't create WARCs with accounts there, I believe. |
21:57:42 | <@JAA> | Maybe I'm misremembering, but I think it was only for GMD exports. |
21:58:41 | <@arkiver> | i believe so yes |
21:58:47 | <@arkiver> | (but same disclaimer here) |
21:59:51 | <thuban> | the api grab used login cookies and generated warcs https://wiki.archiveteam.org/index.php/Yahoo!_Groups#2019_API_grab https://github.com/ArchiveTeam/yahoo-group-archiver |
22:01:18 | <@JAA> | > from warcio import WARCWriter |
22:01:23 | <@JAA> | *twitch* |
22:01:24 | <thuban> | :( |
22:01:44 | <@JAA> | 'Special' indeed... |
22:01:53 | <imer> | oof. >"1/3 is in australia, 1/3 with me, and 1/3 on IA"[12]. As of September 2022 neither of the first two parts have been uploaded. |
22:02:17 | <thuban> | just the flowchart makes me wanna cry |
22:02:24 | <@arkiver> | oh |
22:02:29 | <@arkiver> | was this that alternative project? |
22:02:46 | <@JAA> | Yeah |
22:02:50 | <thuban> | (the state of affairs depicted by the flowchart, not the flowchart itself, it's a very nice flowchart, thank you Doranwen) |
22:03:02 | <@arkiver> | sigh |
22:03:06 | <@arkiver> | where did those WARCs end up? |
22:03:27 | <@arkiver> | that was annoying from what i remember |
22:04:06 | <@arkiver> | ah the data was never uploadd? |
22:04:19 | <@arkiver> | it should be, but not in the wayback machine due to warcio and login |
22:05:05 | <thuban> | some of it's been uploaded (but not in wbm afaik), some of it's floating around in limbo |
22:05:39 | <thuban> | ask marked and/or lennier1 (lennier2?) |
22:06:11 | <@arkiver> | they'll upload when they upload, there's been plenty of time |
22:06:47 | <thuban> | anyway |
22:06:57 | <thuban> | mgrandi: writing your own script(s) seems like overkill; something wrong with grab-site? |
22:07:49 | <@arkiver> | mgrandi: are the links you to through login actually then downloadable without login? |
22:08:11 | <thuban> | oh, good question |
22:09:07 | <mgrandi> | No I think it needs a cookie but I can find out later |
22:09:15 | <mgrandi> | I dunno how grab site works or if I can run that heh |
22:09:59 | <thuban> | it's basically local archivebot, dashboard and all |
22:10:12 | <thuban> | https://github.com/ArchiveTeam/grab-site/ |
22:13:12 | <h2ibot> | Kevidryon2 created "osu!" (+2283, Version 2): https://wiki.archiveteam.org/?title=%22osu%21%22 |
22:13:13 | <h2ibot> | JustAnotherArchivist moved "osu!" to Osu!: https://wiki.archiveteam.org/?title=Osu%21 |
22:13:26 | <lennier2> | I never did figure out for sure what that "1/3 in Australia" was referring to. Did datechnoman run any targets? I was meaning to ask them. |
22:15:12 | <h2ibot> | JustAnotherArchivist edited Osu! (-59, Cleanup): https://wiki.archiveteam.org/?diff=50978&oldid=50977 |
22:27:43 | | etnguyen03 (etnguyen03) joins |
22:30:19 | <mgrandi> | https://forum.xentax.com/download/file.php?id=22167 here is a example file |
22:31:20 | <mgrandi> | Looks like you need to be logged in, maybe it does a redirect to the raw file that might not check |
22:32:04 | <thuban> | oh dammit, they moved the date up |
22:34:16 | <h2ibot> | Switchnode edited Deathwatch (-1, /* 2023 */ update xentax with new deadline): https://wiki.archiveteam.org/?diff=50979&oldid=50973 |
22:34:37 | <thuban> | (dec 1 now) |
22:48:39 | | magmaus31 (magmaus3) joins |
22:48:44 | | magmaus3 quits [Ping timeout: 265 seconds] |
22:48:44 | | magmaus31 is now known as magmaus3 |
22:50:24 | | hitgrr8 quits [Client Quit] |
22:59:56 | | etnguyen03 quits [Ping timeout: 252 seconds] |
23:04:26 | | dumbgoy joins |
23:18:10 | | etnguyen03 (etnguyen03) joins |
23:20:19 | <fireonlive> | :/ |
23:35:34 | <Doranwen> | thuban: I didn't create the flowchart, lol - I did contribute to some of the *data* being acquired, and to sorting it all out (still working on that!) - but I'd have to search through logs to see who created the flowchart |
23:38:01 | <audrooku|m> | I appreciate your dedication <3 |
23:39:45 | | jacksonchen666 (jacksonchen666) joins |
23:41:28 | | Wohlstand quits [Client Quit] |
23:43:09 | <thuban> | oh, my mistake--looks like it was OrIdow6. i hereby redirect my thanks |
23:43:11 | <thuban> | ditto, though |
23:44:11 | | jacksonchen666 quits [Ping timeout: 245 seconds] |
23:50:32 | | etnguyen03 quits [Ping timeout: 252 seconds] |
23:54:19 | | Megame (Megame) joins |