00:00:00 | | Radzig joins |
00:06:00 | | nepeat quits [Ping timeout: 260 seconds] |
00:09:36 | | nepeat (nepeat) joins |
00:12:38 | | collat quits [Ping timeout: 240 seconds] |
00:13:34 | | collat joins |
00:29:19 | | etnguyen03 quits [Client Quit] |
00:36:30 | <pabs> | Barto JAA - anarcat is working on the shutdown, and asked me questions about doing the final archive before that. so it should be fine |
00:38:15 | <@JAA> | pabs: Lovely. Do you know what the 2024-09-17 is referring to? |
00:38:25 | | sralracer quits [Client Quit] |
00:38:32 | <@JAA> | anarcat++ |
00:38:32 | <eggdrop> | [karma] 'anarcat' now has 3 karma! |
00:38:49 | <pabs> | https://archive.fart.website/archivebot/viewer/job/20240918051701e1k3d |
00:38:57 | <pabs> | hmm, thats only shallow |
00:49:37 | <@JAA> | Yeah, that's the !ao pokechu22 mentioned earlier. |
00:50:12 | <pabs> | anarcat: thats my mistake, I misread the fart web interface. |
00:50:27 | <pabs> | anarcat: so 20221021 looks like the full archive |
00:50:32 | <pabs> | *last full archive |
00:53:37 | | nick5 joins |
00:54:37 | | nick5 quits [Client Quit] |
00:58:01 | <pabs> | alexlehm: nice spotting, I'll add that to AB |
01:03:28 | <pabs> | oh shit, it has many subdomains |
01:03:48 | | decky_e_ quits [Read error: Connection reset by peer] |
01:04:09 | | decky_e_ joins |
01:05:11 | | Bleo18260072271962 quits [Quit: Ping timeout (120 seconds)] |
01:05:12 | | yonerboner quits [Quit: Ping timeout (120 seconds)] |
01:05:28 | | yonerboner joins |
01:05:37 | | ^ quits [Remote host closed the connection] |
01:05:46 | | nepeat quits [Client Quit] |
01:05:48 | | Ryz quits [Quit: Ping timeout (120 seconds)] |
01:06:00 | | tek_dmn- quits [Quit: ZNC - https://znc.in] |
01:06:02 | | Aoede_ quits [Quit: ZNC - https://znc.in] |
01:06:41 | | Ryz (Ryz) joins |
01:07:15 | | @rewby quits [Ping timeout: 260 seconds] |
01:07:25 | | Aoede (Aoede) joins |
01:07:43 | | Bleo18260072271962 joins |
01:07:57 | | ^ (^) joins |
01:08:20 | | rewby (rewby) joins |
01:08:20 | | @ChanServ sets mode: +o rewby |
01:08:51 | | tek_dmn (tek_dmn) joins |
01:09:13 | | nepeat (nepeat) joins |
01:11:47 | | etnguyen03 (etnguyen03) joins |
01:27:08 | | ghbmmm joins |
01:28:50 | | ghbmmm quits [Client Quit] |
01:31:03 | | mikemcnarland joins |
01:31:07 | | msrn_ quits [Ping timeout: 255 seconds] |
01:31:20 | | mikael joins |
01:31:41 | | DLoader7 joins |
01:32:20 | | M60_ quits [Ping timeout: 260 seconds] |
01:32:28 | | DLoader quits [Ping timeout: 255 seconds] |
01:32:29 | | DLoader7 is now known as DLoader |
01:32:30 | | mikemcnarland quits [Client Quit] |
01:32:44 | | M60_ joins |
02:02:54 | <h2ibot> | Cooljeanius edited Deathwatch (+355, add botsin.space): https://wiki.archiveteam.org/?diff=53675&oldid=53669 |
02:05:55 | <h2ibot> | Cooljeanius edited Mastodon (+120, /* Dead and dying instances */ add botsin.space): https://wiki.archiveteam.org/?diff=53676&oldid=53507 |
02:47:53 | | egallager joins |
02:47:58 | <egallager> | hi |
02:48:08 | | etnguyen03 quits [Client Quit] |
02:48:17 | <egallager> | just wanted to make sure people had seen about botsin.space shutting donw: https://muffinlabs.com/posts/2024/10/29/10-29-rip-botsin-space/ |
02:51:41 | | Guest54 joins |
02:52:51 | | etnguyen03 (etnguyen03) joins |
03:03:02 | | etnguyen03 quits [Remote host closed the connection] |
03:15:13 | | egallager quits [Client Quit] |
03:30:04 | | accordsviewer joins |
03:32:15 | | accordsviewer quits [Client Quit] |
03:33:57 | | chrismrtn (chrismrtn) joins |
03:34:12 | | nicolas17 quits [Quit: Konversation terminated!] |
04:04:34 | | leo60228 quits [Ping timeout: 255 seconds] |
04:09:52 | | wa joins |
04:19:19 | | Yuio joins |
04:19:37 | | Yuio quits [Client Quit] |
04:25:24 | | BlueMaxima quits [Read error: Connection reset by peer] |
04:52:32 | | ladlounge joins |
04:52:36 | | angenieux quits [Quit: The Lounge - https://thelounge.chat] |
04:54:36 | | angenieux (angenieux) joins |
04:57:44 | | Guest54 quits [Client Quit] |
04:58:57 | | ladlounge quits [Client Quit] |
05:26:15 | | Commander001 quits [Ping timeout: 260 seconds] |
05:27:52 | | Commander001 joins |
05:28:43 | | SAM joins |
05:28:59 | | SAM quits [Client Quit] |
05:29:03 | | Island quits [Read error: Connection reset by peer] |
05:30:20 | | magmaus3 quits [Ping timeout: 260 seconds] |
05:31:21 | | Island joins |
05:39:32 | | magmaus3 (magmaus3) joins |
05:54:43 | | Commander001 quits [Read error: Connection reset by peer] |
05:54:56 | | Commander001 joins |
06:44:25 | | collat quits [Ping timeout: 260 seconds] |
06:50:04 | | collat joins |
06:50:57 | | le0n (le0n) joins |
06:51:50 | | leo60228 (leo60228) joins |
06:55:54 | | mike joins |
06:57:00 | | mike quits [Client Quit] |
07:05:53 | | Unholy236192464537713 (Unholy2361) joins |
07:11:06 | | Unholy2361924645377131 (Unholy2361) joins |
07:14:45 | | Unholy236192464537713 quits [Ping timeout: 260 seconds] |
07:14:45 | | Unholy2361924645377131 is now known as Unholy236192464537713 |
07:21:45 | | flotwig quits [Ping timeout: 260 seconds] |
07:21:46 | | flotwig_ joins |
07:22:16 | | flotwig_ is now known as flotwig |
07:35:26 | | qw3rty joins |
07:35:45 | | qw3rty_ quits [Ping timeout: 260 seconds] |
08:07:22 | <asie> | more for the deathwatch: https://forum.pclab.pl/ - the fairly large (~10 million+ posts) forum PCLab.pl, connected to an online portal that shut down in 2020, is shutting down on November 30th, 2024 |
08:08:21 | <asie> | this is probably just an AB job, but a very big one |
08:09:16 | <asie> | November 30th or 29th; the announcement actually gives both dates, so it's safer to assume 29th |
08:09:46 | <asie> | (I assume the intent is something like "the hours between late November 29th and early November 30th") |
08:10:01 | <asie> | (or maybe it's just a typo.) |
08:11:08 | <asie> | It's worth noting that the forum continues to be active until that date, however; only the creation of new accounts has been blocked. |
09:16:13 | <thuban> | asie: thanks, running. |
09:18:17 | <thuban> | i'm a little concerned by its tendency to return giant php farts with status code 200 (particularly on subforum pages?), which might impede recursion, but i guess we can do some manual requeues if necessary |
09:38:12 | <asie> | it's very likely the service has been "deteriorating" since 2020 |
09:38:26 | <asie> | given that's when the online portal was shut down, the forum was probably largely left to its own devices |
09:45:37 | | s4n1ty7 (s4n1ty) joins |
09:46:07 | | Miori quits [Ping timeout: 255 seconds] |
09:46:17 | | s4n1ty quits [Read error: Connection reset by peer] |
09:46:34 | | Miori joins |
09:47:27 | | s4n1ty7 is now known as s4n1ty |
09:57:22 | | thehedgeh0g quits [Ping timeout: 255 seconds] |
09:57:22 | | noname quits [Ping timeout: 255 seconds] |
09:57:49 | | evan quits [Ping timeout: 255 seconds] |
09:58:16 | | alittleglitchy quits [Ping timeout: 255 seconds] |
09:58:16 | | c3manu quits [Ping timeout: 255 seconds] |
09:59:10 | | shreyasminocha quits [Ping timeout: 255 seconds] |
09:59:11 | | BornOn420 quits [Remote host closed the connection] |
09:59:15 | | klaffty quits [Ping timeout: 260 seconds] |
09:59:49 | | klaffty joins |
10:00:22 | <h2ibot> | Switchnode edited Deathwatch (+360, /* 2024 */ add lewd.it and forum.pclab.pl): https://wiki.archiveteam.org/?diff=53677&oldid=53675 |
10:01:16 | | BornOn420 (BornOn420) joins |
10:04:55 | <masterx244|m> | <thuban> "i'm a little concerned by its..." <- in the worst case you do a ugly enumeration crawl without AB and then do a unrecursive crawl with AB on a URL list where you know of all topic lengths already |
10:05:38 | <masterx244|m> | (had to do something similar once, too when i had a page where pagination bricked itself in the middle for some odd reason. crawl was done with a ugly C# program to generate the ID listings needed) |
10:07:23 | | c3manu (c3manu) joins |
10:07:26 | | evan joins |
10:08:04 | | thehedgeh0g (mrHedgehog0) joins |
10:09:33 | | vix5110_ joins |
10:17:31 | | magmaus3 quits [Read error: Connection reset by peer] |
10:17:42 | | magmaus3 (magmaus3) joins |
10:18:04 | | shreyasminocha (shreyasminocha) joins |
10:18:20 | | atweedie joins |
10:19:06 | | noname joins |
10:21:25 | | le0n quits [Ping timeout: 260 seconds] |
10:23:51 | | le0n (le0n) joins |
10:24:42 | | alittleglitchy joins |
11:00:04 | | Bleo18260072271962 quits [Quit: The Lounge - https://thelounge.chat] |
11:02:50 | | Bleo18260072271962 joins |
11:09:38 | | le0n quits [Ping timeout: 240 seconds] |
11:17:19 | | useretail joins |
11:37:51 | | le0n (le0n) joins |
11:45:25 | | noname quits [Ping timeout: 260 seconds] |
11:45:25 | | evan quits [Ping timeout: 260 seconds] |
11:46:00 | | thehedgeh0g quits [Ping timeout: 260 seconds] |
11:46:35 | | alittleglitchy quits [Ping timeout: 260 seconds] |
11:46:35 | | c3manu quits [Ping timeout: 260 seconds] |
11:47:10 | | atweedie quits [Ping timeout: 260 seconds] |
11:47:10 | | shreyasminocha quits [Ping timeout: 260 seconds] |
11:47:10 | | klaffty quits [Ping timeout: 260 seconds] |
11:48:03 | | AK quits [Quit: AK] |
11:51:56 | | klaffty joins |
11:53:00 | <pabs> | alexlehm: did a bunch of AB jobs and a couple of code archiving jobs for the few git repos. archive.org uploads still down though, so it could be a while |
11:53:42 | | SkilledAlpaca418 quits [Quit: SkilledAlpaca418] |
11:54:14 | | noname joins |
11:54:14 | | evan joins |
11:54:16 | | atweedie joins |
11:54:22 | | alittleglitchy joins |
11:54:23 | | thehedgeh0g (mrHedgehog0) joins |
11:54:23 | | shreyasminocha (shreyasminocha) joins |
11:54:23 | | c3manu (c3manu) joins |
11:54:54 | | AK (AK) joins |
11:56:39 | | SkilledAlpaca418 joins |
12:21:26 | | pablo joins |
12:28:11 | | pablo quits [Client Quit] |
12:28:44 | | pablo joins |
12:34:40 | | pablo quits [Client Quit] |
12:40:27 | | Unholy236192464537713 quits [Quit: Ping timeout (120 seconds)] |
12:58:59 | | anarcat waves |
12:59:27 | <anarcat> | so i'm happy to coordinate an AT activity WRT the mailman2 retirement, but before people freak out: 1. i'm keeping the /pipermail/ stuff indefinitely and 2. i'm here! |
12:59:35 | <anarcat> | so who wants to launch the job? should i? |
12:59:41 | <anarcat> | i'm happy to just delegate that, i'm rusty |
13:00:01 | <anarcat> | if you do, mention my nick ("anarcat") in the job so i get pinged though |
13:00:09 | <anarcat> | i would otherwise launch that "soon" |
13:03:17 | <anarcat> | JAA / pabs ^ |
13:16:47 | | kdy quits [Remote host closed the connection] |
13:16:59 | | kdy (kdy) joins |
13:29:48 | | DigitalDragons quits [Quit: Ping timeout (120 seconds)] |
13:31:03 | | DigitalDragons (DigitalDragons) joins |
13:33:56 | | Wohlstand (Wohlstand) joins |
14:00:02 | | FireFly quits [Quit: Leaving] |
14:00:11 | | FireFly joins |
14:01:58 | <anarcat> | JAA / pabs : so i think i would do `!a https://lists.torproject.org/ --reason "one more full crawl before mm3 migration"`, then `!ignore $job /pipermail/[^/]+/(1\d{3}|20[0-1]\d|2020)` |
14:02:04 | <anarcat> | i wonder if i need the /options/ ignore |
14:02:08 | <anarcat> | and whether that makes sense, advice welcome! |
14:11:07 | <anarcat> | i'm also trying to figure out how the previous job was ran, it looks like it didn't ahve any ignores at all |
14:11:21 | <anarcat> | *except* there's stuff like this in the logs: 2021-11-01 16:00:12,337 - archivebot.pipeline.wpull_plugin - INFO - Ignore https://lists.torproject.org/pipermail/tor-commits/2018-March/140986.html using pattern ^https?://lists\.torproject\.org/pipermail/tor-commits/ |
14:11:31 | <anarcat> | so i wonder if tor-commits is somehow in a builtin ignore list |
14:14:02 | <pabs> | sounds like someone added ^https?://lists\.torproject\.org/pipermail/tor-commits/ as an ignore |
14:14:14 | <h2ibot> | Anarcat edited Mailman/2 (-79, remove duplicate lists.tpo entries): https://wiki.archiveteam.org/?diff=53678&oldid=53667 |
14:14:40 | <anarcat> | pabs: but i can't find this in the warc logs |
14:14:53 | <anarcat> | and now i wonder if we should have a job just crawling that list |
14:15:15 | <h2ibot> | Anarcat edited Mailman/2 (+60, update last save timestamp for tpo): https://wiki.archiveteam.org/?diff=53679&oldid=53678 |
14:46:14 | | sralracer joins |
14:46:33 | | sralracer is now authenticated as sralracer |
14:51:21 | | Megame (Megame) joins |
14:54:25 | | katocala quits [Ping timeout: 260 seconds] |
14:54:50 | | katocala joins |
14:55:46 | | yonerboner quits [Quit: The Lounge - https://thelounge.chat] |
15:01:21 | | Guest54 joins |
15:07:12 | | nicolas17 joins |
15:07:18 | | nicolas17 is now authenticated as nicolas17 |
15:12:00 | <pabs> | anarcat: probably yes but without outlinks? OTOH the info is all in git |
15:12:12 | <anarcat> | there shouldn't be much outlinks in there |
15:12:21 | <anarcat> | i kind of like it gives an out of band audit log |
15:12:58 | | katocala quits [Ping timeout: 240 seconds] |
15:13:17 | | katocala joins |
15:13:25 | <pabs> | ah, I thought there would be links to git |
15:15:27 | <anarcat> | ah maybe |
15:15:28 | <anarcat> | i don't know |
15:15:44 | <vix5110_> | Hey, prolly a dumb question, but why are projects still stalled since uploads were resumed on archive.org's side ? At least this is what i understood |
15:15:47 | <anarcat> | there are links to gitlab, for sure, actually |
15:15:50 | <anarcat> | so i guess i'll ignore that as well |
15:18:37 | <DigitalDragons> | vix5110_: The Wayback Machine and the archive.org website are back online, but you still can't upload anything (or even log into the site) yet |
15:19:51 | <DigitalDragons> | Some projects are using temporarily storage now, but others are still waiting for IA to come back |
15:20:38 | <nicolas17> | vix5110_: uploads were not resumed on archive.org's side |
15:24:21 | <vix5110_> | they mentioned institutional uploads and institutional web archiving |
15:24:30 | <vix5110_> | maybe thats not related at all tho |
15:25:05 | <vix5110_> | also i understand its a lot of work for them but |
15:25:12 | <vix5110_> | they could give an eta at least |
15:25:43 | | Megame quits [Client Quit] |
15:27:41 | <vix5110_> | also how is archivebot still active if uploads are not available |
15:28:27 | | danwellby quits [Read error: Connection reset by peer] |
15:28:36 | <pabs> | temporary storage |
15:28:54 | <vix5110_> | oh ok |
15:29:08 | <vix5110_> | thanks |
15:29:56 | | danwellby joins |
15:36:25 | | Wohlstand quits [Ping timeout: 260 seconds] |
16:07:55 | | myself quits [Ping timeout: 260 seconds] |
16:15:08 | | myself (myself) joins |
16:28:02 | | xkey quits [Quit: WeeChat 4.2.2] |
16:28:45 | | xkey (xkey) joins |
16:47:20 | | katocala is now authenticated as katocala |
17:19:38 | | Wohlstand (Wohlstand) joins |
17:28:04 | | fuzzy8021 quits [Remote host closed the connection] |
17:28:35 | | fuzzy80211 (fuzzy80211) joins |
18:28:04 | <@JAA> | anarcat, pabs: Yep, tor-commits was explicitly ignored, and it's the only onsite thing that was ignored on that 2021 run as far as I can see. |
18:35:29 | <@JAA> | We could do --no-offsite and throw those (possibly minus the Git stuff) into #// at a later time. |
18:40:38 | <anarcat> | i figured we could ignore all of tor-commits for now, and do a !ao just for tor-commits? |
18:52:12 | <@JAA> | If we do --no-offsite, no need to do anything about tor-commits. |
18:52:56 | <@JAA> | And it would also finish much faster due to not needing to do random offsite stuff that will often fail etc. |
18:54:00 | | anarcat shrugs |
18:54:03 | <anarcat> | i'd say it's your call |
18:54:10 | <anarcat> | i've haven't done this in a while |
18:55:20 | <@JAA> | Since we haven't done tor-commits in the previous run, the suggested ignore won't work either. It might be fine to just regrab it all? |
18:57:25 | <@JAA> | Let's try that and assess in a few hours how it's going. :-) |
18:59:54 | <anarcat> | thanks JAA |
19:01:35 | <anarcat> | odd, it's not on the dashboard yet |
19:01:39 | <anarcat> | i guess that takes a couple minutes? |
19:01:44 | <anarcat> | or it's queued or something |
19:02:04 | <anarcat> | right, it's part of the 7 pending i guess |
19:10:24 | <@JAA> | anarcat: It's running now. |
19:11:46 | <katia> | then go catch it! |
19:11:58 | | katia left the channel (Parted) |
19:12:09 | <anarcat> | that site is kicking my browser's ass |
19:12:20 | <that_lurker> | kinky |
19:39:12 | <DigitalDragons> | that_lurker bonk++ |
19:39:12 | <eggdrop> | [karma] 'that_lurker bonk' now has 15 karma! |
20:39:45 | | Wohlstand quits [Ping timeout: 260 seconds] |
21:01:16 | <katia> | that_lurker bonk++ |
21:01:17 | <eggdrop> | [karma] 'that_lurker bonk' now has 16 karma! |
21:11:32 | | dimkauzh (dimkauzh) joins |
21:16:26 | <Barto> | anarcat++ |
21:16:27 | <eggdrop> | [karma] 'anarcat' now has 4 karma! |
21:16:32 | <Barto> | fireonlive++ |
21:16:32 | <eggdrop> | [karma] 'fireonlive' now has 695 karma! |
21:17:04 | <Barto> | thanks for doing it the best way :-) |
21:18:19 | | etnguyen03 (etnguyen03) joins |
21:21:23 | | dimkauzh leaves |
21:39:50 | | collat quits [Ping timeout: 260 seconds] |
21:51:36 | | collat joins |
21:52:36 | | Naruyoko joins |
22:04:22 | | vix5110_ quits [Client Quit] |
22:08:57 | <@JAA> | anarcat: Looks to be running well and should probably finish sometime tomorrow. |
22:10:10 | | collat quits [Ping timeout: 260 seconds] |
22:21:55 | | collat joins |
22:22:16 | | Wohlstand (Wohlstand) joins |
22:31:10 | | collat quits [Ping timeout: 260 seconds] |
22:31:54 | | collat joins |
22:33:47 | | BlueMaxima joins |
22:36:38 | | collat quits [Ping timeout: 240 seconds] |
22:36:54 | | collat joins |
22:42:15 | | collat quits [Ping timeout: 260 seconds] |
22:44:54 | <h2ibot> | Pokechu22 edited Mailman/2 (+106, /* Archived */ lists.masc.org, lists.massupt.org): https://wiki.archiveteam.org/?diff=53680&oldid=53679 |
22:51:35 | | useretail quits [Quit: Leaving] |
22:52:54 | | collat joins |
22:59:58 | | collat quits [Ping timeout: 240 seconds] |
23:12:24 | | collat joins |
23:17:50 | | collat quits [Ping timeout: 260 seconds] |
23:29:06 | | collat joins |
23:35:34 | | Dango360_ quits [Quit: Leaving] |
23:35:46 | | Dango360 (Dango360) joins |
23:38:50 | | collat quits [Ping timeout: 260 seconds] |
23:50:57 | | collat joins |
23:58:05 | | collat quits [Ping timeout: 260 seconds] |