00:07:44jacobk joins
00:10:25Eighty quits [Ping timeout: 265 seconds]
00:10:40Eighty (Eighty) joins
00:28:32<Maakuth>Hmm, how about the numerical redirects that are predictable? I suppose ArchiveBot will fill them in on time, so no need to generate them
00:32:31Eighty quits [Ping timeout: 265 seconds]
00:34:03Eighty (Eighty) joins
00:44:53Kenshin quits [Quit: ZNC - http://znc.in]
00:44:53drexler_ quits [Remote host closed the connection]
00:44:53yano quits [Remote host closed the connection]
00:44:55RKenshin joins
00:45:12drexler_ joins
00:45:23yano (yano) joins
00:49:55tzt quits [Ping timeout: 265 seconds]
01:11:20@arkiver quits [Remote host closed the connection]
01:11:43arkiver (arkiver) joins
01:11:43@ChanServ sets mode: +o arkiver
01:14:54DiscantX quits [Remote host closed the connection]
01:15:16DiscantX joins
01:35:29Arcorann quits [Ping timeout: 265 seconds]
02:16:35qwertyasdfuiopghjkl joins
02:29:37lukash7 quits [Ping timeout: 265 seconds]
02:44:36lukash7 joins
03:48:18MrRadar quits [Quit: Rebooting]
03:51:45Discant joins
03:51:52DiscantX quits [Remote host closed the connection]
03:51:52gazorpazorp quits [Remote host closed the connection]
03:51:52Iki quits [Remote host closed the connection]
03:52:01drexler_ quits [Remote host closed the connection]
03:52:02[42] quits [Max SendQ exceeded]
03:52:02Mateon1 quits [Remote host closed the connection]
03:52:04gazorpazorp (gazorpazorp) joins
03:52:04Iki joins
03:52:04Mateon1 joins
03:52:08drexler_ joins
03:52:16MrRadar (MrRadar) joins
03:52:29[42] (N4Y) joins
03:53:46Iki quits [Remote host closed the connection]
03:53:46[42] quits [Excess Flood]
03:53:51gazorpazorp quits [Max SendQ exceeded]
03:54:03Iki joins
03:54:08gazorpazorp (gazorpazorp) joins
03:54:45[42] (N4Y) joins
04:06:54eroc1990 quits [Client Quit]
04:07:20eroc1990 (eroc1990) joins
04:13:32wyatt8740 quits [Ping timeout: 265 seconds]
04:55:51drexler_ quits [Remote host closed the connection]
04:56:08drexler_ joins
04:58:11Iki quits [Remote host closed the connection]
04:58:11gazorpazorp quits [Remote host closed the connection]
04:58:11drexler_ quits [Remote host closed the connection]
04:58:11[42] quits [Max SendQ exceeded]
04:58:12AK quits [Quit: Ping timeout (120 seconds)]
04:58:13drexler_ joins
04:58:13gazorpazorp (gazorpazorp) joins
04:58:16Iki joins
04:58:35AK9 (AK) joins
04:59:13[42] (N4Y) joins
05:33:33<Maakuth>derivation appears to be done, at least no status visible on the archive page. but still I can click through threads and end up on "not archived" page. for example https://web.archive.org/web/20220411192221/https://murobbs.muropaketti.com/threads/maailman-kenties-kalleinta-peliae-voi-kokeilla-ilmaiseksi-avaruuden-pimeydestae-paeaesee-nauttimaan-viikon-verran.1442456/
05:34:00<Maakuth>is there some other step between derivation and snapshot visibility in wayback machine?
05:43:02<Jake>Yes, there's an additional step afterwards.
05:47:29<Sanqui>yeah the "ingestion" process into WBM can take time
05:50:58<Sanqui|m>BTW, now that the bridge is back in full operation (the failure was caused by an expired certificate, and I understand it shouldn't happen again), here's a reminder that hackint.org is also available from the Matrix network, and should you wish to use a client on this side, I offer a "Space" which groups together Archive Team main & project channels in an organized manner. https://matrix.to/#/#archive-team:matrix.org
05:52:06<Sanqui>anyways, the murobbs AT job is doing well too, only 300k remaining!
05:53:59lennier1 quits [Client Quit]
05:54:27lennier1 (lennier1) joins
06:01:46user_ (gazorpazorp) joins
06:01:50mgrytbak1 joins
06:01:51nepeat_ (nepeat) joins
06:02:03driib0835 (driib) joins
06:02:04lunik10 joins
06:02:05niku1 joins
06:02:09mrfooooo4 joins
06:02:10Ryz28 (Ryz) joins
06:02:16cpina_ joins
06:02:45Iki quits [Remote host closed the connection]
06:02:45nepeat quits [Quit: ZNC - https://znc.in]
06:02:45Matthww quits [Client Quit]
06:02:45monika quits [Quit: Ping timeout (120 seconds)]
06:02:45coderobe quits [Quit: Ping timeout (120 seconds)]
06:02:45niku quits [Client Quit]
06:02:45Craigle quits [Quit: Ping timeout (120 seconds)]
06:02:45mrfooooo quits [Quit: Ping timeout (120 seconds)]
06:02:45Mayk quits [Quit: ZNC 1.7.5+deb4 - https://znc.in]
06:02:45cpina quits [Quit: Bye!]
06:02:45Ruthalas quits [Client Quit]
06:02:45rsn quits [Quit: Ping timeout (120 seconds)]
06:02:45mgrytbak quits [Quit: Ping timeout (120 seconds)]
06:02:45drexler_ quits [Remote host closed the connection]
06:02:45driib083 quits [Client Quit]
06:02:45lukash7 quits [Client Quit]
06:02:45lunik1 quits [Quit: Ping timeout (120 seconds)]
06:02:45niku1 is now known as niku
06:02:45gazorpazorp quits [Remote host closed the connection]
06:02:45Ryz2 quits [Quit: Ping timeout (120 seconds)]
06:02:45katocala quits [Remote host closed the connection]
06:02:45dm4v quits [Client Quit]
06:02:45simon816 quits [Quit: ZNC 1.8.2 - https://znc.in]
06:02:45jspiros_ quits []
06:02:45kiskaLogBot quits [Remote host closed the connection]
06:02:45mgrytbak1 is now known as mgrytbak
06:02:45driib0835 is now known as driib083
06:02:45lunik10 is now known as lunik1
06:02:45mrfooooo4 is now known as mrfooooo
06:02:46Ryz28 is now known as Ryz2
06:02:49mikael quits [Quit: ZNC - http://znc.in]
06:02:49TappyToes quits [Quit: ZNC - https://znc.in]
06:02:49Sanqui quits [Quit: .]
06:02:49omglolbah quits [Quit: ZNC - https://znc.in]
06:02:49Soul_ quits [Quit: http://drsclan.net]
06:02:51kiskaLogBot joins
06:02:51katocala joins
06:02:52Matthww joins
06:02:52msrn_ joins
06:02:53lukash7 joins
06:02:53monika (boom) joins
06:02:57dm4v joins
06:03:00dm4v quits [Changing host]
06:03:00dm4v (dm4v) joins
06:03:00Soulflare joins
06:03:02drexler joins
06:03:02Craigle (Craigle) joins
06:03:03Sanqui joins
06:03:05coderobe (coderobe) joins
06:03:08Iki joins
06:03:13jspiros (jspiros) joins
06:04:13simon816 (simon816) joins
06:04:28omglolbah joins
06:04:45TappyToes joins
06:10:13Mayk78 joins
06:16:26<Maakuth>Sanqui: s/AT/AB/ ?
06:16:38<Sanqui>AB yes sorry
06:18:02<Maakuth>well, that one should fill in the redirects, media and outlinks, I suppose?
06:18:59<Sanqui>some of them, but I also made the outlink ignores somewhat aggressive. I do believe JAA also put outlinks through #// though. Maybe we do need a wiki page to track this work at this point XD
06:23:02<Maakuth>the threads go back a decade, so too late for many outlinks anyway
06:23:41<Maakuth>here's the successor forum btw, should we take it too? https://bbs.io-tech.fi/
06:24:33<Sanqui>yes -- and any other major (or not) finnish forums you can think of
06:24:52<Sanqui>I'm happy to manage the archivebot jobs for them if you make a list
06:25:07<Maakuth>should we go for a single wiki page?
06:25:27<Sanqui>that would work
06:26:45<Sanqui>I did a mini-project for discovering and archiving estonian forums last year lol
06:26:54<Sanqui>but I didn't put the list on the wiki :(
06:28:32<Maakuth>I registered on the wiki. should I just create another project page then?
06:29:11<Sanqui>yeah, if you make just a bullet point list I can try to put it in shape
06:29:42<Maakuth>I have some mediawiki background myself, but no idea of local policies of course
06:30:13<Sanqui>the project template is more geared towards individual sites
06:30:18<Maakuth>ok
06:30:41<Sanqui>I "maintain" a template like this myself https://wiki.archiveteam.org/index.php?title=Template:Czech_websites
06:30:57<Sanqui>but I've sorta moved onto a "move fast break things" mentality with my own jobs lol
06:36:01<Maakuth>ok, I put just the two forums there. is anonymous editing allowed there? I would like to enlist some finns to help
06:39:53<Sanqui>nope, all mediawikis I know of have problems with spam
06:40:09<Sanqui>you might as well start an etherpad or something and then copy it yourself
06:41:33<Maakuth>ok
06:52:27HackMii quits [Remote host closed the connection]
06:53:39HackMii (hacktheplanet) joins
06:54:32<Maakuth>https://wiki.archiveteam.org/index.php?title=Finnish_Web in modqueue, I've dug some forums there and will add more
06:57:00shogchips joins
06:58:21shoghicp quits [Ping timeout: 265 seconds]
07:02:05Gereon62 quits [Ping timeout: 265 seconds]
07:03:06HackMii quits [Remote host closed the connection]
07:04:16HackMii (hacktheplanet) joins
07:13:39<Sanqui>Maakuth: cool, I'm not an admin on the wiki so I'll have to wait until your account gets approved
07:14:23Gereon62 (Gereon) joins
07:28:49<Maakuth>Sanqui: http://muistio.tieke.fi/p/suomifoorumit here's the etherpad if you want to queue something already
07:32:00HackMii quits [Remote host closed the connection]
07:33:13HackMii (hacktheplanet) joins
07:41:48wyatt8740 joins
07:54:34<Maakuth>about 70 found already :)
08:13:30Arcorann (Arcorann) joins
08:36:49<Sanqui>Maakuth: awesome, love these regional fora, collections of these are hard to get without people resources
08:51:25Webuser354 joins
09:09:34syntaxx (syntaxx) joins
09:21:43sec^nd quits [Ping timeout: 252 seconds]
09:27:36sec^nd (second) joins
10:11:01Webuser354 quits [Remote host closed the connection]
10:11:01drexler quits [Remote host closed the connection]
10:11:01jspiros quits [Client Quit]
10:11:09drexler joins
10:11:09jspiros (jspiros) joins
10:22:08<Maakuth>fun to see AB chew through suomi24 sitemaps. every item takes a dozen seconds and adds something like 40k items to remaining
10:45:28pabs quits [Killed (ing.hackint.org (Nickname regained by services))]
10:45:31pabs (pabs) joins
11:11:52@arkiver quits [Remote host closed the connection]
11:12:14arkiver (arkiver) joins
11:12:14@ChanServ sets mode: +o arkiver
12:01:45Discant quits [Ping timeout: 265 seconds]
12:16:22Zeklyn quits [Quit: %VPS Died%]
12:16:22Iki quits [Remote host closed the connection]
12:16:22drexler quits [Remote host closed the connection]
12:16:25Zeklyn_ joins
12:16:28Iki joins
12:16:41drexler joins
12:31:50sec^nd quits [Remote host closed the connection]
12:33:07sec^nd (second) joins
12:43:19katocala quits [Remote host closed the connection]
12:43:49katocala joins
12:44:17katocala leaves
12:44:29<msrn_>finnish imageboards would be a good target, ylilauta.org tries their hardest to block every archival effort.
12:47:19<Sanqui>we (archive team) don't generally dedicate our time to imageboards, they're a lot of trouble lol, but there are other groups
12:48:26msrn_ is now known as mikael
12:48:55<Sanqui>btw I noticed in 2017 Aoede started a job for keskustelu.suomi24.fi without outlinks and had to abort it ten days later probably because it was so massive haha, will be interesting
12:49:42<Maakuth>yeah, I think that's the very largest finnish forum
12:50:20<mikael>I sometime tried to wget some ylilauta threads but they are doing to javascript checking background
12:51:05<Maakuth>trying to be ephemeral somehow I suppose
12:56:42<Sanqui>btw - I'm booked for a 1h research interview with New Design Congress, the people also behind Webrecorder, Browsertrix etc. I'd be interested to be briefed on Archive Team's opinion on their projects (especially from JAA), so I can make it as constructive as possible
13:04:39Minkafighter quits [Client Quit]
13:05:25Minkafighter joins
13:09:43syntaxx quits [Client Quit]
13:09:51syntaxx (syntaxx) joins
13:13:34driib083 quits [Client Quit]
13:13:49driib0835 (driib) joins
13:21:03katocala joins
13:30:00Mateon1 quits [Read error: Connection reset by peer]
13:30:23Mateon1 joins
14:08:26Arcorann quits [Ping timeout: 240 seconds]
15:17:09<@JAA>Sanqui: Since it's all based on warcio, it has significant data accuracy issues, and apparently the authors don't seem to care about that at all (given that they *intentionally* introduced mangling a while back and I had to convince them to even see it as a problem)...
15:17:28<@JAA>Nothing happened in almost a year since I reported those, so yeah.
15:19:55<pcr>https://github.com/webrecorder/warcio/issues/created_by/JustAnotherArchivist
15:20:21<@JAA>To say I'm not a fan of warcio is an understatement at this point. :-)
15:27:42lukash7 quits [Client Quit]
15:27:59lukash7 joins
15:34:15<Sanqui>JAA: right, thanks! I'll pass on the concern and let's hope something beneficial comes from it
15:34:39<Sanqui>ugh, all these issues just make me want to double down on raw packet captures. i.e. pcapng
15:34:52lennier1 quits [Client Quit]
15:35:04<Sanqui>even though parsing them with wireshark has been less-than-fun, at least you can be 100% certain the data is correct, forever
15:36:16<@JAA>Yeah, not very usable though.
15:36:20lennier1 (lennier1) joins
15:36:28<Sanqui>only a question of tooling.
15:36:47<Sanqui>and evidently, there is *still* need for new tooling
15:37:02<@JAA>Yes, some of which I'm working on. :-)
15:37:36<@JAA>And then I'll try to push for several things in the WARC spec.
15:37:49<Sanqui>the discord thesis has given me an excellent opportunity to play with pcapng files, so I'm seeing all the pain points right now. They're annoying, but fixable
15:37:51<@JAA>It's a slow-moving body.
15:38:56<Sanqui>the webrecorder group is working on a WACZ format: https://webrecorder.net/2021/01/18/wacz-format-1-0.html
15:39:05<Sanqui>I hope we don't end up in a situation with competing formats
15:40:52<Sanqui>(though as a higher-fidelity format I don't think pcapng can truly compete)
15:41:00<@JAA>That's basically just a ZIP that packs up a WARC, a CDX, and some other index files.
15:42:22<@JAA>I suppose it's slightly more convenient for a user as you only have to download one file instead of two or more, but I don't see any technical advantages.
15:44:30<Sanqui>anyway, call's scheduled for friday, so if you think of anything else to pass on to the group, lemme know, I hope to make the best of it. As I have learned during my work at the Czechoslovak game archive, cross-organization exchange of stances, procedures, and general awareness tends to be lacking, so I hope to maybe represent Archive Team and build a bridge :)
15:45:19<@JAA>arkiver may have some thoughts as well. ^^^
15:45:38<@JAA>(Start reading at 12:56.)
15:46:46<@JAA>And yeah, definitely agreed on the cross-org exchange. I've chatted a bit with people via IIPC before but am not really active there.
15:47:09<IDK>Someone told me that wayback machine deletes some duplicates, does it apply to archiveteam's capture
15:47:26<IDK>found that on wikipedia
15:47:52<@JAA>lolno
15:47:56<Sanqui>IIPC's a whole other group. That goes into my knowledge base, thank you
15:48:08<IDK>I see, citation needed
15:48:15<IDK>https://en.wikipedia.org/wiki/Wayback_Machine
15:48:33<@JAA>Yeah, the Webrecorder people are also in IIPC though, along with UK Web Archive and various others.
15:48:53<Sanqui>"Deleting duplicates" is a near-oxymoron. You can deduplicate without losing data.
15:49:33<@JAA>'Only the content creator can decide where their content is published or duplicated, so the Archive would have to delete pages from its system upon request of the creator.'
15:49:46<@JAA>That's something entirely different than deduping archives.
15:49:56<Sanqui>That's copyright law.
15:50:00<@JAA>Yup
15:51:10<@JAA>And since IA is a library, they can't actually be forced to delete it, I think. But they can be forced to cease distribution, i.e. WBM exclusions and blocking access to the WARCs.
15:52:37<@JAA>(Well, they could be forced to delete things, but I'd imagine that'd have to go through courts etc.)
15:57:10<IDK>Under uses/limitations: Starting in April 2018, administrative staff members of the Wayback Machine's archive team have enforced the Quarter month rule, by occasionally deleting time intervals of 23 days or 39 days (3/4 and 5/4 of a month, respectively), to reduce the queue size.**[citation needed]**
15:57:50<IDK>I think someone thinks that the page they archived was gone, but it usually pop up a few days later
15:59:20Discant joins
15:59:36<@JAA>ಠ_ಠ
16:38:42Discant quits [Ping timeout: 265 seconds]
16:54:39CraftByte quits [Ping timeout: 265 seconds]
17:08:13<duce1337>archive this https://samples.vx-underground.org/samples/Blocks/
17:09:23nico_32 quits [Remote host closed the connection]
17:09:45benjins quits [Remote host closed the connection]
17:11:16benjins joins
17:11:23nico_32 (nico) joins
17:12:03mikael quits [Ping timeout: 265 seconds]
17:13:44CraftByte (DragonSec|CraftByte) joins
17:15:07<@JAA>duce1337: Reason?
17:16:23mikael joins
17:17:40<duce1337>JAA: just for sake of saving, like most things on internet
17:21:38<@JAA>Well, it's huge and essentially a mirror...
17:32:40AK9 is now known as AK
17:34:36<Maakuth>The murobbs snapshot appears to be live in wayback machine, nice 🙂
17:36:17<h2ibot>Maakuth created Finnish Web (+856, Created page with "Project to crawl through…): https://wiki.archiveteam.org/?title=Finnish%20Web
18:15:06abobik232 joins
18:15:12abobik232 quits [Remote host closed the connection]
18:51:37jacobk quits [Ping timeout: 265 seconds]
19:20:39Craigle quits [Client Quit]
19:21:06Craigle (Craigle) joins
19:21:08jacobk joins
19:29:38<h2ibot>Maakuth edited Finnish Web (+2931, added a lot of Finnish forums): https://wiki.archiveteam.org/?diff=48466&oldid=48465
19:30:45Iki quits [Remote host closed the connection]
19:30:45qwertyasdfuiopghjkl quits [Remote host closed the connection]
19:30:58Iki joins
19:32:31<Maakuth>okay Sanqui, these should get AB nice and busy for a while :)
19:42:07lennier2 joins
19:45:16lennier1 quits [Ping timeout: 265 seconds]
19:45:19lennier2 is now known as lennier1
20:34:17Iki quits [Remote host closed the connection]
20:34:17CraftByte quits [Client Quit]
20:34:17@arkiver quits [Excess Flood]
20:34:17Matthww quits [Client Quit]
20:34:17monika quits [Client Quit]
20:34:17coderobe quits [Client Quit]
20:34:17drexler quits [Remote host closed the connection]
20:34:17dm4v quits [Client Quit]
20:34:17katocala quits [Remote host closed the connection]
20:34:21dm4v joins
20:34:23CraftByte9 (DragonSec|CraftByte) joins
20:34:26drexler joins
20:34:27Iki joins
20:34:27Matthww joins
20:34:28monika (boom) joins
20:34:30dm4v quits [Changing host]
20:34:30dm4v (dm4v) joins
20:34:33katocala joins
20:34:36coderobe (coderobe) joins
20:34:50arkiver (arkiver) joins
20:34:50@ChanServ sets mode: +o arkiver
20:39:11pabs quits [Remote host closed the connection]
20:50:11qwertyasdfuiopghjkl joins
20:50:50VerifiedJ quits [Quit: The Lounge - https://thelounge.chat]
20:51:21VerifiedJ (VerifiedJ) joins
20:58:08eroc1990 quits [Client Quit]
20:58:36eroc1990 (eroc1990) joins
21:01:09jacobk quits [Ping timeout: 265 seconds]
21:13:33jacobk joins
21:32:13DiscantX joins
21:35:13Discant joins
21:36:06Matthww quits [Ping timeout: 265 seconds]
21:36:47Matthww joins
21:37:03DiscantX quits [Ping timeout: 265 seconds]
21:37:32jacobk quits [Ping timeout: 265 seconds]
21:37:46plums joins
21:37:56<plums>Hi is someone here
21:40:52plums quits [Remote host closed the connection]
21:43:35Matthww quits [Client Quit]
21:43:53Matthww joins
21:50:15Matthww quits [Client Quit]
21:50:29Matthww joins
21:55:13<h2ibot>Maakuth edited Finnish Web (+0, link formatting): https://wiki.archiveteam.org/?diff=48467&oldid=48466
21:55:14<h2ibot>Michaelblob edited WikiTeam (+265): https://wiki.archiveteam.org/?diff=48468&oldid=48326
21:57:15Matthww2 joins
21:58:46Matthww quits [Read error: Connection reset by peer]
21:58:46Matthww2 is now known as Matthww
22:01:09atphoenix_ (atphoenix) joins
22:03:59atphoenix quits [Ping timeout: 265 seconds]
22:05:05Matthww quits [Ping timeout: 265 seconds]
22:10:13Matthww joins
22:12:54Matthww quits [Client Quit]
22:13:08Matthww joins
22:36:30qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
22:38:29HackMii quits [Ping timeout: 252 seconds]
22:41:01jacobk joins
22:41:25HackMii (hacktheplanet) joins
22:46:49superkuh quits [Remote host closed the connection]
22:47:04superkuh joins
22:48:36jtagcat6 quits [Quit: Bye!]
22:48:52jtagcat6 (jtagcat) joins
22:57:14jtagcat6 quits [Client Quit]
22:57:40jtagcat6 (jtagcat) joins
23:11:38pabs (pabs) joins
23:39:50@arkiver quits [Excess Flood]
23:40:13BlueMaxima joins
23:40:14arkiver (arkiver) joins
23:40:14@ChanServ sets mode: +o arkiver