00:05:43Arcorann (Arcorann) joins
00:14:49DogsRNice_ joins
00:17:50DogsRNice quits [Ping timeout: 240 seconds]
00:40:02jasons (jasons) joins
00:46:36qwertyasdfuiopghjkl quits [Remote host closed the connection]
01:02:45DogsRNice__ joins
01:04:24qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
01:06:47DogsRNice_ quits [Ping timeout: 272 seconds]
01:19:28Mateon2 joins
01:21:21Mateon1 quits [Ping timeout: 272 seconds]
01:21:21Mateon2 is now known as Mateon1
01:35:55jasons quits [Ping timeout: 272 seconds]
01:48:54<h2ibot>Pokechu22 edited Deathwatch (+122, /* 2024 */ virusradar.com (AB job in progress…): https://wiki.archiveteam.org/?diff=51577&oldid=51576
02:39:18jasons (jasons) joins
02:45:35Naruyoko5 quits [Ping timeout: 272 seconds]
02:49:23<@JAA>The SAP Q&A/blog migration is complete. It looks like answers.sap.com started redirecting to community.sap.com at around 2024-01-25 21:35 (or at least that's when my 403s stopped and I got 301s instead).
02:50:41<@JAA>They have redirects in place, but not for all URLs that were valid previously.
02:53:36<fireonlive>:/
03:08:14Naruyoko joins
03:18:15fuzzy8021 quits [Read error: Connection reset by peer]
03:20:32fuzzy8021 (fuzzy8021) joins
03:21:10fuzzy8021 quits [Read error: Connection reset by peer]
03:21:36fuzzy8021 (fuzzy8021) joins
03:37:20jasons quits [Ping timeout: 240 seconds]
04:40:46jasons (jasons) joins
04:59:36DogsRNice__ quits [Read error: Connection reset by peer]
05:16:45BlueMaxima quits [Client Quit]
05:37:20jasons quits [Ping timeout: 240 seconds]
06:07:50<h2ibot>Pokechu22 edited Rumble (+743, a bit of info on embeds and regular videos): https://wiki.archiveteam.org/?diff=51578&oldid=50648
06:38:26cas joins
06:39:44<cas>https://www.reddit.com/r/DanmeiNovels/comments/19eld53/bilibili_comics_international_app_is_shutting/ I got words that bilibili comics is shutting down soon on Feb 29 2024
06:40:04<cas>I wonder if AT can afford to save and archive its content
06:40:48jasons (jasons) joins
06:42:45<cas>https://www.reddit.com/r/Piracy/comments/19d8py9/a_bulgarian_videosharing_platform_is_going_to/ there's also this bulgarian videosharing platform that's going to delete everything it has in a month too. Here's the link to it. https://www.vbox7.com/ I recall I sent messages about this already tho
06:43:31<cas>Ok I see there's an article about the site, disregard my message about vbox7
07:25:24cas quits [Remote host closed the connection]
07:27:44parfait (kdqep) joins
07:36:50jasons quits [Ping timeout: 240 seconds]
07:42:09<h2ibot>Pokechu22 edited Rumble (+64): https://wiki.archiveteam.org/?diff=51579&oldid=51578
07:56:33Ruthalas59 quits [Ping timeout: 272 seconds]
08:12:24qwertyasdfuiopghjkl quits [Remote host closed the connection]
08:40:25jasons (jasons) joins
09:04:29Ruthalas59 (Ruthalas) joins
09:35:50jasons quits [Ping timeout: 240 seconds]
10:00:04Bleo18260 quits [Client Quit]
10:01:22Bleo18260 joins
10:39:20jasons (jasons) joins
10:42:23qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
11:06:50programmerq quits [Ping timeout: 240 seconds]
11:41:23jasons quits [Ping timeout: 272 seconds]
11:50:47driib quits [Quit: The Lounge - https://thelounge.chat]
11:53:48murmur quits [Read error: Connection reset by peer]
11:53:50murmur joins
12:02:57programmerq (programmerq) joins
12:09:48driib (driib) joins
12:21:49driib quits [Client Quit]
12:22:52driib (driib) joins
12:44:07jasons (jasons) joins
13:09:50Arcorann quits [Ping timeout: 240 seconds]
13:12:56h3ndr1k quits [Client Quit]
13:14:56h3ndr1k (h3ndr1k) joins
13:15:18<h2ibot>Bzc6p edited Vbox7 (+5, Status: this is more like a special case): https://wiki.archiveteam.org/?diff=51580&oldid=51564
13:23:07Megame (Megame) joins
13:44:53jasons quits [Ping timeout: 272 seconds]
14:21:12Inti83 joins
14:21:13<eggdrop>[tell] Inti83: [2023-12-06T07:04:23Z] <thuban> all of the sites on the argentina wiki page have been submitted to archivebot; you can monitor running jobs at http://archivebot.com/ and retrieve finished ones from https://archive.fart.website/archivebot/viewer/
14:21:14<eggdrop>[tell] Inti83: [2023-12-06T07:04:29Z] <thuban> note that a job succeeding does not necessarily mean the site was adequately captured (if eg there is heavy use of javascript)
14:22:26<Inti83>Hi there I am back just to let you know Educ.ar was taken down. I added the youtube channel to the wiki, just in case they take that down as well. Unfortunately the archived site seems to be broken, probably because of heavy use of JS :/
14:23:43<Inti83>We downloaded quite a few sites with grab-site, but downloaders are a bit worried about the metadata giving away their identities. Do you know if there is a way of parsing warcs to change, for instance, the home directory? Without corrupting the integrity of the warc?
14:24:15Shjosan quits [Client Quit]
14:24:31Shjosan (Shjosan) joins
14:33:25<thuban>Inti83: hello! looks like your last edit is still in the moderation queue, but if you join #down-the-tube you can queue the youtube channel yourself
14:35:23<thuban>as for your question, are you concerned specifically about the directory paths in the warcinfo?
14:44:04<yzqzss>Hi,
14:44:04<yzqzss>We're archiving a Chinese painting app called 画吧(huabar).It will be shut down on 2024-02-08.
14:44:19<yzqzss>It has a total of ~19,000,000 valid painting ids(noteid). the project files and images of the paintings add up to 10-13 TiB. we have downloaded 70% of them, and the rest will be done in 3 days.
14:44:38<yzqzss>We are considering uploading archive to IA.This is technically possible, but we are not sure if the 10 TiB+ data is acceptable for IA? Any experience/suggestions?
14:45:14<yzqzss>https://wiki.saveweb.org/画吧
14:45:14<yzqzss>https://wiki.saveweb.org/en:画吧
14:48:06jasons (jasons) joins
14:57:25inti8365 joins
14:57:35inti8365 quits [Remote host closed the connection]
14:58:24<Inti83>@thub
14:58:45<Inti83>thuban: hi, sorry am not used to irc
14:59:08<Inti83>Yes, I traced down the date of the warc the bot created and then looked up that date
14:59:14<Inti83>in archive.org
14:59:19<Inti83>I'm not sure if that's how it works
14:59:28<Inti83>I didn't download the warcs
14:59:32Iki1 joins
15:00:00<Inti83>But most of the front-page links are broken. I should look into the actual warc that the bot created, perhaps try it out on replay.page
15:01:49<yzqzss><yzqzss> "Hi,..." <- > <@yzqzss:matrix.org> Hi,
15:01:50<yzqzss>> We're archiving a Chinese painting app called 画吧(huabar).It will be shut down on 2024-02-08.
15:01:50<yzqzss>edit: it's a painting app from China, not "Chinese painting" app
15:03:25Iki quits [Ping timeout: 272 seconds]
15:04:57<thuban>Inti83: the job generated a lot of data (https://archive.fart.website/archivebot/viewer/job/2023120605514614pkg), so hopefully the relevant pages have been saved even if the front-page links don't work; you may be able to find them through the menus
15:06:16<Inti83>thuban: cool, yeah I saw all the files, will check thanks!
15:06:20Megame quits [Ping timeout: 240 seconds]
15:07:03qwertyasdfuiopghjkl quits [Remote host closed the connection]
15:10:25qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
15:10:26qwertyasdfuiopghjkl quits [Excess Flood]
15:11:45qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
15:18:45<h2ibot>Switchnode edited Deathwatch (+412, /* 2024 */ add huabar): https://wiki.archiveteam.org/?diff=51581&oldid=51577
15:26:00<thuban>yzqzss: nice work! your question is probably more appropriate for #internetarchive, but my _guess_ is that if you contact them in advance it will be fine
15:26:46<thuban>(a lot of archiveteam projects are around that size; some are much bigger)
15:31:02Inti83 quits [Ping timeout: 265 seconds]
15:47:45jasons quits [Ping timeout: 272 seconds]
15:55:14<yzqzss>thuban: OK, I'll contact info@IA.
16:03:25<@JAA>yzqzss: Talk to arkiver about it! :-)
16:05:34<@JAA>!tell Inti83 There is no good tooling for editing WARCs (warcio might look like it, but stay away from that!), and attempting to do so is generally strongly recommended against precisely because it's so easy to corrupt something. Changing the warcinfo record would require rewriting the entire WARC(s).
16:05:35<eggdrop>[tell] ok, I'll tell Inti83 when they join next
16:06:47<thuban>JAA: so you're telling me i shouldn't pop it open in my text editor? :)
16:07:38<@JAA>thuban: Might still be a better option than warcio. :-)
16:08:25<thuban>probably is, considering the header-rewriting stuff
16:08:38<yzqzss><JAA> "yzqzss: Talk to arkiver about it..." <- yeap, I PMed arkiver days ago, but he was traveling (https://irclogs.archivete.am/archiveteam-bs/2024-01-26#lf158747a) and I didn't receive a reply. lol.
16:08:50<@JAA>Assuming your editor doesn't try to convert to UTF-8 or similar things, at least, yeah.
16:09:25<@JAA>yzqzss: Right, yeah, just try again. :-) You'll have more success than with info@ in my experience.
16:15:51<thuban>in all seriousness i did almost suggest that, with appropriate limitations and caveats. but i didn't want to jump the gun when i wasn't entirely clear on what was being asked...
16:16:15<thuban>(did you know about http://purl.org/dc/terms/provenance ? :o)
16:45:10<@arkiver>yzqzss: i missed your message indeed
16:50:44jasons (jasons) joins
16:53:32<@arkiver>(it was through discord)
16:54:06<@arkiver>just as a note to everyone - the most reliable way to reach me is at arkiver@protonmail.com , or IRC. if I don't reply on something like Discord, please send me an email instead
17:00:31Inti83 joins
17:00:32<eggdrop>[tell] Inti83: [2024-01-27T16:05:34Z] <JAA> There is no good tooling for editing WARCs (warcio might look like it, but stay away from that!), and attempting to do so is generally strongly recommended against precisely because it's so easy to corrupt something. Changing the warcinfo record would require rewriting the entire WARC(s).
17:11:15Megame (Megame) joins
17:13:26bladem quits [Quit: Leaving]
17:38:11EmeraldSnorlax|m is now known as rain|m
17:42:43<Inti83>eggdrop: if a site is now archived in arhive.org, how can we use the warcs? Can we download them to use them or process them? Like with warc2html if needed? Is there any way to navigate the warc files?
17:43:02Shjosan quits [Client Quit]
17:43:32Shjosan (Shjosan) joins
17:50:21<Inti83>Another question, if the bot creates this warc www.educ.ar-inf-20231206-055146-14pkg-00000.warc.gz Does this correspond with an entry de for 6th ofdecember 5.51 in archive.org?
17:52:09<nulldata>Inti83 - eggdrop is a bot relaying a message from JAA made while you were gone
17:52:38<Inti83>Ahh thanks for letting me know nulldata
18:04:09<@JAA>thuban: Yeah, there are lots of issues with the approach (e.g. also line endings), so I would never recommend it, but it can work. Compression is a different beast though.
18:05:18<thuban>mmm
18:06:06<@JAA>Inti83: You can download ArchiveBot WARCs, sure. I'm not familiar with the quirks of the tooling to dump the contents into static files or similar, except that it's a bit of a pain. Personally, when I've needed WARC playback locally, I've used pywb in the past, which worked well enough (although it inherits warcio's problems, but if you don't care too much about header accuracy, it's acceptable for
18:06:12<@JAA>playback).
18:06:57<@JAA>AB job data is spread over items in the collection, but you can find all the files for a job through the AB viewer.
18:07:15<@JAA>In this case: https://archive.fart.website/archivebot/viewer/job/2023120605514614pkg
18:11:27<thuban>the quickest way to play back a warc is https://replayweb.page/; i believe unar/"The Unarchiver" can extract files if you need to do that for some reason
18:14:07<@JAA>Will it rewrite the href and src attributes to match the structure after extraction though?
18:15:57Megame quits [Ping timeout: 272 seconds]
18:17:26Megame (Megame) joins
18:18:10<thuban>no, it extracts files.
18:18:22Megame quits [Remote host closed the connection]
18:19:39Megame (Megame) joins
18:21:37<@JAA>Yeah, as expected.
18:21:54<@JAA>So it's virtually useless for playback unless it's all media or similar.
18:22:12<thuban>that's why i didn't recommend it for playback!
18:22:36<nicolas17>Inti83: WARCs that were archived using archivebot are available on wayback machine, so it's rare that you need to download them and extract them yourself
18:22:45<@JAA>The entire WARC ecosystem is so awkward to work with...
18:25:36Shjosan quits [Client Quit]
18:25:52Shjosan (Shjosan) joins
18:29:14Inti83 quits [Remote host closed the connection]
18:34:12katia_ (katia) joins
18:34:49katia_ quits [Remote host closed the connection]
18:45:12katia_ (katia) joins
18:46:22Megame quits [Remote host closed the connection]
18:46:44Megame (Megame) joins
19:01:07decky_e_ quits [Read error: Connection reset by peer]
19:01:10katia_ quits [Remote host closed the connection]
19:01:27decky_e_ joins
19:02:34decky joins
19:03:22nyany_ quits [Quit: (516): and then you went into taco bell without pants...and surprisingly you weren't the only one there without pants]
19:03:32nyany (nyany) joins
19:05:50decky_e_ quits [Ping timeout: 240 seconds]
19:09:25itachi1706 quits [Quit: Bye :P]
19:11:15itachi1706 (itachi1706) joins
19:26:34Wohlstand (Wohlstand) joins
19:35:42Megame quits [Client Quit]
19:43:36alpine joins
19:44:50jasons quits [Ping timeout: 240 seconds]
19:55:39alpine quits [Remote host closed the connection]
19:59:09Shrinks99 joins
19:59:34<Shrinks99>Gah, apologies, maybe there's a few other links on the wiki that should be updated to this channel :P
19:59:44Alyssa joins
19:59:50<Alyssa>BS = bullshit?
20:00:17<Alyssa>Can we substitute "probabilistic epistemology"?
20:01:54<thuban>we've discussed changing the channel names. but perhaps it _would_ make sense to change that front page link...
20:02:14<Alyssa>Hi, thuban :)
20:02:28<Alyssa>I wanna help you guys.
20:02:35<Alyssa>I have a few ideas.
20:02:54<Alyssa>Wanna wget -mb some stuff with me?
20:03:28<Alyssa>Btw, I know this was started in January 2009 by SketchCow.
20:03:48<Alyssa>I think we can really make the Internet Archive famous
20:04:01<Alyssa>It's the only reliable way to establish intellectual property.
20:04:27<@JAA>Either 'bullshit' or 'bikeshed', depending on who you ask.
20:04:40<Alyssa>JAA: Hi *-*
20:04:54<@JAA>It used to be the offtopic channel, nowadays it's the archival discussion channel, and -ot is for offtopic.
20:05:03<Alyssa>Ok.
20:05:33<Alyssa>Basically, I think we should archive music that's in danger of being deleted off of youtube.
20:05:43<Alyssa>The problem, of course, is copyright law
20:05:58<Alyssa>And despite my autodidactic BA in legalese
20:06:04JC|m leaves
20:06:06<Alyssa>I can't really find the right way to do it.
20:06:11<thuban>Shrinks99: we don't have any such rules that i'm aware of, although perhaps people have Opinions. what did you want to update?
20:06:43<Shrinks99>Better description for ReplayWebpage, docs link, contributor count update
20:07:17<thuban>go right ahead, imho
20:07:21<Shrinks99>Would also want to add ArchiveWeb.page, our extension for interactive archiving of pages in Chrome
20:07:22<Alyssa>https://www.youtube.com/watch?v=5EZRA-KQx58
20:08:18<@JAA>Yeah, I'm curious if that can actually comply with the WARC spec, but I haven't taken a close look at it yet.
20:08:30<Shrinks99>Both of the above support WARCs, Would maybe add an entry for Browsertrix Crawler — though I'm not sure if it actually supports WARC files directly (you can always extract them from the WACZs)
20:08:33<@JAA>If it uses the APIs rather than MITM proxying, I don't think it can.
20:09:11<nicolas17>Alyssa: archive.org is not immune to the DMCA, and big recording labels like UMG *will* send takedown requests
20:10:03<@JAA>Feel free to add it though. Probably leave the 'recommendation' column as a question mark unless you're familiar with our position on WARC spec compliance etc.
20:10:15<Shrinks99>AFAIK ArchiveWeb.page is spec complaint? But as our UX designer admittedly that's not the area I have the most knowledge about heh
20:10:22<Shrinks99>Yeah, will leave the recommendation column to you
20:10:46<@JAA>The big thing is that WARC needs to preserve the data exactly as it is sent by the server.
20:11:12<@JAA>If you use browser APIs to get the response headers and then combine that back into lines, that may not be what the server sent.
20:11:17<Alyssa>nicolas17: Good to know ;)
20:11:19<@JAA>So anything that does that does not comply with the spec.
20:11:33<Shrinks99>Yeah, unsure
20:11:51<@JAA>warcio has similar problems, as you may have seen mentioned on that page.
20:12:30<Shrinks99>I'm guessing WARCIT is also out of the question then haha
20:13:01<@JAA>Oh, that thing, yes, definitely.
20:13:30<@JAA>But putting it on the page is still useful so we can put a big red warning next to it. :-)
20:13:47<Shrinks99>Fair enough ;P
20:14:14<Shrinks99>FWIW I wouldn't recommend it RN either, I don't think it even writes WARC records correctly?
20:17:45<h2ibot>Inti83 edited Argentina (+175, Add Educ.ar youtube channel): https://wiki.archiveteam.org/?diff=51587&oldid=51435
20:17:46<h2ibot>CreaZyp154 edited URLTeam/Warrior (+249, Added note for is.gd and v.gd erroring on URL…): https://wiki.archiveteam.org/?diff=51582&oldid=50413
20:17:47<h2ibot>CreaZyp154 edited URLTeam/Dead (+149, Fixed example links for zud.me and checked and…): https://wiki.archiveteam.org/?diff=51583&oldid=51338
20:17:48<h2ibot>CreaZyp154 edited URLTeam (+998, Added a few shorteners): https://wiki.archiveteam.org/?diff=51584&oldid=51492
20:17:49<h2ibot>CreaZyp154 edited List of website hosts (+179, Added Free Web Hosting Area): https://wiki.archiveteam.org/?diff=51585&oldid=51565
20:17:50<h2ibot>CreaZyp154 edited List of websites excluded from the Wayback Machine/Partial exclusions (+99, …): https://wiki.archiveteam.org/?diff=51586&oldid=51538
20:20:45<thuban>JAA: i'm not sure whether it can, but it definitely doesn't https://github.com/webrecorder/archiveweb.page/blob/main/src/requestresponseinfo.js
20:20:49<@JAA>Essentially, feel free to add any software that interacts with WARCs to that page. I'd recommend keeping the description neutral, but otherwise, anything should be there.
20:22:12<@JAA>thuban: Yeah, that's about what I expected. The outcome from the whole crocoite/chromebot debacle was that it's impossible to do inside the browser. That's likely also why brozzler uses warcprox.
20:22:59<@JAA>I doubt anything changed in the browser APIs in the past couple years to allow raw HTTP data access.
20:23:10<@JAA>There's also the part where WARC can only store HTTP/1.1, not HTTP/2.
20:23:36<thuban>life sure is complicated
20:23:54<@JAA>And some webrecorder software writes fake HTTP/1.1 records to get around that IIRC.
20:24:17<@JAA>(Which, naturally violates the spec.)
20:24:38<Shrinks99>Wouldn't be surprised, I know in the past Ilya has tried to push for updates to the WARC spec and run into a brick wall to the point where we just made our own file format that builds upon WARCs
20:25:15<fireonlive>life bad :(
20:25:22<@JAA>Sounds about right, and he's also not very concerned about data integrity from the discussions I've had with him on GitHub issues.
20:26:09<Shrinks99>I wouldn't say "not concerned"... There's a reason we have a while spec for cryptographic signing of archives ;)
20:26:10<@JAA>I'm not sure warcio was *ever* compliant with the spec.
20:26:26<@JAA>But it certainly hasn't been for years now.
20:27:30<@JAA>https://github.com/webrecorder/warcio/issues/128 and https://github.com/webrecorder/warcio/issues/129 immediately break that. The former has been in the code since at least 2018.
20:28:00inedia quits [Quit: WeeChat 4.1.2]
20:28:20<@JAA>And Ilya's replies in 128 make it quite clear what his stance is, I'd say.
20:28:43DogsRNice joins
20:29:36<@JAA>To be clear, I'm referring to the contents of the WARC, not the integrity of the WARC after capture.
20:29:48<@JAA>The latter is irrelevant if the former is broken, in my opinion.
20:29:53<thuban>he did agree with you in the end, didn't he? it's just that it's not, you know, actually been changed
20:30:12<@JAA>He did, but it required a lot more convincing than it should have, and yeah, still unfixed.
20:30:37<Shrinks99>I can maybe offer some insight as to why it's still unfixed which is that warcio hasn't been the priority for us for a while :P
20:30:41<@JAA>(I believe I also discussed this elsewhere with him at the time.)
20:31:16inedia (inedia) joins
20:31:24<@JAA>Right, it's been more about accessibility?
20:31:36<@JAA>Or user friendliness, or whatever you want to call it.
20:31:44<Shrinks99>Wasn't around in 2021 so I don't have first-hand insights, but we'd probably be open to a PR? ...Though there's plenty in the repo that are newer and also unanswered :\
20:32:21<Shrinks99>RN the priority is "high fidelity capture" which — putting aside spec compliance — browsers appear to be much better at
20:33:10<Shrinks99>And of course on my side, yeah, making web crawling & archiving tools more accessible with better UX
20:34:07<@JAA>Yeah, that's exactly the thing though. 'Putting aside spec compliance' is not something that should ever cross the mind when working on archival. Spec compliance and integrity is paramount to preservation, otherwise you can't be sure whether what is archived is correct.
20:34:37<thuban>yes, interesting notion of "fidelity", although of course i see what you mean :P
20:34:56<Shrinks99>Well... I might argue that you can't ensure that either way because not everything sent to browsers over HTTP is content-addressed / signed
20:35:00<@JAA>It's great to work on those things. But it has to happen within the restrictions of correctly preserving the data, too.
20:35:14<Shrinks99>But I hear your concerns
20:35:19<@JAA>True, but no such issue with HTTPS, mostly.
20:35:37<@JAA>Unfortunately, you can never prove that the data wasn't modified by the crawler, of course.
20:35:45<Shrinks99>Yeah, that's the big problem
20:36:00<Shrinks99>and until that's in the spec, IMO, the best you can do is give a good paper trail of who created the archive
20:36:16<@JAA>It can't be in the spec, because it's impossible with current technology.
20:36:23<@JAA>TLS would have to be redesigned for it.
20:36:27<Shrinks99>yes, the TLS spec
20:36:36<@JAA>And that's never going to happen because it's not a design goal of TLS.
20:36:50<Shrinks99>Yep!
20:37:05<thuban>...and make sure that the people who created the archive have a reputation for being real sticklers about correctness!
20:37:22<@JAA>And even if it were, you still can't guarantee the data wasn't created later, after the key was leaked or similar.
20:38:00<Shrinks99>heh, well, that's the thing with providing the record of who created it — the viewer gets to judge if the archive is any good or not based on if they trust the software & user who created it
20:38:17<Shrinks99>So if Webrecorder's aren't good enough for you, you can make that informed decision!
20:38:29<@JAA>Indeed
20:39:11<@JAA>Almost all WARC software writes a warcinfo record with the relevant details. :-)
20:39:57<Shrinks99>*almost* https://github.com/webrecorder/browsertrix-crawler/issues/452
20:40:00<Shrinks99>:P
20:40:22<@JAA>lol
20:40:23<thuban>:(
20:40:56@JAA is not surprised.
20:44:46<Shrinks99>IDK if that issue is actually correct, I'm pretty sure Browsertrix supports writing warcinfo records
20:45:11<Shrinks99>ah but maybe not if there's multiple warcs
20:48:37jasons (jasons) joins
20:48:50<Shrinks99>Okay well, in my wiki edit for ArchiveWebpage, I'm noting (in yellow) that "Because ArchiveWeb.page intercepts the browser's network requests, it may not write fully spec-compliant WARC files."
20:49:39<Shrinks99>Because I don't want to leave this here and not elaborate in case it doesn't get further updates, but also edit later if that's not accurate enough for ya?
20:50:00<@JAA>I'll give it a close look later and edit as necessary. :-)
20:50:09<Shrinks99>Great, TY :)
20:50:33<Shrinks99>Ah it gets sent to review
20:50:43<Shrinks99>well I suppose everything will get sorted out then!
20:50:46<@JAA>Yeah, beecause spam.
20:51:04<@JAA>Not a content review (except in extreme cases).
20:51:52<h2ibot>Shrinks99 edited The WARC Ecosystem (+466, Added up to date data about ReplayWebpage,…): https://wiki.archiveteam.org/?diff=51588&oldid=51519
20:52:07line quits [Remote host closed the connection]
20:52:09<nicolas17>who wants to pay me to make a pcap to warc conversion tool? :p
20:52:26line joins
20:54:41<@JAA>> who wants ... me to make a pcap to warc conversion tool?
20:54:43<@JAA>Yes please!
20:54:44<@JAA>:-P
20:55:20<fireonlive>:P
20:55:51line quits [Remote host closed the connection]
20:55:53<h2ibot>Shrinks99 edited The WARC Ecosystem (+3, Fixes formatting, also updates PYWB link): https://wiki.archiveteam.org/?diff=51589&oldid=51588
20:56:10<Shrinks99>(I fucked up the table formatting oof)
20:56:44<TheTechRobo>Ugh, my internet being spotty caused me to fuck it up again. lol
20:56:53<h2ibot>TheTechRobo edited The WARC Ecosystem (+1, fix ArchiveWeb.page being on the same line as…): https://wiki.archiveteam.org/?diff=51590&oldid=51589
20:56:59<TheTechRobo>lol
20:57:33<Shrinks99>Forgot the pipe characters
20:58:23line joins
20:58:24line quits [Remote host closed the connection]
20:58:54<h2ibot>TheTechRobo edited The WARC Ecosystem (+1, Really fix formatting): https://wiki.archiveteam.org/?diff=51591&oldid=51590
20:59:16<nicolas17>Preview button
20:59:54<h2ibot>JustAnotherArchivist changed the user rights of User:Shrinks99
20:59:55<h2ibot>JustAnotherArchivist changed the user rights of User:CreaZyp154
21:00:26<TheTechRobo>Yeah, I did preview my edit first, but I think S.hrinks99 did an edit at the same time as me causing weirdness, then it took awhile for the edit to submit
21:00:49<Shrinks99>You're not going to like this but I also submitted a change that added the pipe :P
21:00:54<h2ibot>JustAnotherArchivist changed the user rights of User:Inti83
21:00:55<Shrinks99>Either way, should be sorted now
21:00:55line joins
21:01:31<TheTechRobo>Hahaha I should have left it up to you :P
21:01:37line quits [Remote host closed the connection]
21:01:49<TheTechRobo>s/up to/for/
21:02:33line joins
21:03:51line quits [Remote host closed the connection]
21:05:22line joins
21:06:30line quits [Remote host closed the connection]
21:06:55line joins
21:08:35<Shrinks99>Alrighty well, signing off for now, @JAA I'll pass along your thoughts and tag these issues but can't promise fixes any time soon — I think we're open to PRs tho!
21:09:40<fireonlive>Shrinks99: thanks for stopping by! hope to see you back again
21:09:45line quits [Remote host closed the connection]
21:09:50<Shrinks99><3
21:09:54Shrinks99 quits [Client Quit]
21:10:12<fireonlive>i was about to say we’re in the same open to PR pickle but too late haha
21:10:41line joins
21:12:57<h2ibot>TheTechRobo edited Strawpoll.me (+77, Add comments URL): https://wiki.archiveteam.org/?diff=51592&oldid=51408
21:13:04<TheTechRobo>Two questions:
21:14:25<TheTechRobo>1. Should the wiki page on the wiki's Twitch page be updated to #burnthetwitch, or is that channel just for my bot?
21:14:32<TheTechRobo>(I forgot my second question, lol)
21:16:10line quits [Remote host closed the connection]
21:17:01line joins
21:19:58<h2ibot>TheTechRobo edited Twitch.tv (+27, Add link to Archives section in infobox): https://wiki.archiveteam.org/?diff=51593&oldid=51474
21:22:26line quits [Remote host closed the connection]
21:22:46line joins
21:24:38line quits [Remote host closed the connection]
21:25:48line joins
21:28:31line quits [Remote host closed the connection]
21:29:01line joins
21:34:36line quits [Remote host closed the connection]
21:34:56line joins
21:35:08<pokechu22>line: please fix your connection when you get a chance
21:42:29BlueMaxima joins
21:45:20jasons quits [Ping timeout: 240 seconds]
21:46:43f_ (funderscore) joins
21:59:42line quits [Remote host closed the connection]
22:00:03line joins
22:05:29line quits [Remote host closed the connection]
22:05:50line joins
22:11:38<line>pokechu22: yes, reconfigured router, sorry. fixed now
22:12:10<fireonlive>:)
22:13:12Arcorann (Arcorann) joins
22:15:21tertu2 quits [Ping timeout: 272 seconds]
22:17:11tertu (tertu) joins
22:17:56evan quits [Remote host closed the connection]
22:17:56c3manu quits [Read error: Connection reset by peer]
22:17:56shreyasminocha quits [Remote host closed the connection]
22:17:56thehedgeh0g quits [Remote host closed the connection]
22:17:59evan joins
22:18:02thehedgeh0g (mrHedgehog0) joins
22:18:02shreyasminocha (shreyasminocha) joins
22:18:02c3manu (c3manu) joins
22:22:46<fireonlive>all: the situation with nitter is looking dire and all nitter instances will probably stop working in ~2-3 weeks for an unknown period of time. given this, if you have any accounts you'd like archived burning a hole in your pocket (or mean to look for some) please add them to https://pad.notkiska.pw/p/archivebot-twitter [give the notes a quick read
22:22:46<fireonlive>if you haven't, check if they're not on the list already please] so they can be run them while nitter is still functional... :c
22:22:46<fireonlive>if you don't have access to AB yourself please leave a reason after the username in parens (i.e. (died, bankrupt, notable, whatever)) for archival and don't forget to set your name in the pad
22:22:51<fireonlive>further reading about nitter itself in the recent comments of https://github.com/zedeus/nitter/issues/983 if curious
22:23:54<fireonlive>also at some point earlier eggdrop will probably stop nitterizing every twitter link for similar reasons :p
22:26:59bladem (bladem) joins
22:30:30<@HCross>Can we not put random Nitter instances through AB please
22:30:35<@HCross>that'll just be nasty
22:31:14<fireonlive>HCross: indeed, we're using a special one for AT hosted by Barto
22:31:56<@HCross>I still don't think that's an amazing idea unless you're using different tokens
22:32:09<@JAA>Each instance has its own tokens, yes.
22:32:41<@HCross>Oh good
22:33:08<fireonlive>=]
22:38:57<Barto>HCross: you're somewhat not wrong, and we tried to stay on a single instance as much as we could.
22:39:24<Barto>i would have loved to not have done this, but rich electric imposteur decided the opposite
22:42:30<fireonlive>(thanks Barto by the way!)
22:46:51<Barto>fireonlive: i really feel like the smallest gear in this whole machinery
22:48:08<fireonlive>infrastructure providers/upkeepers need love too :D
22:49:14jasons (jasons) joins
22:50:28<Barto>this setup is literally running under a chair, on the floor. Can't get more ghetto than that (or shall we call it cyberpunk?).
22:50:58<Barto>it's a lovely small Odroid H2+, with 16GB of ram
22:58:49<fireonlive>i couldn't think of anything more archiveteam
22:58:52<fireonlive>:3
22:59:19<DigitalDragons>i was gonna say, isn't jank the norm here?
23:00:24<Barto>:-)
23:02:39<fireonlive>:)
23:04:04<@JAA>It's a requirement.
23:26:36<fireonlive>Zed (creator of nitter): Guest accounts have been removed, they weren't just left to believe that. With real accounts getting rate limited immediately and likely banned, I don't see any path forward for Nitter. ~ https://github.com/zedeus/nitter/issues/983#issuecomment-1913362376
23:32:12<DogsRNice>just scrape using an account that spams fake bitcoin stuff so it wont be banned
23:33:17<@JAA>Not dogecoin?
23:38:19<fireonlive>soooo many porn bots
23:38:23<fireonlive>and they're all straight >:|
23:46:20jasons quits [Ping timeout: 240 seconds]