00:13:47HackMii quits [Remote host closed the connection]
00:14:06HackMii (hacktheplanet) joins
00:47:15etnguyen03 quits [Client Quit]
01:11:30h|ca2 (h) joins
01:15:04xkey quits [Quit: WeeChat 4.7.2]
01:16:16xkey (xkey) joins
01:22:56h|ca2 quits [Client Quit]
01:27:10h|ca2 (h) joins
01:27:17etnguyen03 (etnguyen03) joins
01:38:28h|ca2 quits [Client Quit]
01:38:46h|ca2 (h) joins
02:38:42nulldata-alt9 (nulldata) joins
02:40:51nulldata-alt quits [Ping timeout: 272 seconds]
02:40:51nulldata-alt9 is now known as nulldata-alt
02:43:56<steering>i can't think of any problem with a http/1.1 browser trying to talk to an http/1.0 server *except* if the server actually checked the version in the request :P
02:49:43agtsmith quits [Ping timeout: 272 seconds]
02:55:23<h2ibot>PaulWise edited YouTube (+376): https://wiki.archiveteam.org/?diff=58688&oldid=58472
02:55:50steering can't really find confirmation but assumes browsers dont send chunked requests in practice
02:57:37<nicolas17>right, chunked requests are rare in general
02:58:00<nicolas17>but I think the problem here was sending a chunked response to an HTTP1.0 request?
02:58:37agtsmith joins
02:59:27<steering>yeah the actual error is probably a chunked response. just, in theory, if a browser sent a chunked request body it wouldn't be compatible with http/1.0 servers.
02:59:46<steering>other than that the only difference is some extra headers that the http/1.0 server should just ignore
03:16:56etnguyen03 quits [Client Quit]
03:18:55etnguyen03 (etnguyen03) joins
03:20:43egallager quits [Quit: Leaving]
03:40:24AlsoHP_Archivist joins
03:40:26etnguyen03 quits [Remote host closed the connection]
03:40:29PredatorIWD255 joins
03:42:18PredatorIWD25 quits [Ping timeout: 256 seconds]
03:42:18PredatorIWD255 is now known as PredatorIWD25
03:44:34HP_Archivist quits [Ping timeout: 256 seconds]
03:45:27sg72 quits [Ping timeout: 272 seconds]
03:55:55AlsoHP_Archivist quits [Client Quit]
03:56:10HP_Archivist (HP_Archivist) joins
03:56:15ericgallager joins
04:11:52v01d joins
04:17:25sg72 joins
05:12:58v01d quits [Ping timeout: 256 seconds]
05:13:57<nulldata>*sigh* looks like a couple of guys took stuff from my Rockstar Games archive to make their own archive site and give zero credit.
05:15:22DogsRNice quits [Read error: Connection reset by peer]
05:20:13<nulldata>I wouldn't even be so mad if they didn't have a credits page giving themselves pats on the back for all their "efforts"
05:20:19<nulldata>https://rockstar-archive.github.io/credits.html
05:38:08andrewnyr quits [Quit: Ping timeout (120 seconds)]
05:39:23andrewnyr joins
05:45:55<hexagonwin>another south korean webtoon platform shutting down at the end of this year.. https://www.myktoon.com/web/support/notice_view.kt?noticeseq=1636&currentPageNumber=1
05:48:47<h2ibot>Brad edited Deathwatch (-1895, Added iDIN and corrected datetime year of Dutch…): https://wiki.archiveteam.org/?diff=58689&oldid=58634
05:49:12<@JAA>klea, cruller: ^ The Edit™ went through this time.
05:49:47<h2ibot>Hexagonwin edited 짱공유닷컴 (+205, Mention data loss for last few posts due to…): https://wiki.archiveteam.org/?diff=58690&oldid=58614
05:51:47<h2ibot>JustAnotherArchivist edited Deathwatch (+213, Merge edit by…): https://wiki.archiveteam.org/?diff=58691&oldid=58689
05:59:44chrismeller0 (chrismeller) joins
06:01:36nexussfan quits [Read error: Connection reset by peer]
06:01:42chrismeller quits [Ping timeout: 256 seconds]
06:01:42chrismeller0 is now known as chrismeller
06:01:56nexussfan (nexussfan) joins
06:20:31nexussfan quits [Client Quit]
06:35:58<klea>interesting migration: https://github.com/rockstar-archive/rockstar-archive.github.io/commit/86d5d329c55e59a6700f0c9abe91032ffd06dcd5 <- https://github.com/the-rg-archive/the-rg-archive.github.io
06:36:48<klea>JAA: ACK
06:36:52klea has to go away soon-ish
06:51:39CYBERDEV quits [Ping timeout: 272 seconds]
06:54:14<cruller>JAA: Maybe I can fix it soon.
06:54:26<cruller>CC klea
06:56:10CYBERDEV joins
06:58:05<BlankEclair>https://archive.org/details/not-your-parents-web?tab=about
06:58:10<BlankEclair>huh, what's this collection?
07:04:19Suika_ joins
07:04:57Suika quits [Ping timeout: 272 seconds]
07:06:32DopefishJustin quits [Remote host closed the connection]
07:14:44DopefishJustin joins
07:17:44dendory3 joins
07:19:31dendory quits [Ping timeout: 272 seconds]
07:19:53<nulldata>https://fil.org/blog/the-web-isn-t-forever-new-research-findings-from-not-your-parents-web-project
07:20:20<BlankEclair>thanks!
07:25:09<h2ibot>Cruller edited Deathwatch (+3764, Partially undo revision 58689 by…): https://wiki.archiveteam.org/?diff=58692&oldid=58691
07:26:09<h2ibot>Cruller edited Deathwatch (+213, Reapply the diff…): https://wiki.archiveteam.org/?diff=58693&oldid=58692
07:28:49<cruller>OK. I'm checking https://wiki.archiveteam.org/index.php?title=Deathwatch&type=revision&diff=58693&oldid=58634 now.
07:43:05<cruller>Precisely, edit 58692 was "revert" rather than "undo"? Because it didn't include the newer diff.
07:46:29<cruller>Well, it's no big deal. Anyway, it's impossible to describe such a slightly complex operation in a single word.
08:41:34flotwig quits [Read error: Connection reset by peer]
08:45:18<hexagonwin>it seems like https://namu.wiki and https://arca.live (same operator) is kinda in danger, but they have very strict bot detections (mainly cloudflare)
08:45:53<hexagonwin>for arca.live i do get far less captchas with an account though (almost none?), how does archiveteam usually handle cases like this?
08:47:16flotwig joins
08:49:46NeonGlitch_ (NeonGlitch) joins
08:50:34NeonGlitch quits [Ping timeout: 256 seconds]
09:35:17Webuser686268 joins
09:35:22Webuser686268 leaves
09:54:10Wohlstand (Wohlstand) joins
09:55:32twiswist_ (twiswist) joins
09:58:50chrismeller5 (chrismeller) joins
09:59:45twiswist quits [Ping timeout: 272 seconds]
10:01:58chrismeller quits [Ping timeout: 256 seconds]
10:01:58chrismeller5 is now known as chrismeller
10:34:14Dada joins
10:39:55<twiswist_>hexagonwin: I pointed out namu.wiki a couple months ago after hearing about infighting in the staff team(?), but nothing appears to have happened to it yet. An AB job was started and it predictably got hit by its anti-bot defenses, haven't thought about it since
10:50:52<hexagonwin>twiswist_: yeah, their anti-bot defenses are very intense. it's the largest wiki in korean language with lots of valuable information and no proper backup (think all fandom wiki combined but in korean)
10:52:52<hexagonwin>they're sorta in a legal gray-zone (thus they claim to operate in paraguay, by some paper company) so theres no guarantees of its long term existence at all
10:53:14<hexagonwin>actually pretty impressive they managed to operate for 10 years at this point
11:22:24cyanbox_ joins
11:25:53cyanbox quits [Ping timeout: 272 seconds]
11:26:24ducky quits [Ping timeout: 260 seconds]
11:27:10cyan_box joins
11:29:00ducky (ducky) joins
11:30:19cyanbox_ quits [Ping timeout: 272 seconds]
11:35:45<h2ibot>Cruller edited Alive... OR ARE THEY (+119, /* Alarm */ Add Namuwiki): https://wiki.archiveteam.org/?diff=58694&oldid=58639
11:39:22<cruller>twiswist_: I'm not sure if your source is good, but I've added Namuwiki to Fire Drill anyway. If you have other sources, please add them yourself or let someone know.
11:43:46<h2ibot>Cruller edited Deathwatch (+1, /* 2026 */ Fix ref for Schwung! app): https://wiki.archiveteam.org/?diff=58695&oldid=58693
11:44:19<hexagonwin>cruller: also need to add arca.live, it's operated by the same people. i'll just create a new wiki page for namu.wiki soon
11:46:32<cruller>arca.live has already been added to Deathwatch. And Namuwiki page exists. https://wiki.archiveteam.org/index.php/Namuwiki (stub)
11:47:24<hexagonwin>oh seems like i missed it. thanks.
11:48:19<cruller>:)
11:57:35<Dango360>BlankEclair, nulldata: the report on the NYPW data is very interesting https://ws-dl.blogspot.com/2024/09/2024-09-20-some-urls-are-immortal-most.html
12:05:32<cruller>Namuwiki and arca.live have several special considerations, so there may not have been a similar case in the past.
12:05:49<cruller>Wiki, CAPTCHAs, bypassing CAPTCHAs by logging in, and a long-term crisis with no clear deadline are all considerable.
12:06:12<hexagonwin>i'm writing a wiki page to provide related information, my english sucks so it takes some time tho
12:10:20<cruller>I use https://translate.google.com/ https://papago.naver.com/ https://www.deepl.com/ https://translate.kagi.com/ https://translate.preferredai.jp/ https://plus.miraitranslate.com/ in combination.
12:10:37<cruller>For english writing
12:10:43<hexagonwin>heh thanks a lot for sharing those
12:42:31pabs quits [Ping timeout: 272 seconds]
12:53:02VerifiedJ quits [Remote host closed the connection]
12:53:36VerifiedJ (VerifiedJ) joins
13:06:37pabs (pabs) joins
13:23:09<h2ibot>OrIdow6 edited Amino (-43): https://wiki.archiveteam.org/?diff=58696&oldid=58652
13:28:02<hexagonwin>finally finished writing that wiki page. for some reason I can't upload images :/
13:28:53<hexagonwin>"The file you uploaded seems to be empty. This might be due to a typo in the filename. Please check whether you really want to upload this file."
13:32:27<hexagonwin>also - can someone please run this http://www.haenaem.org/ on archivebot? just a random personal homepage i discovered while surfing
13:33:54benjins3 quits [Ping timeout: 256 seconds]
13:37:31lexikiq quits [Quit: Ping timeout (120 seconds)]
13:37:50lexikiq joins
14:03:47dendory3 is now known as dendory
14:03:56dendory quits [Changing host]
14:03:56dendory (dendory) joins
14:17:07sec^nd quits [Remote host closed the connection]
14:17:25sec^nd (second) joins
14:19:10<klea>cruller: ACK
14:29:27Radzig2 joins
14:32:43Radzig quits [Ping timeout: 272 seconds]
14:32:43Radzig2 is now known as Radzig
14:36:43benjins3 joins
14:42:35pedantic-darwin joins
14:59:29<nicolas17>imer: are youtube videos accumulating on the targets? how much disk space remains free?
15:08:12SootBector quits [Remote host closed the connection]
15:09:24SootBector (SootBector) joins
15:10:24<h2ibot>Hans5958 edited Adobe Aero (+94): https://wiki.archiveteam.org/?diff=58697&oldid=58460
15:10:25<h2ibot>Hans5958 edited Main Page/Current Projects (+0, Adobe Aero done): https://wiki.archiveteam.org/?diff=58698&oldid=58680
15:11:24<h2ibot>Hans5958 edited In The Media (+113, Recover (and improve) the lead sentence): https://wiki.archiveteam.org/?diff=58699&oldid=58673
15:14:25<h2ibot>Hans5958 edited In The Media (+241, Add "To Save its Content, Archive Team is…): https://wiki.archiveteam.org/?diff=58700&oldid=58699
15:16:25<h2ibot>Hans5958 edited In The Media (+184, Add "Legacy Update expands archive of vanished…): https://wiki.archiveteam.org/?diff=58701&oldid=58700
15:18:25<h2ibot>Hans5958 edited In The Media (+252, Add "The race to save our online lives from a…): https://wiki.archiveteam.org/?diff=58702&oldid=58701
15:18:26<h2ibot>Hans5958 edited In The Media (+0, Wrong way around): https://wiki.archiveteam.org/?diff=58703&oldid=58702
15:20:26<h2ibot>Manu edited Discourse/active (+63, Add community.toradex.com): https://wiki.archiveteam.org/?diff=58704&oldid=58578
15:20:27<h2ibot>Hans5958 edited In The Media (+29, /* 2017 */ Add date): https://wiki.archiveteam.org/?diff=58705&oldid=58703
15:22:26<h2ibot>Manu edited Discourse/active (+45, Add forum.openwrt.org): https://wiki.archiveteam.org/?diff=58706&oldid=58704
15:22:27<h2ibot>Hans5958 edited In The Media (+248, /* 2019 */ Add "Internet Archive team begins…): https://wiki.archiveteam.org/?diff=58707&oldid=58705
15:24:26<h2ibot>Hans5958 edited In The Media (+236, /* 2014 */ Add "Lost forever? Archive Team says…): https://wiki.archiveteam.org/?diff=58708&oldid=58707
15:27:27<h2ibot>Hans5958 edited In The Media (+6, /* 2011 */ Fix unavailable link): https://wiki.archiveteam.org/?diff=58709&oldid=58708
15:28:27<h2ibot>Hans5958 edited In The Media (+81, /* 2025 */ Readd Odysee based on earlier format): https://wiki.archiveteam.org/?diff=58710&oldid=58709
15:30:27<h2ibot>Hans5958 edited In The Media (+172, /* 2017 */ Add "A Team of Volunteers Is…): https://wiki.archiveteam.org/?diff=58711&oldid=58710
15:36:28<h2ibot>Hans5958 edited In The Media (+166, Add "Preserving The Internet's Digital Past"): https://wiki.archiveteam.org/?diff=58712&oldid=58711
15:38:28<h2ibot>Hans5958 edited In The Media (+280, /* 2009 */ Add "Yahoo is Shutting Down…): https://wiki.archiveteam.org/?diff=58713&oldid=58712
15:38:43Webuser866623 joins
15:38:49Webuser866623 quits [Client Quit]
15:40:28<h2ibot>Hans5958 edited In The Media (+184, /* 2013 */ Add "Archive Team voltooit back-up…): https://wiki.archiveteam.org/?diff=58714&oldid=58713
15:44:29<h2ibot>KleaBot edited Main Page/In The Media (+70, Updated from [[In The Media]]): https://wiki.archiveteam.org/?diff=58715&oldid=58683
15:54:35@rewby quits [Quit: WeeChat 4.4.2]
16:19:35cyan_box quits [Read error: Connection reset by peer]
16:23:30<@imer>nicolas17: doesnt look like it
16:25:45<@imer>looks like they’re sat in the inbox, did an automation break arkiver?
16:30:44rewby (rewby) joins
16:30:44@ChanServ sets mode: +o rewby
16:37:03<that_lurker>maybe because of the data loss code change
16:39:13NF885 (NF885) joins
16:42:14<NF885>imer looks like it's only broken for some projects
16:42:37<NF885>e.g. Blogger and Ad Library are getting the collection
16:43:27<NF885>but URLs, YouTube and US gov aren't
16:46:07<NF885>also Telegram isn't
16:52:12<NF885>actually, looks like all the US gov stuff is not getting moved to the AT account
16:54:54<NF885>(side note: shouldn't the EOT collection stop being added to the US gov items?)
16:59:20<nicolas17>imer: oh so stuff is being uploaded to IA, just not moved to the right collection?
16:59:43<nicolas17>that's good news actually, I thought we were *still* not uploading since dec 5
17:08:22<NF885>BTW I found https://archive.org/details/github-downloads-2012-12 - shouldn't this be under the AT collection?
17:21:20<NF885>also, for future reference, the 256 items in the AT collection directly should probably be cleaned up at some point
17:21:21<NF885>https://archive.org/details/archiveteam?tab=collection&query=primary_collection%3Aarchiveteam+-mediatype%3Acollection&sort=-addeddate&not%5B%5D=collection%3A%22archiveteam_usgovernment%22
17:54:34_null quits [Ping timeout: 256 seconds]
17:54:47<h2ibot>Klea created Moinmoin (+22, Create redirect to [[MoinMoin]]): https://wiki.archiveteam.org/?title=Moinmoin
17:55:47<h2ibot>Klea edited MoinMoin (-1, /* Lost */ Update reference text): https://wiki.archiveteam.org/?diff=58717&oldid=53174
18:04:48<h2ibot>Nintendofan885 edited ArchiveBot/Monitoring (+7, /* Current monitoring */ use IRC template): https://wiki.archiveteam.org/?diff=58718&oldid=58503
18:43:35z_ joins
18:44:33z_ is now known as tmg1|michelson
18:49:39NF885 quits [Client Quit]
19:33:44sg72 quits [Ping timeout: 256 seconds]
19:35:54sg72 joins
19:51:18<klea>apparently revolt.chat was renamed to stoat.chat - https://stoat.chat/updates/long-live-stoat
19:59:18v01d joins
20:10:28ducky quits [Ping timeout: 260 seconds]
20:29:13Webuser603821 joins
20:29:41Webuser603821 quits [Client Quit]
20:38:13_null (_null) joins
21:19:19fetcher quits [Ping timeout: 272 seconds]
21:24:16fetcher joins
21:35:42twiswist_ quits [Read error: Connection reset by peer]
21:46:59Dada quits [Remote host closed the connection]
21:48:07archiveDrill quits [Quit: The Lounge - https://thelounge.chat]
21:49:50archiveDrill joins
22:06:10flotwig quits [Ping timeout: 256 seconds]
22:12:59flotwig joins
22:19:05azalea_sh_ (azalea_sh_) joins
22:19:29<azalea_sh_>Hey Hey! Glad to not be on the cost shameboard because I have found another set of public media URLs!
22:19:46flotwig quits [Ping timeout: 256 seconds]
22:19:48<azalea_sh_>I'll run some metrixs for estimated storage costs and such
22:21:15<azalea_sh_>This time the db file storing the URLs is only like 1GB so it shouldn't be *that* bad (hopefully)
22:21:31<nicolas17>now that's a feature idea
22:21:47<nicolas17>send a URL list to archivebot but attribute it to another username for shameboard purposes :3c
22:21:55<azalea_sh_>:+1:
22:21:56flotwig joins
22:23:55nine quits [Ping timeout: 272 seconds]
22:29:05nine joins
22:29:05nine quits [Changing host]
22:29:05nine (nine) joins
22:33:33Wohlstand quits [Quit: Wohlstand]
22:40:00<azalea_sh_>Size Est is between 400 and 600GB
22:40:12<azalea_sh_>I'll split the url file rq
22:42:35<azalea_sh_>After filtering for uniques its way lower, like half that
22:43:57<azalea_sh_>https://transfer.archivete.am/VIa5U/stickers_part_000.txt
22:43:57<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/VIa5U/stickers_part_000.txt
22:43:58<azalea_sh_>:3#
22:44:31StarletCharlotte joins
22:45:25<azalea_sh_>(2,773,420) URLs 50-100 KB avg filesize
22:46:40<StarletCharlotte>Is it normal for the Wayback Machine to just... Drop pages you try to save? Because I've been regularly sending pages to it via the save page from an open directory that gets updated regularly (with older stuff often being deleted, so stuff gets lost), and even when it's supposedly done saving, even days later, they just... Never show up in the
22:46:40<StarletCharlotte>Wayback Machine. And resaving a file that's been updated says it's the first archive. Is this normal? Are those pages still being indexed or are they just gone? Is there any way to prevent this?
22:47:46etnguyen03 (etnguyen03) joins
22:47:48<nicolas17>using Save page Now?
22:47:59<StarletCharlotte>Yes
22:48:15<StarletCharlotte>https://web.archive.org/save/
22:49:07<pokechu22>It's an indexing issue affecting everyone. The data still exists, it's just not accessible on web.archive.org
22:49:13<StarletCharlotte>oh thank god
22:49:30<pokechu22>It started a month or so ago IIRC
22:49:51<StarletCharlotte>I was worried because I definitely saved some stuff that's gone now
22:50:28StarletCharlotte quits [Client Quit]
22:50:30<pokechu22>though my understanding is that it's generally available on a temporary index for a day or 2, and then stops being accessible when it drops out of that temporary index
22:51:36<@JAA>Also, #internetarchive for IA/WBM/SPN-related things
22:55:22<azalea_sh_>also wanted to say thx again for the help with the amino archive, it, having been a solo endeavour so far, was a huge pain to get all of this data so im extremely glad that there's a way to get it redundantly stored and also have the media archived despite the scale
22:56:32<azalea_sh_>I have some plans with all of that data that I would never be able to finish myself so this has been a great help
22:57:05<nicolas17>JAA: did you submit the previous amino list to AB btw?
22:57:35<@JAA>Yes, still running
22:57:43<nicolas17>good
22:58:39<nicolas17>9 parts? wow
22:58:48<@JAA>10
22:59:18<nicolas17>and part 0 is 1.3TB already
22:59:22<nicolas17>:pain:
22:59:31<azalea_sh_>That's just because the URLs were sorted
22:59:46<azalea_sh_>pa1 subdomain is ~80% of the storage size
23:00:06<azalea_sh_>but just 8M urls of the >60M URLs total
23:00:15<nicolas17>what was your size estimate?
23:00:22<azalea_sh_>5-7TB total
23:01:09<azalea_sh_>the stickers one i sent earlier is like 200-450 I think
23:01:34<azalea_sh_>I redid calculations constantly with current AB queues and that also seems to match up with previous estimations
23:02:44<azalea_sh_>but yeah it's expected that part 0 will be the largest since that has most of pa1
23:16:54atphoenix_ (atphoenix) joins
23:19:39atphoenix__ quits [Ping timeout: 272 seconds]
23:21:45nexussfan (nexussfan) joins
23:26:25Wohlstand (Wohlstand) joins
23:28:27<@JAA>Yeah, should've shuffled the list. Oh well.
23:28:48<azalea_sh_>Would it have been better if it were shuffled?
23:28:56<azalea_sh_>if so, then sorry!
23:29:23<@JAA>I mean I should've done that when I split it up into parts etc. :-)
23:30:20<azalea_sh_>The stickers list is also sorted fyi but since that is only one urls file anyway that shouldnt matter too much i think
23:30:56<azalea_sh_>the semi public dumps were also sorted, dunno if you included those parts already i only calced the total urls amount for the first 6 parts
23:31:12<@JAA>Yeah, I'll just rename the stickers file to something more descriptive.
23:31:26<@JAA>I haven't done the semi-public list yet.
23:32:13<azalea_sh_>Aye aye! I'll be gone for a few days, lets see if it'll still be ongoing when im back
23:53:00etnguyen03 quits [Client Quit]
23:53:29Island joins
23:53:30HackMii quits [Remote host closed the connection]
23:53:52HackMii (hacktheplanet) joins