00:05:22Twisty quits [Remote host closed the connection]
00:17:47sarge (sarge) joins
00:19:38mr_sarge quits [Ping timeout: 252 seconds]
00:23:50CraftByte quits [Ping timeout: 252 seconds]
00:49:44mr_sarge (sarge) joins
00:51:53sarge quits [Ping timeout: 252 seconds]
00:59:33Mateon2 joins
01:00:08Mateon1 quits [Ping timeout: 252 seconds]
01:00:08Mateon2 is now known as Mateon1
01:00:28<h2ibot>JAABot edited CurrentWarriorProject (+4): https://wiki.archiveteam.org/?diff=49911&oldid=49863
01:03:20etnguyen03 quits [Ping timeout: 265 seconds]
01:09:44Catdurid quits [Remote host closed the connection]
01:13:05Earendil7 quits [Client Quit]
01:14:14Earendil7 (Earendil7) joins
01:18:52etnguyen03 (etnguyen03) joins
01:37:00railen63 quits [Ping timeout: 252 seconds]
01:54:04<@arkiver>OrIdow6: JAA: do we have a channel for egloos?
01:54:16<@arkiver>yts98: nice! :)
01:54:23<@arkiver>yeah i was planning on it, looks like a fun project
01:55:20etnguyen03 quits [Ping timeout: 252 seconds]
01:55:59dr3gs joins
01:56:01<@JAA>arkiver: No channel yet as far as I know.
01:56:36<Ivan226>Isn't the channel for Twitter just here
01:57:04<@JAA>?
01:57:21<Ivan226>yeah it is
01:57:50<Ivan226>(was asking for like was there a channel for twitter, like how reddit has #shreddit and pixiv has #pixeled)
01:58:24<@JAA>Right
01:59:07<Ivan226>Cuz I was gonna suggest Twitter's channel be Musk'd or something (kinda like how you guys joke about Yahoo)
01:59:19<Ivan226>(or at least, that's what the wiki lore says)
02:02:36Hajdar quits [Client Quit]
02:02:40<@arkiver>yts98: i just saw https://github.com/yts98/lineblog-grab - nice! i need to have a better look at this
02:03:01<@arkiver>are you fine with me cloning that later to the github.com/archiveteam ?
02:03:39Hajdar (Hajdar) joins
02:11:34<@arkiver>hmm
02:11:47<@arkiver>yts98: actually, this is only for API crawling?
02:15:39etnguyen03 (etnguyen03) joins
02:19:43<@arkiver>(oh nvm)
02:25:06<@arkiver>yts98: what is the reason for splitting API and the rest from each other? this can create the problem that some lineblog.me URLs that are better discoverable through the API are not queued to the lineblog.me part, and vice-verse
02:25:09<@arkiver>versa*
02:26:00<@arkiver>and duplicate images
02:26:53<@arkiver>i might join those two again
02:27:01<@arkiver>but overall looking pretty nice already
02:29:50savethestuffyo quits [Quit: WeeChat 3.8]
02:30:56<@arkiver>also i seem to have missed your messages on the 26th :/ sorry about that
02:31:30cbl5257 joins
02:33:15Arachnophine quits [Remote host closed the connection]
02:33:51Arachnophine (Arachnophine) joins
02:34:05graham quits [Client Quit]
02:34:11etnguyen03 quits [Ping timeout: 252 seconds]
02:34:49Arachnophine quits [Read error: Connection reset by peer]
02:35:00Arachnophine (Arachnophine) joins
02:39:10<@arkiver>anyone have a fun idea for a channel for egloos?
02:39:24etnguyen03 (etnguyen03) joins
02:39:35<@arkiver>(eggos could be one)
02:43:48<@arkiver>well
02:44:16<@arkiver>#eggos for egloos!
02:44:21mr_sarge quits [Ping timeout: 265 seconds]
02:45:04<@arkiver>OrIdow6: ^
02:45:15xkey quits [Client Quit]
02:45:15TTN joins
02:48:33<nstrom|m>#eglost
02:49:16<@arkiver>nstrom|m: sorry it's already #eggos
02:49:30<fireonlive>🧇
02:50:31mike joins
02:50:47<h2ibot>SaveThEWhatNow edited Reddit (+13): https://wiki.archiveteam.org/?diff=49912&oldid=49891
02:50:55mike is now known as savethestuffyo
02:54:14<@arkiver>i'll start a bunch of projects the coming days for upcoming shut downs
02:54:48<h2ibot>Nicolas17v2 edited Reddit (+1, typo): https://wiki.archiveteam.org/?diff=49913&oldid=49912
02:55:11<nicolas17>can you give an update of what's going on with reddit/imgur?
02:55:18<@arkiver>yes
02:55:20<@arkiver>imgur paused
02:55:22<@arkiver>reddit coming up
02:57:04<nicolas17>reddit has been paused for 8h, if it's not coming back very soon we should have unpaused imgur...
03:05:24mr_sarge (sarge) joins
03:14:16dr3gs quits [Remote host closed the connection]
03:20:18sarge (sarge) joins
03:21:29mr_sarge quits [Ping timeout: 252 seconds]
03:24:14dumbgoy quits [Ping timeout: 252 seconds]
03:30:18jlwoodwa joins
03:40:08sarge quits [Read error: Connection reset by peer]
03:44:06<yts98>arkiver: feel free to clone my repo. I seperated API from the rest because I want them to have different request user-agents (okhttp and Chrome respectively). If user-agent doesn't matter, two parts can be joined.
03:46:47mr_sarge (sarge) joins
03:48:59etnguyen03 quits [Ping timeout: 252 seconds]
03:49:42dumbgoy joins
03:52:44etnguyen03 (etnguyen03) joins
03:53:02mr_sarge quits [Ping timeout: 252 seconds]
03:53:20mr_sarge (sarge) joins
03:54:26dumbgoy quits [Ping timeout: 265 seconds]
03:58:47mr_sarge quits [Ping timeout: 265 seconds]
03:59:53etnguyen03 quits [Client Quit]
04:02:28atphoenix quits [Remote host closed the connection]
04:03:11atphoenix (atphoenix) joins
04:15:33Megame quits [Client Quit]
04:25:14kkkkkkkk joins
04:26:54BigBrain quits [Remote host closed the connection]
04:27:27BigBrain (bigbrain) joins
04:28:16Lord_Nightmare quits [Ping timeout: 265 seconds]
04:29:37kkkkkkkk quits [Remote host closed the connection]
04:34:40Lord_Nightmare (Lord_Nightmare) joins
04:59:42TTN quits [Remote host closed the connection]
05:13:51hitgrr8 joins
05:46:26xkey (xkey) joins
05:50:55Island quits [Read error: Connection reset by peer]
06:22:26YetAnotherArchiver joins
06:26:41@rewby quits [Ping timeout: 265 seconds]
06:29:34<YetAnotherArchiver>tianya.cn is dead
06:29:42<YetAnotherArchiver>wiki: https://en.wikipedia.org/wiki/Tianya_Club
06:29:49<YetAnotherArchiver>https://finance.yahoo.com/news/pioneering-internet-portal-tianya-cn-093000078.html
06:38:55killsushi joins
06:45:49Akatsuki joins
07:20:14BlueMaxima quits [Client Quit]
07:37:56YetAnotherArchiver quits [Remote host closed the connection]
08:15:55beario_ quits [Ping timeout: 265 seconds]
08:15:55beario quits [Ping timeout: 265 seconds]
08:31:31rewby (rewby) joins
08:31:31@ChanServ sets mode: +o rewby
08:37:19YetAnotherArchiver joins
08:40:55YetAnotherArchiver quits [Remote host closed the connection]
08:53:01Aertbei joins
09:07:15sonick (sonick) joins
09:25:00qw3rty quits [Read error: Connection reset by peer]
09:25:12qw3rty joins
10:04:40Ruthalas5 quits [Ping timeout: 265 seconds]
10:13:21Jonboy345 joins
10:13:59AmAnd0A quits [Ping timeout: 252 seconds]
10:16:32AmAnd0A joins
10:17:40Akatsuki quits [Remote host closed the connection]
10:23:54Ruthalas5 (Ruthalas) joins
10:48:28<betamax>JAA / nicolas17: update on the School of Dragons stuff: JumpStart have now announced that *all* of their games are shutting down on June 30
10:48:32<betamax>https://www.jumpstart.com/
10:48:53<betamax>" Effective immediately, we will be discontinuing support for our current games and there will be no further game downloads or content or functionality updates. On June 30, 2023, our servers will be shut down completely. Game progress and in-game items will not be saved, and any user data will be unavailable after that date."
10:50:28<BigBrain>not dumping game progress and in-game item db?
10:51:00<betamax>my friend is looking into reverse-engineering the "School of Dragons" game, but only that one
10:51:30<betamax>wait, does this mean Neopets is shutting down too?
10:57:36beario joins
10:57:44beario_ joins
10:58:03beario_ quits [Client Quit]
11:49:40decky_e quits [Remote host closed the connection]
11:55:10sonick quits [Client Quit]
12:05:47Ruthalas5 quits [Client Quit]
12:06:07Ruthalas5 (Ruthalas) joins
12:06:19mr_sarge (sarge) joins
12:13:39icedice quits [Client Quit]
12:28:49birdjj quits [Quit: The Lounge - https://thelounge.chat]
12:29:28birdjj joins
12:33:38beario_ joins
12:33:53icedice (icedice) joins
12:39:55<flashfire42>noooooooooooooooooooo
12:40:04beario quits [Client Quit]
12:40:08beario_ quits [Client Quit]
12:40:24beario joins
12:40:59Akatsuki joins
12:54:17Unholy23615 (Unholy2361) joins
12:57:09etnguyen03 (etnguyen03) joins
12:57:53Unholy2361 quits [Ping timeout: 252 seconds]
12:57:53Unholy23615 is now known as Unholy2361
12:58:11AmAnd0A quits [Ping timeout: 265 seconds]
12:58:21AmAnd0A joins
13:06:39AmAnd0A quits [Read error: Connection reset by peer]
13:06:58AmAnd0A joins
13:08:04Andyman joins
13:09:01Andyman quits [Remote host closed the connection]
13:14:25mr_sarge quits [Read error: Connection reset by peer]
13:16:51mr_sarge (sarge) joins
13:37:44phaeton joins
13:50:02Alobaidy joins
13:54:22Alobaidy quits [Ping timeout: 252 seconds]
14:05:53<AK>Oh god we need to get those sites run through AB then
14:06:20<AK>If it's not been started in ~5 hours when I finish work I'll go through and find all sites+subdomains and try to grab what we can
14:14:23Graphxne joins
14:53:09spirit joins
14:59:37<nstrom|m>no notice on neopets site so that one's probably staying up. http://www.mathblaster.com/ does have shutdown warning
15:10:16IDK quits [Client Quit]
15:24:12<joepie91|m>> Thank you for playing JumpStart Games various products over the years. Effective immediately, we will be discontinuing support for our current Educational JumpStart and Math Blaster games - this does NOT include Neopets branded games, and there will be no further game downloads or content or functionality updates. On June 30, 2023, our servers will be shut down completely. Game progress and in-game items will not be saved, and any user data
15:24:12<joepie91|m>will be unavailable after that date.
15:27:44spirit quits [Client Quit]
15:38:08jlwoodwa quits [Ping timeout: 252 seconds]
15:41:11Island joins
15:53:02decky_e (decky_e) joins
16:05:48c3manu (c3manu) joins
16:19:11killsushi quits [Ping timeout: 252 seconds]
16:20:58<fireonlive>more like jumpstop
16:22:35dumbgoy joins
16:59:53decky_e quits [Ping timeout: 252 seconds]
17:00:39decky_e (decky_e) joins
17:07:19dr3gs joins
17:08:47icedice quits [Client Quit]
17:17:13infiliotech joins
17:26:55etnguyen03 quits [Ping timeout: 265 seconds]
17:36:09graham joins
17:42:39StrangeFello quits [Remote host closed the connection]
17:43:46icedice (icedice) joins
17:44:19etnguyen03 (etnguyen03) joins
17:45:13graham quits [Client Quit]
17:46:08graham joins
17:48:35bigdata quits [Quit: Leaving]
17:51:34dr3gs quits [Remote host closed the connection]
17:56:47graham quits [Client Quit]
18:10:00graham joins
18:19:07graham quits [Client Quit]
18:29:53spirit joins
18:53:29HiccupJul (HiccupJul) joins
18:53:37<HiccupJul>is there an official dockerfile or VM image for grab-site?
18:57:21railen63 joins
18:57:23<@JAA>Nope, the relevant PR has been going for a while: https://github.com/ArchiveTeam/grab-site/pull/195
18:57:29mattx433 (mattx433) joins
19:15:08jlwoodwa joins
19:15:34Megame (Megame) joins
19:20:01jlwoodwa quits [Ping timeout: 265 seconds]
19:38:01<h2ibot>Rexma edited List of websites excluded from the Wayback Machine/Partial exclusions/Twitter accounts (+40, added new account, and organized alphabetically.): https://wiki.archiveteam.org/?diff=49914&oldid=49718
19:38:02<h2ibot>Ufarwisan edited List of websites excluded from the Wayback Machine (+33, Added moot's blog): https://wiki.archiveteam.org/?diff=49915&oldid=49903
19:38:03<h2ibot>FireonLive edited Deathwatch (+81, clarify PlaceIMG): https://wiki.archiveteam.org/?diff=49916&oldid=49892
19:38:04<h2ibot>Yts98 edited Deathwatch (+186, Add Half Dimension): https://wiki.archiveteam.org/?diff=49917&oldid=49916
19:39:13<fireonlive>oh hey it’s me!
19:39:26<fireonlive>I was a little sad to not be able to be all lowercase haha
19:39:35<fireonlive>but such is the MediaWiki life
19:39:42<HiccupJul>JAA: ah thanks, i should have searched for that
19:58:21iCaotix joins
20:00:04<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=49918&oldid=49915
20:02:39icaotix|m leaves
20:12:24HiccupJul quits [Client Quit]
20:14:05phaeton quits [Remote host closed the connection]
20:16:10<nicolas17>AWS us-east1 is on fire again
20:18:20<nicolas17>figuratively, I should clarify
20:22:38<FireFly>I was about to ask :p
20:24:14<fireonlive>OVH round 2
20:24:15<fireonlive>lol
20:25:34<@JAA>Round 2 was Global Switch in Paris a few weeks ago.
20:27:36railen64 joins
20:27:41<fireonlive>ohhh
20:31:04railen63 quits [Ping timeout: 265 seconds]
20:35:52etnguyen03 quits [Ping timeout: 252 seconds]
20:38:18etnguyen03 (etnguyen03) joins
20:40:34etnguyen03 quits [Client Quit]
20:46:46railen63 joins
20:48:57railen64 quits [Ping timeout: 265 seconds]
20:51:55railen64 joins
20:55:18railen63 quits [Ping timeout: 252 seconds]
21:03:27decky_e quits [Ping timeout: 265 seconds]
21:09:58geezabiscuit quits [Ping timeout: 252 seconds]
21:10:57railen63 joins
21:11:27geezabiscuit (geezabiscuit) joins
21:13:36railen64 quits [Ping timeout: 265 seconds]
21:16:12railen64 joins
21:16:17<@arkiver>rewby: while you're perhaps still around, could we get a target for egloos?
21:16:52<@arkiver>this would be
21:16:57<@arkiver>archiveteam_egloos_
21:16:59<@arkiver>egloos_
21:17:03<@arkiver>Archive Team Egloos:
21:17:16<@arkiver>(this is somewhat important - deadline is in 3 days)
21:17:37<@arkiver>another one, with a deadline at the end of this month is LINE BLOG, which would be
21:18:14<@arkiver>archiveteam_lineblog_
21:18:18<@arkiver>lineblog_
21:18:23<@arkiver>Archive Team LINE BLOG:
21:19:08railen63 quits [Ping timeout: 252 seconds]
21:23:16nicolas17 quits [Ping timeout: 265 seconds]
21:24:13<fireonlive>do we have a channel name for line? if not i suggest #holdtheline after the toto song: https://www.youtube.com/watch?v=htgr3pvBr-I
21:24:22<@arkiver>love it :)
21:24:24<fireonlive>:D
21:26:43nicolas17 joins
21:27:23<@JAA>Some previous suggestions from the March shutdown of LINE LIVE included: noose, lineout, offline, borderline, end-of-line, line-of-horizon, segment
21:28:33nicolas17 quits [Read error: Connection reset by peer]
21:29:03nicolas17 joins
21:29:11<@arkiver>well we're in #holdtheline
21:29:47<@arkiver>let's also make a channel for stack exchange
21:29:48<@JAA>I've been in the others for months. :-P
21:29:50<fireonlive><fingers pointing to eachother.png>
21:29:52<@arkiver>anyone have ideas?
21:30:01<@JAA>But yeah, that one's quite fine as well.
21:30:28<@arkiver>ouch sorry JAA
21:30:38<fireonlive>if it was just stackoverflow my first thought was stackunderflow but sadly it's all of the exchange
21:31:28decky_e (decky_e) joins
21:31:43<@arkiver>well it's becoming less of an "exchange" now that they stopped providing new data dumps
21:31:44<fireonlive>the decision is yours!
21:33:20<fireonlive>a while ago i looked into antonyms of exchange but nothing really bit me
21:33:34<@JAA>Yeah, same.
21:34:28railen69 joins
21:34:35<@JAA>stackhoard
21:34:44Aertbei quits [Remote host closed the connection]
21:34:54<fireonlive>ooh
21:35:06Dango360_ (Dango360) joins
21:35:08<pokechu22>Didn't they rebrand the company from stackexchange to stackoverflow?
21:35:42<fireonlive>stackhoarder?
21:35:59Dango360 quits [Ping timeout: 252 seconds]
21:36:06<nicolas17>stackunderflow
21:36:18<@arkiver>let's wait some time before picking as this is not time sensitive. so some others have a chance too :P
21:36:43<BigBrain>stackdump/stacktrace
21:36:52<@JAA>pokechu22: Huh, indeed, totally missed that.
21:37:10<fireonlive>wait did they? wow
21:37:46railen64 quits [Ping timeout: 265 seconds]
21:40:40<nicolas17>stopped my grabber again because my modem got angry again
21:49:22arct joins
21:51:02<threedeeitguy>StackAvalanche?
21:53:39<myself>shitoverflow
21:54:51<@JAA>toppleexchange
21:57:57<PredatorIWD_>shartoverflow
22:00:04Hajdar quits [Remote host closed the connection]
22:00:14railen63 joins
22:00:27Hajdar (Hajdar) joins
22:01:56railen69 quits [Ping timeout: 265 seconds]
22:03:36hitgrr8 quits [Client Quit]
22:06:04<threedeeitguy>I feel like there is a jenga pun in there somewhere
22:06:16<manu|m>i was about to say
22:06:50<betamax>JAA: sorry for abother ping, just wondered if the School of Dragons bucket was in AB yet (and what its ID was)
22:07:44decky_e quits [Ping timeout: 265 seconds]
22:10:57decky_e (decky_e) joins
22:11:16<@JAA>betamax: I've been setting it up. It won't be in AB due to the required dedupe; I'll retrieve it with qwarc instead.
22:15:35decky_e quits [Ping timeout: 252 seconds]
22:17:22decky_e (decky_e) joins
22:29:19railen64 joins
22:32:05railen63 quits [Ping timeout: 252 seconds]
22:35:59<@JAA>I'm relisting the bucket right now to get a complete up-to-date list.
22:37:00<nicolas17>yesterday I got an updated list, on my VPS since it has better latency and bandwidth to AWS
22:37:06<nicolas17>then tried diff -u against the old one
22:37:09<nicolas17>out of memory :D
22:37:53<@JAA>Yeah, sorting + `comm` might work better.
22:38:15<@JAA>Lots of ServerDisconnectedError as expected. :-|
22:38:37railen63 joins
22:38:54<nicolas17>'aws s3 ls --recursive' is giving me 4-5k files/sec, ETA 42 minutes
22:39:14lk quits [Ping timeout: 252 seconds]
22:39:56<@JAA>Yeah, I'm getting around 6k/s currently.
22:40:10railen64 quits [Ping timeout: 252 seconds]
22:40:24<nicolas17>despite the errors? that's pretty good then
22:40:54lk (lk) joins
22:41:16<@JAA>Yeah, parallelism goes brrr. :-)
22:41:29railen64 joins
22:44:34railen63 quits [Ping timeout: 252 seconds]
22:45:43BlueMaxima joins
22:51:23<@JAA>The dedupe works well. Quick test downloaded 113 MiB into a 5.3 MB WARC. :-)
22:52:22<nicolas17>wow, what part of the bucket??
22:52:54<@JAA>Just a couple random files with 00000* hashes.
22:53:37<@arkiver>very nice :)
22:53:40<fireonlive>=]
22:55:21<@JAA>It's not fast enough though, took 25 seconds, so I need it to go about 4 times faster.
22:55:48<@JAA>But this was at a concurrency of 1, so... :-)
22:57:01<@JAA>Retesting at 6 gives 3 seconds, but I bet there's some degree of caching involved.
22:57:21<nicolas17>oh you did it by hash, no wonder you got so many to dedup :D
22:57:31decky_e quits [Ping timeout: 265 seconds]
22:57:38<@JAA>Yeah, I kind of have to do it that way because qwarc only dedupes within a single process.
22:57:49<@JAA>So processing them in hash order maximises the dedupe.
22:57:57decky_e (decky_e) joins
23:01:19railen63 joins
23:03:38railen64 quits [Ping timeout: 252 seconds]
23:04:52<@JAA>By the way, the record holders in that bucket are almost 2k copies of a few files.
23:12:05<nicolas17>mp4? :D
23:13:54<fireonlive>that's... efficient lol
23:14:57<@JAA>Yep, specifically these four: DWADragonsUnity/Android/1.11.0/High/de-DE/Movies/ScientificMethod02.mp4 DWADragonsUnity/Android/1.11.0/High/de-DE/Movies/ScientificMethod04.mp4 DWADragonsUnity/Android/1.11.0/High/de-DE/Movies/BoD_Intro.mp4 DWADragonsUnity/Android/1.11.0/High/de-DE/Movies/ScientificMethod03.mp4
23:15:11<@JAA>Each of those appears 1963 times in the bucket as of my listing the other night.
23:15:29<nicolas17>my total for mp4 was 1247 MiB unique data (92 files) + 514697 MiB duplicate (48357 files)
23:19:00railen64 joins
23:19:56railen63 quits [Remote host closed the connection]
23:21:57<nicolas17>JAA: finished updating my listing, there were some more new files in cudos/activity/kpi like last time, seems that updates daily
23:23:21<nicolas17>nothing else in the actual data we're interested in
23:24:10<@JAA>nicolas17: Have you seen any files vanish so far?
23:24:28<nicolas17>no
23:24:33<nicolas17>I used "comm -3 <(cut -c32- origin.ka.cdn.txt) <(cut -c32- origin.ka.cdn2.txt)"
23:24:53<@JAA>Ok, good.
23:24:58<nicolas17>since aws ls output looks like '2014-11-07 13:40:07 2014500 Content/AAJILZ/LoadScreens/de-DE/AniDWDragonsMElemLsSmokebreath.png'
23:25:17<@JAA>My relisting's almost done as well, stupid timeouts.
23:25:37<@JAA>I'm curious why you aren't running into that as much.
23:26:13<nicolas17>I think awscli deals with retries on its own, but on the other hand it doesn't do parallel requests, so yeah I'm baffled how your approach is slower
23:26:31<nicolas17>I don't even know if I'm getting errors because awscli hides that :P
23:32:22<fireonlive>if we don't show the errors the customers can't complain!
23:32:28<fireonlive>foreheadtap
23:33:34railen63 joins
23:34:48lk quits [Ping timeout: 252 seconds]
23:35:32railen64 quits [Ping timeout: 252 seconds]
23:35:38lk (lk) joins
23:37:09nicolas17 quits [Client Quit]
23:37:17railen64 joins
23:40:32railen63 quits [Ping timeout: 265 seconds]
23:44:15railen63 joins
23:44:51TTN joins
23:45:26AmAnd0A quits [Ping timeout: 252 seconds]
23:46:20railen64 quits [Ping timeout: 265 seconds]
23:48:16lk quits [Ping timeout: 265 seconds]
23:49:28lk (lk) joins
23:55:47<h2ibot>Yts98 edited Deathwatch (+6): https://wiki.archiveteam.org/?diff=49919&oldid=49917
23:55:48<h2ibot>FireonLive edited LINE BLOG (+20, fireonlive names a project IRC channel!): https://wiki.archiveteam.org/?diff=49920&oldid=49910
23:58:09Chris50106 (Chris5010) joins