00:29:50qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
00:36:55qwertyasdfuiopghjkl quits [Max SendQ exceeded]
00:37:07qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
00:40:45systwi quits [Ping timeout: 272 seconds]
00:46:49<fireonlive>-+rss- Nintendo Network shutdown – The beginning of the end: https://pretendo.network/blog/12-23-23 https://news.ycombinator.com/item?id=38766570
00:46:57<fireonlive>we've maybe seen the nintendo stuff earlier yeah?
00:52:03<Pedrosso>So... What services does Nintendo hold with that network?
00:54:35systwi (systwi) joins
00:58:44<fireonlive>friend was complaining that all the player-made/shared levels for the first game would be unavailable
00:58:59<fireonlive>unsure if there's a way to do anything with that sadly
01:01:50ScenarioPlanet quits [Ping timeout: 240 seconds]
01:03:32<Pedrosso>Was there ever done a grab of Super Mario Maker games? What other sorts of things could be of interest of archival?
01:13:08<Pedrosso>courses*
01:31:48RealPerson leaves
01:53:16ScenarioPlanet (ScenarioPlanet) joins
01:58:01ScenarioPlanet quits [Ping timeout: 272 seconds]
02:06:40DLoader quits [Changing host]
02:06:40DLoader joins
02:17:09DogsRNice joins
02:27:48RealPerson joins
02:44:26RealPerson leaves
02:46:16RealPerson joins
03:00:14HP_Archivist quits [Client Quit]
03:04:05RealPerson leaves
03:38:23<@JAA>Assuming OneHallyu stays up, the topic retries should be done in a bit over 2 hours. I'll then run another similar thing for the remaining topics that are being done sequentially since that's so slow. Also one more topic failed with timeouts.
03:38:37<@JAA>Some of these topics have well over 10k pages, pretty insane.
03:40:08<Terbium>another forum bites the dust, forums are disappearing rapidly :(
03:45:46<fireonlive>everyone loves fucking discord these days
03:46:32<Terbium>sadly the case, forums are move to the free easy to use walled garden known as discord :/
03:47:38<Terbium>wrote a discord archiver recently to archive discord servers into a database, hopefully their API doesn't change too much in the coming months
04:20:13BlueMaxima quits [Read error: Connection reset by peer]
04:39:07<fireonlive>i was using DiscordChatExporter but sadly it doesn't quite support those 'new fangled' 'forum' channels (and threads in normal channels) yet
04:53:21Island quits [Read error: Connection reset by peer]
04:58:47ScenarioPlanet (ScenarioPlanet) joins
05:21:22<Pedrosso>a discord archiver you say?
05:23:20<fireonlive>OwO
05:32:19<Terbium>Yeah, DiscordChatExporter didn't really suit my needs (large scale distributed crawling with database store)
05:32:29<Terbium>Decided to write up a basic crawler
05:32:53<Terbium>Currently doesn't grab attachments, but planned as a feature soon
05:33:05<fireonlive>:)
05:35:14<fireonlive>sounds like you do some fun stuff
05:35:18<fireonlive>:p
05:39:25<Terbium>More like preparing for Discord's inevitable demise :P
05:40:37<fireonlive>true true xP
05:44:14<Pedrosso>How is the crawler you wrote different than DiscordChatExporter?
05:46:03<fireonlive>at the very least i would assume discord hates it more
05:46:05<fireonlive>:p
05:47:43<Terbium>Python based (no .NET thankfully), dumps everything to database as fast as possible, distributed (can use multiple instances to allocate servers/channels to crawl with different accounts/IPs)
05:48:02<Terbium>No attachment downloading right now (can backfill later)
05:48:13<fireonlive>(no .NET thank you so much)
05:48:23<Pedrosso>what's so bad about .NET ?
05:48:28<fireonlive>it's not python ™
05:48:32<Pedrosso>True
05:48:40<Terbium>I saw .NET, I gagged so hard I ended up writing my own crawler
05:48:44<fireonlive>:D
05:48:55<fireonlive>any issues forseen with discord requiring 'the parameters' soon?
05:49:04<fireonlive>for earlier grabs/crawls that might not have them
05:49:29<Terbium>It's simple enough to rewrite in go or rust, but I don't really care as it's not performance intensive (all IO bound)
05:49:30<fireonlive>(and i guess they'll expire at some point on the attachment urls?)
05:50:10<fireonlive>if we really wanted developers here, we'd just need to make a few posts around the internet saying 'rust would never be able to be up to the task of ArchiveTeam's needs'
05:50:17<Terbium>I believe you can regenerate the links with refresh tokens
05:50:20<fireonlive>and the RETF would descend hell on here
05:50:31<fireonlive>(rust evangelism task force) to prove us wrong
05:51:00<Pedrosso>o.o
05:52:09<Terbium>DiscordChatExporter is great for personal exporting for the average user, just didn't suit my needs. It's not a bad app for the casual archiver
05:52:56<Pedrosso>I feel like this fits in #discard lol
05:53:30<Terbium>I know Sanqui and TheTechRobo was working on Discard for MITM based crawling. I think that stalled
05:54:04<Pedrosso>what does DiscordChatExporter do badly, other than not being able to handle the new features?
05:56:16<Terbium>Mostly scalability
05:57:20<Terbium>Not as easy to use 50 Discord accounts. Multiple accounts are needed due to server cap for each account (unless you leave/rejoin to swap in and out servers)
06:00:23<Pedrosso>I see I see. I think I'd leave this to the non-casual archivers. Hah. How've you been using it so far?
06:01:46<fireonlive>i think if I read it correctly as well Terbium's can do 'follow-up's very easily
06:01:54<fireonlive>i.e. get new messages since last visit
06:02:23<fireonlive>(or maybe has it built in already to continuously do so)
06:02:28<Terbium>Just started, so slowly scaling up (trying to find large lists of discord servers),and throw 10 accounts and IPs at them
06:03:14<fireonlive>:)
06:04:23<Pedrosso>Does it go recursively too? As in if it finds an invite link does it try it and then go from there?
06:04:43<Pedrosso>hm, but I suppose that doesn't work for servers which you have to manually figure out stuff like roles for viewing channels..
06:05:38<Terbium>Nope, it's very dumb right now, just crawls the servers the account has access to
06:06:00<Terbium>Yeah, the roles stuff causes lots of problems for me
06:06:16<Terbium>Especially "Verify your phone number" and all that nonsense
06:06:41<fireonlive>ah yeah the phone number thing :/
06:07:55<Terbium>Would be nice if we had direct access to their SycllaDB clusters
06:09:58<Pedrosso>I suppose a dumb bot will get lots of content still.
06:10:09<Pedrosso>Especially since we have very long lists of servers
06:13:45monoxane quits [Quit: estoy fuera]
06:19:45project10 quits [Client Quit]
06:20:01project10 (project10) joins
06:21:54monoxane (monoxane) joins
06:28:13qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
06:31:32project10 quits [Changing host]
06:31:32project10 (project10) joins
06:38:42<Pedrosso>Terbium: Please do keep us up to date with this
06:38:46project10 quits [Max SendQ exceeded]
06:39:05project10 (project10) joins
06:41:07<Terbium>*disappears from AT for another 12 months*
07:12:40IDK (IDK) joins
07:14:33<@JAA>Ok, I should have all OneHallyu topic pages now, I think.
07:15:56<fireonlive>:D
07:19:45<fireonlive>are attachments to be attempted?
07:20:09<@JAA>Do you have an example? I couldn't find any.
07:20:18<fireonlive>oh, i don't
07:20:25<fireonlive>oh! i meant media i guess
07:20:35<fireonlive>i think you said you skipped .. something
07:20:49<@JAA>The only things I saw hosted on OneHallyu itself were avatars. But maybe I just didn't look in the right place.
07:21:03<fireonlive>ah ok :)
07:21:09<fireonlive>faulty memories!
07:21:21<@JAA>I did this with qwarc. qwarc doesn't care about HTML. So no page requisite extraction or similar.
07:22:05<@JAA>qwarc fetches a URL you give it and writes it to WARC. Basically everything else is left as an exercise to the user.
07:23:08DogsRNice quits [Read error: Connection reset by peer]
07:26:22<fireonlive>:)
07:26:33<@JAA>Oh, two topics failed. One is a 'count to a million' forum game, the other just a random small discussion.
07:31:08<@JAA>The former doesn't even have 5k pages, but it's extremely slow.
07:31:27<@arkiver>some inefficient pagination i guess
07:31:52<@JAA>No, there are far larger topics that are faster.
07:32:00<@JAA>Largest I saw had 18k pages.
07:32:07<fireonlive>damn
07:32:27<@JAA>(I didn't systematically check though, so maybe that isn't even the largest one that exists.)
07:33:00<@JAA>Anyway, it's getting grabbed now, whether the server likes it or not. :-)
07:33:10<fireonlive>👀
07:36:13DopefishJustin quits [Ping timeout: 272 seconds]
07:36:46<@JAA>Ah, now the response time is actually decent.
07:36:54<@JAA>https://transfer.archivete.am/inline/spNkP/explanation.png
07:38:13<fireonlive>:D
07:48:27DopefishJustin joins
08:15:36<@JAA>That topic is done as well now, and that should be everything that's accessible. (I saw a small number of 403s.)
08:16:00<@JAA>src extraction is running but will take a little while.
08:18:36<@arkiver>outlinks going to #// ?
08:19:24<@JAA>Possibly later. Just focusing on onsite stuff now since that'll vanish very soon.
08:19:28<@arkiver>got it
08:19:29<@arkiver>sounds good
08:33:47<SketchCow>Merry Christmas, maniacs
08:34:56<fireonlive>thanks sketchy
08:46:51Ruthalas59 quits [Quit: END OF LINE]
08:56:55Ruthalas59 (Ruthalas) joins
09:14:11hitgrr8 joins
09:57:20qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
09:58:45qwertyasdfuiopghjkl quits [Max SendQ exceeded]
09:58:57qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
10:00:05Bleo18260 quits [Client Quit]
10:01:29Bleo18260 joins
10:15:56mv joins
10:16:42mv quits [Remote host closed the connection]
10:21:49IDK quits [Client Quit]
10:33:14<pabs>https://publicwww.com/ — a search engine for stuff in websites' source code.
10:36:45<fireonlive>ooh neat
10:40:19<pabs>paid beyond alexa rank 1mil, higher costs for further down the ranking
10:40:31<pabs>https://publicwww.com/prices.html
10:40:40<pabs>ouch, $499/month for all URLs
10:41:11pabs wonders how that compares to shodan
10:41:33<pabs>er 3mil not 1mil
10:42:09<pabs>and $49/month gets you all URLs, but only 100 searches/day up to 100K rows
10:42:31<pabs>hmm, I think the 1mil was without an account
11:01:59T31M quits [Quit: ZNC - https://znc.in]
11:26:07<qwertyasdfuiopghjkl>JAA: for OneHallyu, did you save the user profiles? (You can do https://onehallyu.com/profile/1--/ https://onehallyu.com/profile/2--/ https://onehallyu.com/profile/3--/ etc and get redirected to the correct name. Looks like there's also different tabs on each profile page that need to be requested separately.)
12:17:11BornOn420 quits [Remote host closed the connection]
12:18:15BornOn420 (BornOn420) joins
12:18:43BornOn420 quits [Remote host closed the connection]
12:35:05<@JAA>qwertyasdfuiopghjkl: No, only topics.
12:39:07ScenarioPlanet quits [Changing host]
12:39:07ScenarioPlanet (ScenarioPlanet) joins
13:22:39Basis joins
13:23:55systwi quits [Ping timeout: 272 seconds]
13:26:50Arcorann quits [Ping timeout: 240 seconds]
13:31:47jacksonchen666 (jacksonchen666) joins
13:38:41systwi (systwi) joins
14:47:24RealPerson joins
14:51:58<@JAA>I started an AB job for the OneHallyu src values I managed to extract, but it looks like the site is dying now and returning HTTP 522 (Buttflare's code for connection timeout to the upstream server) for a lot of things.
14:52:31<@JAA>So maybe they took the server online and only what remains in Buttflare's cache is still around.
14:52:35<@JAA>offline*
15:02:17T31M joins
15:02:57RealPerson leaves
15:07:56<that_lurker>Could someone grab the upcoming Finnish presidential election candidates websites. https://lounge.kuhaon.fun/folder/65908e5765e73d9f/FinnishPresidentialElectionCandidates.txt
15:10:55<that_lurker>More info https://en.wikipedia.org/wiki/2024_Finnish_presidential_election
15:23:03RealPerson joins
15:30:17qwertyasdfuiopghjkl quits [Client Quit]
15:32:48qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
15:34:13qwertyasdfuiopghjkl quits [Max SendQ exceeded]
15:34:30qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
15:48:58RealPerson leaves
15:54:44<@JAA>that_lurker: #vooterbooter
15:55:05<@JAA>I'll run them later if nobody beats me to it.
15:55:22<that_lurker>ooh did not know there was a channel for this
15:58:47RealPerson joins
16:09:16RealPerson leaves
16:10:22RealPerson joins
16:13:51<that_lurker>also thanks :-)
16:31:43RealPerson leaves
16:32:01systwi quits [Ping timeout: 272 seconds]
16:59:53Scen joins
17:03:41ScenarioPlanet quits [Ping timeout: 272 seconds]
17:04:03systwi (systwi) joins
17:18:08Scen quits [Client Quit]
17:18:44ScenarioPlanet (ScenarioPlanet) joins
17:20:01ScenarioPlanet is now known as Scen
17:20:34Scen is now known as ScenarioPlanet
18:20:50aninternettroll quits [Read error: Connection reset by peer]
18:20:55aninternettroll (aninternettroll) joins
18:29:49jacksonchen666 quits [Ping timeout: 250 seconds]
18:32:37jacksonchen666 (jacksonchen666) joins
19:23:50kiryu quits [Ping timeout: 240 seconds]
20:06:57ScenarioPlanet quits [Client Quit]
20:07:47ScenarioPlanet (ScenarioPlanet) joins
20:15:51Island joins
20:30:55ScenarioPlanet quits [Client Quit]
20:31:10ScenarioPlanet (ScenarioPlanet) joins
20:51:00qwertyasdfuiopghjkl quits [Remote host closed the connection]
21:03:43Ruthalas59 quits [Ping timeout: 272 seconds]
21:04:40magmaus3 (magmaus3) joins
21:09:01DopefishJustin quits [Read error: Connection reset by peer]
21:15:15BlueMaxima joins
21:16:20magmaus3 quits [Ping timeout: 240 seconds]
21:18:47ScenarioPlanet quits [Read error: Connection reset by peer]
21:22:04magmaus3 (magmaus3) joins
21:26:20magmaus3 quits [Ping timeout: 240 seconds]
21:29:58Ruthalas59 (Ruthalas) joins
21:31:19hitgrr8 quits [Client Quit]
21:34:45Ruthalas59 quits [Ping timeout: 272 seconds]
21:43:51magmaus3 (magmaus3) joins
21:59:52RealPerson joins
22:08:25aninternettroll quits [Read error: Connection reset by peer]
22:08:36aninternettroll (aninternettroll) joins
22:16:58RealPerson leaves
22:17:50ThreeHM quits [Ping timeout: 240 seconds]
23:34:24Arcorann (Arcorann) joins
23:53:43DogsRNice joins
23:54:57DopefishJustin joins
23:59:54Xesxen quits [Max SendQ exceeded]
23:59:59Xesxen (Xesxen) joins