| 00:29:50 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 00:36:55 | | qwertyasdfuiopghjkl quits [Max SendQ exceeded] |
| 00:37:07 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 00:40:45 | | systwi quits [Ping timeout: 272 seconds] |
| 00:46:49 | <fireonlive> | -+rss- Nintendo Network shutdown – The beginning of the end: https://pretendo.network/blog/12-23-23 https://news.ycombinator.com/item?id=38766570 |
| 00:46:57 | <fireonlive> | we've maybe seen the nintendo stuff earlier yeah? |
| 00:52:03 | <Pedrosso> | So... What services does Nintendo hold with that network? |
| 00:54:35 | | systwi (systwi) joins |
| 00:58:44 | <fireonlive> | friend was complaining that all the player-made/shared levels for the first game would be unavailable |
| 00:58:59 | <fireonlive> | unsure if there's a way to do anything with that sadly |
| 01:01:50 | | ScenarioPlanet quits [Ping timeout: 240 seconds] |
| 01:03:32 | <Pedrosso> | Was there ever done a grab of Super Mario Maker games? What other sorts of things could be of interest of archival? |
| 01:13:08 | <Pedrosso> | courses* |
| 01:31:48 | | RealPerson leaves |
| 01:53:16 | | ScenarioPlanet (ScenarioPlanet) joins |
| 01:58:01 | | ScenarioPlanet quits [Ping timeout: 272 seconds] |
| 02:06:40 | | DLoader quits [Changing host] |
| 02:06:40 | | DLoader joins |
| 02:17:09 | | DogsRNice joins |
| 02:27:48 | | RealPerson joins |
| 02:44:26 | | RealPerson leaves |
| 02:46:16 | | RealPerson joins |
| 03:00:14 | | HP_Archivist quits [Client Quit] |
| 03:04:05 | | RealPerson leaves |
| 03:38:23 | <@JAA> | Assuming OneHallyu stays up, the topic retries should be done in a bit over 2 hours. I'll then run another similar thing for the remaining topics that are being done sequentially since that's so slow. Also one more topic failed with timeouts. |
| 03:38:37 | <@JAA> | Some of these topics have well over 10k pages, pretty insane. |
| 03:40:08 | <Terbium> | another forum bites the dust, forums are disappearing rapidly :( |
| 03:45:46 | <fireonlive> | everyone loves fucking discord these days |
| 03:46:32 | <Terbium> | sadly the case, forums are move to the free easy to use walled garden known as discord :/ |
| 03:47:38 | <Terbium> | wrote a discord archiver recently to archive discord servers into a database, hopefully their API doesn't change too much in the coming months |
| 04:20:13 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 04:39:07 | <fireonlive> | i was using DiscordChatExporter but sadly it doesn't quite support those 'new fangled' 'forum' channels (and threads in normal channels) yet |
| 04:53:21 | | Island quits [Read error: Connection reset by peer] |
| 04:58:47 | | ScenarioPlanet (ScenarioPlanet) joins |
| 05:21:22 | <Pedrosso> | a discord archiver you say? |
| 05:23:20 | <fireonlive> | OwO |
| 05:32:19 | <Terbium> | Yeah, DiscordChatExporter didn't really suit my needs (large scale distributed crawling with database store) |
| 05:32:29 | <Terbium> | Decided to write up a basic crawler |
| 05:32:53 | <Terbium> | Currently doesn't grab attachments, but planned as a feature soon |
| 05:33:05 | <fireonlive> | :) |
| 05:35:14 | <fireonlive> | sounds like you do some fun stuff |
| 05:35:18 | <fireonlive> | :p |
| 05:39:25 | <Terbium> | More like preparing for Discord's inevitable demise :P |
| 05:40:37 | <fireonlive> | true true xP |
| 05:44:14 | <Pedrosso> | How is the crawler you wrote different than DiscordChatExporter? |
| 05:46:03 | <fireonlive> | at the very least i would assume discord hates it more |
| 05:46:05 | <fireonlive> | :p |
| 05:47:43 | <Terbium> | Python based (no .NET thankfully), dumps everything to database as fast as possible, distributed (can use multiple instances to allocate servers/channels to crawl with different accounts/IPs) |
| 05:48:02 | <Terbium> | No attachment downloading right now (can backfill later) |
| 05:48:13 | <fireonlive> | (no .NET thank you so much) |
| 05:48:23 | <Pedrosso> | what's so bad about .NET ? |
| 05:48:28 | <fireonlive> | it's not python ™ |
| 05:48:32 | <Pedrosso> | True |
| 05:48:40 | <Terbium> | I saw .NET, I gagged so hard I ended up writing my own crawler |
| 05:48:44 | <fireonlive> | :D |
| 05:48:55 | <fireonlive> | any issues forseen with discord requiring 'the parameters' soon? |
| 05:49:04 | <fireonlive> | for earlier grabs/crawls that might not have them |
| 05:49:29 | <Terbium> | It's simple enough to rewrite in go or rust, but I don't really care as it's not performance intensive (all IO bound) |
| 05:49:30 | <fireonlive> | (and i guess they'll expire at some point on the attachment urls?) |
| 05:50:10 | <fireonlive> | if we really wanted developers here, we'd just need to make a few posts around the internet saying 'rust would never be able to be up to the task of ArchiveTeam's needs' |
| 05:50:17 | <Terbium> | I believe you can regenerate the links with refresh tokens |
| 05:50:20 | <fireonlive> | and the RETF would descend hell on here |
| 05:50:31 | <fireonlive> | (rust evangelism task force) to prove us wrong |
| 05:51:00 | <Pedrosso> | o.o |
| 05:52:09 | <Terbium> | DiscordChatExporter is great for personal exporting for the average user, just didn't suit my needs. It's not a bad app for the casual archiver |
| 05:52:56 | <Pedrosso> | I feel like this fits in #discard lol |
| 05:53:30 | <Terbium> | I know Sanqui and TheTechRobo was working on Discard for MITM based crawling. I think that stalled |
| 05:54:04 | <Pedrosso> | what does DiscordChatExporter do badly, other than not being able to handle the new features? |
| 05:56:16 | <Terbium> | Mostly scalability |
| 05:57:20 | <Terbium> | Not as easy to use 50 Discord accounts. Multiple accounts are needed due to server cap for each account (unless you leave/rejoin to swap in and out servers) |
| 06:00:23 | <Pedrosso> | I see I see. I think I'd leave this to the non-casual archivers. Hah. How've you been using it so far? |
| 06:01:46 | <fireonlive> | i think if I read it correctly as well Terbium's can do 'follow-up's very easily |
| 06:01:54 | <fireonlive> | i.e. get new messages since last visit |
| 06:02:23 | <fireonlive> | (or maybe has it built in already to continuously do so) |
| 06:02:28 | <Terbium> | Just started, so slowly scaling up (trying to find large lists of discord servers),and throw 10 accounts and IPs at them |
| 06:03:14 | <fireonlive> | :) |
| 06:04:23 | <Pedrosso> | Does it go recursively too? As in if it finds an invite link does it try it and then go from there? |
| 06:04:43 | <Pedrosso> | hm, but I suppose that doesn't work for servers which you have to manually figure out stuff like roles for viewing channels.. |
| 06:05:38 | <Terbium> | Nope, it's very dumb right now, just crawls the servers the account has access to |
| 06:06:00 | <Terbium> | Yeah, the roles stuff causes lots of problems for me |
| 06:06:16 | <Terbium> | Especially "Verify your phone number" and all that nonsense |
| 06:06:41 | <fireonlive> | ah yeah the phone number thing :/ |
| 06:07:55 | <Terbium> | Would be nice if we had direct access to their SycllaDB clusters |
| 06:09:58 | <Pedrosso> | I suppose a dumb bot will get lots of content still. |
| 06:10:09 | <Pedrosso> | Especially since we have very long lists of servers |
| 06:13:45 | | monoxane quits [Quit: estoy fuera] |
| 06:19:45 | | project10 quits [Client Quit] |
| 06:20:01 | | project10 (project10) joins |
| 06:21:54 | | monoxane (monoxane) joins |
| 06:28:13 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 06:31:32 | | project10 quits [Changing host] |
| 06:31:32 | | project10 (project10) joins |
| 06:38:42 | <Pedrosso> | Terbium: Please do keep us up to date with this |
| 06:38:46 | | project10 quits [Max SendQ exceeded] |
| 06:39:05 | | project10 (project10) joins |
| 06:41:07 | <Terbium> | *disappears from AT for another 12 months* |
| 07:12:40 | | IDK (IDK) joins |
| 07:14:33 | <@JAA> | Ok, I should have all OneHallyu topic pages now, I think. |
| 07:15:56 | <fireonlive> | :D |
| 07:19:45 | <fireonlive> | are attachments to be attempted? |
| 07:20:09 | <@JAA> | Do you have an example? I couldn't find any. |
| 07:20:18 | <fireonlive> | oh, i don't |
| 07:20:25 | <fireonlive> | oh! i meant media i guess |
| 07:20:35 | <fireonlive> | i think you said you skipped .. something |
| 07:20:49 | <@JAA> | The only things I saw hosted on OneHallyu itself were avatars. But maybe I just didn't look in the right place. |
| 07:21:03 | <fireonlive> | ah ok :) |
| 07:21:09 | <fireonlive> | faulty memories! |
| 07:21:21 | <@JAA> | I did this with qwarc. qwarc doesn't care about HTML. So no page requisite extraction or similar. |
| 07:22:05 | <@JAA> | qwarc fetches a URL you give it and writes it to WARC. Basically everything else is left as an exercise to the user. |
| 07:23:08 | | DogsRNice quits [Read error: Connection reset by peer] |
| 07:26:22 | <fireonlive> | :) |
| 07:26:33 | <@JAA> | Oh, two topics failed. One is a 'count to a million' forum game, the other just a random small discussion. |
| 07:31:08 | <@JAA> | The former doesn't even have 5k pages, but it's extremely slow. |
| 07:31:27 | <@arkiver> | some inefficient pagination i guess |
| 07:31:52 | <@JAA> | No, there are far larger topics that are faster. |
| 07:32:00 | <@JAA> | Largest I saw had 18k pages. |
| 07:32:07 | <fireonlive> | damn |
| 07:32:27 | <@JAA> | (I didn't systematically check though, so maybe that isn't even the largest one that exists.) |
| 07:33:00 | <@JAA> | Anyway, it's getting grabbed now, whether the server likes it or not. :-) |
| 07:33:10 | <fireonlive> | 👀 |
| 07:36:13 | | DopefishJustin quits [Ping timeout: 272 seconds] |
| 07:36:46 | <@JAA> | Ah, now the response time is actually decent. |
| 07:36:54 | <@JAA> | https://transfer.archivete.am/inline/spNkP/explanation.png |
| 07:38:13 | <fireonlive> | :D |
| 07:48:27 | | DopefishJustin joins |
| 07:48:27 | | DopefishJustin is now authenticated as DopefishJustin |
| 08:15:36 | <@JAA> | That topic is done as well now, and that should be everything that's accessible. (I saw a small number of 403s.) |
| 08:16:00 | <@JAA> | src extraction is running but will take a little while. |
| 08:18:36 | <@arkiver> | outlinks going to #// ? |
| 08:19:24 | <@JAA> | Possibly later. Just focusing on onsite stuff now since that'll vanish very soon. |
| 08:19:28 | <@arkiver> | got it |
| 08:19:29 | <@arkiver> | sounds good |
| 08:33:47 | <SketchCow> | Merry Christmas, maniacs |
| 08:34:56 | <fireonlive> | thanks sketchy |
| 08:46:51 | | Ruthalas59 quits [Quit: END OF LINE] |
| 08:56:55 | | Ruthalas59 (Ruthalas) joins |
| 09:14:11 | | hitgrr8 joins |
| 09:57:20 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 09:58:45 | | qwertyasdfuiopghjkl quits [Max SendQ exceeded] |
| 09:58:57 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 10:00:05 | | Bleo18260 quits [Client Quit] |
| 10:01:29 | | Bleo18260 joins |
| 10:15:56 | | mv joins |
| 10:16:42 | | mv quits [Remote host closed the connection] |
| 10:21:49 | | IDK quits [Client Quit] |
| 10:33:14 | <pabs> | https://publicwww.com/ — a search engine for stuff in websites' source code. |
| 10:36:45 | <fireonlive> | ooh neat |
| 10:40:19 | <pabs> | paid beyond alexa rank 1mil, higher costs for further down the ranking |
| 10:40:31 | <pabs> | https://publicwww.com/prices.html |
| 10:40:40 | <pabs> | ouch, $499/month for all URLs |
| 10:41:11 | | pabs wonders how that compares to shodan |
| 10:41:33 | <pabs> | er 3mil not 1mil |
| 10:42:09 | <pabs> | and $49/month gets you all URLs, but only 100 searches/day up to 100K rows |
| 10:42:31 | <pabs> | hmm, I think the 1mil was without an account |
| 11:01:59 | | T31M quits [Quit: ZNC - https://znc.in] |
| 11:26:07 | <qwertyasdfuiopghjkl> | JAA: for OneHallyu, did you save the user profiles? (You can do https://onehallyu.com/profile/1--/ https://onehallyu.com/profile/2--/ https://onehallyu.com/profile/3--/ etc and get redirected to the correct name. Looks like there's also different tabs on each profile page that need to be requested separately.) |
| 12:17:11 | | BornOn420 quits [Remote host closed the connection] |
| 12:18:15 | | BornOn420 (BornOn420) joins |
| 12:18:43 | | BornOn420 quits [Remote host closed the connection] |
| 12:35:05 | <@JAA> | qwertyasdfuiopghjkl: No, only topics. |
| 12:39:07 | | ScenarioPlanet quits [Changing host] |
| 12:39:07 | | ScenarioPlanet (ScenarioPlanet) joins |
| 13:22:39 | | Basis joins |
| 13:23:55 | | systwi quits [Ping timeout: 272 seconds] |
| 13:26:50 | | Arcorann quits [Ping timeout: 240 seconds] |
| 13:31:47 | | jacksonchen666 (jacksonchen666) joins |
| 13:38:41 | | systwi (systwi) joins |
| 14:47:24 | | RealPerson joins |
| 14:51:58 | <@JAA> | I started an AB job for the OneHallyu src values I managed to extract, but it looks like the site is dying now and returning HTTP 522 (Buttflare's code for connection timeout to the upstream server) for a lot of things. |
| 14:52:31 | <@JAA> | So maybe they took the server online and only what remains in Buttflare's cache is still around. |
| 14:52:35 | <@JAA> | offline* |
| 15:02:17 | | T31M joins |
| 15:02:57 | | RealPerson leaves |
| 15:03:28 | | T31M is now authenticated as T31M |
| 15:07:56 | <that_lurker> | Could someone grab the upcoming Finnish presidential election candidates websites. https://lounge.kuhaon.fun/folder/65908e5765e73d9f/FinnishPresidentialElectionCandidates.txt |
| 15:10:55 | <that_lurker> | More info https://en.wikipedia.org/wiki/2024_Finnish_presidential_election |
| 15:23:03 | | RealPerson joins |
| 15:30:17 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 15:32:48 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 15:34:13 | | qwertyasdfuiopghjkl quits [Max SendQ exceeded] |
| 15:34:30 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 15:48:58 | | RealPerson leaves |
| 15:54:44 | <@JAA> | that_lurker: #vooterbooter |
| 15:55:05 | <@JAA> | I'll run them later if nobody beats me to it. |
| 15:55:22 | <that_lurker> | ooh did not know there was a channel for this |
| 15:58:47 | | RealPerson joins |
| 16:09:16 | | RealPerson leaves |
| 16:10:22 | | RealPerson joins |
| 16:13:51 | <that_lurker> | also thanks :-) |
| 16:31:43 | | RealPerson leaves |
| 16:32:01 | | systwi quits [Ping timeout: 272 seconds] |
| 16:59:53 | | Scen joins |
| 17:03:41 | | ScenarioPlanet quits [Ping timeout: 272 seconds] |
| 17:04:03 | | systwi (systwi) joins |
| 17:18:08 | | Scen quits [Client Quit] |
| 17:18:44 | | ScenarioPlanet (ScenarioPlanet) joins |
| 17:20:01 | | ScenarioPlanet is now known as Scen |
| 17:20:17 | | Scen is now authenticated as * |
| 17:20:22 | | Scen is now authenticated as Scen |
| 17:20:34 | | Scen is now known as ScenarioPlanet |
| 17:20:44 | | ScenarioPlanet is now authenticated as * |
| 17:20:44 | | ScenarioPlanet is now authenticated as ScenarioPlanet |
| 18:20:50 | | aninternettroll quits [Read error: Connection reset by peer] |
| 18:20:55 | | aninternettroll (aninternettroll) joins |
| 18:29:49 | | jacksonchen666 quits [Ping timeout: 250 seconds] |
| 18:32:37 | | jacksonchen666 (jacksonchen666) joins |
| 19:23:50 | | kiryu quits [Ping timeout: 240 seconds] |
| 20:06:57 | | ScenarioPlanet quits [Client Quit] |
| 20:07:47 | | ScenarioPlanet (ScenarioPlanet) joins |
| 20:15:51 | | Island joins |
| 20:30:55 | | ScenarioPlanet quits [Client Quit] |
| 20:31:10 | | ScenarioPlanet (ScenarioPlanet) joins |
| 20:51:00 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 21:03:43 | | Ruthalas59 quits [Ping timeout: 272 seconds] |
| 21:04:40 | | magmaus3 (magmaus3) joins |
| 21:09:01 | | DopefishJustin quits [Read error: Connection reset by peer] |
| 21:15:15 | | BlueMaxima joins |
| 21:16:20 | | magmaus3 quits [Ping timeout: 240 seconds] |
| 21:18:47 | | ScenarioPlanet quits [Read error: Connection reset by peer] |
| 21:22:04 | | magmaus3 (magmaus3) joins |
| 21:26:20 | | magmaus3 quits [Ping timeout: 240 seconds] |
| 21:29:58 | | Ruthalas59 (Ruthalas) joins |
| 21:31:19 | | hitgrr8 quits [Client Quit] |
| 21:34:45 | | Ruthalas59 quits [Ping timeout: 272 seconds] |
| 21:43:51 | | magmaus3 (magmaus3) joins |
| 21:59:52 | | RealPerson joins |
| 22:08:25 | | aninternettroll quits [Read error: Connection reset by peer] |
| 22:08:36 | | aninternettroll (aninternettroll) joins |
| 22:16:58 | | RealPerson leaves |
| 22:17:50 | | ThreeHM quits [Ping timeout: 240 seconds] |
| 23:34:24 | | Arcorann (Arcorann) joins |
| 23:53:43 | | DogsRNice joins |
| 23:54:57 | | DopefishJustin joins |
| 23:54:57 | | DopefishJustin is now authenticated as DopefishJustin |
| 23:56:32 | | DLoader is now authenticated as DLoader |
| 23:59:54 | | Xesxen quits [Max SendQ exceeded] |
| 23:59:59 | | Xesxen (Xesxen) joins |