00:29:50 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
00:36:55 | | qwertyasdfuiopghjkl quits [Max SendQ exceeded] |
00:37:07 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
00:40:45 | | systwi quits [Ping timeout: 272 seconds] |
00:46:49 | <fireonlive> | -+rss- Nintendo Network shutdown – The beginning of the end: https://pretendo.network/blog/12-23-23 https://news.ycombinator.com/item?id=38766570 |
00:46:57 | <fireonlive> | we've maybe seen the nintendo stuff earlier yeah? |
00:52:03 | <Pedrosso> | So... What services does Nintendo hold with that network? |
00:54:35 | | systwi (systwi) joins |
00:58:44 | <fireonlive> | friend was complaining that all the player-made/shared levels for the first game would be unavailable |
00:58:59 | <fireonlive> | unsure if there's a way to do anything with that sadly |
01:01:50 | | ScenarioPlanet quits [Ping timeout: 240 seconds] |
01:03:32 | <Pedrosso> | Was there ever done a grab of Super Mario Maker games? What other sorts of things could be of interest of archival? |
01:13:08 | <Pedrosso> | courses* |
01:31:48 | | RealPerson leaves |
01:53:16 | | ScenarioPlanet (ScenarioPlanet) joins |
01:58:01 | | ScenarioPlanet quits [Ping timeout: 272 seconds] |
02:06:40 | | DLoader quits [Changing host] |
02:06:40 | | DLoader joins |
02:17:09 | | DogsRNice joins |
02:27:48 | | RealPerson joins |
02:44:26 | | RealPerson leaves |
02:46:16 | | RealPerson joins |
03:00:14 | | HP_Archivist quits [Client Quit] |
03:04:05 | | RealPerson leaves |
03:38:23 | <@JAA> | Assuming OneHallyu stays up, the topic retries should be done in a bit over 2 hours. I'll then run another similar thing for the remaining topics that are being done sequentially since that's so slow. Also one more topic failed with timeouts. |
03:38:37 | <@JAA> | Some of these topics have well over 10k pages, pretty insane. |
03:40:08 | <Terbium> | another forum bites the dust, forums are disappearing rapidly :( |
03:45:46 | <fireonlive> | everyone loves fucking discord these days |
03:46:32 | <Terbium> | sadly the case, forums are move to the free easy to use walled garden known as discord :/ |
03:47:38 | <Terbium> | wrote a discord archiver recently to archive discord servers into a database, hopefully their API doesn't change too much in the coming months |
04:20:13 | | BlueMaxima quits [Read error: Connection reset by peer] |
04:39:07 | <fireonlive> | i was using DiscordChatExporter but sadly it doesn't quite support those 'new fangled' 'forum' channels (and threads in normal channels) yet |
04:53:21 | | Island quits [Read error: Connection reset by peer] |
04:58:47 | | ScenarioPlanet (ScenarioPlanet) joins |
05:21:22 | <Pedrosso> | a discord archiver you say? |
05:23:20 | <fireonlive> | OwO |
05:32:19 | <Terbium> | Yeah, DiscordChatExporter didn't really suit my needs (large scale distributed crawling with database store) |
05:32:29 | <Terbium> | Decided to write up a basic crawler |
05:32:53 | <Terbium> | Currently doesn't grab attachments, but planned as a feature soon |
05:33:05 | <fireonlive> | :) |
05:35:14 | <fireonlive> | sounds like you do some fun stuff |
05:35:18 | <fireonlive> | :p |
05:39:25 | <Terbium> | More like preparing for Discord's inevitable demise :P |
05:40:37 | <fireonlive> | true true xP |
05:44:14 | <Pedrosso> | How is the crawler you wrote different than DiscordChatExporter? |
05:46:03 | <fireonlive> | at the very least i would assume discord hates it more |
05:46:05 | <fireonlive> | :p |
05:47:43 | <Terbium> | Python based (no .NET thankfully), dumps everything to database as fast as possible, distributed (can use multiple instances to allocate servers/channels to crawl with different accounts/IPs) |
05:48:02 | <Terbium> | No attachment downloading right now (can backfill later) |
05:48:13 | <fireonlive> | (no .NET thank you so much) |
05:48:23 | <Pedrosso> | what's so bad about .NET ? |
05:48:28 | <fireonlive> | it's not python ™ |
05:48:32 | <Pedrosso> | True |
05:48:40 | <Terbium> | I saw .NET, I gagged so hard I ended up writing my own crawler |
05:48:44 | <fireonlive> | :D |
05:48:55 | <fireonlive> | any issues forseen with discord requiring 'the parameters' soon? |
05:49:04 | <fireonlive> | for earlier grabs/crawls that might not have them |
05:49:29 | <Terbium> | It's simple enough to rewrite in go or rust, but I don't really care as it's not performance intensive (all IO bound) |
05:49:30 | <fireonlive> | (and i guess they'll expire at some point on the attachment urls?) |
05:50:10 | <fireonlive> | if we really wanted developers here, we'd just need to make a few posts around the internet saying 'rust would never be able to be up to the task of ArchiveTeam's needs' |
05:50:17 | <Terbium> | I believe you can regenerate the links with refresh tokens |
05:50:20 | <fireonlive> | and the RETF would descend hell on here |
05:50:31 | <fireonlive> | (rust evangelism task force) to prove us wrong |
05:51:00 | <Pedrosso> | o.o |
05:52:09 | <Terbium> | DiscordChatExporter is great for personal exporting for the average user, just didn't suit my needs. It's not a bad app for the casual archiver |
05:52:56 | <Pedrosso> | I feel like this fits in #discard lol |
05:53:30 | <Terbium> | I know Sanqui and TheTechRobo was working on Discard for MITM based crawling. I think that stalled |
05:54:04 | <Pedrosso> | what does DiscordChatExporter do badly, other than not being able to handle the new features? |
05:56:16 | <Terbium> | Mostly scalability |
05:57:20 | <Terbium> | Not as easy to use 50 Discord accounts. Multiple accounts are needed due to server cap for each account (unless you leave/rejoin to swap in and out servers) |
06:00:23 | <Pedrosso> | I see I see. I think I'd leave this to the non-casual archivers. Hah. How've you been using it so far? |
06:01:46 | <fireonlive> | i think if I read it correctly as well Terbium's can do 'follow-up's very easily |
06:01:54 | <fireonlive> | i.e. get new messages since last visit |
06:02:23 | <fireonlive> | (or maybe has it built in already to continuously do so) |
06:02:28 | <Terbium> | Just started, so slowly scaling up (trying to find large lists of discord servers),and throw 10 accounts and IPs at them |
06:03:14 | <fireonlive> | :) |
06:04:23 | <Pedrosso> | Does it go recursively too? As in if it finds an invite link does it try it and then go from there? |
06:04:43 | <Pedrosso> | hm, but I suppose that doesn't work for servers which you have to manually figure out stuff like roles for viewing channels.. |
06:05:38 | <Terbium> | Nope, it's very dumb right now, just crawls the servers the account has access to |
06:06:00 | <Terbium> | Yeah, the roles stuff causes lots of problems for me |
06:06:16 | <Terbium> | Especially "Verify your phone number" and all that nonsense |
06:06:41 | <fireonlive> | ah yeah the phone number thing :/ |
06:07:55 | <Terbium> | Would be nice if we had direct access to their SycllaDB clusters |
06:09:58 | <Pedrosso> | I suppose a dumb bot will get lots of content still. |
06:10:09 | <Pedrosso> | Especially since we have very long lists of servers |
06:13:45 | | monoxane quits [Quit: estoy fuera] |
06:19:45 | | project10 quits [Client Quit] |
06:20:01 | | project10 (project10) joins |
06:21:54 | | monoxane (monoxane) joins |
06:28:13 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
06:31:32 | | project10 quits [Changing host] |
06:31:32 | | project10 (project10) joins |
06:38:42 | <Pedrosso> | Terbium: Please do keep us up to date with this |
06:38:46 | | project10 quits [Max SendQ exceeded] |
06:39:05 | | project10 (project10) joins |
06:41:07 | <Terbium> | *disappears from AT for another 12 months* |
07:12:40 | | IDK (IDK) joins |
07:14:33 | <@JAA> | Ok, I should have all OneHallyu topic pages now, I think. |
07:15:56 | <fireonlive> | :D |
07:19:45 | <fireonlive> | are attachments to be attempted? |
07:20:09 | <@JAA> | Do you have an example? I couldn't find any. |
07:20:18 | <fireonlive> | oh, i don't |
07:20:25 | <fireonlive> | oh! i meant media i guess |
07:20:35 | <fireonlive> | i think you said you skipped .. something |
07:20:49 | <@JAA> | The only things I saw hosted on OneHallyu itself were avatars. But maybe I just didn't look in the right place. |
07:21:03 | <fireonlive> | ah ok :) |
07:21:09 | <fireonlive> | faulty memories! |
07:21:21 | <@JAA> | I did this with qwarc. qwarc doesn't care about HTML. So no page requisite extraction or similar. |
07:22:05 | <@JAA> | qwarc fetches a URL you give it and writes it to WARC. Basically everything else is left as an exercise to the user. |
07:23:08 | | DogsRNice quits [Read error: Connection reset by peer] |
07:26:22 | <fireonlive> | :) |
07:26:33 | <@JAA> | Oh, two topics failed. One is a 'count to a million' forum game, the other just a random small discussion. |
07:31:08 | <@JAA> | The former doesn't even have 5k pages, but it's extremely slow. |
07:31:27 | <@arkiver> | some inefficient pagination i guess |
07:31:52 | <@JAA> | No, there are far larger topics that are faster. |
07:32:00 | <@JAA> | Largest I saw had 18k pages. |
07:32:07 | <fireonlive> | damn |
07:32:27 | <@JAA> | (I didn't systematically check though, so maybe that isn't even the largest one that exists.) |
07:33:00 | <@JAA> | Anyway, it's getting grabbed now, whether the server likes it or not. :-) |
07:33:10 | <fireonlive> | 👀 |
07:36:13 | | DopefishJustin quits [Ping timeout: 272 seconds] |
07:36:46 | <@JAA> | Ah, now the response time is actually decent. |
07:36:54 | <@JAA> | https://transfer.archivete.am/inline/spNkP/explanation.png |
07:38:13 | <fireonlive> | :D |
07:48:27 | | DopefishJustin joins |
07:48:27 | | DopefishJustin is now authenticated as DopefishJustin |
08:15:36 | <@JAA> | That topic is done as well now, and that should be everything that's accessible. (I saw a small number of 403s.) |
08:16:00 | <@JAA> | src extraction is running but will take a little while. |
08:18:36 | <@arkiver> | outlinks going to #// ? |
08:19:24 | <@JAA> | Possibly later. Just focusing on onsite stuff now since that'll vanish very soon. |
08:19:28 | <@arkiver> | got it |
08:19:29 | <@arkiver> | sounds good |
08:33:47 | <SketchCow> | Merry Christmas, maniacs |
08:34:56 | <fireonlive> | thanks sketchy |
08:46:51 | | Ruthalas59 quits [Quit: END OF LINE] |
08:56:55 | | Ruthalas59 (Ruthalas) joins |
09:14:11 | | hitgrr8 joins |
09:57:20 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
09:58:45 | | qwertyasdfuiopghjkl quits [Max SendQ exceeded] |
09:58:57 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
10:00:05 | | Bleo18260 quits [Client Quit] |
10:01:29 | | Bleo18260 joins |
10:15:56 | | mv joins |
10:16:42 | | mv quits [Remote host closed the connection] |
10:21:49 | | IDK quits [Client Quit] |
10:33:14 | <pabs> | https://publicwww.com/ — a search engine for stuff in websites' source code. |
10:36:45 | <fireonlive> | ooh neat |
10:40:19 | <pabs> | paid beyond alexa rank 1mil, higher costs for further down the ranking |
10:40:31 | <pabs> | https://publicwww.com/prices.html |
10:40:40 | <pabs> | ouch, $499/month for all URLs |
10:41:11 | | pabs wonders how that compares to shodan |
10:41:33 | <pabs> | er 3mil not 1mil |
10:42:09 | <pabs> | and $49/month gets you all URLs, but only 100 searches/day up to 100K rows |
10:42:31 | <pabs> | hmm, I think the 1mil was without an account |
11:01:59 | | T31M quits [Quit: ZNC - https://znc.in] |
11:26:07 | <qwertyasdfuiopghjkl> | JAA: for OneHallyu, did you save the user profiles? (You can do https://onehallyu.com/profile/1--/ https://onehallyu.com/profile/2--/ https://onehallyu.com/profile/3--/ etc and get redirected to the correct name. Looks like there's also different tabs on each profile page that need to be requested separately.) |
12:17:11 | | BornOn420 quits [Remote host closed the connection] |
12:18:15 | | BornOn420 (BornOn420) joins |
12:18:43 | | BornOn420 quits [Remote host closed the connection] |
12:35:05 | <@JAA> | qwertyasdfuiopghjkl: No, only topics. |
12:39:07 | | ScenarioPlanet quits [Changing host] |
12:39:07 | | ScenarioPlanet (ScenarioPlanet) joins |
13:22:39 | | Basis joins |
13:23:55 | | systwi quits [Ping timeout: 272 seconds] |
13:26:50 | | Arcorann quits [Ping timeout: 240 seconds] |
13:31:47 | | jacksonchen666 (jacksonchen666) joins |
13:38:41 | | systwi (systwi) joins |
14:47:24 | | RealPerson joins |
14:51:58 | <@JAA> | I started an AB job for the OneHallyu src values I managed to extract, but it looks like the site is dying now and returning HTTP 522 (Buttflare's code for connection timeout to the upstream server) for a lot of things. |
14:52:31 | <@JAA> | So maybe they took the server online and only what remains in Buttflare's cache is still around. |
14:52:35 | <@JAA> | offline* |
15:02:17 | | T31M joins |
15:02:57 | | RealPerson leaves |
15:03:28 | | T31M is now authenticated as T31M |
15:07:56 | <that_lurker> | Could someone grab the upcoming Finnish presidential election candidates websites. https://lounge.kuhaon.fun/folder/65908e5765e73d9f/FinnishPresidentialElectionCandidates.txt |
15:10:55 | <that_lurker> | More info https://en.wikipedia.org/wiki/2024_Finnish_presidential_election |
15:23:03 | | RealPerson joins |
15:30:17 | | qwertyasdfuiopghjkl quits [Client Quit] |
15:32:48 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
15:34:13 | | qwertyasdfuiopghjkl quits [Max SendQ exceeded] |
15:34:30 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
15:48:58 | | RealPerson leaves |
15:54:44 | <@JAA> | that_lurker: #vooterbooter |
15:55:05 | <@JAA> | I'll run them later if nobody beats me to it. |
15:55:22 | <that_lurker> | ooh did not know there was a channel for this |
15:58:47 | | RealPerson joins |
16:09:16 | | RealPerson leaves |
16:10:22 | | RealPerson joins |
16:13:51 | <that_lurker> | also thanks :-) |
16:31:43 | | RealPerson leaves |
16:32:01 | | systwi quits [Ping timeout: 272 seconds] |
16:59:53 | | Scen joins |
17:03:41 | | ScenarioPlanet quits [Ping timeout: 272 seconds] |
17:04:03 | | systwi (systwi) joins |
17:18:08 | | Scen quits [Client Quit] |
17:18:44 | | ScenarioPlanet (ScenarioPlanet) joins |
17:20:01 | | ScenarioPlanet is now known as Scen |
17:20:17 | | Scen is now authenticated as * |
17:20:22 | | Scen is now authenticated as Scen |
17:20:34 | | Scen is now known as ScenarioPlanet |
17:20:44 | | ScenarioPlanet is now authenticated as * |
17:20:44 | | ScenarioPlanet is now authenticated as ScenarioPlanet |
18:20:50 | | aninternettroll quits [Read error: Connection reset by peer] |
18:20:55 | | aninternettroll (aninternettroll) joins |
18:29:49 | | jacksonchen666 quits [Ping timeout: 250 seconds] |
18:32:37 | | jacksonchen666 (jacksonchen666) joins |
19:23:50 | | kiryu quits [Ping timeout: 240 seconds] |
20:06:57 | | ScenarioPlanet quits [Client Quit] |
20:07:47 | | ScenarioPlanet (ScenarioPlanet) joins |
20:15:51 | | Island joins |
20:30:55 | | ScenarioPlanet quits [Client Quit] |
20:31:10 | | ScenarioPlanet (ScenarioPlanet) joins |
20:51:00 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
21:03:43 | | Ruthalas59 quits [Ping timeout: 272 seconds] |
21:04:40 | | magmaus3 (magmaus3) joins |
21:09:01 | | DopefishJustin quits [Read error: Connection reset by peer] |
21:15:15 | | BlueMaxima joins |
21:16:20 | | magmaus3 quits [Ping timeout: 240 seconds] |
21:18:47 | | ScenarioPlanet quits [Read error: Connection reset by peer] |
21:22:04 | | magmaus3 (magmaus3) joins |
21:26:20 | | magmaus3 quits [Ping timeout: 240 seconds] |
21:29:58 | | Ruthalas59 (Ruthalas) joins |
21:31:19 | | hitgrr8 quits [Client Quit] |
21:34:45 | | Ruthalas59 quits [Ping timeout: 272 seconds] |
21:43:51 | | magmaus3 (magmaus3) joins |
21:59:52 | | RealPerson joins |
22:08:25 | | aninternettroll quits [Read error: Connection reset by peer] |
22:08:36 | | aninternettroll (aninternettroll) joins |
22:16:58 | | RealPerson leaves |
22:17:50 | | ThreeHM quits [Ping timeout: 240 seconds] |
23:34:24 | | Arcorann (Arcorann) joins |
23:53:43 | | DogsRNice joins |
23:54:57 | | DopefishJustin joins |
23:54:57 | | DopefishJustin is now authenticated as DopefishJustin |
23:56:32 | | DLoader is now authenticated as DLoader |
23:59:54 | | Xesxen quits [Max SendQ exceeded] |
23:59:59 | | Xesxen (Xesxen) joins |