| 00:02:54 | | NickNick joins |
| 00:20:13 | | sec^nd quits [Remote host closed the connection] |
| 00:21:57 | | sec^nd (second) joins |
| 00:39:37 | | Junior joins |
| 00:40:13 | | Junior quits [Remote host closed the connection] |
| 00:48:36 | | HackMii quits [Remote host closed the connection] |
| 00:49:36 | | HackMii (hacktheplanet) joins |
| 00:55:11 | | sec^nd quits [Ping timeout: 248 seconds] |
| 00:56:06 | | BlueMaxima joins |
| 00:58:35 | | sec^nd (second) joins |
| 01:25:35 | | mut4ntm0nkey quits [Ping timeout: 248 seconds] |
| 01:26:29 | | jacobk quits [Ping timeout: 265 seconds] |
| 01:37:15 | | jacobk joins |
| 01:39:19 | | mut4ntm0nkey (mutantmonkey) joins |
| 01:57:51 | | michaelblob quits [Client Quit] |
| 01:59:22 | | FalconK_ quits [Quit: WeeChat 3.7.1] |
| 02:00:02 | | FalconK (FalconK) joins |
| 02:06:23 | | negativegray joins |
| 02:07:37 | <negativegray> | Hi! I'm looking for a specific fanfic but don't really have the disk space or the internet speed to comb the WARC batches for it, is this the proper channel to ask if someone has it in an archive? |
| 02:12:21 | | NickNick quits [Client Quit] |
| 02:23:57 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 02:34:07 | | onetruth quits [Client Quit] |
| 02:39:26 | | michaelblob (michaelblob) joins |
| 02:48:57 | <TheTechRobo> | negativegray: which archive are you looking at? |
| 02:49:10 | <TheTechRobo> | most official archiveteam stuff is in the wayback machine |
| 02:56:19 | <negativegray> | I don't know how to effectively search there, I'm looking at these: https://archive.org/details/archiveteam_fanfiction |
| 02:58:38 | <TheTechRobo> | From the collection info (https://archive.org/details/archiveteam_fanfiction?tab=about): "Fanfiction.net Safety Download <https://archive.org/details/fanfic_download_2012_01> is a single 2 GB tar file containing epub files, which may be easier to extract." |
| 03:02:55 | | sonick quits [Client Quit] |
| 03:02:56 | <TheTechRobo> | negativegray: ^ |
| 03:03:37 | <negativegray> | I've checked that, it doesn't have the complete thing I'm pretty sure |
| 03:04:40 | <negativegray> | TheTechRobo: or I'm very bad at searching through it |
| 03:07:02 | <TheTechRobo> | negativegray: Assuming it's not in the Wayback Machine (that grab was long before I joined AT so I don't know), you can look through the items' CDX |
| 03:07:10 | <TheTechRobo> | For example, for https://archive.org/download/archiveteam-fanfiction-warc-07, there are several cdx.gz files |
| 03:07:20 | <TheTechRobo> | Not sure which is the "correct" one but they're a lot smaller than the WARC |
| 03:07:33 | <TheTechRobo> | They basically list the WARC's contents, e.g. urls, capture time iirc |
| 03:08:06 | <TheTechRobo> | they also list which WARC contains the resource |
| 03:08:34 | <negativegray> | TheTechRobo: oooh, thank you! How do I open a cdx? |
| 03:09:10 | <TheTechRobo> | negativegray: It's just text, and there's plenty of documentation. |
| 03:09:13 | <TheTechRobo> | let me see if I can find some. |
| 03:10:27 | <TheTechRobo> | negativegray: Here you go! The first line of CDX is the legend, and it has letters that correspond to what the value is representing. I think it's space separated. |
| 03:10:31 | <TheTechRobo> | Here's the letter list: https://archive.org/web/researcher/cdx_legend.php |
| 03:10:48 | <negativegray> | TheTechRobo: thank you! |
| 03:10:49 | <TheTechRobo> | Not all letters will be present. |
| 03:12:12 | <negativegray> | TheTechRobo: I tried reading the .cdx and it did not help me, even with the legend |
| 03:12:34 | <TheTechRobo> | Hang on let me download it my internet is slow |
| 03:12:38 | <TheTechRobo> | I may have to go before it finishes |
| 03:13:46 | <negativegray> | okay! |
| 03:15:06 | <TheTechRobo> | negativegray: while it downloads, what information do you have about the fanfic? |
| 03:15:16 | <TheTechRobo> | do you have the URL? or do you need a full-text search? |
| 03:15:32 | <TheTechRobo> | if the latter, CDX won't work for oyu |
| 03:18:15 | <negativegray> | TheTechRobo: yeah I need a full text search. I have the author's name and the fanfic's name |
| 03:18:18 | <negativegray> | or title |
| 03:18:28 | <TheTechRobo> | In that case, yeah, CDX probably won't help you. :/ |
| 03:18:36 | <TheTechRobo> | Unless the url contains the title or something. |
| 03:19:12 | <negativegray> | yeah |
| 03:19:15 | <negativegray> | ty though |
| 03:20:11 | <TheTechRobo> | I'm not sure what you can do in that case. Does anybody have the fanfic warcs downloaded? |
| 03:20:18 | <TheTechRobo> | I have to go to bed btw, good night! |
| 03:20:38 | <negativegray> | good night! |
| 03:23:32 | <Doranwen> | negativegray: out of curiosity, what fandom? |
| 03:23:38 | <negativegray> | Harry Potter |
| 03:23:58 | <Doranwen> | Ah, I wouldn't have it. Could ask some friends of mine, though. |
| 03:24:24 | <Doranwen> | We've got a Discord server where we share info on deleted fics we have. |
| 03:25:52 | <negativegray> | oh! |
| 03:25:57 | <negativegray> | That'd be great! |
| 03:26:06 | <negativegray> | It is in portuguese, though |
| 04:04:22 | <negativegray> | okay! I got an URL for the author and the fic! |
| 04:06:07 | <negativegray> | I can only access the first chapter, though |
| 04:11:10 | | eroc1990 quits [Remote host closed the connection] |
| 04:11:35 | | eroc1990 (eroc1990) joins |
| 04:19:02 | | march_happy quits [Ping timeout: 268 seconds] |
| 04:19:20 | | march_happy (march_happy) joins |
| 04:20:35 | <negativegray> | gods, being so close hurts. I managed to get to the wayback machine page of the first chapter, but it seems to be the only one that there is on cache |
| 05:13:06 | | pabs quits [Ping timeout: 276 seconds] |
| 05:22:08 | | pabs (pabs) joins |
| 05:30:20 | | negativegray quits [Remote host closed the connection] |
| 06:50:48 | | pabs quits [Ping timeout: 265 seconds] |
| 06:52:24 | | Island quits [Read error: Connection reset by peer] |
| 06:53:42 | | pabs (pabs) joins |
| 07:20:14 | | hitgrr8 joins |
| 07:34:52 | | sonick (sonick) joins |
| 08:00:41 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 10:01:15 | | DiscantX joins |
| 10:07:15 | | Doomaholic joins |
| 10:10:23 | | DiscantX quits [Client Quit] |
| 10:11:06 | | DiscantX joins |
| 10:15:37 | <schwarzkatz|m> | Can you share a link, |
| 10:16:41 | <schwarzkatz|m> | Damn it, didn’t mean to send so early. |
| 10:16:41 | <schwarzkatz|m> | negativegray: can you share a link please? |
| 10:29:06 | <h2ibot> | Arkiver uploaded File:Buzzvideo-logo.png: https://wiki.archiveteam.org/?title=File%3ABuzzvideo-logo.png |
| 10:30:04 | | sec^nd quits [Remote host closed the connection] |
| 10:30:06 | <h2ibot> | Arkiver uploaded File:Buzzvideo-icon.png: https://wiki.archiveteam.org/?title=File%3ABuzzvideo-icon.png |
| 10:31:57 | | sec^nd (second) joins |
| 11:20:07 | | march_happy quits [Remote host closed the connection] |
| 11:22:47 | | march_happy (march_happy) joins |
| 11:31:27 | | sec^nd quits [Ping timeout: 248 seconds] |
| 11:32:55 | | sec^nd (second) joins |
| 12:07:34 | | le0n quits [Quit: see you later, alligator] |
| 12:09:04 | | le0n (le0n) joins |
| 12:28:18 | <@JAA> | (They left hours ago.) |
| 12:39:42 | <schwarzkatz|m> | ah. is that something only admins see? |
| 12:40:29 | <@JAA> | I have no idea what Matrix does with that information, but on IRC, anyone can see it. |
| 12:41:21 | <schwarzkatz|m> | hm, weird. |
| 12:41:28 | <joepie91|m> | I've been noticing that parts aren't bridging correctly lately |
| 12:41:31 | <joepie91|m> | I suspect a bridge bug |
| 13:08:45 | | Arcorann quits [Ping timeout: 268 seconds] |
| 13:13:19 | | Ketchup901 quits [Ping timeout: 248 seconds] |
| 13:16:02 | | Ketchup901 (Ketchup901) joins |
| 13:57:43 | | spirit joins |
| 14:01:12 | | gazorpazorp quits [Read error: Connection reset by peer] |
| 14:10:00 | | Ketchup901 quits [Remote host closed the connection] |
| 14:10:42 | | Ketchup901 (Ketchup901) joins |
| 14:12:21 | | Ketchup901 quits [Remote host closed the connection] |
| 14:13:22 | | Ketchup901 (Ketchup901) joins |
| 14:22:39 | | sec^nd quits [Ping timeout: 248 seconds] |
| 14:23:12 | | sec^nd (second) joins |
| 14:27:34 | | VerifiedJ quits [Quit: The Lounge - https://thelounge.chat] |
| 14:29:47 | | VerifiedJ (VerifiedJ) joins |
| 14:47:57 | <Frogging101> | Is yt-dlp able to download a YouTube channel that has more videos than the page limit? |
| 15:12:54 | | sonick quits [Client Quit] |
| 15:29:14 | <JTL> | can you provide an example channel? |
| 15:30:34 | <Doranwen> | schwarzkatz|m: They were looking for https://www.fanfiction.net/s/1888034/1/. It's not in the FanficRepack_Redux collection, which a friend of mine suggested looking in. |
| 15:32:51 | | Island joins |
| 15:34:07 | | march_happy quits [Remote host closed the connection] |
| 15:35:08 | | march_happy (march_happy) joins |
| 15:37:55 | | march_happy quits [Remote host closed the connection] |
| 15:38:43 | | march_happy (march_happy) joins |
| 15:45:14 | | HP_Archivist (HP_Archivist) joins |
| 16:01:12 | <@JAA> | Doranwen: Do we have any idea when it was deleted? |
| 16:01:22 | <@JAA> | The WBM snapshot is from 2005. |
| 16:06:39 | | mut4ntm0nkey quits [Ping timeout: 248 seconds] |
| 16:06:59 | | mut4ntm0nkey (mutantmonkey) joins |
| 17:24:03 | | qwertyasdfuiopghjkl joins |
| 18:04:05 | | DLoader quits [Ping timeout: 265 seconds] |
| 18:12:55 | | DLoader joins |
| 18:13:59 | | upintheairsheep joins |
| 18:14:32 | <upintheairsheep> | Hello, I would like to learn what tool https://archive.org/details/TikTok?tab=about is scraped by |
| 18:15:07 | <@arkiver> | internal, not related to IA |
| 18:15:13 | <upintheairsheep> | I know a lot about the comment API and the replies API |
| 18:15:42 | <upintheairsheep> | So is the ArchiveTeam not behind it? |
| 18:15:45 | <@arkiver> | no |
| 18:16:38 | | upintheairsheep leaves |
| 18:17:05 | <spirit> | NEXT! |
| 18:17:34 | <@arkiver> | :P |
| 18:18:06 | | upintheairsheep joins |
| 18:18:47 | <upintheairsheep> | To remind you, TikTok is going to remove videos related to tanning after warning from medical experts. https://www.theguardian.com/technology/2022/dec/01/tiktok-to-ban-videos-that-encourage-sunburn-and-tanning-after-alarm-from-medical-experts |
| 18:22:03 | <upintheairsheep> | Tag and videos still seem to be up: https://www.tiktok.com/tag/sunburnchallenge?lang=en |
| 18:22:51 | <upintheairsheep> | Other tags of interest: https://www.tiktok.com/tag/sunburn https://www.tiktok.com/tag/tanning https://www.tiktok.com/tag/sunbathing |
| 18:32:53 | | upintheairsheep quits [Remote host closed the connection] |
| 18:37:38 | | hackbug (hackbug) joins |
| 19:25:48 | | systwi_ (systwi) joins |
| 19:27:51 | | systwi quits [Ping timeout: 276 seconds] |
| 19:34:12 | <TheTechRobo> | How do you reverse engineer the requests that a Steam game makes? I was thinking of a proxy, but as far as I'm aware you can't configure its use. |
| 19:34:25 | <TheTechRobo> | Wireshark's fine but it captures ALL traffic... |
| 19:35:04 | <schwarzkatz|m> | it has powerful filtering tho |
| 19:36:50 | <TheTechRobo> | I don't know how to use it xD |
| 19:37:47 | <TheTechRobo> | I might be able to guess at the domain name, though. Is there a way to do that for wireshark? |
| 19:37:54 | <TheTechRobo> | Or guess at part of the domain name, at leasty. |
| 19:38:01 | <TheTechRobo> | (I know both the company and game name) |
| 19:38:31 | <schwarzkatz|m> | related documentation: |
| 19:38:31 | <schwarzkatz|m> | https://www.wireshark.org/docs/wsug_html_chunked/ChCapCaptureFilterSection.html |
| 19:38:31 | <schwarzkatz|m> | https://www.tcpdump.org/manpages/pcap-filter.7.html |
| 19:38:35 | <TheTechRobo> | Wireshark also isn't great for HTTP because it just gets the raw TCP data, no? There's likely ssl. |
| 19:39:23 | <schwarzkatz|m> | you'd need to use https://docs.mitmproxy.org/stable/ then I guess :D |
| 19:40:43 | <@JAA> | Depending on how the game validates TLS certs, it might be messy though. |
| 19:41:05 | <@JAA> | If it has its own cert store or hardcoded fingerprints or similar, for example. |
| 19:41:36 | <@JAA> | Then you'll need to either replace that (have fun) or use something like tcpdump/Wireshark and extract the master key (also fun). |
| 19:41:47 | <@JAA> | pre-master key* |
| 19:42:01 | <schwarzkatz|m> | if everything goes through mitmproxy though, why would it be messy? |
| 19:42:30 | <@JAA> | Because the client (game) needs to trust mitmproxy's CA cert for that to work. |
| 19:46:57 | <@JAA> | If it uses the system trust store, that's easy, but if it doesn't, mess. |
| 19:47:42 | <TheTechRobo> | Is there a linux way to get the traffic from a specific process given its PID? |
| 19:47:45 | <@JAA> | See also: you can't make browsers accept mitmproxy by adding the CA cert to the system trust store. Need to do it separately in the browser. |
| 19:48:07 | <schwarzkatz|m> | that... sucks. I thought it was system wide. |
| 19:50:04 | <@JAA> | TheTechRobo: Maybe some iptables magic would help here, but not sure. |
| 19:53:20 | <@JAA> | Stack Exchange suggests strace, network namespaces, and iptables: https://askubuntu.com/questions/11709/how-can-i-capture-network-traffic-of-a-single-process |
| 20:21:46 | | sudofox joins |
| 20:22:59 | <sudofox> | hiya. i'm looking for some tool recommendations. so i've been trying to archive all static assets from some websites i'm interested in for personal curiosity. i decided to finally give archiving user content from one of them a shot, but it kinda breaks my normal workflow of "try many URLs and git commit whatever i found" due to the sheer # of files |
| 20:23:34 | <sudofox> | i've started using git lfs but the reason i'm using git is mainly to actually see how much progress i've made/new things found each time i try something |
| 20:24:21 | <sudofox> | i'm wondering if there's a better tool to track progress with recovered files -- i'm also committing tooling for guessing filenames at the same time |
| 20:24:49 | <sudofox> | i guess i could use S3 but I still like being able to see what's new with `git status` and so on. |
| 20:25:40 | <sudofox> | also git lfs kinda duplicates objects into .git/lfs so double disk space |
| 20:32:56 | <@JAA> | Yeah, you'll want to get away from 'one file per asset' anyway probably. It just doesn't scale. Eventually, your file system will be sad as well. |
| 20:34:03 | <sudofox> | eh, yeah, you're right -- key-based object storage is probably much better for this stuff |
| 20:34:04 | <@JAA> | One route is WARC, but accessibility isn't exactly great with it. |
| 20:34:45 | <sudofox> | i've been thinking about building a little ceph server in my basement for a while for that purpose (instead of using Amazon) |
| 20:35:12 | <@JAA> | You get extra metadata and a technically more accurate capture that way, too. |
| 20:35:30 | <@JAA> | I suppose that would work as well, yeah. |
| 21:22:05 | | systwi (systwi) joins |
| 21:22:54 | | systwi_ quits [Ping timeout: 276 seconds] |
| 22:07:11 | | sonick (sonick) joins |
| 22:29:57 | <Doranwen> | JAA: No, he never mentioned that. Left his Reddit nick with me but that's all I've got. Oh well, lol. |
| 23:02:39 | | hitgrr8 quits [Client Quit] |
| 23:06:39 | | sudofox quits [Ping timeout: 265 seconds] |
| 23:26:47 | | fishingforsoup_ quits [Quit: Leaving] |
| 23:27:04 | | fishingforsoup joins |
| 23:40:42 | | Arcorann (Arcorann) joins |
| 23:48:24 | | jacksonchen666 (jacksonchen666) joins |
| 23:56:23 | | sudofox joins |