00:00:03HackMii quits [Remote host closed the connection]
00:00:20HackMii (hacktheplanet) joins
00:23:15etnguyen03 quits [Quit: Konversation terminated!]
00:34:32ducky quits [Ping timeout: 260 seconds]
00:36:20ducky (ducky) joins
00:48:29ericgallager quits [Quit: This computer has gone to sleep]
00:58:27ericgallager joins
01:00:51Hackerpcs quits [Quit: Hackerpcs]
01:01:32<ericgallager>has anyone archived Zillow's climate risk scores? https://bsky.app/profile/volts.wtf/post/3m6upiqs67k2c
01:04:54<pokechu22>IIRC zillow is pretty anti-scraping
01:06:56Hackerpcs (Hackerpcs) joins
01:08:42Webuser421296 joins
01:09:01Webuser421296 quits [Client Quit]
01:17:29etnguyen03 (etnguyen03) joins
01:18:01<nicolas17>looking at parti-livestream
01:18:35jason joins
01:19:09<nicolas17>listing taking me 1 second per page ugh
01:19:45Czechball quits [Quit: Quit: Leaving]
01:21:47<nicolas17>200k files and I don't know how far I am
01:23:33<nicolas17>there are duplicated filessss
01:24:03<@JAA>I once listed a bucket that took almost two months and produced a couple hundred GB of compressed JSONL. Yep, that's how it goes. :-)
01:24:39<nicolas17>example (2MB):
01:24:40<nicolas17>c20d4d80a75defb0364ca43f14486f58cd210597 423581_557ce44a-1597-4555-8221-557bec85c2f0.png
01:24:41<nicolas17>c20d4d80a75defb0364ca43f14486f58cd210597 423581_713af7b0-2fbd-43ef-a963-d9821cf2a95a.png
01:25:02@JAA pretends to be surprised.
01:25:19<@JAA>.png sounds like it might be a thumbnail or similar?
01:26:39<nicolas17>https://media.parti.com/423581_713af7b0-2fbd-43ef-a963-d9821cf2a95a.png (bad content-type)
01:28:03<nicolas17>...really curious what that decimal number before the uuid means
01:28:22etnguyen03 quits [Client Quit]
01:29:57<nicolas17>1 million files, 630GiB, still going
01:30:19<@JAA>Yeah, I wouldn't be surprised if this were pretty big, especially if it's HLS with segments.
01:30:43<nicolas17>so far it's all images
01:30:44<nicolas17>???
01:34:41<@JAA>I just found that netcup employs Anubis, by the way. Custom message 'This site is protected by <a href="https://www.anexia.com/">ANEXIA</a>.' and the anime girl image is a slow 403.
01:38:51<nicolas17>1TiB
01:40:24<nexussfan>JAA: custom versions of anubis exists, but AFAIK they all use .within-website
01:42:19<nicolas17>oh god I got to the video streams, this is huge
01:45:19<nicolas17>/channels contains 298 dev-channel-<id> subdirectories which seems... strangely low?
01:47:00<nicolas17>"the web3 creator economy & live stream platform" okay having <300 active users makes more sense now
01:47:42<nicolas17>how come they're using google cloud storage instead of filecoin? :P
01:52:18<mystique_altrosky>cause they need something that actually works
01:57:30etnguyen03 (etnguyen03) joins
02:09:10tzt quits [Remote host closed the connection]
02:09:29tzt (tzt) joins
02:11:09nathang2184 quits [Quit: Ping timeout (120 seconds)]
02:11:28nathang2184 joins
02:19:00nathang2184 quits [Ping timeout: 256 seconds]
02:26:16<nicolas17>list has been running for an hour, 7M files, 5.2TiB, still going
02:26:41beardicus quits [Ping timeout: 272 seconds]
02:26:43<nicolas17>I started from scratch on my VPS which has better latency to google, I'm optimistic it will catch up with my local PC long before it finishes
02:27:50beardicus (beardicus) joins
02:30:17nathang2184 joins
02:30:48Czechball joins
02:31:13Doomaholic (Doomaholic) joins
02:34:21jason quits [Read error: Connection reset by peer]
02:34:46jason joins
02:38:27Wohlstand quits [Quit: Wohlstand]
02:38:27nathang2184 quits [Read error: Connection reset by peer]
02:38:38nathang2184 joins
02:45:40ducky quits [Ping timeout: 260 seconds]
02:46:19nathang2184 quits [Ping timeout: 272 seconds]
02:47:28ducky (ducky) joins
02:48:02sg72 quits [Remote host closed the connection]
02:49:11sg72 joins
02:59:40ducky quits [Ping timeout: 260 seconds]
03:01:35ducky (ducky) joins
03:04:10<nicolas17>21M files, 17.6TiB, still going
03:06:40ducky quits [Ping timeout: 260 seconds]
03:07:50Island quits [Read error: Connection reset by peer]
03:09:11nathang2184 joins
03:11:53ducky (ducky) joins
03:13:33cultpony quits [Ping timeout: 272 seconds]
03:14:16cultpony (cultpony) joins
03:14:49nathang2184 quits [Ping timeout: 272 seconds]
03:15:56<nicolas17>I estimate 60TB but I could be way off
03:16:56ducky quits [Ping timeout: 260 seconds]
03:24:23<h2ibot>BlankEclair edited List of websites excluded from the Wayback Machine/Partial exclusions (+55, …): https://wiki.archiveteam.org/?diff=58207&oldid=58146
03:27:50etnguyen03 quits [Client Quit]
03:30:30nathang2184 joins
03:32:11Lord_Nightmare quits [Quit: ZNC - http://znc.in]
03:33:24ducky (ducky) joins
03:35:49Lord_Nightmare (Lord_Nightmare) joins
03:37:37nathang2184 quits [Ping timeout: 272 seconds]
03:40:27etnguyen03 (etnguyen03) joins
03:44:51nathang2184 joins
03:59:21<nicolas17>ok finished channels/ and it seems there's more directories, so all bets are off now
03:59:36DogDisco joins
04:01:19<@JAA>Welcome to S3 bucket listing.
04:04:19etnguyen03 quits [Remote host closed the connection]
04:13:24ducky quits [Ping timeout: 260 seconds]
04:17:59<nicolas17>https://storage.googleapis.com/parti-livestream/?prefix=ivs 9% on this directory now
04:23:05<nicolas17>anyway I'm not sure if archiving this is feasible or useful
04:23:54Webuser669590 joins
04:24:04<nicolas17>by the time I'm done running the list, some videos from >30 days ago may have gotten removed, and there may be some new ones
04:24:14Webuser669590 quits [Client Quit]
04:24:15<nicolas17>I think *live* streams are in this same bucket even (so new HLS segment files are being added every 2 seconds)
04:30:38<Guest>nicolas17: ive seen that livestreams have ids assigned to them (maybe sequentially?), so i think it might have something to do with that
04:31:10<nicolas17><nicolas17> ...really curious what that decimal number before the uuid means
04:31:15<nicolas17>I suspect it's the *user* ID
04:31:27<Guest>also possible
04:31:48<nicolas17>because under channels/ there's 300 directories with similar numbers, each of them having multiple videos with a timestamp
04:32:02<nicolas17>so that number is the user/channel ID, not the video ID
04:35:01ducky (ducky) joins
04:36:50SootBector quits [Remote host closed the connection]
04:37:57SootBector (SootBector) joins
04:39:59<Guest>only 300?
04:40:34<nicolas17><nicolas17> "the web3 creator economy & live stream platform" okay having <300 active users makes more sense now
04:40:42<Guest>thats like what kick.com tried to do with twitch
04:41:29<nicolas17>note that if a user hasn't streamed in the last month (so saved streams already expired) and/or doesn't save stream recordings, I won't see the directory as existing at all
04:41:41<Guest>imo theres not much of a point in archiving (especially if theres only 300 channrls)
04:41:49<Guest>i thought the bucket was a lot smaller
04:42:24<nicolas17>I don't even know what's in ivs/, maybe it's similar to "clips"? in which case there's a *ton*
04:42:57<Guest>what are the file formats?
04:44:40ducky quits [Ping timeout: 260 seconds]
04:45:20ducky (ducky) joins
04:48:43<nicolas17>HLS
04:48:56<nicolas17>might be they used a different system for past streams
04:49:01<nicolas17>ivs might mean https://aws.amazon.com/ivs/
04:54:37<nicolas17>maybe the last month of streams is in channels/<id>/archive/<timestamp> but older stuff is in ivs? maybe they migrated systems around that time? 2025/10/27 is the most recent timestamp I see in ivs
04:57:40<Guest>parti was founded in 2017 and ivs was created in 2020, unless they changed the architecture since then (the site is really empty for an 8 year old company)
04:59:46<Guest>its kind of pointless whether they used ivs or not though
05:02:07<nicolas17>it seems ivs/ has stream recordings older than ~2025-10-27... whether they changed systems at that date, or they move them there after a month, doesn't really matter
05:03:44<nicolas17>either way my extrapolation so far says ivs/ is 100TB :p
05:05:26<nicolas17>and considering https://tracker.archiveteam.org/twitch/ was 550TB...
05:06:22<Guest> based on https://ivs.rocks/calculator i dont think they would use ivs
05:06:40<nicolas17>ah well that's for streaming
05:06:53<nicolas17>when the stream is over you can shove it into a regular S3 bucket
05:12:11<nicolas17>paying for 100TB of GCP storage is still no laughing matter tbh
05:15:55sec^nd quits [Remote host closed the connection]
05:16:15sec^nd (second) joins
05:23:52ducky quits [Ping timeout: 260 seconds]
05:26:04ducky (ducky) joins
05:31:20ducky quits [Ping timeout: 260 seconds]
05:31:23<nicolas17>Guest: does 1508 users sound more reasonable?
05:34:09jason quits [Ping timeout: 272 seconds]
05:34:35jason joins
05:36:37ducky (ducky) joins
06:09:36nexussfan quits [Quit: Konversation terminated!]
06:31:10jason quits [Read error: Connection reset by peer]
06:31:35jason joins
06:34:03jason quits [Read error: Connection reset by peer]
06:34:18jason joins
06:58:31Wohlstand (Wohlstand) joins