00:01:45nulldata (nulldata) joins
00:03:34<nicolas17>oh this was captured by archiveteam :P
00:18:43<Doranwen>Heh, nothing like finding a topic on your exact issue marked with [SOLVED] - only to discover it was for an earlier version, and the last comment was from someone having the issue on your version, with no solution. /o\
00:34:31fireonlive captures nicolas17
00:43:13<nicolas17>JAA: ugh why is there no decent WARC tooling :(
00:44:12<@JAA>:-(
00:44:37<nicolas17>I want to download a lot of files
00:45:12<nicolas17>if I do it via WBM I get my IP blocked after like 20 of them
00:45:20<nicolas17>I don't want to download the whole warc because it's too big
00:45:35<@JAA>CDX munching time
00:46:44<nicolas17>ideally I'd get the cdx and make range requests in the warc for the files I want... but most are contiguous so I could get massive speed gains if I make a single range request to get multiple files at once...
00:46:50<nicolas17>but that's Work
00:47:22<nicolas17>at least they're .warc.gz so I don't need to mess with the zstd dictionary stuff
00:47:44<@JAA>Boring :-P
00:48:08<nicolas17>but once I have a warc record, then how do I parse it
00:49:36<@JAA>You can use `warc-dump-responses --meta | http-response-bodies` from my little-things, but it's not going to work for everything, and you just get all bodies concatenated(-ish) since it's intended for link extraction etc.
00:54:49<Terbium>I may be executed for this, but you can use warcio :P
00:55:39<@JAA>It's 'fine' for reading as long as you're aware of which data it corrupts (see wiki).
01:14:32fangfufu quits [Quit: ZNC 1.8.2+deb3.1 - https://znc.in]
01:18:57Mateon2 joins
01:20:50Mateon1 quits [Ping timeout: 240 seconds]
01:20:50Mateon2 is now known as Mateon1
01:21:43fangfufu joins
01:40:51<fireonlive>laptop opened for the first time: evening :x
01:41:01<fireonlive>and they say i'm chronically online
01:41:25<fireonlive>i guess the phone counts :D
02:07:11<nicolas17>JAA: the cdx points at the *response* record, right?
02:09:40<nicolas17>another mystery to me:
02:10:00<nicolas17>"curl -L https://archive.org/download/archiveteam_testflight_20150309202824/testflight_20150309202824.megawarc.warc.os.cdx.gz | zgrep 7b7ea8ce65967f6c3c329d8f4f4a8a6" why does this have "unk" mime type and "-" response code if the actual WARC record seems to be okay?
02:12:56<nicolas17>ugh
02:13:02<nicolas17>"most are contiguous so I could get massive speed gains if I make a single range request to get multiple files at once"
02:13:04<nicolas17>wrong
02:13:16<nicolas17>they are contiguous in the CDX, I think it's sorted by URL, they are not contiguous in the warc
02:16:39<Terbium>yep, it's not guaranteed the CDX order matches the WARC record order
02:17:42<nicolas17>then I'm not sure if it's worth the trouble to coalesce multiple files into a single HTTP range request
02:18:33<Terbium>you'll just need to preprocess the CDX to sort by offset to obtain what (should) match the WARC file
02:18:49<nicolas17>yeah but
02:19:00<nicolas17>I did that sorting
02:19:05<nicolas17>and I'm seeing big gaps between the files I need
02:19:40<Terbium>the CDX may be incomplete or deliberately omits certain record types
02:19:56Mateon2 joins
02:21:00<Terbium>in the end it's going to be difficult if there are lots of gaps and you need range requests
02:21:50Mateon1 quits [Ping timeout: 240 seconds]
02:21:50Mateon2 is now known as Mateon1
02:22:59<nicolas17>https://transfer.archivete.am/inline/R5Za4/gaps.txt
02:23:38<Terbium>yeah, that's not good, is everything you need in that one megawarc? or is it split up across others?
02:24:36<nicolas17>making 4848 range requests for the exact pieces I need would be slow, downloading the entire warc would also be slow, in theory I could coalesce requests if the gap is "small enough" (1MB maybe?) but I don't know if the programming work involved would really be worth the trouble
02:27:43<Terbium>yeah, I tried from a host with peering close to IA, still getting low download speeds
02:34:38<Terbium>it's a shame IPFS is still in its infancy, would be perfect for distribution of this type of data
02:35:06<anarcat>ipfs is in its infancy?
02:35:15<nicolas17>in terms of adoption, probably?
02:35:16<anarcat>i thought it had been around for decades
02:35:21<anarcat>well maybe not decades
02:35:25<anarcat>at least one :p
02:35:37<anarcat>ah well, 2015 it seems
02:35:45<Terbium>it's only since 2015
02:35:54<Terbium>they are also making major protocol changes every 2-3 months
02:35:55<anarcat>i guess i was an early adopter
02:35:59<anarcat>haha
02:36:01<nicolas17>that's about as old as these warcs
02:36:02<anarcat>oook
02:36:07<Terbium>I started using IPFS since 2015 when they started
02:36:14<anarcat>yeah here too
02:36:18<anarcat>i think i stopped in 2018 :p
02:36:19<Terbium>it's highly unstable with major protocol changes
02:36:56<Terbium>and large changes it features and behavior. I think it's a great idea, but the sheer number of changes and shifts in direction makes it difficult for end users to adopt
02:37:15<anarcat>incentives are also tricky
02:37:18<anarcat>anyway, really ot
02:37:41<Terbium>we are in -ot :P
02:37:48<anarcat>true
02:43:18<Terbium>might be faster to just ship IA a USB and manually copy over the megawarc :P
02:43:43<Terbium>their egress pipe is probably hammered to hell
02:52:11<@arkiver>nicolas17: concurrently making those requests wouldn't be too slow
02:52:21<@arkiver>the CDX is alphabetically ordered
02:52:26<@arkiver>the WARC is not
02:57:53<Terbium>i'm getting 250-350 KiB/s sequential on that megawarc :/
03:28:19icedice quits [Client Quit]
03:40:17Barto quits [Ping timeout: 272 seconds]
03:43:20<nicolas17>https://data.nicolas17.xyz/testflight-list.html
03:43:29<nicolas17>ETA 11h for one of the megawarcs, this will take a while
04:32:51tzt quits [Ping timeout: 272 seconds]
04:34:22tzt (tzt) joins
05:37:14DogsRNice quits [Read error: Connection reset by peer]
05:57:52<@arkiver>nicolas17: are you doing this sequentially?
05:58:20<nicolas17>arkiver: I'm downloading like 10 warcs simultaneously right now
05:58:30<@arkiver>sounds good, you could up it to more even
06:01:04<fireonlive>in 2 weeks, 2014 will be 10 years ago
06:01:49<@arkiver>:P
06:01:55<fireonlive>:D
06:01:57<@arkiver>fireonlive: is 2014 significant?
06:02:12<fireonlive>nah, just time flying by 😵
06:02:53<nicolas17>arkiver: I'm already easily losing track of what warcs are done and what aren't, and one of the downloads is going fast enough that the script generating the html frequently hits sqlite database locked :P
06:03:00<nicolas17>it's 3am, I'll sleep before making improvements
06:03:59<fireonlive>like lady gaga's "born this way" being released ~12 years ago now x_x
06:04:32<fireonlive>it came out last year, right? :D
06:29:49<@arkiver>nicolas17: alright :P
06:32:00<nicolas17>oh one of the downloads is doing at 30MB/s and making me CPU-bound actually
06:52:06HackMii quits [Remote host closed the connection]
06:52:27HackMii (hacktheplanet) joins
07:01:14tbc1887 quits [Read error: Connection reset by peer]
07:13:17<fireonlive>https://dl.fireon.live/irc/89def58ed643e973/This%20goes%20beyond%20ruined%20%F0%9F%98%A3%20%40Whitney%20Houston%20%23iwillalwaysloveyou%20%23remix%20%20%5B7238778258517462315%5D.mp4
07:13:21<fireonlive>AHHHHHHHHHHHHHHHHHHHH
07:26:18BlueMaxima quits [Read error: Connection reset by peer]
07:31:20MetaNova quits [Ping timeout: 240 seconds]
07:37:02MetaNova (MetaNova) joins
07:48:05Arcorann (Arcorann) joins
07:52:02Kap10G joins
07:53:12<Kap10G>var hurrayHeartz = []
07:53:22<Kap10G>jQuery('a.download-pill').each(function( index ) { hurrayHeartz.push(( $(this).attr('href') ));})
07:53:31<Kap10G>also paste jquery in console if needed
07:55:24<fireonlive>does it give me free mana
07:55:37Dango360 quits [Read error: Connection reset by peer]
07:55:51<@arkiver>Kap10G: what is this?
07:56:44<Kap10G>a way to grab all the hrefs on an archive archive
07:56:52<Kap10G>when the file is too big to zip
07:57:01<Kap10G>https://ctxt.io/2/AADQeo2REQ
07:57:04<Kap10G>example
07:57:37<Kap10G>internet archive has really nice rom zips
07:57:50<Kap10G>and some of the torrents are weak on support
07:59:50<Kap10G>paste in the inspector web console
08:00:22<@arkiver>Kap10G: did you talk with anyone here before about this?
08:00:40<@arkiver>this seems kind of odd, to paste this out of nowhere without context, but maybe i'm missing something
08:00:43<Kap10G>i talked with jason scott a long ass time ago he was #textfiles
08:00:50<@arkiver>yeah we know
08:01:01<Kap10G>i dunno just felt like coming in and saying thanks
08:01:04<@arkiver>(we know who jason is, not that you talked with him)
08:01:06<Kap10G>plus i'm a pretty good unix
08:01:12<Kap10G>machinist ;)
08:01:18<Kap10G>just seeing whats out there
08:01:19<@arkiver>i should note jason is barely here nowadays
08:01:23<flashfire42>Nice Guy. Called me an asshole on more than one occasion. Still one of my idols XD
08:01:25<@arkiver>so he may not see this
08:01:31<Kap10G>that's alright
08:05:20<fireonlive>you may email him at jasonscottdoesnotliketobethanked@textfiles.com
08:05:31<Kap10G>cool
08:05:38<flashfire42>lmao
08:06:03<Kap10G>he invited me to a discord
08:06:06<Kap10G>but i forgot what it was
08:06:15<Kap10G>i didn't use that
08:06:25<Kap10G>i'm pretty strictly irc for all that stuff
08:06:33<fireonlive>IRC++
08:06:34<eggdrop>[karma] 'IRC' now has 1 karma!
08:06:43<@arkiver>jason is mostly Discord
08:06:51<fireonlive>Discord--
08:06:51<eggdrop>[karma] 'Discord' now has -1 karma!
08:07:10<Kap10G>i was going to help with rust programming and javascript for emulators in browser or something
08:07:37<Kap10G>it obviously got done already
08:09:48<fireonlive>might have been 'fans of the internet archive' at https://discord.gg/YzgyBkmBby
08:36:41rohvani quits [Ping timeout: 272 seconds]
08:51:09Kap10G_ joins
08:55:03Kap10G quits [Ping timeout: 272 seconds]
09:31:46nulldata quits [Client Quit]
09:32:08nulldata (nulldata) joins
09:43:43Arcorann quits [Ping timeout: 265 seconds]
09:45:51Arcorann (Arcorann) joins
09:48:13Kap10G_ quits [Read error: Connection reset by peer]
09:49:10Kap10G joins
09:56:45Kap10G_ joins
09:56:45Kap10G quits [Read error: Connection reset by peer]
09:58:08c3manu (c3manu) joins
09:58:13c3manu quits [Max SendQ exceeded]
09:58:29c3manu (c3manu) joins
10:00:01Bleo1826 quits [Client Quit]
10:00:55cdreimanu (c3manu) joins
10:00:55cdreimanu quits [Max SendQ exceeded]
10:01:12cdreimanu (c3manu) joins
10:01:21Bleo1826 joins
10:03:01cdreimanu quits [Client Quit]
10:03:18cdreimanu (c3manu) joins
10:03:20c3manu quits [Ping timeout: 240 seconds]
10:03:30cdreimanu is now known as c3manu
10:57:50nicolas17 quits [Remote host closed the connection]
10:58:11nicolas17 joins
11:06:36Barto (Barto) joins
11:17:33nicolas17 quits [Ping timeout: 272 seconds]
11:20:42nicolas17 joins
12:50:18Kap10G_ quits [Remote host closed the connection]
12:50:20Arcorann quits [Ping timeout: 240 seconds]
12:51:43Kap10G joins
12:52:20pseudorizer quits [Ping timeout: 240 seconds]
12:55:04pseudorizer (pseudorizer) joins
13:53:10benjins2 joins
14:43:26icedice (icedice) joins
15:33:22neggles quits [Quit: bye friends - ZNC - https://znc.in]
15:35:31neggles (neggles) joins
15:49:48benjinsm joins
15:50:11benjins2_ joins
15:50:50benjins2 quits [Ping timeout: 240 seconds]
15:51:07benjinsmi joins
15:51:09benjins quits [Ping timeout: 272 seconds]
15:54:20benjinsm quits [Ping timeout: 240 seconds]
16:21:46DogsRNice joins
16:29:41<nicolas17>JAA: https://en.wikipedia.org/wiki/TestFlight#History
16:30:06<nicolas17>people discovering the testflight archive is a "leak" now -.-
16:42:48<@JAA>nicolas17: Yes, CDX only contains response records.
16:44:57<@JAA>Or actually, that might not be entirely true, but it definitely doesn't have request records.
16:46:20<@JAA>And also yes, CDX is sorted by the normalised URL in alphabetical order.
16:47:18<@JAA>I suppose it was a leak back in 2015-ish unless that data was intentionally exposed, but yeah.
17:03:47<audrooku|m>nicolas17: Such is the nature of archival work, unfortunately
17:20:17<@JAA>monohedron: Re -bs, what drives are that?
17:37:58Dango360 (Dango360) joins
17:47:14lumidify_ quits [Quit: leaving]
17:51:56lumidify (lumidify) joins
18:01:03Vokun uploaded an image: (1167KiB) < https://matrix.hackint.org/_matrix/media/v3/download/matrix.org/TxARPyiqQfBIDRoxJETKuVQJ/image.png >
18:02:07<nukke>how do I downvote on irc?
18:04:11<@JAA>IRC++
18:04:13<eggdrop>[karma] 'IRC' now has 2 karma!
18:04:13<@JAA>Discord--
18:04:14<eggdrop>[karma] 'Discord' now has -2 karma!
18:04:16<@JAA>Like this.
18:06:22benjins2__ joins
18:07:31benjins joins
18:08:20benjins2_ quits [Ping timeout: 240 seconds]
18:08:35benjinsmi quits [Ping timeout: 272 seconds]
18:08:54benjinsm joins
18:10:19benjins2 joins
18:10:50benjins2__ quits [Ping timeout: 240 seconds]
18:12:23benjins quits [Ping timeout: 272 seconds]
18:30:24<fireonlive>:D
18:45:42<nicolas17>so many noobs calling the testflight thing a "leak" that Jason Scott had to go clarify things in their discord server
19:08:14<DigitalDragons>curl -o a.html https://google.com/
19:08:18<DigitalDragons>guys i leaked google
19:30:41<fireonlive>😱
19:45:01<Terbium>Discord--
19:45:01<eggdrop>[karma] 'Discord' now has -3 karma!
19:45:24<Terbium>IRC++
19:45:25<eggdrop>[karma] 'IRC' now has 3 karma!
19:52:35<fireonlive>:D
19:52:43<fireonlive>guess who's back, back again... https://nitter.net/malmoeb/status/1736646796993061071
19:58:48<@arkiver>the ??? person?
19:58:55<@arkiver>(checking tweet now)
19:59:05<@arkiver>ah no
19:59:14<fireonlive>conficker in 2023 :D
19:59:40<@JAA>Joaquinito did show up again the other week. :-P
20:18:54<ymgve>I've been going through old amiga disks and like 10% of them are infected with some form of virus
20:19:54<ymgve>won't spread far outside winuae though
20:28:19BlueMaxima joins
20:29:48c3manu quits [Remote host closed the connection]
20:41:40<fireonlive>oh wow lol
21:00:56<DogsRNice>oh i have a dumb story about Conficker. i was like 11 at the time and looking at the news on my wii (didnt have a reliable computer at the time) and i read a story about Conficker and had a nightmare about it causing civilization to end
21:20:51<fireonlive>oof :c
21:30:17snivyy joins
21:40:39<nulldata>https://lounge.nulldata.foo/uploads/1c2deb245698a077/MG_1702935549670.jpg
21:40:45<nulldata>https://lounge.nulldata.foo/uploads/09c26f3b256b3953/MG_1702935554738.jpg
21:43:53snivyy quits [Remote host closed the connection]
21:50:28systwi quits [Read error: Connection reset by peer]
21:59:18systwi (systwi) joins
22:26:04<DogsRNice>lol