| 00:01:45 | | nulldata (nulldata) joins |
| 00:03:34 | <nicolas17> | oh this was captured by archiveteam :P |
| 00:18:43 | <Doranwen> | Heh, nothing like finding a topic on your exact issue marked with [SOLVED] - only to discover it was for an earlier version, and the last comment was from someone having the issue on your version, with no solution. /o\ |
| 00:34:31 | | fireonlive captures nicolas17 |
| 00:43:13 | <nicolas17> | JAA: ugh why is there no decent WARC tooling :( |
| 00:44:12 | <@JAA> | :-( |
| 00:44:37 | <nicolas17> | I want to download a lot of files |
| 00:45:12 | <nicolas17> | if I do it via WBM I get my IP blocked after like 20 of them |
| 00:45:20 | <nicolas17> | I don't want to download the whole warc because it's too big |
| 00:45:35 | <@JAA> | CDX munching time |
| 00:46:44 | <nicolas17> | ideally I'd get the cdx and make range requests in the warc for the files I want... but most are contiguous so I could get massive speed gains if I make a single range request to get multiple files at once... |
| 00:46:50 | <nicolas17> | but that's Work |
| 00:47:22 | <nicolas17> | at least they're .warc.gz so I don't need to mess with the zstd dictionary stuff |
| 00:47:44 | <@JAA> | Boring :-P |
| 00:48:08 | <nicolas17> | but once I have a warc record, then how do I parse it |
| 00:49:36 | <@JAA> | You can use `warc-dump-responses --meta | http-response-bodies` from my little-things, but it's not going to work for everything, and you just get all bodies concatenated(-ish) since it's intended for link extraction etc. |
| 00:54:49 | <Terbium> | I may be executed for this, but you can use warcio :P |
| 00:55:39 | <@JAA> | It's 'fine' for reading as long as you're aware of which data it corrupts (see wiki). |
| 01:14:32 | | fangfufu quits [Quit: ZNC 1.8.2+deb3.1 - https://znc.in] |
| 01:18:57 | | Mateon2 joins |
| 01:20:50 | | Mateon1 quits [Ping timeout: 240 seconds] |
| 01:20:50 | | Mateon2 is now known as Mateon1 |
| 01:21:43 | | fangfufu joins |
| 01:21:49 | | fangfufu is now authenticated as fangfufu |
| 01:40:51 | <fireonlive> | laptop opened for the first time: evening :x |
| 01:41:01 | <fireonlive> | and they say i'm chronically online |
| 01:41:25 | <fireonlive> | i guess the phone counts :D |
| 02:07:11 | <nicolas17> | JAA: the cdx points at the *response* record, right? |
| 02:09:40 | <nicolas17> | another mystery to me: |
| 02:10:00 | <nicolas17> | "curl -L https://archive.org/download/archiveteam_testflight_20150309202824/testflight_20150309202824.megawarc.warc.os.cdx.gz | zgrep 7b7ea8ce65967f6c3c329d8f4f4a8a6" why does this have "unk" mime type and "-" response code if the actual WARC record seems to be okay? |
| 02:12:56 | <nicolas17> | ugh |
| 02:13:02 | <nicolas17> | "most are contiguous so I could get massive speed gains if I make a single range request to get multiple files at once" |
| 02:13:04 | <nicolas17> | wrong |
| 02:13:16 | <nicolas17> | they are contiguous in the CDX, I think it's sorted by URL, they are not contiguous in the warc |
| 02:16:39 | <Terbium> | yep, it's not guaranteed the CDX order matches the WARC record order |
| 02:17:42 | <nicolas17> | then I'm not sure if it's worth the trouble to coalesce multiple files into a single HTTP range request |
| 02:18:33 | <Terbium> | you'll just need to preprocess the CDX to sort by offset to obtain what (should) match the WARC file |
| 02:18:49 | <nicolas17> | yeah but |
| 02:19:00 | <nicolas17> | I did that sorting |
| 02:19:05 | <nicolas17> | and I'm seeing big gaps between the files I need |
| 02:19:40 | <Terbium> | the CDX may be incomplete or deliberately omits certain record types |
| 02:19:56 | | Mateon2 joins |
| 02:21:00 | <Terbium> | in the end it's going to be difficult if there are lots of gaps and you need range requests |
| 02:21:50 | | Mateon1 quits [Ping timeout: 240 seconds] |
| 02:21:50 | | Mateon2 is now known as Mateon1 |
| 02:22:59 | <nicolas17> | https://transfer.archivete.am/inline/R5Za4/gaps.txt |
| 02:23:38 | <Terbium> | yeah, that's not good, is everything you need in that one megawarc? or is it split up across others? |
| 02:24:36 | <nicolas17> | making 4848 range requests for the exact pieces I need would be slow, downloading the entire warc would also be slow, in theory I could coalesce requests if the gap is "small enough" (1MB maybe?) but I don't know if the programming work involved would really be worth the trouble |
| 02:27:43 | <Terbium> | yeah, I tried from a host with peering close to IA, still getting low download speeds |
| 02:34:38 | <Terbium> | it's a shame IPFS is still in its infancy, would be perfect for distribution of this type of data |
| 02:35:06 | <anarcat> | ipfs is in its infancy? |
| 02:35:15 | <nicolas17> | in terms of adoption, probably? |
| 02:35:16 | <anarcat> | i thought it had been around for decades |
| 02:35:21 | <anarcat> | well maybe not decades |
| 02:35:25 | <anarcat> | at least one :p |
| 02:35:37 | <anarcat> | ah well, 2015 it seems |
| 02:35:45 | <Terbium> | it's only since 2015 |
| 02:35:54 | <Terbium> | they are also making major protocol changes every 2-3 months |
| 02:35:55 | <anarcat> | i guess i was an early adopter |
| 02:35:59 | <anarcat> | haha |
| 02:36:01 | <nicolas17> | that's about as old as these warcs |
| 02:36:02 | <anarcat> | oook |
| 02:36:07 | <Terbium> | I started using IPFS since 2015 when they started |
| 02:36:14 | <anarcat> | yeah here too |
| 02:36:18 | <anarcat> | i think i stopped in 2018 :p |
| 02:36:19 | <Terbium> | it's highly unstable with major protocol changes |
| 02:36:56 | <Terbium> | and large changes it features and behavior. I think it's a great idea, but the sheer number of changes and shifts in direction makes it difficult for end users to adopt |
| 02:37:15 | <anarcat> | incentives are also tricky |
| 02:37:18 | <anarcat> | anyway, really ot |
| 02:37:41 | <Terbium> | we are in -ot :P |
| 02:37:48 | <anarcat> | true |
| 02:43:18 | <Terbium> | might be faster to just ship IA a USB and manually copy over the megawarc :P |
| 02:43:43 | <Terbium> | their egress pipe is probably hammered to hell |
| 02:52:11 | <@arkiver> | nicolas17: concurrently making those requests wouldn't be too slow |
| 02:52:21 | <@arkiver> | the CDX is alphabetically ordered |
| 02:52:26 | <@arkiver> | the WARC is not |
| 02:57:53 | <Terbium> | i'm getting 250-350 KiB/s sequential on that megawarc :/ |
| 03:28:19 | | icedice quits [Client Quit] |
| 03:40:17 | | Barto quits [Ping timeout: 272 seconds] |
| 03:43:20 | <nicolas17> | https://data.nicolas17.xyz/testflight-list.html |
| 03:43:29 | <nicolas17> | ETA 11h for one of the megawarcs, this will take a while |
| 04:32:51 | | tzt quits [Ping timeout: 272 seconds] |
| 04:34:22 | | tzt (tzt) joins |
| 05:37:14 | | DogsRNice quits [Read error: Connection reset by peer] |
| 05:57:52 | <@arkiver> | nicolas17: are you doing this sequentially? |
| 05:58:20 | <nicolas17> | arkiver: I'm downloading like 10 warcs simultaneously right now |
| 05:58:30 | <@arkiver> | sounds good, you could up it to more even |
| 06:01:04 | <fireonlive> | in 2 weeks, 2014 will be 10 years ago |
| 06:01:49 | <@arkiver> | :P |
| 06:01:55 | <fireonlive> | :D |
| 06:01:57 | <@arkiver> | fireonlive: is 2014 significant? |
| 06:02:12 | <fireonlive> | nah, just time flying by 😵 |
| 06:02:53 | <nicolas17> | arkiver: I'm already easily losing track of what warcs are done and what aren't, and one of the downloads is going fast enough that the script generating the html frequently hits sqlite database locked :P |
| 06:03:00 | <nicolas17> | it's 3am, I'll sleep before making improvements |
| 06:03:59 | <fireonlive> | like lady gaga's "born this way" being released ~12 years ago now x_x |
| 06:04:32 | <fireonlive> | it came out last year, right? :D |
| 06:29:49 | <@arkiver> | nicolas17: alright :P |
| 06:32:00 | <nicolas17> | oh one of the downloads is doing at 30MB/s and making me CPU-bound actually |
| 06:52:06 | | HackMii quits [Remote host closed the connection] |
| 06:52:27 | | HackMii (hacktheplanet) joins |
| 07:01:14 | | tbc1887 quits [Read error: Connection reset by peer] |
| 07:13:17 | <fireonlive> | https://dl.fireon.live/irc/89def58ed643e973/This%20goes%20beyond%20ruined%20%F0%9F%98%A3%20%40Whitney%20Houston%20%23iwillalwaysloveyou%20%23remix%20%20%5B7238778258517462315%5D.mp4 |
| 07:13:21 | <fireonlive> | AHHHHHHHHHHHHHHHHHHHH |
| 07:26:18 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 07:31:20 | | MetaNova quits [Ping timeout: 240 seconds] |
| 07:37:02 | | MetaNova (MetaNova) joins |
| 07:48:05 | | Arcorann (Arcorann) joins |
| 07:52:02 | | Kap10G joins |
| 07:53:12 | <Kap10G> | var hurrayHeartz = [] |
| 07:53:22 | <Kap10G> | jQuery('a.download-pill').each(function( index ) { hurrayHeartz.push(( $(this).attr('href') ));}) |
| 07:53:31 | <Kap10G> | also paste jquery in console if needed |
| 07:55:24 | <fireonlive> | does it give me free mana |
| 07:55:37 | | Dango360 quits [Read error: Connection reset by peer] |
| 07:55:51 | <@arkiver> | Kap10G: what is this? |
| 07:56:44 | <Kap10G> | a way to grab all the hrefs on an archive archive |
| 07:56:52 | <Kap10G> | when the file is too big to zip |
| 07:57:01 | <Kap10G> | https://ctxt.io/2/AADQeo2REQ |
| 07:57:04 | <Kap10G> | example |
| 07:57:37 | <Kap10G> | internet archive has really nice rom zips |
| 07:57:50 | <Kap10G> | and some of the torrents are weak on support |
| 07:59:50 | <Kap10G> | paste in the inspector web console |
| 08:00:22 | <@arkiver> | Kap10G: did you talk with anyone here before about this? |
| 08:00:40 | <@arkiver> | this seems kind of odd, to paste this out of nowhere without context, but maybe i'm missing something |
| 08:00:43 | <Kap10G> | i talked with jason scott a long ass time ago he was #textfiles |
| 08:00:50 | <@arkiver> | yeah we know |
| 08:01:01 | <Kap10G> | i dunno just felt like coming in and saying thanks |
| 08:01:04 | <@arkiver> | (we know who jason is, not that you talked with him) |
| 08:01:06 | <Kap10G> | plus i'm a pretty good unix |
| 08:01:12 | <Kap10G> | machinist ;) |
| 08:01:18 | <Kap10G> | just seeing whats out there |
| 08:01:19 | <@arkiver> | i should note jason is barely here nowadays |
| 08:01:23 | <flashfire42> | Nice Guy. Called me an asshole on more than one occasion. Still one of my idols XD |
| 08:01:25 | <@arkiver> | so he may not see this |
| 08:01:31 | <Kap10G> | that's alright |
| 08:05:20 | <fireonlive> | you may email him at jasonscottdoesnotliketobethanked@textfiles.com |
| 08:05:31 | <Kap10G> | cool |
| 08:05:38 | <flashfire42> | lmao |
| 08:06:03 | <Kap10G> | he invited me to a discord |
| 08:06:06 | <Kap10G> | but i forgot what it was |
| 08:06:15 | <Kap10G> | i didn't use that |
| 08:06:25 | <Kap10G> | i'm pretty strictly irc for all that stuff |
| 08:06:33 | <fireonlive> | IRC++ |
| 08:06:34 | <eggdrop> | [karma] 'IRC' now has 1 karma! |
| 08:06:43 | <@arkiver> | jason is mostly Discord |
| 08:06:51 | <fireonlive> | Discord-- |
| 08:06:51 | <eggdrop> | [karma] 'Discord' now has -1 karma! |
| 08:07:10 | <Kap10G> | i was going to help with rust programming and javascript for emulators in browser or something |
| 08:07:37 | <Kap10G> | it obviously got done already |
| 08:09:48 | <fireonlive> | might have been 'fans of the internet archive' at https://discord.gg/YzgyBkmBby |
| 08:36:41 | | rohvani quits [Ping timeout: 272 seconds] |
| 08:51:09 | | Kap10G_ joins |
| 08:55:03 | | Kap10G quits [Ping timeout: 272 seconds] |
| 09:31:46 | | nulldata quits [Client Quit] |
| 09:32:08 | | nulldata (nulldata) joins |
| 09:43:43 | | Arcorann quits [Ping timeout: 265 seconds] |
| 09:45:51 | | Arcorann (Arcorann) joins |
| 09:48:13 | | Kap10G_ quits [Read error: Connection reset by peer] |
| 09:49:10 | | Kap10G joins |
| 09:56:45 | | Kap10G_ joins |
| 09:56:45 | | Kap10G quits [Read error: Connection reset by peer] |
| 09:58:08 | | c3manu (c3manu) joins |
| 09:58:13 | | c3manu quits [Max SendQ exceeded] |
| 09:58:29 | | c3manu (c3manu) joins |
| 10:00:01 | | Bleo1826 quits [Client Quit] |
| 10:00:55 | | cdreimanu (c3manu) joins |
| 10:00:55 | | cdreimanu quits [Max SendQ exceeded] |
| 10:01:12 | | cdreimanu (c3manu) joins |
| 10:01:21 | | Bleo1826 joins |
| 10:03:01 | | cdreimanu quits [Client Quit] |
| 10:03:18 | | cdreimanu (c3manu) joins |
| 10:03:20 | | c3manu quits [Ping timeout: 240 seconds] |
| 10:03:30 | | cdreimanu is now known as c3manu |
| 10:57:50 | | nicolas17 quits [Remote host closed the connection] |
| 10:58:11 | | nicolas17 joins |
| 11:06:36 | | Barto (Barto) joins |
| 11:17:33 | | nicolas17 quits [Ping timeout: 272 seconds] |
| 11:20:42 | | nicolas17 joins |
| 12:50:18 | | Kap10G_ quits [Remote host closed the connection] |
| 12:50:20 | | Arcorann quits [Ping timeout: 240 seconds] |
| 12:51:43 | | Kap10G joins |
| 12:52:20 | | pseudorizer quits [Ping timeout: 240 seconds] |
| 12:55:04 | | pseudorizer (pseudorizer) joins |
| 13:53:10 | | benjins2 joins |
| 14:43:26 | | icedice (icedice) joins |
| 15:33:22 | | neggles quits [Quit: bye friends - ZNC - https://znc.in] |
| 15:35:31 | | neggles (neggles) joins |
| 15:49:48 | | benjinsm joins |
| 15:50:11 | | benjins2_ joins |
| 15:50:50 | | benjins2 quits [Ping timeout: 240 seconds] |
| 15:51:07 | | benjinsmi joins |
| 15:51:09 | | benjins quits [Ping timeout: 272 seconds] |
| 15:54:20 | | benjinsm quits [Ping timeout: 240 seconds] |
| 16:21:46 | | DogsRNice joins |
| 16:29:41 | <nicolas17> | JAA: https://en.wikipedia.org/wiki/TestFlight#History |
| 16:30:06 | <nicolas17> | people discovering the testflight archive is a "leak" now -.- |
| 16:42:48 | <@JAA> | nicolas17: Yes, CDX only contains response records. |
| 16:44:57 | <@JAA> | Or actually, that might not be entirely true, but it definitely doesn't have request records. |
| 16:46:20 | <@JAA> | And also yes, CDX is sorted by the normalised URL in alphabetical order. |
| 16:47:18 | <@JAA> | I suppose it was a leak back in 2015-ish unless that data was intentionally exposed, but yeah. |
| 17:03:47 | <audrooku|m> | nicolas17: Such is the nature of archival work, unfortunately |
| 17:20:17 | <@JAA> | monohedron: Re -bs, what drives are that? |
| 17:37:58 | | Dango360 (Dango360) joins |
| 17:47:14 | | lumidify_ quits [Quit: leaving] |
| 17:51:56 | | lumidify (lumidify) joins |
| 18:01:03 | | Vokun uploaded an image: (1167KiB) < https://matrix.hackint.org/_matrix/media/v3/download/matrix.org/TxARPyiqQfBIDRoxJETKuVQJ/image.png > |
| 18:02:07 | <nukke> | how do I downvote on irc? |
| 18:04:11 | <@JAA> | IRC++ |
| 18:04:13 | <eggdrop> | [karma] 'IRC' now has 2 karma! |
| 18:04:13 | <@JAA> | Discord-- |
| 18:04:14 | <eggdrop> | [karma] 'Discord' now has -2 karma! |
| 18:04:16 | <@JAA> | Like this. |
| 18:06:22 | | benjins2__ joins |
| 18:07:31 | | benjins joins |
| 18:08:20 | | benjins2_ quits [Ping timeout: 240 seconds] |
| 18:08:35 | | benjinsmi quits [Ping timeout: 272 seconds] |
| 18:08:54 | | benjinsm joins |
| 18:10:19 | | benjins2 joins |
| 18:10:50 | | benjins2__ quits [Ping timeout: 240 seconds] |
| 18:12:23 | | benjins quits [Ping timeout: 272 seconds] |
| 18:30:24 | <fireonlive> | :D |
| 18:45:42 | <nicolas17> | so many noobs calling the testflight thing a "leak" that Jason Scott had to go clarify things in their discord server |
| 19:08:14 | <DigitalDragons> | curl -o a.html https://google.com/ |
| 19:08:18 | <DigitalDragons> | guys i leaked google |
| 19:30:41 | <fireonlive> | 😱 |
| 19:45:01 | <Terbium> | Discord-- |
| 19:45:01 | <eggdrop> | [karma] 'Discord' now has -3 karma! |
| 19:45:24 | <Terbium> | IRC++ |
| 19:45:25 | <eggdrop> | [karma] 'IRC' now has 3 karma! |
| 19:52:35 | <fireonlive> | :D |
| 19:52:43 | <fireonlive> | guess who's back, back again... https://nitter.net/malmoeb/status/1736646796993061071 |
| 19:58:48 | <@arkiver> | the ??? person? |
| 19:58:55 | <@arkiver> | (checking tweet now) |
| 19:59:05 | <@arkiver> | ah no |
| 19:59:14 | <fireonlive> | conficker in 2023 :D |
| 19:59:40 | <@JAA> | Joaquinito did show up again the other week. :-P |
| 20:18:54 | <ymgve> | I've been going through old amiga disks and like 10% of them are infected with some form of virus |
| 20:19:54 | <ymgve> | won't spread far outside winuae though |
| 20:28:19 | | BlueMaxima joins |
| 20:29:48 | | c3manu quits [Remote host closed the connection] |
| 20:41:40 | <fireonlive> | oh wow lol |
| 21:00:56 | <DogsRNice> | oh i have a dumb story about Conficker. i was like 11 at the time and looking at the news on my wii (didnt have a reliable computer at the time) and i read a story about Conficker and had a nightmare about it causing civilization to end |
| 21:20:51 | <fireonlive> | oof :c |
| 21:30:17 | | snivyy joins |
| 21:40:39 | <nulldata> | https://lounge.nulldata.foo/uploads/1c2deb245698a077/MG_1702935549670.jpg |
| 21:40:45 | <nulldata> | https://lounge.nulldata.foo/uploads/09c26f3b256b3953/MG_1702935554738.jpg |
| 21:43:53 | | snivyy quits [Remote host closed the connection] |
| 21:50:28 | | systwi quits [Read error: Connection reset by peer] |
| 21:59:18 | | systwi (systwi) joins |
| 22:26:04 | <DogsRNice> | lol |