00:09:56LegitSi quits [Remote host closed the connection]
00:39:44pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
00:41:35pabs (pabs) joins
00:51:39Hackerpcs quits [Quit: Hackerpcs]
00:53:48Hackerpcs (Hackerpcs) joins
01:02:51yawkat quits [Ping timeout: 255 seconds]
01:03:08gazorpazorp quits [Read error: Connection reset by peer]
01:03:14gazorpazorp (gazorpazorp) joins
01:06:52tzt quits [Client Quit]
01:07:06tzt (tzt) joins
01:11:54yawkat (yawkat) joins
02:10:32lukash79 joins
02:13:32<tech234a>Regarding sweb there is a listing of some sites but I doubt it is complete http://rozcestnik.sweb.cz/
02:18:38<tech234a>Also here is the last capture of the help site that isn't an error message http://web.archive.org/web/20220427152748/http://napoveda-sweb.sweb.cz/
02:19:05<tech234a>Additionally apparently an older URL format was sweb.cz/username
03:03:08march_happy quits [Ping timeout: 265 seconds]
03:03:16march_happy (march_happy) joins
03:04:06ThreeHM quits [Ping timeout: 265 seconds]
03:04:30ThreeHM (ThreeHeadedMonkey) joins
03:09:45march_happy quits [Ping timeout: 255 seconds]
03:10:07march_happy (march_happy) joins
04:18:23<tech234a>YT ended the 4k experiment but it could reappear in the future https://www.theverge.com/2022/10/17/23410072/youtube-4k-premium-feature-test-ends
04:26:58<h2ibot>Wickedplayer494 edited Dota 2 (+141, Dev forums are dead): https://wiki.archiveteam.org/?diff=49096&oldid=48715
04:50:59michaelblob_ (michaelblob) joins
04:54:36michaelblob quits [Ping timeout: 255 seconds]
05:12:45pabs quits [Read error: Connection reset by peer]
05:13:38pabs (pabs) joins
05:24:45march_happy quits [Ping timeout: 255 seconds]
05:25:03march_happy (march_happy) joins
05:27:06<h2ibot>Wickedplayer494 edited SteamDB (+17, /* Vital signs */ xPaw has put some…): https://wiki.archiveteam.org/?diff=49097&oldid=28976
05:32:07<h2ibot>Wickedplayer494 edited Template:Navigation box (-7, Reflecting Vkontakte page move in navbox): https://wiki.archiveteam.org/?diff=49098&oldid=48892
05:34:40DLoader quits [Client Quit]
05:36:08<h2ibot>Wickedplayer494 edited Heroes of Newerth (+80, Website is dead too): https://wiki.archiveteam.org/?diff=49099&oldid=48714
07:46:51BlueMaxima quits [Client Quit]
07:59:24michaelblob (michaelblob) joins
08:03:17michaelblob_ quits [Ping timeout: 265 seconds]
08:13:13DLoader joins
08:23:34DLoader quits [Client Quit]
08:25:45DLoader joins
09:47:18gazorpazorp quits [Remote host closed the connection]
09:47:31gazorpazorp (gazorpazorp) joins
10:09:55qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
10:23:09qwertyasdfuiopghjkl joins
10:48:35qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
10:50:12qwertyasdfuiopghjkl joins
11:20:45Chris5010 (Chris5010) joins
11:24:07mutantm0nkey quits [Remote host closed the connection]
11:24:45mutantm0nkey (mutantmonkey) joins
11:25:20LeGoupil joins
11:30:08tech_exorcist (tech_exorcist) joins
11:39:53<betamax>Has anyone attempted to reverse-engineer Issuu? There are a few tools for downloading the images, but the original text is still searchable through some delightfully encoded .bin files that are fetched from a "layers" server
11:40:06<betamax>e.g: look at this document https://issuu.com/filmhouse/docs/fhmar20_online_eecbde5e3f79e9
11:40:54<betamax>when you go to another page in the document, it downloads a "page_<n>.bin" file
11:41:02<betamax>(here's an example: https://layers.isu.pub/19bf10fbc89b38330de576c9a5d332a8/200221154119/v2/page_4.bin )
11:41:38<betamax>and when you click the "Find Text" button, it downloads *another* .bin file: https://layers.isu.pub/19bf10fbc89b38330de576c9a5d332a8/200221154119/text_v0/text_info.bin
11:41:57<betamax>now the text is visible in those files (once decompressed) if you inspect them
11:42:08<betamax>but what is not visible are the positions of the text on the document
11:42:34<betamax>e.g: if you use the "Find Text" feature of Issuu, it highlights the exact place in the document where the text occurs
11:42:47<betamax>so it must be storing the positional information of the text in that .bin file too
11:43:08<betamax>... why couldn't they just allow PDF downloads for all their documents >:(
11:56:19Megame (Megame) joins
12:11:09omglolbah quits [Remote host closed the connection]
12:45:07Megame quits [Client Quit]
12:50:23qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
12:55:13march_happy quits [Ping timeout: 265 seconds]
12:55:58march_happy (march_happy) joins
13:06:04Arcorann quits [Ping timeout: 240 seconds]
13:09:28<tech_exorcist>hello, which items in the https://archive.org/details/archiveteam_youtube collection contain video comments?
13:10:11<tech_exorcist>i just noticed i'm in #down-the-tube too, so is that question more appropriate for that channel?
13:10:14<@arkiver>tech_exorcist: you can check the CDX files in the items and see if any comment records are included
13:12:03<tech_exorcist>there are 31k items in total, and i'm trying to avoid having to check all of them (even though i can do that if necessary)
13:12:46<tech_exorcist>for example, "curl -L -o - https://archive.org/download/archiveteam_youtube_20210720180401_7e76ed14/youtube_20210720180401_7e76ed14.megawarc.warc.os.cdx.gz | zcat | grep -i comment" returns no output
13:13:16<@arkiver>you don't know what comment URLs look like?
13:13:24<tech_exorcist>not really, sorry
13:14:14<@arkiver>it's the URLs like https://www.youtube.com/youtubei/v1/next?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8
13:14:24<tech_exorcist>oh.
13:14:53<tech_exorcist>where is the video ID though?
13:15:04<@arkiver>yeah not in the URL
13:15:15<tech_exorcist>dammit
13:15:25<@arkiver>to serve pages of comments, youtube makes POST requests to that same endpoint
13:15:40<@arkiver>you'll have to download the actual records to check
13:15:46<tech_exorcist>every 50G file?
13:15:58<@arkiver>the comment WARC records at least
13:16:06<@arkiver>(can do a range request)
13:17:12<tech_exorcist>what's a range request? does it mean i can send a request to archive.org to scan through all warcs and see if they contain a specified string?
13:18:24<tech_exorcist>sorry for the dumb questions, i'm more familiar with the tinypic collection since i've looked for stuff in it a few times
13:18:44<@arkiver>i have some code here that get's a single records, finds the zstd dictionary as well and extracts it https://github.com/ArchiveTeam/zstd-dictionary-trainer/blob/master/trainer/urls.py
13:18:48<@arkiver>especially https://github.com/ArchiveTeam/zstd-dictionary-trainer/blob/master/trainer/urls.py#L18-L61
13:19:17<@arkiver>i don't have time now to explain range requests and zstd, etc., though
13:19:31<tech_exorcist>sorry
13:19:35<@arkiver>no worries :)
13:19:51<tech_exorcist>oh, a range request in the "give me bytes x to y" sense, i know what that is
13:20:08<@arkiver>but yeah in short - can't know from the comment URL what video, need to download the records.
13:21:04<tech_exorcist>got it
13:24:22<tech_exorcist>so: download cdx files -> find all comment urls and the warcs they're in -> look through all those warcs until the desired video id is found (if it's there=
13:24:22<tech_exorcist>*)
13:24:34<tech_exorcist>i can do that
14:45:41LeGoupil quits [Client Quit]
14:45:52LeGoupil joins
14:57:32qwertyasdfuiopghjkl joins
15:23:21fangfufu quits [Remote host closed the connection]
15:23:29fangfufu joins
15:30:00Stiletto quits [Ping timeout: 255 seconds]
15:56:37qwertyasdfuiopghjkl quits [Client Quit]
15:57:00qwertyasdfuiopghjkl joins
16:17:32katocala quits [Remote host closed the connection]
16:27:25Hackerpcs quits [Client Quit]
16:28:01Hackerpcs (Hackerpcs) joins
16:40:39march_happy quits [Ping timeout: 255 seconds]
17:16:15miana quits [Quit: Connection closed for inactivity]
17:40:37tech_exorcist quits [Remote host closed the connection]
17:41:48tech_exorcist (tech_exorcist) joins
18:37:25LegitSi joins
18:43:42dm4v quits [Ping timeout: 265 seconds]
19:15:14dm4v joins
19:29:16LeGoupil quits [Ping timeout: 240 seconds]
19:30:17tech_exorcist quits [Remote host closed the connection]
19:30:32tech_exorcist (tech_exorcist) joins
20:02:54Forstyhia joins
20:08:08<Forstyhia>Hi. I'm a random stranger here, so please pardon any faux pas etc. I make in how I present this. I was talking on Discord about something that came up with Photobucket and asked to relay it to y'all. The gist of it is that Photobucket may be erasing all unpaid images soon. For a couple months, I've been getting nag emails from them to log into an
20:08:09<Forstyhia>account that was supposed to have been deleted, so I finally did. When I did, I discovered that they now insist everyone with an account have a paid plan, or else the account will be deleted. It doesn't specify the timeframe, but hints heavily that it will be soon; if you select to ignore the nag to pick a paid plan, it says that's only a temporary
20:08:09<Forstyhia>option. It also has most images over a new limit (100? 200?) in an account blocked, where even the owner can't access them from within the account until they've deleted some of the unblocked images. I unfortunately can't provide more information, because I decided to request account deletion after seeing that. (It's no loss to the internet, it was
20:08:10<Forstyhia>just personal family photos.) And essentially I worry that means soon Photobucket images will be gone entirely, not just watermarked. And my friends on Discord wanted me to relay that so the archive team could be aware, just in case you aren't already.
20:15:18DLoader quits [Ping timeout: 255 seconds]
20:15:25DLoader joins
20:33:09Forstyhia quits [Remote host closed the connection]
20:40:08DLoader_ joins
20:41:38DLoader quits [Ping timeout: 265 seconds]
20:41:39DLoader_ is now known as DLoader
21:14:57tech_exorcist quits [Client Quit]
21:31:48DLoader quits [Ping timeout: 255 seconds]
21:34:09DLoader joins
21:39:48CounterTurns joins
21:45:07DLoader_ joins
21:46:53<CounterTurns>I hope this is an okay place to ask this question, please lmk if not: I have a dead tindeck link (http://tindeck.com/listen/mynz) from a tumblr embed. I can find the listen page on wayback, but it's not clear to me how to access the archived mp3 itself. Any advice would be great
21:47:14DLoader__ joins
21:47:16DLoader quits [Ping timeout: 240 seconds]
21:47:25DLoader__ is now known as DLoader
21:50:15DLoader_ quits [Ping timeout: 255 seconds]
21:50:18march_happy (march_happy) joins
21:51:13<@JAA>CounterTurns: https://web.archive.org/web/20180729183128/http://tindeck.com/dl/mynz (Click on 'direct link' to get the MP3 immediately rather than waiting for the countdown.)
21:56:14<CounterTurns>Oh thanks, that makes sense. I have some other links with listen pages where that method doesn't work (e.g. http://tindeck.com/listen/cggq); should I assume those mp3s aren't archived?
22:02:31<@JAA>https://web.archive.org/web/20180731182421/http://tindeck.com/dl/cggq works fine for me.
22:02:44<@JAA>That page is linked on https://web.archive.org/web/20180731182420/http://tindeck.com/listen/cggq by the way ('Download Track' on the right).
22:03:42<@JAA>The project was in late July and early August 2018, so that's the time range you want to check in the WBM.
22:05:11<CounterTurns>Ah, okay I was looking at a November 2018 date on the WBM where I was getting the dead links
22:06:00<CounterTurns>But you're saying if I use that time range for the project there should be a date that has the file? (assuming it was still live on tindeck at that time)
22:06:51<@JAA>Yeah, probably.
22:07:41<@JAA>Look for the archiveteam_tindeck captures in particular. You can see the collection in the calendar view when you hover over a timestamp. It appears above the calendar.
22:08:30<@JAA>E.g. on https://web.archive.org/web/*/http://tindeck.com/listen/cggq 'Tue, 31 Jul 2018 18:24:20 GMT (why: archiveteam, archiveteam_tindeck)'
22:08:49lennier1 quits [Client Quit]
22:09:13lennier1 (lennier1) joins
22:11:34<CounterTurns>Thanks, that's very helpful! Really appreciate you taking the time to walk me through it
22:39:00<lennier1>G4TV is shutting down: https://g4tv.com/blog/g4update
22:51:42CounterTurns quits [Remote host closed the connection]
22:56:46Arcorann (Arcorann) joins
23:05:07BlueMaxima joins
23:35:51Stiletto joins
23:56:47BlueMaxima quits [Read error: Connection reset by peer]