| 00:09:56 | | LegitSi quits [Remote host closed the connection] |
| 00:39:44 | | pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.] |
| 00:41:35 | | pabs (pabs) joins |
| 00:51:39 | | Hackerpcs quits [Quit: Hackerpcs] |
| 00:53:48 | | Hackerpcs (Hackerpcs) joins |
| 01:02:51 | | yawkat quits [Ping timeout: 255 seconds] |
| 01:03:08 | | gazorpazorp quits [Read error: Connection reset by peer] |
| 01:03:14 | | gazorpazorp (gazorpazorp) joins |
| 01:06:52 | | tzt quits [Client Quit] |
| 01:07:06 | | tzt (tzt) joins |
| 01:11:54 | | yawkat (yawkat) joins |
| 02:10:32 | | lukash79 joins |
| 02:13:32 | <tech234a> | Regarding sweb there is a listing of some sites but I doubt it is complete http://rozcestnik.sweb.cz/ |
| 02:18:38 | <tech234a> | Also here is the last capture of the help site that isn't an error message http://web.archive.org/web/20220427152748/http://napoveda-sweb.sweb.cz/ |
| 02:19:05 | <tech234a> | Additionally apparently an older URL format was sweb.cz/username |
| 02:26:44 | | katocala is now authenticated as katocala |
| 03:03:08 | | march_happy quits [Ping timeout: 265 seconds] |
| 03:03:16 | | march_happy (march_happy) joins |
| 03:04:06 | | ThreeHM quits [Ping timeout: 265 seconds] |
| 03:04:30 | | ThreeHM (ThreeHeadedMonkey) joins |
| 03:09:45 | | march_happy quits [Ping timeout: 255 seconds] |
| 03:10:07 | | march_happy (march_happy) joins |
| 04:18:23 | <tech234a> | YT ended the 4k experiment but it could reappear in the future https://www.theverge.com/2022/10/17/23410072/youtube-4k-premium-feature-test-ends |
| 04:26:58 | <h2ibot> | Wickedplayer494 edited Dota 2 (+141, Dev forums are dead): https://wiki.archiveteam.org/?diff=49096&oldid=48715 |
| 04:50:59 | | michaelblob_ (michaelblob) joins |
| 04:54:36 | | michaelblob quits [Ping timeout: 255 seconds] |
| 05:12:45 | | pabs quits [Read error: Connection reset by peer] |
| 05:13:38 | | pabs (pabs) joins |
| 05:24:45 | | march_happy quits [Ping timeout: 255 seconds] |
| 05:25:03 | | march_happy (march_happy) joins |
| 05:27:06 | <h2ibot> | Wickedplayer494 edited SteamDB (+17, /* Vital signs */ xPaw has put someā¦): https://wiki.archiveteam.org/?diff=49097&oldid=28976 |
| 05:32:07 | <h2ibot> | Wickedplayer494 edited Template:Navigation box (-7, Reflecting Vkontakte page move in navbox): https://wiki.archiveteam.org/?diff=49098&oldid=48892 |
| 05:34:40 | | DLoader quits [Client Quit] |
| 05:36:08 | <h2ibot> | Wickedplayer494 edited Heroes of Newerth (+80, Website is dead too): https://wiki.archiveteam.org/?diff=49099&oldid=48714 |
| 07:46:51 | | BlueMaxima quits [Client Quit] |
| 07:59:24 | | michaelblob (michaelblob) joins |
| 08:03:17 | | michaelblob_ quits [Ping timeout: 265 seconds] |
| 08:13:13 | | DLoader joins |
| 08:23:34 | | DLoader quits [Client Quit] |
| 08:25:45 | | DLoader joins |
| 09:47:18 | | gazorpazorp quits [Remote host closed the connection] |
| 09:47:31 | | gazorpazorp (gazorpazorp) joins |
| 10:09:55 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 10:23:09 | | qwertyasdfuiopghjkl joins |
| 10:48:35 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 10:50:12 | | qwertyasdfuiopghjkl joins |
| 11:20:45 | | Chris5010 (Chris5010) joins |
| 11:24:07 | | mutantm0nkey quits [Remote host closed the connection] |
| 11:24:45 | | mutantm0nkey (mutantmonkey) joins |
| 11:25:20 | | LeGoupil joins |
| 11:30:08 | | tech_exorcist (tech_exorcist) joins |
| 11:39:53 | <betamax> | Has anyone attempted to reverse-engineer Issuu? There are a few tools for downloading the images, but the original text is still searchable through some delightfully encoded .bin files that are fetched from a "layers" server |
| 11:40:06 | <betamax> | e.g: look at this document https://issuu.com/filmhouse/docs/fhmar20_online_eecbde5e3f79e9 |
| 11:40:54 | <betamax> | when you go to another page in the document, it downloads a "page_<n>.bin" file |
| 11:41:02 | <betamax> | (here's an example: https://layers.isu.pub/19bf10fbc89b38330de576c9a5d332a8/200221154119/v2/page_4.bin ) |
| 11:41:38 | <betamax> | and when you click the "Find Text" button, it downloads *another* .bin file: https://layers.isu.pub/19bf10fbc89b38330de576c9a5d332a8/200221154119/text_v0/text_info.bin |
| 11:41:57 | <betamax> | now the text is visible in those files (once decompressed) if you inspect them |
| 11:42:08 | <betamax> | but what is not visible are the positions of the text on the document |
| 11:42:34 | <betamax> | e.g: if you use the "Find Text" feature of Issuu, it highlights the exact place in the document where the text occurs |
| 11:42:47 | <betamax> | so it must be storing the positional information of the text in that .bin file too |
| 11:43:08 | <betamax> | ... why couldn't they just allow PDF downloads for all their documents >:( |
| 11:56:19 | | Megame (Megame) joins |
| 12:11:09 | | omglolbah quits [Remote host closed the connection] |
| 12:45:07 | | Megame quits [Client Quit] |
| 12:50:23 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 12:55:13 | | march_happy quits [Ping timeout: 265 seconds] |
| 12:55:58 | | march_happy (march_happy) joins |
| 13:06:04 | | Arcorann quits [Ping timeout: 240 seconds] |
| 13:09:28 | <tech_exorcist> | hello, which items in the https://archive.org/details/archiveteam_youtube collection contain video comments? |
| 13:10:11 | <tech_exorcist> | i just noticed i'm in #down-the-tube too, so is that question more appropriate for that channel? |
| 13:10:14 | <@arkiver> | tech_exorcist: you can check the CDX files in the items and see if any comment records are included |
| 13:12:03 | <tech_exorcist> | there are 31k items in total, and i'm trying to avoid having to check all of them (even though i can do that if necessary) |
| 13:12:46 | <tech_exorcist> | for example, "curl -L -o - https://archive.org/download/archiveteam_youtube_20210720180401_7e76ed14/youtube_20210720180401_7e76ed14.megawarc.warc.os.cdx.gz | zcat | grep -i comment" returns no output |
| 13:13:16 | <@arkiver> | you don't know what comment URLs look like? |
| 13:13:24 | <tech_exorcist> | not really, sorry |
| 13:14:14 | <@arkiver> | it's the URLs like https://www.youtube.com/youtubei/v1/next?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8 |
| 13:14:24 | <tech_exorcist> | oh. |
| 13:14:53 | <tech_exorcist> | where is the video ID though? |
| 13:15:04 | <@arkiver> | yeah not in the URL |
| 13:15:15 | <tech_exorcist> | dammit |
| 13:15:25 | <@arkiver> | to serve pages of comments, youtube makes POST requests to that same endpoint |
| 13:15:40 | <@arkiver> | you'll have to download the actual records to check |
| 13:15:46 | <tech_exorcist> | every 50G file? |
| 13:15:58 | <@arkiver> | the comment WARC records at least |
| 13:16:06 | <@arkiver> | (can do a range request) |
| 13:17:12 | <tech_exorcist> | what's a range request? does it mean i can send a request to archive.org to scan through all warcs and see if they contain a specified string? |
| 13:18:24 | <tech_exorcist> | sorry for the dumb questions, i'm more familiar with the tinypic collection since i've looked for stuff in it a few times |
| 13:18:44 | <@arkiver> | i have some code here that get's a single records, finds the zstd dictionary as well and extracts it https://github.com/ArchiveTeam/zstd-dictionary-trainer/blob/master/trainer/urls.py |
| 13:18:48 | <@arkiver> | especially https://github.com/ArchiveTeam/zstd-dictionary-trainer/blob/master/trainer/urls.py#L18-L61 |
| 13:19:17 | <@arkiver> | i don't have time now to explain range requests and zstd, etc., though |
| 13:19:31 | <tech_exorcist> | sorry |
| 13:19:35 | <@arkiver> | no worries :) |
| 13:19:51 | <tech_exorcist> | oh, a range request in the "give me bytes x to y" sense, i know what that is |
| 13:20:08 | <@arkiver> | but yeah in short - can't know from the comment URL what video, need to download the records. |
| 13:21:04 | <tech_exorcist> | got it |
| 13:24:22 | <tech_exorcist> | so: download cdx files -> find all comment urls and the warcs they're in -> look through all those warcs until the desired video id is found (if it's there= |
| 13:24:22 | <tech_exorcist> | *) |
| 13:24:34 | <tech_exorcist> | i can do that |
| 14:45:41 | | LeGoupil quits [Client Quit] |
| 14:45:52 | | LeGoupil joins |
| 14:57:32 | | qwertyasdfuiopghjkl joins |
| 15:23:21 | | fangfufu quits [Remote host closed the connection] |
| 15:23:29 | | fangfufu joins |
| 15:23:30 | | fangfufu is now authenticated as fangfufu |
| 15:30:00 | | Stiletto quits [Ping timeout: 255 seconds] |
| 15:56:37 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 15:57:00 | | qwertyasdfuiopghjkl joins |
| 16:17:32 | | katocala quits [Remote host closed the connection] |
| 16:27:25 | | Hackerpcs quits [Client Quit] |
| 16:28:01 | | Hackerpcs (Hackerpcs) joins |
| 16:40:39 | | march_happy quits [Ping timeout: 255 seconds] |
| 17:16:15 | | miana quits [Quit: Connection closed for inactivity] |
| 17:40:37 | | tech_exorcist quits [Remote host closed the connection] |
| 17:41:48 | | tech_exorcist (tech_exorcist) joins |
| 18:37:25 | | LegitSi joins |
| 18:43:42 | | dm4v quits [Ping timeout: 265 seconds] |
| 19:15:14 | | dm4v joins |
| 19:29:16 | | LeGoupil quits [Ping timeout: 240 seconds] |
| 19:30:17 | | tech_exorcist quits [Remote host closed the connection] |
| 19:30:32 | | tech_exorcist (tech_exorcist) joins |
| 20:02:54 | | Forstyhia joins |
| 20:08:08 | <Forstyhia> | Hi. I'm a random stranger here, so please pardon any faux pas etc. I make in how I present this. I was talking on Discord about something that came up with Photobucket and asked to relay it to y'all. The gist of it is that Photobucket may be erasing all unpaid images soon. For a couple months, I've been getting nag emails from them to log into an |
| 20:08:09 | <Forstyhia> | account that was supposed to have been deleted, so I finally did. When I did, I discovered that they now insist everyone with an account have a paid plan, or else the account will be deleted. It doesn't specify the timeframe, but hints heavily that it will be soon; if you select to ignore the nag to pick a paid plan, it says that's only a temporary |
| 20:08:09 | <Forstyhia> | option. It also has most images over a new limit (100? 200?) in an account blocked, where even the owner can't access them from within the account until they've deleted some of the unblocked images. I unfortunately can't provide more information, because I decided to request account deletion after seeing that. (It's no loss to the internet, it was |
| 20:08:10 | <Forstyhia> | just personal family photos.) And essentially I worry that means soon Photobucket images will be gone entirely, not just watermarked. And my friends on Discord wanted me to relay that so the archive team could be aware, just in case you aren't already. |
| 20:15:18 | | DLoader quits [Ping timeout: 255 seconds] |
| 20:15:25 | | DLoader joins |
| 20:33:09 | | Forstyhia quits [Remote host closed the connection] |
| 20:40:08 | | DLoader_ joins |
| 20:41:38 | | DLoader quits [Ping timeout: 265 seconds] |
| 20:41:39 | | DLoader_ is now known as DLoader |
| 21:14:57 | | tech_exorcist quits [Client Quit] |
| 21:31:48 | | DLoader quits [Ping timeout: 255 seconds] |
| 21:34:09 | | DLoader joins |
| 21:39:48 | | CounterTurns joins |
| 21:45:07 | | DLoader_ joins |
| 21:46:53 | <CounterTurns> | I hope this is an okay place to ask this question, please lmk if not: I have a dead tindeck link (http://tindeck.com/listen/mynz) from a tumblr embed. I can find the listen page on wayback, but it's not clear to me how to access the archived mp3 itself. Any advice would be great |
| 21:47:14 | | DLoader__ joins |
| 21:47:16 | | DLoader quits [Ping timeout: 240 seconds] |
| 21:47:25 | | DLoader__ is now known as DLoader |
| 21:50:15 | | DLoader_ quits [Ping timeout: 255 seconds] |
| 21:50:18 | | march_happy (march_happy) joins |
| 21:51:13 | <@JAA> | CounterTurns: https://web.archive.org/web/20180729183128/http://tindeck.com/dl/mynz (Click on 'direct link' to get the MP3 immediately rather than waiting for the countdown.) |
| 21:56:14 | <CounterTurns> | Oh thanks, that makes sense. I have some other links with listen pages where that method doesn't work (e.g. http://tindeck.com/listen/cggq); should I assume those mp3s aren't archived? |
| 22:02:31 | <@JAA> | https://web.archive.org/web/20180731182421/http://tindeck.com/dl/cggq works fine for me. |
| 22:02:44 | <@JAA> | That page is linked on https://web.archive.org/web/20180731182420/http://tindeck.com/listen/cggq by the way ('Download Track' on the right). |
| 22:03:42 | <@JAA> | The project was in late July and early August 2018, so that's the time range you want to check in the WBM. |
| 22:05:11 | <CounterTurns> | Ah, okay I was looking at a November 2018 date on the WBM where I was getting the dead links |
| 22:06:00 | <CounterTurns> | But you're saying if I use that time range for the project there should be a date that has the file? (assuming it was still live on tindeck at that time) |
| 22:06:51 | <@JAA> | Yeah, probably. |
| 22:07:41 | <@JAA> | Look for the archiveteam_tindeck captures in particular. You can see the collection in the calendar view when you hover over a timestamp. It appears above the calendar. |
| 22:08:30 | <@JAA> | E.g. on https://web.archive.org/web/*/http://tindeck.com/listen/cggq 'Tue, 31 Jul 2018 18:24:20 GMT (why: archiveteam, archiveteam_tindeck)' |
| 22:08:49 | | lennier1 quits [Client Quit] |
| 22:09:13 | | lennier1 (lennier1) joins |
| 22:11:34 | <CounterTurns> | Thanks, that's very helpful! Really appreciate you taking the time to walk me through it |
| 22:39:00 | <lennier1> | G4TV is shutting down: https://g4tv.com/blog/g4update |
| 22:51:42 | | CounterTurns quits [Remote host closed the connection] |
| 22:56:46 | | Arcorann (Arcorann) joins |
| 23:05:07 | | BlueMaxima joins |
| 23:35:51 | | Stiletto joins |
| 23:56:47 | | BlueMaxima quits [Read error: Connection reset by peer] |