| 00:00:25 | | jtagcat quits [Client Quit] |
| 00:00:34 | <nicolas17> | we need evidence that it's good, not "I can't think why it would be bad" |
| 00:00:42 | | jtagcat (jtagcat) joins |
| 00:01:07 | <@JAA> | Yep, ideally a comprehensive test suite we can also run continuously in the future on building. |
| 00:01:29 | <@JAA> | But no such test suite for WARCs exists in general, and it's a lot of work. |
| 00:01:57 | <appledash> | as someone who tried to write a crawler that outputs WARC... I decided WARC is just garbage and I wrote my own format |
| 00:02:12 | <fireonlive> | 𤨠|
| 00:02:19 | <fireonlive> | tell us more about this format of yours |
| 00:02:27 | <@JAA> | WARC isn't great, but it's the least terrible format out there. ARC is far worse. |
| 00:02:46 | <@JAA> | The WARC spec has a number of issues though, and implementing it is tricky to get right. |
| 00:03:08 | <appledash> | My format is just a folder full of timestamped uncompressed HTTP request and response payloads, with folders named based on the request URL/path |
| 00:03:16 | <appledash> | It gets the job done for what I need |
| 00:05:01 | | fireonlive blunks |
| 00:05:39 | <fireonlive> | what do you use it for mainly? |
| 00:06:46 | <appledash> | Web scraping, saving contents of web sites as I go that I might want to process later... The main idea is say, I have an image web site I want to download, I write a script which saves the images all in a directory, but I also output that raw data in case later on I find out there was some vital information in every HTML page that I forgot to download (say a description of |
| 00:06:46 | <appledash> | the image, or the author, or something) |
| 00:07:00 | <appledash> | then I can go back and process those files and match them up with the other data I downloaded to augment it |
| 00:07:01 | <fireonlive> | ahh |
| 00:08:56 | <@JAA> | Personal mirrors are fair game for anything. Use wget with link rewriting for all I care. :-P |
| 00:09:24 | <@JAA> | For proper archival, that'd be missing some metadata. It also doesn't scale, and repeated retrievals of the same URL get fun. |
| 00:09:44 | <appledash> | What'd it be missing? |
| 00:10:33 | <nicolas17> | hm |
| 00:10:41 | <nicolas17> | JAA: I just thought of a tool that would be handy to have |
| 00:10:49 | <nicolas17> | dedup a WARC after the fact |
| 00:11:09 | | wickedplayer494 quits [Ping timeout: 265 seconds] |
| 00:11:18 | <@JAA> | HTTP headers, IP, transfer encoding (although that one's debatable) come to mind. |
| 00:11:51 | <@JAA> | nicolas17: Yes, that was a key design part of the thing I've been working on. |
| 00:11:52 | <appledash> | well the request/response data include http headers :p |
| 00:12:01 | <appledash> | everything after the TCP socket |
| 00:12:08 | <@JAA> | Ah, ok. 'Payload' means something specific in HTTP. :-) |
| 00:12:52 | <nicolas17> | afaik qwarc does deduplication between different URLs in one archival task, but if I rerun it next month, it won't deduplicate files that didn't change vs previous archival |
| 00:13:01 | <@JAA> | Actually, RFC 9110 deprecated the word, I guess. But it was the body without encoding prior to that. |
| 00:13:19 | <nicolas17> | archivebot doesn't dedup anything I think? |
| 00:13:24 | <appledash> | It's also moderately annoying that every tool that generates warc files seems to be absurdly complicated for no reason |
| 00:13:40 | <appledash> | I have been a software engineer and sysadmin for 12 years and I still feel like I need a PhD to understand most of these |
| 00:14:15 | <@JAA> | nicolas17: Both correct. In fact, qwarc only dedupes within a single process. When you spread a single archival across multiple processes or restart the process to fix the memory 'leak' (fragmentation), that also leads to duplication. |
| 00:14:35 | <nicolas17> | and I think wget dedups across time, but only if they have the same URL |
| 00:15:14 | <@JAA> | appledash: warcio's interface is reasonable, but unfortunately warcio itself sucks. warcprox would allow you to use whatever HTTP client you want via MITM proxying, which is neat. |
| 00:15:40 | <@JAA> | nicolas17: Correct, and you can also write and load CDXs. wget-at supports URL-agnostic dedupe. |
| 00:15:45 | <appledash> | I remember having some issue with warcprox |
| 00:15:50 | <appledash> | that was my first try I think |
| 00:15:54 | | wickedplayer494 joins |
| 00:16:07 | <@JAA> | I'm not terribly surprised. I have no experience with it myself. |
| 00:16:16 | | wickedplayer494 is now authenticated as wickedplayer494 |
| 00:16:28 | | Arcorann (Arcorann) joins |
| 00:16:33 | <nicolas17> | yeah so it would be nice to have a tool that can run afterwards to replace warc records with dedup pointers |
| 00:16:54 | <@JAA> | Yeah, soonâ˘. :-P |
| 00:17:28 | | Hackerpcs quits [Quit: Hackerpcs] |
| 00:18:35 | <nicolas17> | do you know anything about the HAR format? |
| 00:19:23 | | Hackerpcs (Hackerpcs) joins |
| 00:21:34 | <nicolas17> | browser dev tools can export requests to HAR and it *might* be complete enough to be convertible to WARC but I'm not sure yet |
| 00:25:11 | | Dango360_ quits [Client Quit] |
| 00:25:20 | | Dango360 (Dango360) joins |
| 00:30:05 | <@JAA> | It isn't. |
| 00:30:23 | <@JAA> | It doesn't preserve the headers verbatim, and it doesn't preserve transfer encoding. |
| 00:30:26 | <TheTechRobo> | appledash: do you know what issue you were having? |
| 00:30:29 | | tzt quits [Ping timeout: 265 seconds] |
| 00:30:43 | <fireonlive> | i was using archivebox then JAA went and rained on my parade |
| 00:30:46 | <fireonlive> | đ˘ |
| 00:30:55 | <fireonlive> | âď¸ |
| 00:31:03 | <TheTechRobo> | pywarc when |
| 00:31:37 | <fireonlive> | (rightly) |
| 00:31:50 | | tzt (tzt) joins |
| 00:32:59 | <appledash> | I do not remember :( |
| 00:33:01 | <appledash> | It was awhile ago |
| 00:39:06 | | Dango360 quits [Read error: Connection reset by peer] |
| 00:41:26 | | tzt quits [Read error: Connection reset by peer] |
| 00:42:41 | | tzt (tzt) joins |
| 00:49:26 | | etnguyen03 quits [Ping timeout: 252 seconds] |
| 00:54:48 | | Dango360 (Dango360) joins |
| 00:55:32 | | etnguyen03 (etnguyen03) joins |
| 01:23:39 | | AmAnd0A quits [Ping timeout: 265 seconds] |
| 01:24:19 | | AmAnd0A joins |
| 01:29:25 | | BlueMaxima quits [Client Quit] |
| 01:33:36 | | dumbgoy_ joins |
| 01:36:33 | | Krume quits [Read error: Connection reset by peer] |
| 01:36:42 | | dumbgoy quits [Ping timeout: 265 seconds] |
| 01:41:45 | | Krume (Krume) joins |
| 01:45:55 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 01:46:10 | | AmAnd0A joins |
| 01:54:03 | | bladem quits [Remote host closed the connection] |
| 02:02:48 | | petrichor quits [Client Quit] |
| 02:03:17 | | petrichor (petrichor) joins |
| 02:21:17 | | etnguyen03 quits [Ping timeout: 252 seconds] |
| 02:22:37 | | lennier1 quits [Ping timeout: 265 seconds] |
| 02:23:05 | <h2ibot> | PaulWise edited Mailman2 (+806, add more mailman2 instances, corpit.ru done): https://wiki.archiveteam.org/?diff=50587&oldid=50487 |
| 02:23:18 | | etnguyen03 (etnguyen03) joins |
| 02:24:45 | | lennier1 (lennier1) joins |
| 02:27:06 | <h2ibot> | PaulWise edited Bugzilla (+53, add more bugzilla instances): https://wiki.archiveteam.org/?diff=50588&oldid=50488 |
| 02:37:53 | | wyatt8740 quits [Remote host closed the connection] |
| 02:45:25 | | etnguyen03 quits [Client Quit] |
| 02:48:28 | | wyatt8740 joins |
| 02:51:42 | | parfait (kdqep) joins |
| 02:53:11 | | wyatt8740 quits [Ping timeout: 252 seconds] |
| 02:54:24 | | wyatt8740 joins |
| 02:57:10 | <h2ibot> | PaulWise edited Mailman2 (+67, started some jobs, one instance already gone): https://wiki.archiveteam.org/?diff=50589&oldid=50587 |
| 03:03:13 | | wyatt8740 quits [Ping timeout: 265 seconds] |
| 03:07:30 | | wyatt8740 joins |
| 03:14:05 | | wyatt8740 quits [Ping timeout: 252 seconds] |
| 03:14:23 | | wyatt8740 joins |
| 03:37:28 | | Pichu0102 quits [Remote host closed the connection] |
| 03:54:51 | | krvme joins |
| 03:57:32 | | Krume quits [Ping timeout: 252 seconds] |
| 04:42:34 | | DogsRNice quits [Read error: Connection reset by peer] |
| 04:48:49 | | krvme is now known as Krume |
| 04:49:07 | | Krume is now authenticated as Krume |
| 04:51:55 | | erkinalp joins |
| 05:02:31 | | Unholy2361316618085 (Unholy2361) joins |
| 05:06:17 | | Unholy236131661808 quits [Ping timeout: 252 seconds] |
| 05:06:17 | | Unholy2361316618085 is now known as Unholy236131661808 |
| 05:21:06 | | hitgrr8 joins |
| 05:37:40 | | treora quits [Remote host closed the connection] |
| 05:37:41 | | treora joins |
| 06:16:16 | | sec^nd quits [Ping timeout: 245 seconds] |
| 06:16:26 | | albertlarsan68 quits [Quit: The Lounge - https://thelounge.chat] |
| 06:17:39 | | albertlarsan68 (AlbertLarsan68) joins |
| 06:47:29 | | dumbgoy_ quits [Ping timeout: 265 seconds] |
| 06:54:05 | | ]SaRgE[ quits [Ping timeout: 252 seconds] |
| 07:05:02 | | Unholy236131661808 quits [Remote host closed the connection] |
| 07:05:38 | | nulldata quits [Ping timeout: 252 seconds] |
| 07:06:51 | | Unholy2361316618085 (Unholy2361) joins |
| 07:08:46 | | nulldata (nulldata) joins |
| 07:37:32 | | nulldata quits [Ping timeout: 252 seconds] |
| 07:39:44 | <erkinalp> | about 5.5 days and wowturkey arcihval still going full blast |
| 07:40:57 | | nulldata (nulldata) joins |
| 07:49:42 | | etnguyen03 (etnguyen03) joins |
| 08:01:16 | | Megame quits [Client Quit] |
| 08:05:47 | | nulldata quits [Ping timeout: 265 seconds] |
| 08:08:46 | | nulldata (nulldata) joins |
| 08:11:32 | | Naruyoko5 quits [Remote host closed the connection] |
| 08:11:54 | | Naruyoko5 joins |
| 08:14:02 | <erkinalp> | in wowturkey archivals, you might have seen "DNS resolution failed: [Errno -2] Name or service not known http://www.reklam_link.com/d/news/433509.jpg" |
| 08:14:09 | <erkinalp> | those are the links wowturkey censors |
| 08:14:27 | <erkinalp> | they replace censored hostnames by reklam_link |
| 08:28:54 | | treora quits [Remote host closed the connection] |
| 08:28:55 | | treora joins |
| 08:34:19 | | etnguyen03 quits [Client Quit] |
| 08:35:49 | | Island quits [Read error: Connection reset by peer] |
| 09:21:06 | | Icyelut (Icyelut) joins |
| 09:30:38 | | treora quits [Remote host closed the connection] |
| 09:30:39 | | treora joins |
| 09:53:55 | | railen63 quits [Remote host closed the connection] |
| 09:57:48 | | railen63 joins |
| 09:59:22 | | parfait quits [Ping timeout: 265 seconds] |
| 10:00:01 | | railen63 quits [Remote host closed the connection] |
| 10:00:16 | | railen63 joins |
| 10:05:19 | | miki_57 joins |
| 10:07:03 | | Earendil7 quits [Client Quit] |
| 10:07:37 | | Earendil7 (Earendil7) joins |
| 10:13:00 | | Exorcism (exorcism) joins |
| 10:18:39 | | Exorcism quits [Client Quit] |
| 10:18:51 | | Exorcism (exorcism) joins |
| 10:21:26 | | Exorcism quits [Client Quit] |
| 10:21:42 | | Exorcism (exorcism) joins |
| 10:22:48 | | Exorcism quits [Client Quit] |
| 10:23:13 | | Exorcism (exorcism) joins |
| 10:28:51 | | bladem (bladem) joins |
| 10:34:01 | | Exorcism quits [Client Quit] |
| 10:39:10 | | Exorcism (exorcism) joins |
| 11:08:10 | | Icyelut|2 (Icyelut) joins |
| 11:12:21 | | Icyelut quits [Ping timeout: 265 seconds] |
| 11:33:52 | | Earendil7 quits [Client Quit] |
| 11:35:18 | | Earendil7 (Earendil7) joins |
| 11:39:28 | | erkinalp quits [Remote host closed the connection] |
| 11:40:46 | | erkinalp joins |
| 12:22:35 | | erkinalp quits [Remote host closed the connection] |
| 12:51:22 | | khaosfox joins |
| 12:54:55 | <khaosfox> | Okay, hi. On the chnace that I am right here. I have around 20 or 40TB of archived YouTube channels I'd like to out up on IA, however the videos are sorted in subfolders for the playlist names and I'd like to keep it that way when uploading. I know the web uploader supports folder creation, but I want to use the cli on a headless server, and I can't find any way to do this with the ia cli |
| 12:55:01 | <khaosfox> | utility. And outting hundreths of video files all in one root is extremely stupid. Is there any way to archive this outcome? |
| 13:01:31 | | erkinalp joins |
| 13:03:08 | <qyxojzh|m> | No way to cd your way through it? |
| 13:03:20 | <qyxojzh|m> | (Never used the IA CLI, sorry) |
| 13:08:42 | <khaosfox> | I specifically need a cli solution. On the chance that I overlooked something or there is another tool or script I'm asking. |
| 13:08:59 | <qyxojzh|m> | <qyxojzh|m> "No way to cd your way through..." <- Is this not possible? |
| 13:09:16 | <qyxojzh|m> | i.e. navigate to or create different directories and then upload to those |
| 13:09:56 | <khaosfox> | I see now way to do this with the ia cli utility. |
| 13:10:08 | <qyxojzh|m> | How odd |
| 13:10:27 | <khaosfox> | It let's me specify an idetifier and that's it |
| 13:10:46 | <qyxojzh|m> | Annoying ngl |
| 13:11:19 | <qyxojzh|m> | Would be handy to âdirectorizeâ the archive or at least allow uploading directorized archives to it |
| 13:12:03 | <khaosfox> | Yes, but I hope anyone here has any good idea how to maybe handle this |
| 13:13:11 | | erkinalp quits [Ping timeout: 265 seconds] |
| 13:19:44 | | erkinalp joins |
| 13:20:56 | <TheTechRobo> | you can do directories |
| 13:21:09 | <TheTechRobo> | say you have a folder named `a`, then you put a file in it |
| 13:21:38 | <TheTechRobo> | you can do `ia upload <IDENTIFIER> a` and it will upload all files and subdirectories in `a` to the item |
| 13:21:53 | <TheTechRobo> | be sure you're not using a trailing slash, or it will upload everything to the root! |
| 13:21:56 | <qyxojzh|m> | Perfect! |
| 13:22:13 | <qyxojzh|m> | TheTechRobo: How? |
| 13:22:43 | <qyxojzh|m> | Oh so |
| 13:22:43 | <qyxojzh|m> | `a/b` uploads folder `b` |
| 13:22:43 | <qyxojzh|m> | `a/b/` uploads contents of `b` without the folder |
| 13:23:08 | <qyxojzh|m> | * Oh so |
| 13:23:08 | <qyxojzh|m> | `a/b` uploads folder `b` and therefore its contents |
| 13:23:08 | <qyxojzh|m> | `a/b/` uploads contents of `b` without the folder |
| 13:24:42 | <TheTechRobo> | yes |
| 13:25:12 | <TheTechRobo> | don't ask me why |
| 13:26:15 | <qyxojzh|m> | TheTechRobo: Nah it makes sense tbh |
| 13:26:32 | <qyxojzh|m> | `a/b` = target `b` |
| 13:26:32 | <qyxojzh|m> | `a/b/` = target `b/*` |
| 13:27:03 | <qyxojzh|m> | * target `b/*` (but not `b`) |
| 13:27:07 | | dumbgoy_ joins |
| 13:27:29 | <HP_Archivist> | Why is this site excluded from WBM? https://www.11alive.com/article/news/special-reports/ga-trump-investigation/donald-trump-mug-shot-when-it-will-be-released/85-38d22a92-057c-461d-951e-4331f74b8c4d |
| 13:31:17 | <@kaz> | 403 on that from uk |
| 13:33:23 | <HP_Archivist> | kaz: Works on my end |
| 13:33:30 | <@kaz> | are you in the uk |
| 13:33:33 | <HP_Archivist> | No, US |
| 13:33:41 | <@kaz> | ok then |
| 13:33:50 | <TheTechRobo> | Works here from Canada |
| 13:34:03 | <HP_Archivist> | WBM excludes the site though, for some reason |
| 13:34:19 | <TheTechRobo> | I've seen them exclude certain patterns |
| 13:34:31 | <qyxojzh|m> | EU legislation issues, methinks |
| 13:34:36 | <qyxojzh|m> | Try VPNing? |
| 13:37:38 | <erkinalp> | if you see anything saying "reklam_link" in wowturkey archives, those are censored link |
| 13:38:04 | <erkinalp> | wowturkey censors links to certain sites by replacing their hostname by "reklam_link" |
| 13:38:42 | <qyxojzh|m> | Reklam = ad |
| 13:38:48 | <qyxojzh|m> | from French rĂŠclame |
| 13:39:12 | <qyxojzh|m> | is that right? |
| 13:40:35 | <erkinalp> | yes |
| 13:40:46 | <erkinalp> | ad_link :) |
| 13:40:49 | <qyxojzh|m> | So yeah, makes sense |
| 13:41:21 | <erkinalp> | status, t.me and a few more are amongst the censored ones |
| 13:42:03 | <qyxojzh|m> | Makes sense tbh |
| 13:42:14 | <qyxojzh|m> | At the same time it opens up some questionable stuff |
| 13:43:42 | <erkinalp> | hmm, what if i open a website called reklam_link.com * |
| 13:43:58 | <erkinalp> | i'd make tons of ad revenue tbh |
| 13:44:09 | <erkinalp> | and it's not only me who thought of doing this |
| 13:45:01 | <qyxojzh|m> | Might be taken, doÄru mu? |
| 13:45:55 | <erkinalp> | no |
| 13:46:00 | <nstrom|m> | Underscore isn't valid in a domain name |
| 13:46:07 | <erkinalp> | no such domain registered |
| 13:46:11 | <qyxojzh|m> | Ah so the underscore is the key |
| 13:46:12 | <erkinalp> | ah |
| 13:47:04 | <qyxojzh|m> | No way to register it either |
| 13:47:29 | <erkinalp> | hmm |
| 13:47:31 | <qyxojzh|m> | Ne yazÄąk |
| 13:47:51 | <qyxojzh|m> | * Ne yazÄąk (= what a pity) |
| 13:47:53 | <erkinalp> | register reklam-link.com and rewrite all reklam_link.com to reklam-link.com client side |
| 13:47:55 | <erkinalp> | :joy: |
| 13:48:18 | <qyxojzh|m> | maybe MITM /j |
| 13:58:10 | | sec^nd (second) joins |
| 14:09:09 | | project10 quits [Remote host closed the connection] |
| 14:10:25 | | project10 (project10) joins |
| 14:12:06 | | project10 quits [Remote host closed the connection] |
| 14:13:16 | | project10 (project10) joins |
| 14:15:45 | | project10 quits [Remote host closed the connection] |
| 14:16:37 | | project10 (project10) joins |
| 14:19:53 | | project10 quits [Remote host closed the connection] |
| 14:20:22 | | AmAnd0A quits [Ping timeout: 265 seconds] |
| 14:20:50 | | project10 (project10) joins |
| 14:23:37 | | project10 quits [Remote host closed the connection] |
| 14:24:38 | | project10 (project10) joins |
| 14:25:38 | | Arcorann quits [Ping timeout: 252 seconds] |
| 14:28:58 | <@JAA> | transfer is dead due to an incident at Scaleway. |
| 14:32:22 | | toss (toss) joins |
| 14:40:52 | | ymgve quits [Quit: Leaving] |
| 14:49:38 | | klg quits [Quit: brb] |
| 14:51:52 | | klg (klg) joins |
| 14:55:39 | | Exorcism quits [Client Quit] |
| 14:56:44 | | Exorcism (exorcism) joins |
| 14:57:12 | | AmAnd0A joins |
| 15:06:07 | | Exorcism quits [Client Quit] |
| 15:09:44 | <fireonlive> | JAA: hoping itâs not a SBG2 :/ |
| 15:11:01 | | Exorcism (exorcism) joins |
| 15:12:59 | <@HCross> | it's a "our blade chassis is dead" I think |
| 15:14:05 | <fireonlive> | ahj |
| 15:14:11 | <fireonlive> | ahh* |
| 15:14:23 | <qyxojzh|m> | Currently working out if I may invite my darling Aroy, she made an archival tool I think would be greatly useful here |
| 15:20:15 | <fireonlive> | i read that as tracker at first and was much more concerned |
| 15:20:28 | <fireonlive> | đ
|
| 15:24:29 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 15:24:32 | | AmAnd0A joins |
| 15:33:50 | | AmAnd0A quits [Ping timeout: 265 seconds] |
| 15:34:23 | | AmAnd0A joins |
| 15:35:38 | | project10 quits [Remote host closed the connection] |
| 15:36:28 | | project10 (project10) joins |
| 15:59:18 | | Barto quits [Quit: WeeChat 4.0.3] |
| 16:01:03 | | Barto (Barto) joins |
| 16:07:56 | | toss quits [Ping timeout: 252 seconds] |
| 16:08:14 | | miki_57 quits [Client Quit] |
| 16:09:50 | | katocala quits [Remote host closed the connection] |
| 16:10:55 | <HP_Archivist> | What is the *actual* url for this image? https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd.it%2Fdonald-trumps-mugshot-v0-td10trboe5kb1.png%3Fauto%3Dwebp%26s%3D60b1dd8fc794db49169cd0c892f09277ab58faaa |
| 16:11:05 | <HP_Archivist> | WBM doesn't like this link |
| 16:11:24 | <khaosfox> | Okay, that works. Thanks! |
| 16:13:37 | <HP_Archivist> | Thought it was this but leads to an error https://preview.redd.it/donald-trumps-mugshot-v0-td10trboe5kb1.png |
| 16:14:59 | <HP_Archivist> | Hm. Looks like it's this, I guess https://preview.redd.it/donald-trumps-mugshot-v0-td10trboe5kb1.png?auto=webp&s=60b1dd8fc794db49169cd0c892f09277ab58faaa |
| 16:18:30 | <HP_Archivist> | Odd. Still redirects to the scrambled URI https://web.archive.org/web/20230826160557/https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd.it%2Fdonald-trumps-mugshot-v0-td10trboe5kb1.png%3Fauto%3Dwebp%26s%3D60b1dd8fc794db49169cd0c892f09277ab58faaa&rdt=56478 |
| 16:20:55 | <HP_Archivist> | qwertyasdfuiopghjkl: Mind taking a look at this? ^ |
| 16:24:08 | <qwertyasdfuiopghjkl> | HP_Archivist: https://i.redd.it/td10trboe5kb1.png (found with https://addons.mozilla.org/en-US/firefox/addon/image-max-url/ ) |
| 16:24:55 | <@JAA> | fireonlive: A lot of stuff depends on transfer since that's where the zstd dicts are stored, so eventually it would still stall everything. |
| 16:25:21 | <fireonlive> | indeed |
| 16:25:47 | <@JAA> | HP_Archivist: Yeah, the i.redd.it URL is it, but if you just access that directly, you won't get the image. They started doing that bullshit quite recently, like in the last few months. |
| 16:27:09 | <HP_Archivist> | Thanks qwertyasdfuiopghjkl - It still redirects in the browser. And JAA, yeah, I've never had a problem capturing Reddit images from posts before now. What nonsense. |
| 16:27:22 | | toss (toss) joins |
| 16:27:46 | <fireonlive> | last i checked curl on i.reddit got the full image but what a pain |
| 16:28:32 | | toss_ (toss) joins |
| 16:28:37 | <HP_Archivist> | Still not showing in WBM https://web.archive.org/web/20230826162738/https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2Ftd10trboe5kb1.png&rdt=49036 |
| 16:29:13 | <HP_Archivist> | What's odd is that when I crawled this actual post last night in SPN, it captured the page but not the image (which is kinda the point of the crawl) |
| 16:31:50 | <HP_Archivist> | Archive.is captured the page and image just fine though |
| 16:32:41 | | toss quits [Ping timeout: 252 seconds] |
| 16:34:54 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 16:35:11 | | AmAnd0A joins |
| 16:40:10 | | treora quits [Remote host closed the connection] |
| 16:40:11 | | treora joins |
| 16:40:26 | <qwertyasdfuiopghjkl> | Maybe you can try saving a (different) page that embeds it as an image, but idk if that would work |
| 16:40:44 | <khaosfox> | quit |
| 16:41:04 | <khaosfox> | sorry wrong terminal window |
| 16:47:29 | | katocala joins |
| 16:47:53 | | katocala is now authenticated as katocala |
| 16:49:31 | | Exorcism quits [Client Quit] |
| 16:50:36 | | Exorcism (exorcism) joins |
| 16:51:43 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 16:55:49 | | Matthww11 quits [Client Quit] |
| 16:56:41 | | AmAnd0A joins |
| 17:00:42 | | Rynav joins |
| 17:00:57 | | Matthww11 joins |
| 17:02:01 | <Rynav> | Hi, would it be against the TOS or perhaps the law to download and host some or maybe all pico song files. |
| 17:04:00 | <Rynav> | Currently working on a app that allows users to filter thru picosong entries, get details preview and download the file. But downloading and previewing from archive org itself is extremely slow and sometimes doesn't work at all. |
| 17:04:15 | <Rynav> | Thinking of downloading some files and hosting them on my server |
| 17:04:17 | | Rynav quits [Remote host closed the connection] |
| 17:04:28 | | Rynav joins |
| 17:08:42 | <@JAA> | Rynav: Obviously, virtually all of it is copyrighted content. Whether the artists/copyright holders will care is not a question we can answer. |
| 17:09:50 | <Rynav> | JAA Well yeah you are right , i wonder why I haven't figured it out. Thank you!! |
| 17:11:12 | | Rynav quits [Remote host closed the connection] |
| 17:11:51 | | Rynav joins |
| 17:12:13 | | Rynav quits [Remote host closed the connection] |
| 17:15:00 | <AntoninDelFabbro|m> | I've tried to get a list of URLs for Orange website with wget, but (oh surprise) I got a 403 from Google and failed on http://annuaire-pp.orange.fr/ |
| 17:15:00 | <AntoninDelFabbro|m> | Is there a repository where I can already paste some URLs? |
| 17:57:37 | <pokechu22> | If you've got a file you want to share you can upload it to https://transfer.archivete.am/ |
| 17:59:14 | <imer> | well, you cant, since transfer is currently offline, but that would be the usual place |
| 18:08:58 | <AntoninDelFabbro|m> | đ alright, nice thank you! |
| 18:38:21 | | icedice (icedice) joins |
| 18:40:58 | | pawbs quits [Quit: My ZNC server died. Probably updating my kernelâŚ] |
| 18:47:56 | <fireonlive> | iâd say bpa.st but the spam filters are a âbig oofâ. you can use paste.debian.net in the meantime though if youâd like to dump and run |
| 18:48:08 | <fireonlive> | but if youâre around a bit iâd just wait for the transfer |
| 19:04:56 | | fireonlive is now known as fireonfire |
| 19:06:18 | | Exorcism quits [Remote host closed the connection] |
| 19:08:15 | | Exorcism (exorcism) joins |
| 19:09:43 | | hitgrr8 quits [Client Quit] |
| 19:13:50 | | icedice quits [Ping timeout: 252 seconds] |
| 19:14:15 | | fireonfire is now known as fireonlive |
| 19:30:23 | | icedice (icedice) joins |
| 19:32:07 | | PredatorIWD quits [Ping timeout: 265 seconds] |
| 19:56:11 | <@JAA> | transfer is back. |
| 19:59:39 | <AntoninDelFabbro|m> | Nice! Okay, quick question, I have to get URLs from a website (that uses JSâŚ): to which tool would you orientate me? wget? |
| 20:02:55 | <erkinalp> | wowturkey archival still going strong |
| 20:18:02 | | Letur quits [Ping timeout: 265 seconds] |
| 20:25:48 | | Mateon1 quits [Remote host closed the connection] |
| 20:26:25 | | Mateon1 joins |
| 20:39:17 | | jacksonchen666 (jacksonchen666) joins |
| 20:40:55 | | Island joins |
| 20:44:45 | <appledash> | Hmm, I have an FTP server which seems to be telling me to connect to a LAN IP address whenever I initiate a transfer from it. What'd be the best way to transfer data from it? I'm going to make the assumption that if I just connect to the FTP server's WAN address instead of the LAN address it gives me, it'll work. But is there any way to tell wget to ignore the address the FTP |
| 20:44:45 | <appledash> | server tells me and use a given one? |
| 20:44:56 | <appledash> | The control connection works fine, it just fails to open the data connection |
| 20:52:28 | | Megame (Megame) joins |
| 20:55:52 | <pokechu22> | Maybe active mode would work, where the FTP server opens a connection to your machine? (That's the older mode so it should be fairly well supported) |
| 21:04:34 | <appledash> | I was thinking about that, but there's a catch to it... The FTP server is Russian, and something between me (Canada) and the FTP server is blocking my connection, so I have to proxy through a Russian VPS |
| 21:04:51 | <appledash> | I would need to forward the active mode through the VPS as well I guess |
| 21:11:54 | <pokechu22> | Ah, then yeah, you'd need to do something special to trick that :| |
| 21:26:40 | | erkinalp quits [Ping timeout: 265 seconds] |
| 21:26:49 | | DogsRNice joins |
| 21:41:10 | | Exorcism quits [Client Quit] |
| 22:03:15 | | Miori quits [Remote host closed the connection] |
| 22:14:27 | | mr_sarge (sarge) joins |
| 22:23:45 | | Miori joins |
| 22:41:01 | | PredatorIWD joins |
| 23:01:43 | | imer quits [Quit: Oh no] |
| 23:02:09 | | imer (imer) joins |
| 23:09:47 | | BlueMaxima joins |
| 23:17:24 | | toss_ quits [Client Quit] |
| 23:28:12 | | BPCZ quits [Quit: eh???] |
| 23:29:05 | | BPCZ (BPCZ) joins |
| 23:33:08 | <h2ibot> | Vokunal edited Frequently Asked Questions (+0): https://wiki.archiveteam.org/?diff=50590&oldid=50586 |
| 23:33:09 | <h2ibot> | Cooljeanius edited Twitter (+56, /* External links */ add relevant GitHub repo): https://wiki.archiveteam.org/?diff=50591&oldid=50555 |
| 23:34:36 | | ssssss joins |
| 23:39:27 | | icedice quits [Client Quit] |
| 23:54:51 | | IDK quits [Quit: Connection closed for inactivity] |