00:01:46Arcorann (Arcorann) joins
00:08:41BlueMaxima joins
01:11:40jacobk quits [Ping timeout: 240 seconds]
01:20:40systwi_ quits [Excess Flood]
01:21:01systwi_ joins
01:23:37jacobk joins
01:31:16jacobk quits [Ping timeout: 240 seconds]
01:39:37tzt quits [Ping timeout: 265 seconds]
01:41:01tzt (tzt) joins
01:44:29systwi_ is now known as systwo
01:44:38systwo is now known as systwi_
03:03:43ThreeHM quits [Ping timeout: 265 seconds]
03:04:05ThreeHM (ThreeHeadedMonkey) joins
03:08:48sepro quits [Client Quit]
03:09:21sepro (sepro) joins
03:09:31qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
03:12:54jacobk joins
03:17:36WesleyBidsnipes joins
04:00:43<systwi_>Is this Odysee: https://wiki.archiveteam.org/index.php/Odysee
04:00:53<systwi_>the same as this Odysee?: https://odysee.com/
04:01:26<systwi_>The description, layout and, if I'm not wrong, owner conflict with the AT wiki page.
04:04:13<@JAA>Domain almost certainly changed hands sometime between 2016 and mid-2020.
04:05:18<Frogging101>>Acquired by Google, shutting down
04:05:21<Frogging101>poetry
04:05:22<@JAA>(Based on NS changes and the like)
04:05:36<systwi_>Interesting, I wonder if the acquisition of that specific domain was intentional.
04:06:14<@JAA>I mean, chances are it was just up for sale since the previous site went down.
04:06:23<systwi_>Like Google previously owning either duck.com or duck.co.
04:07:59<@JAA>No record of that in the WBM though.
04:08:14<@JAA>But yeah, same domain, unrelated new site.
04:11:18<systwi_>WTF, it must've been a different, similar domain.
04:11:43<systwi_>I remember it perfectly. It was a website Google owned, and they said they owned it on this very basic page.
04:12:19<systwi_>Near the bottom, in smaller print, it said something along the lines of, "For the DuckDuckGo search engine, click here."
04:13:02<systwi_>Unless it was someone else pretending to be Google, I don't know. I remember this around 2014 or so.
04:14:11<systwi_>Aha, so I'm not losing my mind: https://www.cnet.com/tech/services-and-software/google-owns-duck-com-but-itll-give-rival-duckduckgo-a-shoutout-anyhow/
04:14:12<@JAA>duck.com was owned by Google at one point, yeah.
04:15:04<systwi_>Oh, misread "no record of that in the WBM" as in, "like Google previously owning either duck.com or duck.co" was false.
04:15:39<@JAA>No, that was a continuation of my previous message, odysee.com having been for sale.
04:19:14<@JAA>Countless snapshots of duck.com in the WBM, but they're all just redirects to google.com.
04:19:28<@JAA>At least the ones I checked, anyway.
04:22:52<systwi_>This was the screenshot exactly: https://www.cnet.com/a/img/2ee9a4b02fd00f4f42d94a0dafea80ccd82474d2/2018/07/22/9b699861-12de-4a46-a76f-f23949032fdd/duck-com.jpg
04:24:32<systwi_>Maybe there was some trickery where, if accessed via SPN, it would redirect to www.google.com instead? ¯\_(ツ)_/¯
04:27:01<@JAA>https://web.archive.org/web/20180721151655/http://www.on2.com/
04:27:16<@JAA>It was created 2018-07-21.
04:27:45<@JAA>Well, the redirect to that and the list of duck links, anyway.
04:28:09<@JAA>As also mentioned in the CNET article, by the way. ;-)
04:28:59<systwi_>~_~;
04:37:16tzt quits [Ping timeout: 240 seconds]
05:05:49systwi quits [Changing host]
05:05:49systwi (systwi) joins
05:11:28hogera joins
05:14:01<hogera>hi. please save https://ccp-porn.8964.xyz/ and https://github.com/8964-xyz/ccp-porn
05:14:15<hogera>insults to jinping
05:20:22march_happy (march_happy) joins
05:25:36lennier1 quits [Client Quit]
05:28:07lennier1 (lennier1) joins
05:33:41march_happy quits [Remote host closed the connection]
05:35:28march_happy (march_happy) joins
05:41:57hogera quits [Remote host closed the connection]
05:44:46march_happy quits [Ping timeout: 240 seconds]
05:52:33BlueMaxima quits [Read error: Connection reset by peer]
05:52:49WesleyBidsnipes quits [Client Quit]
05:58:09hogera joins
06:02:54<hogera>we cannot stay much
06:09:41<Arcorann>https://github.com/wsdookadr/femtocrawl <-- this is neat, how does it compare with existing crawlers
06:09:44<@OrIdow6>hogera: Is this urgent?
06:10:40<hogera>i do not know, but not allowed in china
06:11:04hogera quits [Remote host closed the connection]
06:12:52<@OrIdow6>Arcorann: Some of the stuff in https://github.com/wsdookadr/femtocrawl/blob/main/Dockerfile suggests to me it's another headless browser+proxy setup
06:13:30march_happy (march_happy) joins
06:14:32<@OrIdow6>It looks like it makes WARCs by using MITMProxy to record HARs, then converting to WARCs with https://github.com/webrecorder/har2warc , which isn't something I'd rely on for AT integrity standards
06:15:13<@OrIdow6>To say the least
06:17:25<@OrIdow6>I may be wrong on that
06:18:09<Jake>Yeah.... looks a bit... not battle tested at the least?
06:18:28<Jake>https://github.com/wsdookadr/femtocrawl/blob/main/bin/warc_validate.sh
06:18:39<Jake>It might be recording raw WARCs as well... rather than gzipped?
06:20:37<@OrIdow6>Yes, apparently my description of its series of formats is right, see https://wsdookadr.github.io/posts/p8/, search "short diagram that explains"
06:21:22<Jake>yeah can't say I'd recommend any of this.
06:22:31<@OrIdow6>What it looks like it's made for is turning a website into a ZIM file without much work, for which it should be fine
06:22:39<@OrIdow6>Work or background knowledge
06:24:26hogera joins
06:24:38<hogera>bad connection
06:25:05<hogera>will website show on your wayback machine?
06:25:36<@OrIdow6>hogera: If/when we get it, yes
06:26:16<hogera>OrIdow6- thanks sir
06:27:13<hogera>that is 18 + websites
06:32:17<@OrIdow6>OK, saved. And for future purposes I did not look at the images (besides one in the GH repo that seemed not to be NSFW directly)
06:32:30<@OrIdow6>((I got lucky))
06:33:52<hogera>pictures are insult to jinping and not kind to watch
06:34:03<hogera>thanks sir
06:34:59<@OrIdow6>You're welcome
06:35:42<hogera>is ok to send more when i find them?
06:37:37<hogera>when do we find on your wayback machine?
06:38:25<@OrIdow6>It's OK to send more as long as they're political or otherwise historically important. The quality of being porn, by itself, probably isn't enough to make it notable.
06:38:48<@OrIdow6>It may take a few days to show up in the Wayback Machine. (And we don't run the Wayback Machine, we only put things into it.)
06:40:30<hogera>not for porn this is for insult. for porn is gross we do not intend
06:40:59<hogera>ok
06:43:10<hogera>we thanks to sir orldow6
06:43:42<@OrIdow6>You're welcome hogera
06:45:34march_happy quits [Ping timeout: 265 seconds]
06:45:46<Jake>Oh yeah for sure
06:46:11<Jake>they're doing something entirely separate from what we are trying to do. :)
06:46:17march_happy (march_happy) joins
06:48:11<hogera>is we name show in wayback machine next tothe website?
06:49:34<Jake>Your name? No. ArchiveBot would be identified under the "about this capture" section.
06:51:59<hogera>is not the ` hogera ` yes?
06:52:15<hogera>for wayback machine
06:53:11<Jake>That name wouldn't appear in the Wayback Machine, but this channel is publicly logged.
06:53:51<hogera>where is log?
06:54:13hogera quits [Remote host closed the connection]
06:54:36<Maakuth|m>oh, hopefully they are not in trouble
06:54:47<Jake>Our logs are available here. https://hackint.logs.kiska.pw/archiveteam-bs/20220830
06:54:51<Jake>oops... disconnected.
06:55:04<Jake>Every IRC participant also keeps a local log.
06:57:26<Jake>(I'm off to bed, feel free to send that to them if they come back.)
06:58:45<Maakuth|m>they said not being able to stay long. at least I don't see any identifying info in the logs, so hopefully the log doesn't cause trouble
06:58:59<Jake>Yeah, I hope not as well.
07:02:44<@OrIdow6>If it's imperative we can probably partially censor the public logs
07:03:45<Maakuth|m>i don't see hostnames or stuff in the log
07:09:26hinata joins
07:09:28hinata quits [Remote host closed the connection]
07:14:15march_happy quits [Remote host closed the connection]
07:18:54march_happy (march_happy) joins
10:40:23tech_exorcist (tech_exorcist) joins
10:54:41<@OrIdow6>Speaking of Heroku, arkiver, did you try to reach out to them/get a reply?
10:56:53<@OrIdow6>If we get nothing from them this does feel like the kind of project that would benefit from a submission form
10:57:09<@OrIdow6>Since discovery seems to be the major issue here
11:33:41<theblazehen|m>Could try trawling certificate transparency logs for *.herokuapp.com
13:00:37march_happy quits [Remote host closed the connection]
13:03:43march_happy (march_happy) joins
13:08:23Arcorann quits [Read error: Connection reset by peer]
13:08:23gazorpazorp quits [Remote host closed the connection]
13:08:23user_ (gazorpazorp) joins
13:08:23s-crypt6 (s-crypt) joins
13:08:23flashfire422 (flashfire42) joins
13:09:46flashfire42 quits [Ping timeout: 240 seconds]
13:09:46s-crypt quits [Ping timeout: 240 seconds]
13:09:46flashfire422 is now known as flashfire42
13:09:46s-crypt6 is now known as s-crypt
13:11:42Arcorann (Arcorann) joins
13:14:16jacobk quits [Ping timeout: 240 seconds]
13:22:44WesleyBidsnipes joins
14:04:52Arcorann quits [Ping timeout: 240 seconds]
14:29:08tech_exorcist quits [Remote host closed the connection]
14:29:31tech_exorcist (tech_exorcist) joins
14:55:24<programmerq>I was thinking about doing heroku discovery. I know of a handful of apps that are likely on the free plan and forgotten. One in particular is in the wayback machine, but it uses a POST instead of a GET to grab geographic data, and it isn't functional. it's a bummer.
14:55:46<programmerq>https://whyileft.herokuapp.com/ - I'm sure there are tons of dynamic sites like that on heroku that simply won't archive nicely.
15:00:41tech_exorcist quits [Read error: Connection reset by peer]
15:01:34tech_exorcist (tech_exorcist) joins
15:11:14jacobk joins
15:34:04Retrofan joins
15:34:20<Retrofan>Hi
15:37:07<Retrofan>I was wondering if Archivebot could log IRC channels?
15:41:46sec^nd quits [Ping timeout: 240 seconds]
15:42:54qwertyasdfuiopghjkl joins
15:46:34sec^nd (second) joins
15:55:44<theblazehen|m>Would .json.gz be a good choice for "large" chunks of unstructured data?
16:07:50<@JAA>theblazehen|m: Heroku uses a wildcard cert since 2018.
16:08:29<theblazehen|m>I probably should have checked that before I started downloading all the cert transparency logs
16:08:57<@JAA>Retrofan: Not really, no. It can retrieve other logs from the web obviously. And then there's this: https://wiki.archiveteam.org/index.php/2018-10-13
16:11:46jacobk quits [Ping timeout: 240 seconds]
16:32:20jacobk joins
17:06:46march_happy quits [Ping timeout: 240 seconds]
17:18:12<@JAA>Funny, the other day I mentioned we didn't get spam on the wiki in over a year, today there were two spammers. :-)
17:19:05<@JAA>Er right, no auto-rejected spam in over a year, this was manual, and there were other cases like that before.
17:49:46jacobk quits [Ping timeout: 240 seconds]
17:57:16Retrofan quits [Client Quit]
18:03:46wyatt8740 quits [Ping timeout: 240 seconds]
18:08:17jacobk joins
18:41:29<@arkiver>theblazehen|m: please do
18:41:46<@arkiver>on *.herokuapp.com
19:09:40tzt (tzt) joins
19:34:21lennier1 quits [Client Quit]
19:47:59lennier1 (lennier1) joins
21:02:53eroc1990 quits [Quit: The Lounge - https://thelounge.chat]
21:03:26eroc1990 (eroc1990) joins
21:32:34WesleyBidsnipes quits [Client Quit]
21:36:44WesleyBidsnipes joins
21:40:48WesleyBidsnipes quits [Client Quit]
21:42:17tech_exorcist quits [Client Quit]
21:49:22TheTroll joins
21:50:29march_happy (march_happy) joins
23:14:30BlueMaxima joins
23:14:31BlueMaxima_ joins
23:14:34BlueMaxima quits [Remote host closed the connection]
23:14:36BlueMaxima_ quits [Remote host closed the connection]
23:14:41BlueMaxima joins
23:30:37Lord_Nightmare quits [Quit: ZNC - http://znc.in]
23:41:32le0n_ quits [Ping timeout: 265 seconds]
23:41:53le0n (le0n) joins