| 00:01:46 | | Arcorann (Arcorann) joins |
| 00:08:41 | | BlueMaxima joins |
| 01:11:40 | | jacobk quits [Ping timeout: 240 seconds] |
| 01:20:40 | | systwi_ quits [Excess Flood] |
| 01:21:01 | | systwi_ joins |
| 01:23:37 | | jacobk joins |
| 01:31:16 | | jacobk quits [Ping timeout: 240 seconds] |
| 01:39:37 | | tzt quits [Ping timeout: 265 seconds] |
| 01:41:01 | | tzt (tzt) joins |
| 01:44:29 | | systwi_ is now known as systwo |
| 01:44:38 | | systwo is now known as systwi_ |
| 03:03:43 | | ThreeHM quits [Ping timeout: 265 seconds] |
| 03:04:05 | | ThreeHM (ThreeHeadedMonkey) joins |
| 03:08:48 | | sepro quits [Client Quit] |
| 03:09:21 | | sepro (sepro) joins |
| 03:09:31 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 03:12:54 | | jacobk joins |
| 03:17:36 | | WesleyBidsnipes joins |
| 04:00:43 | <systwi_> | Is this Odysee: https://wiki.archiveteam.org/index.php/Odysee |
| 04:00:53 | <systwi_> | the same as this Odysee?: https://odysee.com/ |
| 04:01:26 | <systwi_> | The description, layout and, if I'm not wrong, owner conflict with the AT wiki page. |
| 04:04:13 | <@JAA> | Domain almost certainly changed hands sometime between 2016 and mid-2020. |
| 04:05:18 | <Frogging101> | >Acquired by Google, shutting down |
| 04:05:21 | <Frogging101> | poetry |
| 04:05:22 | <@JAA> | (Based on NS changes and the like) |
| 04:05:36 | <systwi_> | Interesting, I wonder if the acquisition of that specific domain was intentional. |
| 04:06:14 | <@JAA> | I mean, chances are it was just up for sale since the previous site went down. |
| 04:06:23 | <systwi_> | Like Google previously owning either duck.com or duck.co. |
| 04:07:59 | <@JAA> | No record of that in the WBM though. |
| 04:08:14 | <@JAA> | But yeah, same domain, unrelated new site. |
| 04:11:18 | <systwi_> | WTF, it must've been a different, similar domain. |
| 04:11:43 | <systwi_> | I remember it perfectly. It was a website Google owned, and they said they owned it on this very basic page. |
| 04:12:19 | <systwi_> | Near the bottom, in smaller print, it said something along the lines of, "For the DuckDuckGo search engine, click here." |
| 04:13:02 | <systwi_> | Unless it was someone else pretending to be Google, I don't know. I remember this around 2014 or so. |
| 04:14:11 | <systwi_> | Aha, so I'm not losing my mind: https://www.cnet.com/tech/services-and-software/google-owns-duck-com-but-itll-give-rival-duckduckgo-a-shoutout-anyhow/ |
| 04:14:12 | <@JAA> | duck.com was owned by Google at one point, yeah. |
| 04:15:04 | <systwi_> | Oh, misread "no record of that in the WBM" as in, "like Google previously owning either duck.com or duck.co" was false. |
| 04:15:39 | <@JAA> | No, that was a continuation of my previous message, odysee.com having been for sale. |
| 04:19:14 | <@JAA> | Countless snapshots of duck.com in the WBM, but they're all just redirects to google.com. |
| 04:19:28 | <@JAA> | At least the ones I checked, anyway. |
| 04:22:52 | <systwi_> | This was the screenshot exactly: https://www.cnet.com/a/img/2ee9a4b02fd00f4f42d94a0dafea80ccd82474d2/2018/07/22/9b699861-12de-4a46-a76f-f23949032fdd/duck-com.jpg |
| 04:24:32 | <systwi_> | Maybe there was some trickery where, if accessed via SPN, it would redirect to www.google.com instead? ¯\_(ツ)_/¯ |
| 04:27:01 | <@JAA> | https://web.archive.org/web/20180721151655/http://www.on2.com/ |
| 04:27:16 | <@JAA> | It was created 2018-07-21. |
| 04:27:45 | <@JAA> | Well, the redirect to that and the list of duck links, anyway. |
| 04:28:09 | <@JAA> | As also mentioned in the CNET article, by the way. ;-) |
| 04:28:59 | <systwi_> | ~_~; |
| 04:37:16 | | tzt quits [Ping timeout: 240 seconds] |
| 05:05:49 | | systwi quits [Changing host] |
| 05:05:49 | | systwi (systwi) joins |
| 05:11:28 | | hogera joins |
| 05:14:01 | <hogera> | hi. please save https://ccp-porn.8964.xyz/ and https://github.com/8964-xyz/ccp-porn |
| 05:14:15 | <hogera> | insults to jinping |
| 05:20:22 | | march_happy (march_happy) joins |
| 05:25:36 | | lennier1 quits [Client Quit] |
| 05:28:07 | | lennier1 (lennier1) joins |
| 05:33:41 | | march_happy quits [Remote host closed the connection] |
| 05:35:28 | | march_happy (march_happy) joins |
| 05:41:57 | | hogera quits [Remote host closed the connection] |
| 05:44:46 | | march_happy quits [Ping timeout: 240 seconds] |
| 05:52:33 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 05:52:49 | | WesleyBidsnipes quits [Client Quit] |
| 05:58:09 | | hogera joins |
| 06:02:54 | <hogera> | we cannot stay much |
| 06:09:41 | <Arcorann> | https://github.com/wsdookadr/femtocrawl <-- this is neat, how does it compare with existing crawlers |
| 06:09:44 | <@OrIdow6> | hogera: Is this urgent? |
| 06:10:40 | <hogera> | i do not know, but not allowed in china |
| 06:11:04 | | hogera quits [Remote host closed the connection] |
| 06:12:52 | <@OrIdow6> | Arcorann: Some of the stuff in https://github.com/wsdookadr/femtocrawl/blob/main/Dockerfile suggests to me it's another headless browser+proxy setup |
| 06:13:30 | | march_happy (march_happy) joins |
| 06:14:32 | <@OrIdow6> | It looks like it makes WARCs by using MITMProxy to record HARs, then converting to WARCs with https://github.com/webrecorder/har2warc , which isn't something I'd rely on for AT integrity standards |
| 06:15:13 | <@OrIdow6> | To say the least |
| 06:17:25 | <@OrIdow6> | I may be wrong on that |
| 06:18:09 | <Jake> | Yeah.... looks a bit... not battle tested at the least? |
| 06:18:28 | <Jake> | https://github.com/wsdookadr/femtocrawl/blob/main/bin/warc_validate.sh |
| 06:18:39 | <Jake> | It might be recording raw WARCs as well... rather than gzipped? |
| 06:20:37 | <@OrIdow6> | Yes, apparently my description of its series of formats is right, see https://wsdookadr.github.io/posts/p8/, search "short diagram that explains" |
| 06:21:22 | <Jake> | yeah can't say I'd recommend any of this. |
| 06:22:31 | <@OrIdow6> | What it looks like it's made for is turning a website into a ZIM file without much work, for which it should be fine |
| 06:22:39 | <@OrIdow6> | Work or background knowledge |
| 06:24:26 | | hogera joins |
| 06:24:38 | <hogera> | bad connection |
| 06:25:05 | <hogera> | will website show on your wayback machine? |
| 06:25:36 | <@OrIdow6> | hogera: If/when we get it, yes |
| 06:26:16 | <hogera> | OrIdow6- thanks sir |
| 06:27:13 | <hogera> | that is 18 + websites |
| 06:32:17 | <@OrIdow6> | OK, saved. And for future purposes I did not look at the images (besides one in the GH repo that seemed not to be NSFW directly) |
| 06:32:30 | <@OrIdow6> | ((I got lucky)) |
| 06:33:52 | <hogera> | pictures are insult to jinping and not kind to watch |
| 06:34:03 | <hogera> | thanks sir |
| 06:34:59 | <@OrIdow6> | You're welcome |
| 06:35:42 | <hogera> | is ok to send more when i find them? |
| 06:37:37 | <hogera> | when do we find on your wayback machine? |
| 06:38:25 | <@OrIdow6> | It's OK to send more as long as they're political or otherwise historically important. The quality of being porn, by itself, probably isn't enough to make it notable. |
| 06:38:48 | <@OrIdow6> | It may take a few days to show up in the Wayback Machine. (And we don't run the Wayback Machine, we only put things into it.) |
| 06:40:30 | <hogera> | not for porn this is for insult. for porn is gross we do not intend |
| 06:40:59 | <hogera> | ok |
| 06:43:10 | <hogera> | we thanks to sir orldow6 |
| 06:43:42 | <@OrIdow6> | You're welcome hogera |
| 06:45:34 | | march_happy quits [Ping timeout: 265 seconds] |
| 06:45:46 | <Jake> | Oh yeah for sure |
| 06:46:11 | <Jake> | they're doing something entirely separate from what we are trying to do. :) |
| 06:46:17 | | march_happy (march_happy) joins |
| 06:48:11 | <hogera> | is we name show in wayback machine next tothe website? |
| 06:49:34 | <Jake> | Your name? No. ArchiveBot would be identified under the "about this capture" section. |
| 06:51:59 | <hogera> | is not the ` hogera ` yes? |
| 06:52:15 | <hogera> | for wayback machine |
| 06:53:11 | <Jake> | That name wouldn't appear in the Wayback Machine, but this channel is publicly logged. |
| 06:53:51 | <hogera> | where is log? |
| 06:54:13 | | hogera quits [Remote host closed the connection] |
| 06:54:36 | <Maakuth|m> | oh, hopefully they are not in trouble |
| 06:54:47 | <Jake> | Our logs are available here. https://hackint.logs.kiska.pw/archiveteam-bs/20220830 |
| 06:54:51 | <Jake> | oops... disconnected. |
| 06:55:04 | <Jake> | Every IRC participant also keeps a local log. |
| 06:57:26 | <Jake> | (I'm off to bed, feel free to send that to them if they come back.) |
| 06:58:45 | <Maakuth|m> | they said not being able to stay long. at least I don't see any identifying info in the logs, so hopefully the log doesn't cause trouble |
| 06:58:59 | <Jake> | Yeah, I hope not as well. |
| 07:02:44 | <@OrIdow6> | If it's imperative we can probably partially censor the public logs |
| 07:03:45 | <Maakuth|m> | i don't see hostnames or stuff in the log |
| 07:09:26 | | hinata joins |
| 07:09:28 | | hinata quits [Remote host closed the connection] |
| 07:14:15 | | march_happy quits [Remote host closed the connection] |
| 07:18:54 | | march_happy (march_happy) joins |
| 10:40:23 | | tech_exorcist (tech_exorcist) joins |
| 10:54:41 | <@OrIdow6> | Speaking of Heroku, arkiver, did you try to reach out to them/get a reply? |
| 10:56:53 | <@OrIdow6> | If we get nothing from them this does feel like the kind of project that would benefit from a submission form |
| 10:57:09 | <@OrIdow6> | Since discovery seems to be the major issue here |
| 11:33:41 | <theblazehen|m> | Could try trawling certificate transparency logs for *.herokuapp.com |
| 13:00:37 | | march_happy quits [Remote host closed the connection] |
| 13:03:43 | | march_happy (march_happy) joins |
| 13:08:23 | | Arcorann quits [Read error: Connection reset by peer] |
| 13:08:23 | | gazorpazorp quits [Remote host closed the connection] |
| 13:08:23 | | user_ (gazorpazorp) joins |
| 13:08:23 | | s-crypt6 (s-crypt) joins |
| 13:08:23 | | flashfire422 (flashfire42) joins |
| 13:09:46 | | flashfire42 quits [Ping timeout: 240 seconds] |
| 13:09:46 | | s-crypt quits [Ping timeout: 240 seconds] |
| 13:09:46 | | flashfire422 is now known as flashfire42 |
| 13:09:46 | | s-crypt6 is now known as s-crypt |
| 13:11:42 | | Arcorann (Arcorann) joins |
| 13:14:16 | | jacobk quits [Ping timeout: 240 seconds] |
| 13:22:44 | | WesleyBidsnipes joins |
| 14:04:52 | | Arcorann quits [Ping timeout: 240 seconds] |
| 14:29:08 | | tech_exorcist quits [Remote host closed the connection] |
| 14:29:31 | | tech_exorcist (tech_exorcist) joins |
| 14:55:24 | <programmerq> | I was thinking about doing heroku discovery. I know of a handful of apps that are likely on the free plan and forgotten. One in particular is in the wayback machine, but it uses a POST instead of a GET to grab geographic data, and it isn't functional. it's a bummer. |
| 14:55:46 | <programmerq> | https://whyileft.herokuapp.com/ - I'm sure there are tons of dynamic sites like that on heroku that simply won't archive nicely. |
| 15:00:41 | | tech_exorcist quits [Read error: Connection reset by peer] |
| 15:01:34 | | tech_exorcist (tech_exorcist) joins |
| 15:11:14 | | jacobk joins |
| 15:34:04 | | Retrofan joins |
| 15:34:20 | <Retrofan> | Hi |
| 15:37:07 | <Retrofan> | I was wondering if Archivebot could log IRC channels? |
| 15:41:46 | | sec^nd quits [Ping timeout: 240 seconds] |
| 15:42:54 | | qwertyasdfuiopghjkl joins |
| 15:46:34 | | sec^nd (second) joins |
| 15:55:44 | <theblazehen|m> | Would .json.gz be a good choice for "large" chunks of unstructured data? |
| 16:07:50 | <@JAA> | theblazehen|m: Heroku uses a wildcard cert since 2018. |
| 16:08:29 | <theblazehen|m> | I probably should have checked that before I started downloading all the cert transparency logs |
| 16:08:57 | <@JAA> | Retrofan: Not really, no. It can retrieve other logs from the web obviously. And then there's this: https://wiki.archiveteam.org/index.php/2018-10-13 |
| 16:11:46 | | jacobk quits [Ping timeout: 240 seconds] |
| 16:32:20 | | jacobk joins |
| 17:06:46 | | march_happy quits [Ping timeout: 240 seconds] |
| 17:18:12 | <@JAA> | Funny, the other day I mentioned we didn't get spam on the wiki in over a year, today there were two spammers. :-) |
| 17:19:05 | <@JAA> | Er right, no auto-rejected spam in over a year, this was manual, and there were other cases like that before. |
| 17:42:45 | | T31M is now authenticated as T31M |
| 17:49:46 | | jacobk quits [Ping timeout: 240 seconds] |
| 17:57:16 | | Retrofan quits [Client Quit] |
| 18:03:46 | | wyatt8740 quits [Ping timeout: 240 seconds] |
| 18:08:17 | | jacobk joins |
| 18:41:29 | <@arkiver> | theblazehen|m: please do |
| 18:41:46 | <@arkiver> | on *.herokuapp.com |
| 19:09:40 | | tzt (tzt) joins |
| 19:34:21 | | lennier1 quits [Client Quit] |
| 19:47:59 | | lennier1 (lennier1) joins |
| 21:02:53 | | eroc1990 quits [Quit: The Lounge - https://thelounge.chat] |
| 21:03:26 | | eroc1990 (eroc1990) joins |
| 21:32:34 | | WesleyBidsnipes quits [Client Quit] |
| 21:36:44 | | WesleyBidsnipes joins |
| 21:40:48 | | WesleyBidsnipes quits [Client Quit] |
| 21:42:17 | | tech_exorcist quits [Client Quit] |
| 21:49:22 | | TheTroll joins |
| 21:50:29 | | march_happy (march_happy) joins |
| 23:14:30 | | BlueMaxima joins |
| 23:14:31 | | BlueMaxima_ joins |
| 23:14:34 | | BlueMaxima quits [Remote host closed the connection] |
| 23:14:36 | | BlueMaxima_ quits [Remote host closed the connection] |
| 23:14:41 | | BlueMaxima joins |
| 23:30:37 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
| 23:41:32 | | le0n_ quits [Ping timeout: 265 seconds] |
| 23:41:53 | | le0n (le0n) joins |