00:10:45 | | Earendil7 quits [Ping timeout: 272 seconds] |
00:10:47 | | Earendil7_ (Earendil7) joins |
00:11:47 | | Arcorann (Arcorann) joins |
00:13:00 | | nertzy joins |
00:15:29 | | pedantic-darwin quits [Quit: The Lounge - https://thelounge.chat] |
00:15:36 | <BornOn420> | no, straight from the Netherlands |
00:15:45 | | pedantic-darwin joins |
00:16:40 | <BornOn420> | Unable to find image 'containrrr/watchtower:latest' locally |
00:16:40 | <BornOn420> | docker: Error response from daemon: Head "https://registry-1.docker.io/v2/containrrr/watchtower/manifests/latest": unauthorized: incorrect username or password. |
00:26:05 | | DogsRNice joins |
00:28:51 | <Notrealname1234> | Can we archive LEGO Life? |
00:29:03 | <Notrealname1234> | It's not so popular anyway |
00:29:33 | <Notrealname1234> | If we can get the API calls |
00:34:44 | <nulldata> | Notrealname1234 - Yeah would need to figure out API calls. Slightly harder as I think it's app only - no browser version. |
00:35:07 | | nothere joins |
00:35:08 | <Notrealname1234> | Yeah, it's app only. |
00:35:16 | <Notrealname1234> | Use fiddler? |
00:58:36 | | pedantic-darwin quits [Client Quit] |
00:58:52 | | pedantic-darwin joins |
01:04:56 | | Notrealname1234 quits [Client Quit] |
01:08:48 | <nulldata> | !tell Notrealname123 "Maybe. A lot of apps these days use certificate pinning, which make it hard to MITM as they don't allow self-signed certs." |
01:08:48 | <eggdrop> | [tell] ok, I'll tell Notrealname123 when they join next |
02:24:25 | <h2ibot> | PaulWise edited SmolNet (+88, add some probably missed finger URLs): https://wiki.archiveteam.org/?diff=52361&oldid=52339 |
04:34:56 | <pabs> | for archiving things that redirect to random URLs from a non-public list, do we have a way to archive the randomisatoin URL many times to extract the whole list? |
04:35:12 | <pabs> | for eg https://theforest.link/ https://theforest.link/go-for-a-walk |
05:02:11 | <pokechu22> | https://theforest.link/go-for-a-walk?1 https://theforest.link/go-for-a-walk?2 https://theforest.link/go-for-a-walk?3 https://theforest.link/go-for-a-walk?4 etc, but if it's redirecting to a ton of offsite domains that might cause cookie problems for archivebot |
05:06:07 | <pabs> | cookie problems? |
05:07:12 | <pabs> | as in too many cookies in the cookie jar, leading to slowdowns? |
05:10:44 | <pokechu22> | Yeah, due to inefficiencies with out python's cookie jar implementation works |
05:11:10 | <pokechu22> | something about it not removing entries for expired cookies from the dictionary containing cookies |
05:11:27 | <pokechu22> | it'd probably be fine for an !ao < list job for discovery but not recursion... probably |
05:13:17 | <pabs> | yeah I was just going to do front pages |
05:13:35 | <pabs> | who knows how many links are in it :) |
05:14:12 | <pabs> | hmm, wonder if that python cookie jar thing is fixed in newer python |
05:47:25 | | Sophira joins |
05:50:18 | <Sophira> | Hi there. This is a bit of a long shot since I don't think what I'm about to ask about is actually an Archive Team project, but there's a YouTube video from 2008 that I have reason to believe would be contained within the "YouTube Video Crawldata" collections on archive.org, which have "Internet Archive" listed as the contributor. Does anyone know who I'd need to contact to get access to this? |
05:50:55 | <Sophira> | (the downloads on these items are restricted, presumably for bandwidth reasons) |
05:53:02 | <Sophira> | (I can see an email address for the operator in the data, but I don't know if that's something I should be using or not.) |
05:55:50 | <imer> | Sophira: try https://findyoutubevideo.thetechrobo.ca/ |
05:58:22 | | mighty-dob (mighty-dob) joins |
05:58:48 | <Sophira> | Ah, thank you! |
06:03:15 | <Sophira> | Sadly, it doesn't appear to be available in any of the services searched by that link. It says the metadata is available in the Internet Archive but I don't believe that's actually the case. |
06:03:26 | <Sophira> | Thank you for the link though <3 I'll save that. |
06:08:37 | <Sophira> | (In the Wayback Machine, rather.) |
06:30:02 | | HP_Archivist quits [Read error: Connection reset by peer] |
06:36:02 | | DogsRNice quits [Read error: Connection reset by peer] |
06:50:20 | | BlueMaxima quits [Read error: Connection reset by peer] |
07:06:11 | | Unholy23619246453771 (Unholy2361) joins |
07:52:27 | | Earendil7_ quits [Ping timeout: 272 seconds] |
07:52:35 | | Earendil7 (Earendil7) joins |
08:08:43 | | Wohlstand (Wohlstand) joins |
08:09:22 | | aninternettroll quits [Ping timeout: 255 seconds] |
08:24:03 | <mighty-dob> | hi people. sensing the upcoming apocalypse I started to think about the same way as you. I am not really a cool hacker but I managed to make a snapshot of Arduino code database and was looking for to make a github archive, I even bought a NAS for 10TB for personal archives. but now I found your community and it looks like you already did all the job |
08:31:17 | | aninternettroll (aninternettroll) joins |
08:52:44 | | Island quits [Read error: Connection reset by peer] |
09:00:01 | | Bleo1826007227196 quits [Client Quit] |
09:01:21 | | Bleo1826007227196 joins |
09:04:10 | | Wohlstand quits [Client Quit] |
09:15:25 | | Earendil7 quits [Ping timeout: 272 seconds] |
09:28:43 | | nulldata quits [Ping timeout: 272 seconds] |
09:33:18 | <yarrow> | #archivebot request: please archive https://callchelseaperetti.tumblr.com/archive if you can. Reason: proactive grab. |
09:53:19 | | shgaqnyrjp quits [Remote host closed the connection] |
09:53:22 | | shgaqnyrjp_ (shgaqnyrjp) joins |
09:56:55 | | Ryz quits [Ping timeout: 255 seconds] |
09:58:45 | | Ryz (Ryz) joins |
10:22:12 | <pabs> | mighty-dob: for code archiving, see #gitgud #codearchiver (hackint) #swh (libera) |
10:22:59 | <pabs> | https://wiki.archiveteam.org/index.php/Codearchiver https://www.softwareheritage.org/ |
10:23:34 | <pabs> | mighty-dob: if you've got websites you want on archive.org, ArchiveBot can save them, list sites and reasons here |
10:23:58 | <pabs> | personal archives are also good to have too of course :) |
10:24:17 | <pabs> | also check out https://wiki.archiveteam.org/index.php/Warrior |
10:24:48 | <pabs> | the apocalypse is ongoing, websites die every day https://wiki.archiveteam.org/index.php/Deathwatch |
10:28:42 | | JaffaCakes118 (JaffaCakes118) joins |
10:42:45 | | muklumsum quits [Client Quit] |
10:42:54 | <mighty-dob> | pabs: ty I'll check it |
10:48:37 | <mighty-dob> | what do you think cause websites to close? |
10:50:13 | <pabs> | lots of reasons, usually money or people got bored or some drama |
10:50:45 | | muklumsum joins |
11:02:32 | | yarrow quits [Read error: Connection reset by peer] |
11:05:26 | | yarrow (yarrow) joins |
11:35:55 | | qwertyasdfuiopghjkl2 joins |
11:42:02 | <qwertyasdfuiopghjkl2> | https://www.connectseward.org/connect-seward-services-shutting-down/ "After 27 years of offering free e-mail and website hosting for many businesses, organizations, and individuals in Seward County, Connect Seward County will be shutting down effective June 30th, 2024." |
11:46:31 | <mighty-dob> | https://www.geeksforgeeks.org/ the best C++ self-learning online book I've found. it doesn't close but I've been interested in getting an offline copy |
11:46:54 | <mighty-dob> | *contains a lot of javascript |
11:47:39 | <qwertyasdfuiopghjkl2> | From https://www.google.com/search?q=%22Hosted+by+Connect+Seward+County%22 there seems to be a lot of sites that will be affected by the shutdown, but I'm guessing that search probably won't find all of them. (I don't currently have the time to look into it more) |
11:48:31 | | kiryu__ quits [Ping timeout: 255 seconds] |
12:08:19 | | mighty-dob quits [Ping timeout: 272 seconds] |
13:04:00 | | mighty-dob (mighty-dob) joins |
13:16:45 | | kiryu joins |
13:16:45 | | kiryu is now authenticated as kiryu |
13:16:45 | | kiryu quits [Changing host] |
13:16:45 | | kiryu (kiryu) joins |
13:17:29 | | shgaqnyrjp_ is now known as shgaqnyrjp |
13:31:55 | | nertzy quits [Ping timeout: 272 seconds] |
13:50:54 | | nertzy joins |
13:52:49 | | Arcorann quits [Ping timeout: 272 seconds] |
13:54:25 | | Notrealname1234 (Notrealname1234) joins |
13:58:44 | | Notrealname1234 quits [Client Quit] |
14:34:42 | | nulldata (nulldata) joins |
15:04:54 | | grid joins |
15:25:25 | <myself> | mighty-dob: I'd love to learn more about your Arduino stuff, do you mean you grabbed the Arduino-as-an-organization's own repos for the Arduino-branded IDE and stuff? Or were you able to spider all the libraries and board-support packages? |
15:36:01 | | Guest54 joins |
15:40:43 | | Guest54 quits [Ping timeout: 255 seconds] |
15:55:30 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
15:58:59 | | Lord_Nightmare (Lord_Nightmare) joins |
16:13:10 | | JaffaCakes118 quits [Remote host closed the connection] |
16:30:21 | | JaffaCakes118 (JaffaCakes118) joins |
16:31:48 | | JaffaCakes118_2 (JaffaCakes118) joins |
16:32:12 | | ymgve quits [Quit: Leaving] |
16:33:15 | | JaffaCakes118_2 quits [Read error: Connection reset by peer] |
16:34:13 | | JaffaCakes118 quits [Remote host closed the connection] |
17:12:18 | | superkuh joins |
17:14:38 | | grid quits [Client Quit] |
17:22:57 | <mighty-dob> | myself: I wrote a bash script crawler that walked accross the entire API (public JSON file) and downloaded all libraries one-by-one. then I packed them into ~28 zip archives (A..Z) as otherwise moving 30k files across drives was impossible. later I found open ZIM project and decided that it could be handy to pack Arduino library files into ZIM package so it's easier to work with but didn't make that yet |
17:24:36 | <myself> | niiiiiice. That plus all the board support stuff would be amazing to have reliably archived offline. |
17:29:16 | <mighty-dob> | yep. ZIM format could be great for it. I have the entire wikipedia and a lot of other useful web resources downloaded in ZIM format on my NAS for worst case scenarios. I can use them offline or share with the other people as it has webserver to access it |
17:31:53 | <that_lurker> | remember to take a look at the publishing policy of openzim https://openzim.org/wiki/Content_team#Publishing |
17:32:42 | <mighty-dob> | right |
17:32:46 | <that_lurker> | If you want to make them official that is. |
17:33:07 | <that_lurker> | https://openzim.org/wiki/Build_your_ZIM_file |
17:34:10 | <mighty-dob> | perhaps I'll need Arduino permission to publish their database |
17:35:51 | <mighty-dob> | or just share it via torrent |
17:40:31 | <that_lurker> | If it's public data then at least push it to Internet Archive |
17:40:39 | | nertzy quits [Client Quit] |
17:43:04 | <mighty-dob> | I don't know much about your movement yet, didn't figure out how you make archives and how to use them |
17:43:36 | <mighty-dob> | I am just lurking so far in free time |
17:46:20 | | coderobe quits [Quit: Killed (K-Lined)] |
17:52:04 | | BearFortress quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
19:04:41 | | @arkiver is back from vacation :) |
19:05:38 | <imer> | welcome back! |
19:15:36 | <that_lurker> | did you have a nice vacation |
19:27:58 | | mighty-dob quits [Ping timeout: 255 seconds] |
20:18:49 | | coderobe (coderobe) joins |
20:31:58 | | JayEmbee quits [Quit: WeeChat 2.3] |
20:43:46 | <fireonlive> | welcome back arkiver :3 hope you had a good time |
21:10:17 | | fuzzy8021 is now known as fuzzy80211 |
21:10:24 | | fuzzy80211 is now known as group |
21:10:28 | | group is now known as fuzzy80211 |
21:11:05 | | fuzzy80211 is now authenticated as * |
21:11:05 | | fuzzy80211 is now authenticated as fuzzy80211 |
21:24:04 | | Chris5010 quits [Ping timeout: 255 seconds] |
21:28:46 | | simon8162 quits [Quit: ZNC 1.8.2 - https://znc.in] |
21:30:50 | | simon816 (simon816) joins |
21:56:34 | | JaffaCakes118 (JaffaCakes118) joins |
22:02:04 | | Chris5010 (Chris5010) joins |
22:11:15 | | Wohlstand (Wohlstand) joins |
22:11:56 | | coderobe quits [Client Quit] |
23:01:55 | | knecht4 quits [Ping timeout: 272 seconds] |
23:12:41 | | knecht4 joins |
23:15:13 | | ats quits [Ping timeout: 255 seconds] |
23:17:20 | | wyatt8750 quits [Remote host closed the connection] |
23:17:51 | | wyatt8740 joins |
23:19:01 | | @Sanqui quits [Ping timeout: 272 seconds] |
23:23:27 | | thuban quits [Ping timeout: 272 seconds] |
23:24:17 | | thuban (thuban) joins |
23:36:07 | <@arkiver> | thank you :) |
23:40:19 | <@arkiver> | https://www.wired.com/story/the-fight-against-ai-comes-to-a-foundational-data-set/ |
23:40:55 | <@arkiver> | https://www.businessinsider.com/new-york-times-content-removed-common-crawl-ai-training-dataset-2023-11 |
23:41:04 | <@arkiver> | > The New York Times discovered that Common Crawl, one of the largest AI training datasets, contained millions of URLs linking to its paywalled articles and other copyrighted content. |
23:41:17 | <nicolas17> | >discovered |
23:41:25 | <nicolas17> | it seems kind of obvious that CC would have NYT? |
23:41:32 | <@arkiver> | well sure |
23:42:01 | <@arkiver> | The main problem here is that web archivists behind CC are seen as data collectors for LLM training. |
23:42:21 | <nicolas17> | also doesn't CC only have links, so if you want to train your AI with it, you have to actually download them off the original source again? |
23:43:42 | <@arkiver> | They have WARCs available I believe. |
23:44:13 | <@arkiver> | example https://data.commoncrawl.org/crawl-data/CC-MAIN-2018-17/segments/1524125937193.1/warc/CC-MAIN-20180420081400-20180420101400-00000.warc.gz |
23:44:24 | <nicolas17> | hm I see |
23:45:00 | <@arkiver> | But, the Common Crawl case is an example here. Unfortunately this also affects Archive Team and our WARCs. |
23:45:33 | | BlueMaxima joins |
23:47:19 | <katia> | :\ youtube? |
23:55:47 | | Wohlstand quits [Client Quit] |
23:55:54 | | loug4 quits [Client Quit] |
23:57:16 | <@arkiver> | katia: yes, that is an example. |
23:57:31 | | yarrow quits [Read error: Connection reset by peer] |
23:57:38 | <nicolas17> | is that why youtube warcs were blocked recently? |
23:58:11 | | Sanqui joins |
23:58:13 | | Sanqui is now authenticated as Sanqui |
23:58:13 | | Sanqui quits [Changing host] |
23:58:13 | | Sanqui (Sanqui) joins |
23:58:13 | | @ChanServ sets mode: +o Sanqui |