00:10:45Earendil7 quits [Ping timeout: 272 seconds]
00:10:47Earendil7_ (Earendil7) joins
00:11:47Arcorann (Arcorann) joins
00:13:00nertzy joins
00:15:29pedantic-darwin quits [Quit: The Lounge - https://thelounge.chat]
00:15:36<BornOn420>no, straight from the Netherlands
00:15:45pedantic-darwin joins
00:16:40<BornOn420>Unable to find image 'containrrr/watchtower:latest' locally
00:16:40<BornOn420>docker: Error response from daemon: Head "https://registry-1.docker.io/v2/containrrr/watchtower/manifests/latest": unauthorized: incorrect username or password.
00:26:05DogsRNice joins
00:28:51<Notrealname1234>Can we archive LEGO Life?
00:29:03<Notrealname1234>It's not so popular anyway
00:29:33<Notrealname1234>If we can get the API calls
00:34:44<nulldata>Notrealname1234 - Yeah would need to figure out API calls. Slightly harder as I think it's app only - no browser version.
00:35:07nothere joins
00:35:08<Notrealname1234>Yeah, it's app only.
00:35:16<Notrealname1234>Use fiddler?
00:58:36pedantic-darwin quits [Client Quit]
00:58:52pedantic-darwin joins
01:04:56Notrealname1234 quits [Client Quit]
01:08:48<nulldata>!tell Notrealname123 "Maybe. A lot of apps these days use certificate pinning, which make it hard to MITM as they don't allow self-signed certs."
01:08:48<eggdrop>[tell] ok, I'll tell Notrealname123 when they join next
02:24:25<h2ibot>PaulWise edited SmolNet (+88, add some probably missed finger URLs): https://wiki.archiveteam.org/?diff=52361&oldid=52339
04:34:56<pabs>for archiving things that redirect to random URLs from a non-public list, do we have a way to archive the randomisatoin URL many times to extract the whole list?
04:35:12<pabs>for eg https://theforest.link/ https://theforest.link/go-for-a-walk
05:02:11<pokechu22>https://theforest.link/go-for-a-walk?1 https://theforest.link/go-for-a-walk?2 https://theforest.link/go-for-a-walk?3 https://theforest.link/go-for-a-walk?4 etc, but if it's redirecting to a ton of offsite domains that might cause cookie problems for archivebot
05:06:07<pabs>cookie problems?
05:07:12<pabs>as in too many cookies in the cookie jar, leading to slowdowns?
05:10:44<pokechu22>Yeah, due to inefficiencies with out python's cookie jar implementation works
05:11:10<pokechu22>something about it not removing entries for expired cookies from the dictionary containing cookies
05:11:27<pokechu22>it'd probably be fine for an !ao < list job for discovery but not recursion... probably
05:13:17<pabs>yeah I was just going to do front pages
05:13:35<pabs>who knows how many links are in it :)
05:14:12<pabs>hmm, wonder if that python cookie jar thing is fixed in newer python
05:47:25Sophira joins
05:50:18<Sophira>Hi there. This is a bit of a long shot since I don't think what I'm about to ask about is actually an Archive Team project, but there's a YouTube video from 2008 that I have reason to believe would be contained within the "YouTube Video Crawldata" collections on archive.org, which have "Internet Archive" listed as the contributor. Does anyone know who I'd need to contact to get access to this?
05:50:55<Sophira>(the downloads on these items are restricted, presumably for bandwidth reasons)
05:53:02<Sophira>(I can see an email address for the operator in the data, but I don't know if that's something I should be using or not.)
05:55:50<imer>Sophira: try https://findyoutubevideo.thetechrobo.ca/
05:58:22mighty-dob (mighty-dob) joins
05:58:48<Sophira>Ah, thank you!
06:03:15<Sophira>Sadly, it doesn't appear to be available in any of the services searched by that link. It says the metadata is available in the Internet Archive but I don't believe that's actually the case.
06:03:26<Sophira>Thank you for the link though <3 I'll save that.
06:08:37<Sophira>(In the Wayback Machine, rather.)
06:30:02HP_Archivist quits [Read error: Connection reset by peer]
06:36:02DogsRNice quits [Read error: Connection reset by peer]
06:50:20BlueMaxima quits [Read error: Connection reset by peer]
07:06:11Unholy23619246453771 (Unholy2361) joins
07:52:27Earendil7_ quits [Ping timeout: 272 seconds]
07:52:35Earendil7 (Earendil7) joins
08:08:43Wohlstand (Wohlstand) joins
08:09:22aninternettroll quits [Ping timeout: 255 seconds]
08:24:03<mighty-dob>hi people. sensing the upcoming apocalypse I started to think about the same way as you. I am not really a cool hacker but I managed to make a snapshot of Arduino code database and was looking for to make a github archive, I even bought a NAS for 10TB for personal archives. but now I found your community and it looks like you already did all the job
08:31:17aninternettroll (aninternettroll) joins
08:52:44Island quits [Read error: Connection reset by peer]
09:00:01Bleo1826007227196 quits [Client Quit]
09:01:21Bleo1826007227196 joins
09:04:10Wohlstand quits [Client Quit]
09:15:25Earendil7 quits [Ping timeout: 272 seconds]
09:28:43nulldata quits [Ping timeout: 272 seconds]
09:33:18<yarrow>#archivebot request: please archive https://callchelseaperetti.tumblr.com/archive if you can. Reason: proactive grab.
09:53:19shgaqnyrjp quits [Remote host closed the connection]
09:53:22shgaqnyrjp_ (shgaqnyrjp) joins
09:56:55Ryz quits [Ping timeout: 255 seconds]
09:58:45Ryz (Ryz) joins
10:22:12<pabs>mighty-dob: for code archiving, see #gitgud #codearchiver (hackint) #swh (libera)
10:22:59<pabs>https://wiki.archiveteam.org/index.php/Codearchiver https://www.softwareheritage.org/
10:23:34<pabs>mighty-dob: if you've got websites you want on archive.org, ArchiveBot can save them, list sites and reasons here
10:23:58<pabs>personal archives are also good to have too of course :)
10:24:17<pabs>also check out https://wiki.archiveteam.org/index.php/Warrior
10:24:48<pabs>the apocalypse is ongoing, websites die every day https://wiki.archiveteam.org/index.php/Deathwatch
10:28:42JaffaCakes118 (JaffaCakes118) joins
10:42:45muklumsum quits [Client Quit]
10:42:54<mighty-dob>pabs: ty I'll check it
10:48:37<mighty-dob>what do you think cause websites to close?
10:50:13<pabs>lots of reasons, usually money or people got bored or some drama
10:50:45muklumsum joins
11:02:32yarrow quits [Read error: Connection reset by peer]
11:05:26yarrow (yarrow) joins
11:35:55qwertyasdfuiopghjkl2 joins
11:42:02<qwertyasdfuiopghjkl2>https://www.connectseward.org/connect-seward-services-shutting-down/ "After 27 years of offering free e-mail and website hosting for many businesses, organizations, and individuals in Seward County, Connect Seward County will be shutting down effective June 30th, 2024."
11:46:31<mighty-dob>https://www.geeksforgeeks.org/ the best C++ self-learning online book I've found. it doesn't close but I've been interested in getting an offline copy
11:46:54<mighty-dob>*contains a lot of javascript
11:47:39<qwertyasdfuiopghjkl2>From https://www.google.com/search?q=%22Hosted+by+Connect+Seward+County%22 there seems to be a lot of sites that will be affected by the shutdown, but I'm guessing that search probably won't find all of them. (I don't currently have the time to look into it more)
11:48:31kiryu__ quits [Ping timeout: 255 seconds]
12:08:19mighty-dob quits [Ping timeout: 272 seconds]
13:04:00mighty-dob (mighty-dob) joins
13:16:45kiryu joins
13:16:45kiryu quits [Changing host]
13:16:45kiryu (kiryu) joins
13:17:29shgaqnyrjp_ is now known as shgaqnyrjp
13:31:55nertzy quits [Ping timeout: 272 seconds]
13:50:54nertzy joins
13:52:49Arcorann quits [Ping timeout: 272 seconds]
13:54:25Notrealname1234 (Notrealname1234) joins
13:58:44Notrealname1234 quits [Client Quit]
14:34:42nulldata (nulldata) joins
15:04:54grid joins
15:25:25<myself>mighty-dob: I'd love to learn more about your Arduino stuff, do you mean you grabbed the Arduino-as-an-organization's own repos for the Arduino-branded IDE and stuff? Or were you able to spider all the libraries and board-support packages?
15:36:01Guest54 joins
15:40:43Guest54 quits [Ping timeout: 255 seconds]
15:55:30Lord_Nightmare quits [Quit: ZNC - http://znc.in]
15:58:59Lord_Nightmare (Lord_Nightmare) joins
16:13:10JaffaCakes118 quits [Remote host closed the connection]
16:30:21JaffaCakes118 (JaffaCakes118) joins
16:31:48JaffaCakes118_2 (JaffaCakes118) joins
16:32:12ymgve quits [Quit: Leaving]
16:33:15JaffaCakes118_2 quits [Read error: Connection reset by peer]
16:34:13JaffaCakes118 quits [Remote host closed the connection]
17:12:18superkuh joins
17:14:38grid quits [Client Quit]
17:22:57<mighty-dob>myself: I wrote a bash script crawler that walked accross the entire API (public JSON file) and downloaded all libraries one-by-one. then I packed them into ~28 zip archives (A..Z) as otherwise moving 30k files across drives was impossible. later I found open ZIM project and decided that it could be handy to pack Arduino library files into ZIM package so it's easier to work with but didn't make that yet
17:24:36<myself>niiiiiice. That plus all the board support stuff would be amazing to have reliably archived offline.
17:29:16<mighty-dob>yep. ZIM format could be great for it. I have the entire wikipedia and a lot of other useful web resources downloaded in ZIM format on my NAS for worst case scenarios. I can use them offline or share with the other people as it has webserver to access it
17:31:53<that_lurker>remember to take a look at the publishing policy of openzim https://openzim.org/wiki/Content_team#Publishing
17:32:42<mighty-dob>right
17:32:46<that_lurker>If you want to make them official that is.
17:33:07<that_lurker>https://openzim.org/wiki/Build_your_ZIM_file
17:34:10<mighty-dob>perhaps I'll need Arduino permission to publish their database
17:35:51<mighty-dob>or just share it via torrent
17:40:31<that_lurker>If it's public data then at least push it to Internet Archive
17:40:39nertzy quits [Client Quit]
17:43:04<mighty-dob>I don't know much about your movement yet, didn't figure out how you make archives and how to use them
17:43:36<mighty-dob>I am just lurking so far in free time
17:46:20coderobe quits [Quit: Killed (K-Lined)]
17:52:04BearFortress quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
19:04:41@arkiver is back from vacation :)
19:05:38<imer>welcome back!
19:15:36<that_lurker>did you have a nice vacation
19:27:58mighty-dob quits [Ping timeout: 255 seconds]
20:18:49coderobe (coderobe) joins
20:31:58JayEmbee quits [Quit: WeeChat 2.3]
20:43:46<fireonlive>welcome back arkiver :3 hope you had a good time
21:10:17fuzzy8021 is now known as fuzzy80211
21:10:24fuzzy80211 is now known as group
21:10:28group is now known as fuzzy80211
21:24:04Chris5010 quits [Ping timeout: 255 seconds]
21:28:46simon8162 quits [Quit: ZNC 1.8.2 - https://znc.in]
21:30:50simon816 (simon816) joins
21:56:34JaffaCakes118 (JaffaCakes118) joins
22:02:04Chris5010 (Chris5010) joins
22:11:15Wohlstand (Wohlstand) joins
22:11:56coderobe quits [Client Quit]
23:01:55knecht4 quits [Ping timeout: 272 seconds]
23:12:41knecht4 joins
23:15:13ats quits [Ping timeout: 255 seconds]
23:17:20wyatt8750 quits [Remote host closed the connection]
23:17:51wyatt8740 joins
23:19:01@Sanqui quits [Ping timeout: 272 seconds]
23:23:27thuban quits [Ping timeout: 272 seconds]
23:24:17thuban (thuban) joins
23:36:07<@arkiver>thank you :)
23:40:19<@arkiver>https://www.wired.com/story/the-fight-against-ai-comes-to-a-foundational-data-set/
23:40:55<@arkiver>https://www.businessinsider.com/new-york-times-content-removed-common-crawl-ai-training-dataset-2023-11
23:41:04<@arkiver>> The New York Times discovered that Common Crawl, one of the largest AI training datasets, contained millions of URLs linking to its paywalled articles and other copyrighted content.
23:41:17<nicolas17>>discovered
23:41:25<nicolas17>it seems kind of obvious that CC would have NYT?
23:41:32<@arkiver>well sure
23:42:01<@arkiver>The main problem here is that web archivists behind CC are seen as data collectors for LLM training.
23:42:21<nicolas17>also doesn't CC only have links, so if you want to train your AI with it, you have to actually download them off the original source again?
23:43:42<@arkiver>They have WARCs available I believe.
23:44:13<@arkiver>example https://data.commoncrawl.org/crawl-data/CC-MAIN-2018-17/segments/1524125937193.1/warc/CC-MAIN-20180420081400-20180420101400-00000.warc.gz
23:44:24<nicolas17>hm I see
23:45:00<@arkiver>But, the Common Crawl case is an example here. Unfortunately this also affects Archive Team and our WARCs.
23:45:33BlueMaxima joins
23:47:19<katia>:\ youtube?
23:55:47Wohlstand quits [Client Quit]
23:55:54loug4 quits [Client Quit]
23:57:16<@arkiver>katia: yes, that is an example.
23:57:31yarrow quits [Read error: Connection reset by peer]
23:57:38<nicolas17>is that why youtube warcs were blocked recently?
23:58:11Sanqui joins
23:58:13Sanqui quits [Changing host]
23:58:13Sanqui (Sanqui) joins
23:58:13@ChanServ sets mode: +o Sanqui