00:01:02 | | etnguyen03 (etnguyen03) joins |
00:18:23 | | nine quits [Quit: See ya!] |
00:18:34 | | nine joins |
00:18:35 | | nine is now authenticated as nine |
00:18:35 | | nine quits [Changing host] |
00:18:35 | | nine (nine) joins |
00:36:53 | | CuppyMan joins |
00:51:14 | | etnguyen03 quits [Client Quit] |
01:02:09 | | etnguyen03 (etnguyen03) joins |
01:56:35 | | HP_Archivist quits [Quit: Leaving] |
02:09:49 | | Karlett quits [Quit: Leaving] |
02:38:27 | <@JAA> | It's now redirecting to the new forums. |
02:45:59 | | CuppyMan quits [Client Quit] |
02:53:19 | <@JAA> | DNS changed from CNAME forum-lb-2082650296.us-west-2.elb.amazonaws.com to CNAME www.cyberlink.com → CNAME d3it87pvl2tmgl.cloudfront.net, but the former is the one that's timing out, so they didn't only switch the DNS. |
02:54:23 | <@JAA> | Sad that we didn't hear about it sooner. |
02:55:44 | <@JAA> | The Corsair, DDO, and LOTRO forums are still online, by the way. |
02:59:08 | | etnguyen03 quits [Remote host closed the connection] |
03:03:28 | | hexagonwin quits [Read error: Connection reset by peer] |
03:04:00 | | hexagonwin joins |
03:07:44 | | Karlett2 quits [Quit: Leaving] |
03:09:57 | <steering> | re cyberlink: ok but who's actually buying DVD player software in 2025 :P |
03:18:14 | | LunarianBunny1147 quits [Ping timeout: 258 seconds] |
03:18:14 | | hexagonwin quits [Read error: Connection reset by peer] |
03:19:40 | | hexagonwin joins |
03:20:05 | | LunarianBunny1147 (LunarianBunny1147) joins |
03:21:10 | | hexagonwin quits [Read error: Connection reset by peer] |
03:23:11 | | hexagonwin joins |
03:23:56 | | evergreen2 joins |
03:26:34 | | evergreen quits [Ping timeout: 260 seconds] |
03:26:34 | | evergreen2 is now known as evergreen |
03:26:45 | | Island quits [Read error: Connection reset by peer] |
03:31:47 | | Karlett2 (Karlett2) joins |
03:41:37 | | Karlett2 quits [Ping timeout: 258 seconds] |
04:01:58 | | fangfufu quits [Quit: ZNC 1.9.1+deb2+b3 - https://znc.in] |
04:05:46 | | lemuria_ (lemuria) joins |
04:06:21 | | tmg1|michelson quits [Remote host closed the connection] |
04:06:45 | | fangfufu (fangfufu) joins |
04:08:50 | | lemuria quits [Ping timeout: 258 seconds] |
04:17:09 | | SootBector quits [Ping timeout: 255 seconds] |
04:19:42 | | SootBector (SootBector) joins |
04:25:29 | | cmlow quits [Ping timeout: 260 seconds] |
04:26:54 | | Karlett2 (Karlett2) joins |
04:30:34 | | Karlett2 quits [Read error: Connection reset by peer] |
04:31:39 | | SootBector quits [Remote host closed the connection] |
04:32:05 | | Karlett joins |
04:32:46 | | SootBector (SootBector) joins |
04:33:08 | | Karlett quits [Read error: Connection reset by peer] |
04:35:24 | | pabs quits [Ping timeout: 260 seconds] |
04:51:06 | | pabs (pabs) joins |
05:37:09 | | Karlett joins |
05:37:11 | | Karlett quits [Read error: Connection reset by peer] |
05:37:58 | | Karlett joins |
05:37:59 | | Karlett quits [Read error: Connection reset by peer] |
05:39:49 | | Karlett joins |
05:39:49 | | Karlett quits [Read error: Connection reset by peer] |
05:40:52 | | Karlett joins |
05:40:58 | | Karlett quits [Read error: Connection reset by peer] |
05:41:50 | | Karlett joins |
05:43:07 | | Karlett quits [Read error: Connection reset by peer] |
05:43:43 | | Karlett joins |
05:43:43 | | Karlett quits [Read error: Connection reset by peer] |
05:44:40 | | Karlett joins |
05:45:21 | | Karlett quits [Read error: Connection reset by peer] |
05:46:09 | | Karlett joins |
05:46:11 | | Karlett quits [Read error: Connection reset by peer] |
05:47:12 | | Karlett joins |
05:47:22 | | Karlett quits [Read error: Connection reset by peer] |
05:48:06 | | Karlett joins |
05:48:08 | | Karlett quits [Read error: Connection reset by peer] |
05:50:38 | | Karlett joins |
05:50:38 | | Karlett quits [Read error: Connection reset by peer] |
05:51:37 | | Karlett joins |
05:59:04 | | Karlett quits [Remote host closed the connection] |
05:59:42 | | Karlett joins |
05:59:43 | | Karlett quits [Read error: Connection reset by peer] |
06:00:46 | | Karlett joins |
06:02:34 | | Karlett quits [Remote host closed the connection] |
06:02:51 | | cyanbox joins |
06:24:41 | | Karlett2 (Karlett2) joins |
06:48:31 | | nine quits [Quit: See ya!] |
06:48:42 | | nine joins |
06:48:43 | | nine is now authenticated as nine |
06:48:43 | | nine quits [Changing host] |
06:48:43 | | nine (nine) joins |
07:17:31 | <pabs> | kiska: mannie noticed that this pad got emptied, is there any way to restore it to an earlier revision? https://pad.notkiska.pw/p/archivebot-twitter |
07:18:34 | <pabs> | version 34182 in the timeline was the last good one |
07:19:17 | <pabs> | actually make that 34175 |
07:21:09 | | mannie (nannie) joins |
07:21:34 | | nine quits [Client Quit] |
07:21:47 | | nine joins |
07:21:48 | | nine is now authenticated as nine |
07:21:48 | | nine quits [Changing host] |
07:21:48 | | nine (nine) joins |
07:22:16 | <mannie> | kiska: the twitter etherpad is empty pabs looked at it and the latest good one is version 34175. Can you take a look at it? |
07:22:42 | | pabs already mentioned it :) |
07:27:49 | | mannie quits [Client Quit] |
07:30:12 | | SootBector quits [Remote host closed the connection] |
07:31:30 | | SootBector (SootBector) joins |
07:39:01 | | SootBector quits [Remote host closed the connection] |
07:40:09 | | SootBector (SootBector) joins |
08:00:49 | | lennier2 joins |
08:03:39 | | lennier2_ quits [Ping timeout: 260 seconds] |
08:03:55 | | Karlett joins |
08:10:15 | | SootBector quits [Remote host closed the connection] |
08:11:22 | | SootBector (SootBector) joins |
08:24:31 | | Karlett2 quits [Ping timeout: 258 seconds] |
08:25:34 | | Karlett quits [Read error: Connection reset by peer] |
08:26:59 | | JTL quits [Ping timeout: 260 seconds] |
08:31:01 | | JTL (JTL) joins |
08:36:46 | | Karlett2 (Karlett2) joins |
08:40:46 | | Karlett joins |
08:41:24 | | Karlett quits [Client Quit] |
08:41:41 | | Karlett joins |
08:46:33 | | Dada joins |
08:57:47 | | Karlett is now authenticated as Karlett |
09:07:33 | | APOLLO03 quits [Quit: .] |
09:08:37 | | APOLLO03 joins |
09:28:13 | | Karlett2 quits [Remote host closed the connection] |
09:28:37 | | Karlett2 (Karlett2) joins |
09:32:50 | <h2ibot> | Hans5958 edited Main Page/Current Projects (+21, Move Peing to medium): https://wiki.archiveteam.org/?diff=57196&oldid=57184 |
09:37:04 | | Webuser094669 joins |
09:37:42 | | Webuser094669 quits [Client Quit] |
09:39:06 | | hexagonwin quits [Read error: Connection reset by peer] |
09:41:03 | | hexagonwin joins |
09:47:24 | | PredatorIWD25 quits [Read error: Connection reset by peer] |
09:56:35 | | a-dude joins |
09:57:50 | | a-dude quits [Remote host closed the connection] |
09:59:29 | | monoxane (monoxane) joins |
09:59:45 | | hexagonwin quits [Read error: Connection reset by peer] |
10:01:45 | | hexagonwin joins |
10:01:49 | | nulldata-alt3 (nulldata) joins |
10:03:49 | | nulldata-alt quits [Ping timeout: 260 seconds] |
10:03:49 | | nulldata-alt3 is now known as nulldata-alt |
10:09:33 | | TheEnbyperor quits [Ping timeout: 258 seconds] |
10:09:39 | | TheEnbyperor_ quits [Ping timeout: 260 seconds] |
10:37:51 | | PredatorIWD25 joins |
10:46:21 | | ericgallager quits [Quit: This computer has gone to sleep] |
10:46:31 | | Webuser923772 joins |
10:54:36 | | TheEnbyperor joins |
10:54:40 | | TheEnbyperor_ (TheEnbyperor) joins |
10:55:48 | <Webuser923772> | Hi guys, how much were you able to archive of the cyberlink forum? I saw that it went offline this morning |
11:00:02 | | Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat] |
11:02:43 | | Bleo182600722719623455222 joins |
11:08:02 | <@imer> | Webuser923772: "JAA: I got about half of all threads with qwarc, without attachments.", the archivebot job also saved some pages, likely nowhere near complete though :( |
11:13:02 | <Webuser923772> | I have all the PowerDVD (older versions) threads, both english and german, with all attachments. Unfortunately i didn't know about this community till it was too late for full archival |
11:17:19 | | hexagonwin quits [Read error: Connection reset by peer] |
11:17:52 | <Webuser923772> | Forum-Index » PowerDVD (ältere Versionen) -> 1976 german threads with attachments |
11:17:52 | <Webuser923772> | Forum Index » PowerDVD (previous versions) -> 7490 english threads with attachments |
11:18:18 | <Webuser923772> | If needed i can send this batch that i archived, there are probably some other pages that the spider got too |
11:19:39 | | hexagonwin joins |
11:20:32 | <@imer> | Webuser923772: can you upload it to archive.org? |
11:29:26 | | Webuser923772 quits [Client Quit] |
12:02:40 | | SootBector quits [Remote host closed the connection] |
12:03:52 | | SootBector (SootBector) joins |
12:04:28 | | SootBector quits [Remote host closed the connection] |
12:06:45 | | SootBector (SootBector) joins |
12:15:17 | | SootBector quits [Remote host closed the connection] |
12:16:46 | | SootBector (SootBector) joins |
12:22:50 | | ericgallager joins |
12:29:46 | | Commander001 quits [Remote host closed the connection] |
12:40:10 | | Commander001 joins |
12:45:21 | | gosc joins |
13:01:28 | | redbees quits [Quit: ZNC 1.7.5+deb4 - https://znc.in] |
13:38:13 | | Shard (Shard) joins |
14:25:14 | | zhongfu quits [Ping timeout: 258 seconds] |
14:46:11 | | dabs joins |
14:51:16 | <kiska> | pabs: Rev has been restored |
14:51:53 | <pabs> | kiska++ |
14:51:53 | <eggdrop> | [karma] 'kiska' now has 13 karma! |
14:52:46 | <kiska> | Go through it and actually make sure it restored to that version, cause the api for that is a little... unstable :D |
15:11:44 | <pabs> | looks like it |
15:12:00 | | SootBector quits [Remote host closed the connection] |
15:13:17 | | SootBector (SootBector) joins |
15:22:50 | | dabs quits [Client Quit] |
15:28:24 | | cyanbox quits [Read error: Connection reset by peer] |
15:35:53 | <h2ibot> | Manu edited Mailing Lists (+29, Mailman 3: Add lists.das-labor.org): https://wiki.archiveteam.org/?diff=57197&oldid=57008 |
15:36:30 | | Island joins |
15:39:54 | | Wohlstand (Wohlstand) joins |
15:44:28 | | fionera quits [Remote host closed the connection] |
16:24:27 | | ducky quits [Ping timeout: 258 seconds] |
16:29:09 | | skyrock3t joins |
16:30:34 | | skyrocket quits [Ping timeout: 260 seconds] |
16:36:20 | | Commander001 quits [Ping timeout: 258 seconds] |
16:37:10 | | Commander001 joins |
16:45:09 | | tzt quits [Ping timeout: 260 seconds] |
16:54:48 | <justauser|m> | https://fastcode.io/2025/08/30/the-69-billion-domino-effect-how-vmwares-debt-fueled-acquisition-is-killing-open-source-one-repository-at-a-time/ unsure about the potential for data loss. Docker images will no longer be supported, but they promise not to kill them. |
16:57:10 | | tzt (tzt) joins |
17:01:17 | | Shard quits [Quit: Im doing something rq. Il brb] |
17:32:13 | | SootBector quits [Read error: Connection reset by peer] |
17:33:23 | | SootBector (SootBector) joins |
17:40:07 | | b3nzo joins |
17:42:33 | | kansei quits [Quit: ZNC 1.10.1 - https://znc.in] |
17:47:37 | <@OrIdow6> | b3nzo: It's fairly informal, work on what you want mostly (with the caveat that 'what you want' is often difficult to pull off for technical/other practical reasons) |
17:47:51 | <@OrIdow6> | Lots of people just run machines that do grabbing |
17:50:24 | <b3nzo> | i emailed jason regarding a project im working on, so after a bit of discussion he suggested to join archiveteam and to have a look at archiveteam.org for the info, but i couldnt find it |
17:50:47 | <justauser|m> | https://wiki.archiveteam.org/ should work. |
17:51:12 | <justauser|m> | What kind of info? |
17:53:39 | <b3nzo> | im working on a personal web archival project to archive webpages and crawl and then to upload on wayback |
17:54:10 | <b3nzo> | but IA denied to index my warc files on wayback |
17:54:40 | <justauser|m> | Are there any special requirements for crawling? Would ArchiveBot work? |
17:56:06 | <b3nzo> | so reached out to jason on how they can be indexed on the wayback, and he said that only authorized "chain of custody" sources are indexed on the wayback, and he said to join archiveteam so i could get warc files from my project indexed |
17:58:23 | <b3nzo> | no, just regular crawling and prioritizing the sites that blocked IA's crawler like reddit |
17:58:51 | <b3nzo> | i just built a pipeline using grab-site |
17:59:42 | <b3nzo> | so ig its quite similar to how ArchiveBot works(dont know much abt ArchiveBot) |
18:12:06 | | gosc quits [Quit: Leaving] |
18:18:48 | <@JAA> | If you support the cause, you're part of AT, more or less. Like others have mentioned, there's no formal membership. However, getting WARCs into the WBM isn't as open (and can't be). |
18:25:01 | | Shard (Shard) joins |
18:28:16 | <@OrIdow6> | b3nzo: Reddit and the like might be problematic, I don't know what our status on that is; but if you have a short list of URLs we might be able to run it on what exists now |
18:32:04 | <@arkiver> | yeah limited numbers of URLs could probably be archived |
18:32:23 | <@arkiver> | maybe via ArchiveBot - i believe there is not a very strong/strict scope at the moment for ArchiveBot |
18:34:49 | <b3nzo> | JAA: yea that was the reason i reached out to Jason, but not sure what he meant by joining AT, ig i have to wait for his reply |
18:35:42 | | ducky (ducky) joins |
18:37:09 | <b3nzo> | OrIdow6: yea, they can be problematic for displaying on the WBM, but its upto the IA team on what to display and what not to, they'll just store my WARCs of all domains |
18:38:04 | <@JAA> | Yeah, uploading WARCs is always fine and still useful even if they're not added to the WBM index. |
18:40:40 | <@OrIdow6> | b3nzo: I mean, I don't know how much capacity we have right now to capture Reddit even if you do give us a list of Reddit URLs you want captured |
18:42:36 | <b3nzo> | OrIdow6: i dont have a list of URLs yet, im still working on the pipeline(should be done with it by this week), and then the plan is it launch a chrome extension to capture user's URLs(without IP logging or any cookies), blocking domains like gmail, discord, outlook,etc and then implement specific scraping rules for specific sites for better url collection(especially for dynamically loaded pages) |
18:44:57 | <@OrIdow6> | b3nzo: How are you generating WARCs from a Chrome extension? We've looked at that before but it's seemed like its API doesn't give a good way to complely accurately capture what goes over the wire (by aggressively normalizing headers off the top of my head, among other things) |
18:45:34 | <b3nzo> | OrIdow6: what do you mean by "I don't know how much capacity we have", do you mean the warrior? |
18:46:42 | <@OrIdow6> | b3nzo: Warrior or any other system. Mostly availablity of clean IP addresses but also whatever resources may be needed to capture Reddit these days |
18:46:47 | <b3nzo> | OrIdow6: nah, im not generating WARCs through extension, the extension just collects URLs from the users activity |
18:47:51 | | xkey quits [Quit: WeeChat 4.7.0] |
18:49:42 | | lemuria_ is now known as lemuria |
18:51:37 | | xkey (xkey) joins |
18:52:29 | <b3nzo> | would there be any legal issues if i publish the warc files on the project site? |
18:53:14 | <b3nzo> | or are corporations more aggressive towards sites which display the archives? |
18:54:28 | | Webuser098077 joins |
18:59:19 | | ducky quits [Ping timeout: 258 seconds] |
18:59:19 | <@arkiver> | do you mean archive.org with "project site"? |
18:59:33 | <@arkiver> | you can upload them to there, if there's a problem with them they may be taken down though |
18:59:51 | <@arkiver> | and they will not land in the Wayback Machine but in a collection with WARCs uploaded by various accounts on IA |
19:03:14 | <b3nzo> | i meant uploading on IA and on my project's site, so basically 2 locations |
19:06:56 | <@arkiver> | regarding legal issue on your own site, you would have to talk with lawyer about that |
19:08:18 | <b3nzo> | yea, if i upload warcs they endup at archive.org/details/warczone |
19:13:30 | <@arkiver> | yep |
19:13:43 | <@arkiver> | so, feel free to do that! |
19:14:02 | <@arkiver> | maybe not tens of TBs, but sounds like the plan is not too broad |
19:20:44 | | Wohlstand quits [Quit: Wohlstand] |
19:20:59 | | Wohlstand (Wohlstand) joins |
19:31:14 | <b3nzo> | yea, will just upload archives for the meantime until i can get them indexed |
19:32:48 | <b3nzo> | why does IA lock all the indexed archives? even if some of them arent equipped with antibot/yt-dlp |
19:34:40 | <@arkiver> | i can't speak for the crawls by IA themselves, but we have had to restrict direct access to our (Archive Team) WARCs due to scraping for LLM training and due to containing data blocked for access through the Wayback Machine |
19:35:02 | <@arkiver> | many are somewhat positive towards web archiving, but not towards mass scale AI training data collection |
19:35:47 | <@arkiver> | making the WARCs fully available would effectively mean getting archived by us is the same as giving all their public data to AI companies for training |
19:35:53 | <@arkiver> | (yay for LLMs :/ ) |
19:36:37 | <@JAA> | We should put this in an FAQ entry. |
19:36:42 | <@arkiver> | yeah |
19:36:52 | <@arkiver> | (see also recent news articles about IA and Reddit problems) |
19:40:11 | | @arkiver is afk for a while |
19:42:48 | | ducky (ducky) joins |
19:50:49 | <b3nzo> | the AI data problem is just getting bigger, and the IA is getting a lot of hate even though they dont support it |
19:51:50 | <Guest> | i thought ai companies already got all the human generated data 😂 |
19:52:40 | <b3nzo> | they certainly scraped almost all the data |
19:52:58 | <b3nzo> | but they would want more and updated data |
19:56:06 | <Guest> | anthropic got a slap on the wrist for scanning millions of books they purchased to train ai and the courts sided with meta after they torrented over 80tb of media for ai training |
19:56:56 | <Guest> | insane times we live in. most people have probably never seen 80tb worth of content. |
20:02:18 | | Webuser098077 quits [Client Quit] |
20:02:35 | <h2ibot> | Anonymoususer852 edited Frequently Asked Questions (+821, /* We Are Not The Internet Archive */ Added…): https://wiki.archiveteam.org/?diff=57198&oldid=56265 |
20:07:01 | <masterx244|m> | arkiver: sucks for future cases like the imgone warc-eating though. Not possible for most members here anymore to crunch the data to extract stuff |
20:07:35 | <h2ibot> | Anonymoususer852 edited Frequently Asked Questions (+180, "Why can't I download the WARCs for some…): https://wiki.archiveteam.org/?diff=57199&oldid=57198 |
20:10:41 | <b3nzo> | at the end of the day, none of the ai companies will pay a penny towards the charges |
20:11:44 | <masterx244|m> | yeah, fuckturds at the finest level |
21:03:09 | | Wohlstand quits [Client Quit] |
21:13:13 | | SootBector quits [Remote host closed the connection] |
21:14:28 | | SootBector (SootBector) joins |
21:32:05 | | etnguyen03 (etnguyen03) joins |
21:42:14 | | SootBector quits [Remote host closed the connection] |
21:43:22 | | SootBector (SootBector) joins |
21:45:26 | | etnguyen03 quits [Client Quit] |
21:46:50 | | b3nzo quits [Ping timeout: 258 seconds] |
22:05:10 | | dabs joins |
22:08:19 | | beardicus9 quits [Ping timeout: 260 seconds] |
22:08:21 | | SootBect1 (SootBector) joins |
22:08:36 | | SootBector quits [Ping timeout: 255 seconds] |
22:20:11 | | Dada quits [Remote host closed the connection] |
22:39:32 | | etnguyen03 (etnguyen03) joins |
22:40:24 | | Doomaholic quits [Ping timeout: 260 seconds] |
22:46:25 | | Doomaholic (Doomaholic) joins |
23:11:13 | | Wohlstand (Wohlstand) joins |
23:13:25 | | etnguyen03 quits [Client Quit] |
23:18:43 | | CuppyMan joins |
23:19:33 | | nstrom joins |
23:54:31 | | etnguyen03 (etnguyen03) joins |