00:01:13 | | etnguyen03 (etnguyen03) joins |
00:02:02 | | ummmSokar quits [Ping timeout: 258 seconds] |
00:18:08 | | archiveDrill quits [Ping timeout: 258 seconds] |
00:18:50 | | notSokar quits [Client Quit] |
00:18:58 | | Sokar joins |
00:56:08 | | archiveDrill joins |
01:42:56 | | cyanbox joins |
01:50:29 | | twiswist quits [Read error: Connection reset by peer] |
01:51:10 | | twiswist (twiswist) joins |
02:01:34 | | etnguyen03 quits [Client Quit] |
02:02:02 | <Vokun> | Naturally, more than a few people here are in IT, and some probably make a lot of money as well, and others, just invest more than they should, whether it is a lot of money to them or not. |
02:06:00 | <nicolas17> | and others are in IT and have access to machines or IP ranges that work is paying for |
02:08:48 | | Island_ quits [Read error: Connection reset by peer] |
02:09:14 | | f_ quits [Ping timeout: 260 seconds] |
02:10:51 | | f_ (funderscore) joins |
02:14:29 | | etnguyen03 (etnguyen03) joins |
02:18:34 | | f_ quits [Ping timeout: 260 seconds] |
02:18:35 | | etnguyen03 quits [Remote host closed the connection] |
02:19:20 | | f_ (funderscore) joins |
03:33:13 | | Hackerpcs quits [Quit: Hackerpcs] |
03:36:13 | <h2ibot> | Cooljeanius edited Goo Blog (+16, use URL template): https://wiki.archiveteam.org/?diff=57354&oldid=57332 |
03:44:07 | | Hackerpcs (Hackerpcs) joins |
04:46:38 | <twiswist> | Does anyone know how to make (normal) wget append to rejected-log or use a different file every time without external help such as interpolating a random filename into the command? Any time I forget to switch out that target file when I run it more than once, it annihilates data that I care a lot about |
04:49:32 | | devkev0 quits [Ping timeout: 258 seconds] |
04:50:18 | | devkev0 joins |
05:14:51 | | archiveDrill7 joins |
05:15:48 | | Wohlstand quits [Quit: Wohlstand] |
05:17:08 | | archiveDrill quits [Ping timeout: 258 seconds] |
05:17:08 | | archiveDrill7 is now known as archiveDrill |
05:35:12 | | archiveDrill quits [Client Quit] |
05:49:09 | | f_ quits [Ping timeout: 260 seconds] |
05:54:13 | | f_ (funderscore) joins |
06:25:20 | | nepeat_ quits [Quit: ZNC - https://znc.in] |
06:28:46 | | nepeat (nepeat) joins |
06:29:20 | | archiveDrill joins |
06:37:34 | | oxtyped joins |
07:30:46 | | Webuser187009 joins |
07:43:28 | <masterx244|m> | since we had it about wplace recently: someone else created a few full snapshots of it and also dumped the raw data here: https://github.com/samuelscheit/wplace-archive/releases |
07:43:28 | <masterx244|m> | (he also has a viewer for that but thats derived data from those datadumps) |
08:02:21 | | HP_Archivist (HP_Archivist) joins |
08:05:04 | | AlsoHP_Archivist quits [Ping timeout: 260 seconds] |
08:27:51 | | medecau (medecau) joins |
08:32:10 | | medecau quits [Client Quit] |
08:41:38 | | flotwig quits [Read error: Connection reset by peer] |
08:42:49 | | flotwig joins |
08:48:37 | | Dada joins |
09:02:03 | | Webuser187009 quits [Client Quit] |
09:14:28 | | Naruyoko5 joins |
09:15:28 | | AlsoHP_Archivist joins |
09:18:15 | | Naruyoko quits [Ping timeout: 258 seconds] |
09:19:44 | | HP_Archivist quits [Ping timeout: 260 seconds] |
09:20:56 | | AlsoHP_Archivist quits [Ping timeout: 258 seconds] |
09:31:43 | | Naruyoko joins |
09:34:44 | | Naruyoko5 quits [Ping timeout: 258 seconds] |
09:36:57 | | nine quits [Quit: See ya!] |
09:37:10 | | nine joins |
09:37:11 | | nine is now authenticated as nine |
09:37:11 | | nine quits [Changing host] |
09:37:11 | | nine (nine) joins |
10:19:10 | | caylin quits [Read error: Connection reset by peer] |
10:19:16 | | caylin9 (caylin) joins |
10:34:43 | <hexagonwin|m> | I just realized that TISTORY (korean weblog service like blogger that I'm trying to archive) is changing its "inactive account" policy (from removing blogs after 5 years of no login to 3 years) very soon, effective Sep 22. I believe it should be archived asap. ( https://notice.tistory.com/2693 , please use translator) |
10:34:48 | <hexagonwin|m> | I've tried to do this back in July but got busy (and unexpectedly had to save androidfilehost first). I have the full list of valid blogs as of early August( https://p.z80.kr/tistory_blogs.txt ), so what's left is writing a crawler based on the document I've written back then( https://p.z80.kr/tistory_archiveteam.html ) and running it. It's not too complex, the blog post content is shown without JS and comment is loaded via xhr json |
10:34:48 | <hexagonwin|m> | request, and pictures in post require some URL modification to get full resolution instead of lowres thumbnail. |
10:34:54 | <hexagonwin|m> | Could someone please help with writing the crawler? I don't know Lua and have tried to study/understand archiveteam's scripts with not much success. There isn't much time left so if it isn't impossible we should at least download all the blogs in that list using something basic like archivebot so that at least the blog post text and lowres images get saved.. (excluding comments, etc) |
10:49:44 | <@arkiver> | hexagonwin|m: i'll look into it |
10:50:13 | <@arkiver> | was this brought up in july? |
10:50:30 | <hexagonwin|m> | yes i've talked about this on this chatroom |
10:50:44 | <@arkiver> | i missed it then |
10:51:48 | | PredatorIWD259 joins |
10:51:48 | <hexagonwin|m> | thanks a lot for looking into this, please let me know if theres anything i can help. i have multiple internet access in korea which will probably be faster than foreign connections. so hopefully the crawling process should be ok right after the crawler is ready |
10:52:19 | <@arkiver> | are custom domains allowed |
10:52:20 | <@arkiver> | ? |
10:52:30 | <@arkiver> | hexagonwin|m: we should be fine yeah! |
10:52:39 | <@arkiver> | i'll have something running well before the deadline |
10:53:11 | <hexagonwin|m> | arkiver: yes custom domain is allowed, but the *.tistory.com domain also works and it doesn't redirect to the custom domain |
10:53:34 | <hexagonwin|m> | we should have most if not all the active blogs by just getting everything in the list i shared above |
10:53:57 | | petrichor quits [Quit: ZNC 1.10.1 - https://znc.in] |
10:54:18 | <@arkiver> | is there a link from the *.tistory.com domain to the custom domain if a custom domain exists? |
10:55:02 | | petrichor (petrichor) joins |
10:55:17 | | petrichor quits [Client Quit] |
10:55:24 | | PredatorIWD25 quits [Ping timeout: 260 seconds] |
10:55:24 | | PredatorIWD259 is now known as PredatorIWD25 |
10:56:09 | | petrichor (petrichor) joins |
10:56:16 | <hexagonwin|m> | arkiver: i wasn't sure so i just checked. the blog at https://cdmanii.com/ or https://cdmanii.tistory.com/ is a good example. it seems like it's shown on the window.T.config part as DEFAULT_URL, i'm not sure if it's correct for all cases though. |
10:57:18 | <hexagonwin|m> | (when archiving we should get /m/ for all blogs since the / (desktop) version custom skin is very customizable and might be even broken like https://skyvegatest.tistory.com/ , i think i have this in the document) |
10:58:27 | <@arkiver> | hexagonwin|m: thanks! |
10:58:34 | <@arkiver> | do you have the list of blogs that have already been discovered? |
10:58:48 | | petrichor quits [Client Quit] |
10:58:57 | <masterx244|m> | best to capture both (assuming that the images use the same URLs) since links usually go to desktop version and not every user knows the trick to replace the URL when waybacking. And for the window.T.config: best to find out where that JS object is populated (which request is delivering that data or where in the initial HTML it is buried) |
10:58:58 | <hexagonwin|m> | arkiver: the link i sent above is the list of blogs that have been already discovered |
10:59:40 | <@arkiver> | right, thank you. i was checking the html with your description |
10:59:41 | <hexagonwin|m> | (the link i sent above is the result after running this bash script that checks if a blog address is valid, as in here https://p.z80.kr/tistory_archiveteam.html#org99d7e16 ) |
11:00:02 | | Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat] |
11:00:14 | <@arkiver> | feel free to also just include everything that may not exist (but does have a source proving it may have existed at one point) |
11:00:21 | <@arkiver> | the project would handle stuff that doesn't exist |
11:01:00 | <hexagonwin|m> | masterx244: yeah ideally it would be great to save all mobile page, desktop page, thumbnail image(as in html) and original high res image but would the time be enough for this..? the thumbnail image and desktop page would be "easier" for wayback machine surfing probably but it wouldn't really be ideal for archival purpose i guess.. |
11:01:23 | <h2ibot> | Hans5958 edited Goo Blog (-1, Use #itsgoone for Goo): https://wiki.archiveteam.org/?diff=57355&oldid=57354 |
11:01:27 | <masterx244|m> | arkiver knows the AT tooling well enbough to quickly whip something up |
11:01:32 | <hexagonwin|m> | the desktop page can mostly be recreated from the "skin" that can be saved and the mobile page, so no info will be lost that way |
11:01:41 | <hexagonwin|m> | i see, hope it works well :) |
11:01:59 | <masterx244|m> | and if its warrior'd deriving mobile URLs from the desktop URL is a easy thing when itemized based on post |
11:02:05 | <@arkiver> | we would always try to get it from all sources, both mobile and desktop, and images from all servers (with writing revisit records to prevent duplicate data in the WARCs) |
11:02:18 | <@arkiver> | the main question is usually if the site can handle it. |
11:02:24 | <h2ibot> | Hans5958 edited Main Page/Current Projects (-1, Use #itsgoone for Goo): https://wiki.archiveteam.org/?diff=57356&oldid=57333 |
11:02:48 | | Bleo182600722719623455222 joins |
11:03:17 | <hexagonwin|m> | i'm not sure if its the case with tistory but when attempting to archive another service ran by the same corporation (daum agora, marked as saved by archiveteam? https://wiki.archiveteam.org/index.php/Daum_Agora ) my friend has faced very aggressive ip ban from them |
11:03:29 | <hexagonwin|m> | this was 2018 or something so maybe not relevant now |
11:04:10 | <masterx244|m> | DPoS is much harder to squishnicate than a single IP crawling |
11:04:24 | <h2ibot> | Hans5958 edited Main Page/Current Warrior Project (-8, Back to Telegram): https://wiki.archiveteam.org/?diff=57357&oldid=57329 |
11:07:23 | | petrichor (petrichor) joins |
11:08:47 | <@arkiver> | hexagonwin|m: for possible other projects, if the "issue" of data going away does not seem to be picked up here, feel free to press stronger on the issue and it'll more likely get picked up |
11:12:18 | <hexagonwin|m> | thanks for the tip :) was kinda nervous it might sound irritating/disturbing |
11:14:17 | <@arkiver> | not at all! |
11:14:31 | <@arkiver> | is there anything else that has a deadline coming up? |
11:15:52 | <hexagonwin|m> | arkiver: i'm not aware of something like that, although it isn't warc and might not be ideal androidfilehost.com is mostly saved by me |
11:16:47 | | Dada quits [Remote host closed the connection] |
11:17:08 | <hexagonwin|m> | (have some personal stuff going on so can't work on it now, but now i only need to verify the files i downloaded and find a way to upload, will take 2+weeks..) |
11:20:26 | <h2ibot> | Hans5958 edited Frequently Asked Questions (+321, Create headings and some fixes): https://wiki.archiveteam.org/?diff=57358&oldid=57257 |
11:21:26 | <h2ibot> | Hans5958 edited Frequently Asked Questions (-2, Fix wrong bold): https://wiki.archiveteam.org/?diff=57359&oldid=57358 |
11:26:27 | <h2ibot> | Hans5958 edited Template:IA id (+237, Add private information): https://wiki.archiveteam.org/?diff=57360&oldid=54709 |
11:28:27 | <h2ibot> | Hans5958 edited Oshiete! Goo (+6, Use private parameter to indicate private data): https://wiki.archiveteam.org/?diff=57361&oldid=57344 |
11:32:28 | <h2ibot> | Hans5958 edited Template:IA id (+26, Fix syntax and wording): https://wiki.archiveteam.org/?diff=57362&oldid=57360 |
11:34:28 | <h2ibot> | Hans5958 edited Template:IA id (-12, Whoops): https://wiki.archiveteam.org/?diff=57363&oldid=57362 |
11:34:29 | <h2ibot> | Hans5958 edited YouTube (+18, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57364&oldid=57157 |
11:34:30 | <h2ibot> | Hans5958 edited GitHub (+3, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57365&oldid=57193 |
11:34:31 | <h2ibot> | Hans5958 edited Google+ (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57366&oldid=57032 |
11:34:32 | <h2ibot> | Hans5958 edited GeoCities (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57367&oldid=57038 |
11:34:33 | <h2ibot> | Hans5958 edited Reddit (+12, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57368&oldid=57118 |
11:34:34 | <h2ibot> | Hans5958 edited Glitch (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57369&oldid=57345 |
11:34:35 | <h2ibot> | Hans5958 edited Blogger (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57370&oldid=57040 |
11:35:28 | <h2ibot> | Hans5958 edited Google Video (Archive) (+3, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57371&oldid=57026 |
11:35:29 | <h2ibot> | Hans5958 edited Itch.io (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57372&oldid=57342 |
11:35:30 | <h2ibot> | Hans5958 edited Yahoo! Video (+3, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57373&oldid=57027 |
11:35:31 | <h2ibot> | Hans5958 edited Telegram (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57374&oldid=57037 |
11:35:32 | <h2ibot> | Hans5958 edited Retrospring (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57375&oldid=57207 |
11:35:33 | <h2ibot> | Hans5958 edited Microsoft Update (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57376&oldid=57178 |
11:35:34 | <h2ibot> | Hans5958 edited FC2 (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57377&oldid=57315 |
11:35:35 | <h2ibot> | Hans5958 edited Goo.gl (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57378&oldid=57142 |
11:35:36 | <h2ibot> | Hans5958 edited Typepad (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57379&oldid=57343 |
11:35:37 | <h2ibot> | Hans5958 edited URLs (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57380&oldid=57036 |
11:35:38 | <h2ibot> | Hans5958 edited Rumble (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57381&oldid=57035 |
11:35:39 | <h2ibot> | Hans5958 edited FC2WEB (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57382&oldid=57033 |
11:35:40 | <h2ibot> | Hans5958 edited Meta Ad Library (+3, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57383&oldid=57029 |
11:35:41 | <h2ibot> | Hans5958 edited US Government (+3, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57384&oldid=57030 |
11:35:42 | <h2ibot> | Hans5958 edited Ftp-gov (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57385&oldid=57108 |
11:35:43 | <h2ibot> | Hans5958 edited Polar Operational Environmental Satellites (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57386&oldid=57147 |
11:35:44 | <h2ibot> | Hans5958 edited Ge.tt (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57387&oldid=57346 |
11:35:45 | <h2ibot> | Hans5958 edited Posts.cv (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57388&oldid=57148 |
11:35:48 | <emanuele6> | :o |
11:39:45 | <Hans5958> | Sorry guys |
11:40:29 | <h2ibot> | Hans5958 edited Frequently Asked Questions (-69, Fix wrong information regarding WARC access): https://wiki.archiveteam.org/?diff=57389&oldid=57359 |
11:40:30 | <h2ibot> | Hans5958 edited Frequently Asked Questions (+1, Fix anchor): https://wiki.archiveteam.org/?diff=57390&oldid=57389 |
11:41:29 | <h2ibot> | Hans5958 edited Frequently Asked Questions (-117, Remove mention of…): https://wiki.archiveteam.org/?diff=57391&oldid=57390 |
11:43:58 | | tertu quits [Quit: so long...] |
11:44:17 | | tertu (tertu) joins |
11:44:22 | <Hans5958> | I hope someone updates https://twitter.com/at_warrior |
11:44:22 | <eggdrop> | nitter: https://nitter.net/at_warrior |
11:46:30 | <h2ibot> | Cooljeanius edited Oshiete! Goo (-2, Use URL template): https://wiki.archiveteam.org/?diff=57392&oldid=57361 |
11:48:30 | <h2ibot> | Cooljeanius edited Goo (+8, Use URL template): https://wiki.archiveteam.org/?diff=57393&oldid=57353 |
11:48:31 | <h2ibot> | Cooljeanius edited Goo (+2, derp): https://wiki.archiveteam.org/?diff=57394&oldid=57393 |
12:13:34 | | cm quits [Ping timeout: 260 seconds] |
12:16:51 | | cm joins |
12:25:36 | | TheEnbyperor quits [Remote host closed the connection] |
12:25:36 | | TheEnbyperor_ quits [Remote host closed the connection] |
12:35:24 | | TheEnbyperor joins |
12:37:10 | | TheEnbyperor_ (TheEnbyperor) joins |
12:38:31 | | HackMii (hacktheplanet) joins |
12:39:44 | <HackMii> | LLM scrapers seems to be mimicking the archivebot. I see useragent "Mozilla/5.0 (compatible; archive.org_bot +http://archive.org/details/archive.org_bot) Zeno/a07610d warc/v0.8.85" - with version number changes occasionally. |
12:42:06 | <c3manu> | HackMii: sorry, i didn't see this earlier. 'archive.org_bot' is not the Archive Team's bot |
12:42:44 | <c3manu> | i actually don't know what the Internet Archive's crawler use. |
12:42:57 | <c3manu> | HackMii: what makes you think it's not a legit crawl? |
12:43:54 | | Ointment8862 quits [Ping timeout: 260 seconds] |
12:45:15 | | Commander001 quits [Ping timeout: 258 seconds] |
12:45:33 | | Ointment8862 (Ointment8862) joins |
12:46:10 | <HackMii> | c3manu: Since the URL 404s, it matches the pattern on a slow LLM scraper (lots of random IPs, slowly scraping pages) and attempts to scrape continue even if I 403 the useragent. |
12:46:19 | <HackMii> | *files |
12:48:09 | | SootBector quits [Remote host closed the connection] |
12:49:00 | <HackMii> | Well, I guess that's that, as it's not the archivebot and the 403 is correct. |
12:49:25 | | SootBector (SootBector) joins |
12:50:42 | | SootBector quits [Remote host closed the connection] |
12:51:50 | | SootBector (SootBector) joins |
13:02:34 | | petrichor quits [Ping timeout: 260 seconds] |
13:03:57 | | petrichor (petrichor) joins |
13:17:50 | | Ointment8862 quits [Read error: Connection reset by peer] |
13:22:02 | | Ointment8862 (Ointment8862) joins |
13:26:58 | <cruller> | NHK has issued a new announcement regarding its renewal: https://www.nhk.or.jp/nhkone/release/assets/pdf/250917_003.pdf |
13:28:19 | <cruller> | NHK will migrate nearly all pages to https://www.web.nhk and introduce "「ご利用にあたって」画面" there. IMO, this is not a paywall, but rather something like a ToS agreement screen. If so, full archiving is not necessary at this time. |
13:30:19 | <cruller> | However, some content will be deleted. Could someone please crawl it with ArchiveBot? Here's the list of seeds: https://transfer.archivete.am/lyM3Y/List%20of%20NHK%20websites%20to%20be%20closed%20this%20month.txt |
13:30:20 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/lyM3Y/List%20of%20NHK%20websites%20to%20be%20closed%20this%20month.txt |
13:34:33 | | Commander001 joins |
13:34:39 | | Ointment8862 quits [Ping timeout: 260 seconds] |
13:38:09 | | programmerq quits [Ping timeout: 260 seconds] |
13:40:28 | | Ointment8862 (Ointment8862) joins |
13:45:26 | | Ointment8862 quits [Ping timeout: 258 seconds] |
13:51:53 | | dabs joins |
14:19:10 | | dabs quits [Ping timeout: 258 seconds] |
14:43:01 | <h2ibot> | Hans5958 edited 教えて! goo (-1, Redirected page to [[Oshiete! Goo]]): https://wiki.archiveteam.org/?diff=57395&oldid=57305 |
14:45:31 | <hexagonwin|m> | arkiver: while lurking around google/daum search i've found some blogs that are missing in the list i sent above :/ would it be possible to add new blogs to the queue later on? i'm thinking of scraping search engines with random keywords a bit more. |
14:52:53 | | Island joins |
14:55:03 | <h2ibot> | Hans5958 edited Warrior projects (+28350, Update Warrior projects as of today. Needs…): https://wiki.archiveteam.org/?diff=57396&oldid=47677 |
14:55:35 | <@arkiver> | hexagonwin|m: absolutely! new items can be added any time |
14:56:40 | <hexagonwin|m> | arkiver: i see, that's great to hear. btw, is randomly searching on search engines to find new urls commonly done for archiveteam projects? |
14:57:23 | <hexagonwin|m> | it seems to be pretty effective, but requires very frequent ip rotation so i don't think it can be done in dpos.. |
15:00:04 | <h2ibot> | TriangleDemon edited Oshiete! Goo (-6): https://wiki.archiveteam.org/?diff=57397&oldid=57392 |
15:00:05 | <h2ibot> | TriangleDemon edited Goo (-5): https://wiki.archiveteam.org/?diff=57398&oldid=57394 |
15:03:14 | <@arkiver> | it is not done on a large scalw |
15:03:15 | <@arkiver> | scale |
15:04:05 | | fionera quits [Remote host closed the connection] |
15:06:25 | | IDK (IDK) joins |
15:10:18 | <Hans5958> | Just finished doing https://wiki.archiveteam.org/index.php/Warrior_projects which hasn't been updated since 2021. If anyone is bored enough to fill the missing dates (I only did years based on the last commit date on GitHub), go ahead and edit it. |
15:11:12 | <Hans5958> | Also I put every completed projects as "Archive Posted" and "Qualified Success" even though it could be wrong, so if anyone wants to fix it, go ahead |
15:17:07 | <h2ibot> | Nintendofan885 edited Goo Blog (+77, mention ArchiveBot job): https://wiki.archiveteam.org/?diff=57399&oldid=57355 |
15:17:08 | <h2ibot> | Nintendofan885 edited Goo Blog (+4, oops): https://wiki.archiveteam.org/?diff=57400&oldid=57399 |
15:18:07 | <h2ibot> | Nintendofan885 edited Goo Blog (+3, missed another word): https://wiki.archiveteam.org/?diff=57401&oldid=57400 |
15:20:14 | | petrichor quits [Ping timeout: 260 seconds] |
15:21:07 | <h2ibot> | Hans5958 edited Goo Blog (-226, Some edits here and there): https://wiki.archiveteam.org/?diff=57402&oldid=57401 |
15:22:13 | | petrichor (petrichor) joins |
15:25:34 | | cyanbox quits [Read error: Connection reset by peer] |
15:28:59 | | ducky quits [Ping timeout: 260 seconds] |
15:31:11 | | ducky (ducky) joins |
15:45:23 | | Chris5010 quits [Quit: ] |
15:47:49 | | Naruyoko5 joins |
15:51:44 | | Naruyoko quits [Ping timeout: 260 seconds] |
15:54:12 | | Chris5010 (Chris5010) joins |
16:05:04 | | Dada joins |
16:15:22 | <@arkiver> | Hans5958: that is a huge list! |
16:15:36 | <@arkiver> | we got a lot of projects done over the years... |
16:33:44 | | @imer quits [Ping timeout: 260 seconds] |
16:41:10 | | imer (imer) joins |
16:41:10 | | @ChanServ sets mode: +o imer |
16:46:20 | <h2ibot> | Hans5958 edited Warrior projects (+67, Put some channel to hackint): https://wiki.archiveteam.org/?diff=57403&oldid=57396 |
17:07:01 | | b3nzo joins |
17:07:01 | <eggdrop> | [tell] b3nzo: [2025-09-16T20:12:15Z] <JAA> The log is in the meta WARC. Relatedly, if the job crashed, there won't be a meta WARC, so in that case, you should compress the wpull.log file and include that. |
17:07:16 | | Wake quits [Quit: The Lounge - https://thelounge.chat] |
17:24:56 | | Wake joins |
17:31:16 | | Commander001 quits [Remote host closed the connection] |
17:35:50 | | Commander001 joins |
17:39:51 | <@JAA> | Hmm, I don't see anything about Tistory on the wiki. |
17:46:48 | | SootBector quits [Remote host closed the connection] |
17:47:52 | | SootBector (SootBector) joins |
18:10:57 | <Hans5958> | By the way, it would be nice if I can get the list of IRC channels that haven't been abandoned |
18:11:14 | <Hans5958> | So I can manage stuff on Matrix and the wiki, if I got time around |
18:12:05 | <Hans5958> | <JAA> "Hmm, I don't see anything..." <- Re: Tistory, neither on the GitHub org (can't find anything reg. Tistory there) |
18:12:31 | <Hans5958> | That's where I source the Warrior projects |
18:37:13 | <@JAA> | Hans5958: There wasn't a project, but there's apparently a deadline approaching, see earlier discussion. |
18:37:13 | | emanuele6 quits [Read error: Connection reset by peer] |
18:38:01 | <pokechu22> | Ryz has been running sites, but I'm not sure where the list comes from/how complete it is |
18:38:15 | <@JAA> | Reminder for everyone to please add such things to Deathwatch. |
18:39:57 | <Ryz> | As far as I know, Tistory isn't shutting down, it's just one coming from one of the many piles I've been wanting to run through but haven't had the time until I restumbled upon said pile~ |
18:40:25 | <@JAA> | They're not shutting down, but they're purging inactive blogs, see above. |
18:40:32 | <Ryz> | ...Oh :( |
18:41:02 | <@JAA> | Or well, they've always been doing that, but they're shortening the inactivity window, so a bunch of blogs will get purged in a few days. |
18:42:42 | <Ryz> | arkiver, JAA, regarding Tistory blogs, they're a bit sensitive from what I gather from archiving them, they are prone to 429s if maybe there's more than 1 job in the pipeline series |
18:42:58 | <Ryz> | Additionally, there are particular things that make it go forever if not stopped, like calendar stuff |
18:44:01 | <Ryz> | Might explain some of Tistory blogs I encountered when checking through my list that are 'new' when in fact they were there before but got purged or someone took the spot |
18:44:45 | | HackMii quits [Ping timeout: 255 seconds] |
18:46:34 | | HackMii (hacktheplanet) joins |
19:05:59 | | wyatt8740 quits [Ping timeout: 260 seconds] |
19:07:17 | <Ryz> | I could give my list of Tistory stuff if there's going to be Tistory project of sorts coming up |
19:07:25 | <Ryz> | It's not huge but it's something~ |
19:08:43 | | wyatt8740 joins |
19:08:59 | <h2ibot> | Manu edited Distributed recursive crawls (+110, Add mchs.gov.ru): https://wiki.archiveteam.org/?diff=57404&oldid=57306 |
19:09:00 | <h2ibot> | JustAnotherArchivist edited Typepad (+291, Add list of domains): https://wiki.archiveteam.org/?diff=57405&oldid=57379 |
19:10:40 | <h2ibot> | Benizz edited List of websites excluded from the Wayback Machine (+22, Add emma.fr): https://wiki.archiveteam.org/?diff=57406&oldid=57281 |
19:14:41 | <h2ibot> | Pokechu22 edited Mailman/2 (+91, /* Lost */…): https://wiki.archiveteam.org/?diff=57407&oldid=57103 |
19:15:04 | | lennier2_ joins |
19:15:06 | | wyatt8740 quits [Ping timeout: 258 seconds] |
19:15:30 | | IDK quits [Quit: Connection closed for inactivity] |
19:17:47 | | lennier2 quits [Ping timeout: 258 seconds] |
19:19:27 | | wyatt8740 joins |
19:25:27 | | wyatt8740 quits [Ping timeout: 258 seconds] |
19:30:52 | | Dada quits [Remote host closed the connection] |
19:31:35 | | wyatt8740 joins |
19:38:53 | | BornOn420 quits [Quit: Textual IRC Client: www.textualapp.com] |
19:49:22 | | IDK (IDK) joins |
19:51:05 | | BornOn420 (BornOn420) joins |
19:54:22 | | BornOn420_ (BornOn420) joins |
20:00:49 | | BornOn420_ quits [Ping timeout: 260 seconds] |
20:03:57 | <h2ibot> | Manu edited Discourse/archived (+115, Queued community.hedgedoc.org): https://wiki.archiveteam.org/?diff=57408&oldid=57340 |
20:42:00 | | Wohlstand (Wohlstand) joins |
20:43:39 | | Guest quits [Quit: Guest] |
20:45:53 | | milesw joins |
21:07:45 | | etnguyen03 (etnguyen03) joins |
21:23:04 | | Guest joins |
21:26:35 | | Guest quits [Client Quit] |
21:26:50 | | Guest joins |
21:29:21 | | Guest quits [Client Quit] |
21:32:10 | <h2ibot> | JustAnotherArchivist changed the user rights of User:TriangleDemon (Too many edits with incorrect information…) |
21:36:55 | | Guest joins |
21:39:24 | | b3nzo quits [Ping timeout: 260 seconds] |
21:58:42 | | Guest quits [Client Quit] |
21:59:22 | | Guest joins |
22:11:29 | | siinus quits [Ping timeout: 260 seconds] |
22:23:29 | | etnguyen03 quits [Client Quit] |
22:31:19 | | Wohlstand quits [Client Quit] |
22:43:38 | | milesw1 joins |
22:43:39 | | milesw quits [Read error: Connection reset by peer] |
22:43:41 | | etnguyen03 (etnguyen03) joins |
22:50:46 | | Guest quits [Client Quit] |
22:50:56 | | Guest joins |
22:57:12 | | Guest quits [Client Quit] |
22:58:34 | | Guest joins |
23:03:48 | | nicolas17 is now authenticated as nicolas17 |
23:08:29 | | etnguyen03 quits [Client Quit] |
23:21:41 | | emanuele6 (emanuele6) joins |
23:32:12 | <lemuria> | So when a streamer I knew started unlisting their videos left and right, I downloaded them. They're no longer available. Is there a guide out there or just anything to take into consideration before I upload it to the internet archive? Have wanted to do so but my DMCA paranoia is stopping me |
23:32:34 | <lemuria> | I know how to upload and use the ia command somewhat, just need help with the moral/legal/ethical part of it |
23:46:22 | | siinus (siinus) joins |
23:54:38 | <hexagonwin|m> | Ryz: just curious, may i ask what you mean by "calendar stuff"? |
23:54:58 | <@OrIdow6> | lemuria: Honestly there is no guide that I know of, and you'll find different opinions on that, though this room leans heavily towards "upload everything" |
23:55:10 | <nicolas17> | the streamer would have to send a formal DMCA takedown request to internet archive to get you in trouble |
23:55:31 | <nicolas17> | you'll know better than us if they're likely to do that |
23:55:48 | <@OrIdow6> | As for the legal - yeah what nicolas17 said |
23:57:28 | <@OrIdow6> | I personally think that if it's a streamer with like, 5 viewers it shouldn't be uploaded but that's an extreme that probably doesn't apply to you |
23:57:42 | <lemuria> | she's in the 3K-4K follower range |
23:58:07 | <lemuria> | and I guess the main concern here is whether she'll ban me from the community. i'm a member of that streamer's community, been there for like, two to three years |