#archiveteam-bs log for 2025-09-17

Home Search Previous day Next day

00:01:13		etnguyen03 (etnguyen03) joins
00:02:02		ummmSokar quits [Ping timeout: 258 seconds]
00:18:08		archiveDrill quits [Ping timeout: 258 seconds]
00:18:50		notSokar quits [Client Quit]
00:18:58		Sokar joins
00:56:08		archiveDrill joins
01:42:56		cyanbox joins
01:50:29		twiswist quits [Read error: Connection reset by peer]
01:51:10		twiswist (twiswist) joins
02:01:34		etnguyen03 quits [Client Quit]
02:02:02	<Vokun>	Naturally, more than a few people here are in IT, and some probably make a lot of money as well, and others, just invest more than they should, whether it is a lot of money to them or not.
02:06:00	<nicolas17>	and others are in IT and have access to machines or IP ranges that work is paying for
02:08:48		Island_ quits [Read error: Connection reset by peer]
02:09:14		f_ quits [Ping timeout: 260 seconds]
02:10:51		f_ (funderscore) joins
02:14:29		etnguyen03 (etnguyen03) joins
02:18:34		f_ quits [Ping timeout: 260 seconds]
02:18:35		etnguyen03 quits [Remote host closed the connection]
02:19:20		f_ (funderscore) joins
03:33:13		Hackerpcs quits [Quit: Hackerpcs]
03:36:13	<h2ibot>	Cooljeanius edited Goo Blog (+16, use URL template): https://wiki.archiveteam.org/?diff=57354&oldid=57332
03:44:07		Hackerpcs (Hackerpcs) joins
04:46:38	<twiswist>	Does anyone know how to make (normal) wget append to rejected-log or use a different file every time without external help such as interpolating a random filename into the command? Any time I forget to switch out that target file when I run it more than once, it annihilates data that I care a lot about
04:49:32		devkev0 quits [Ping timeout: 258 seconds]
04:50:18		devkev0 joins
05:14:51		archiveDrill7 joins
05:15:48		Wohlstand quits [Quit: Wohlstand]
05:17:08		archiveDrill quits [Ping timeout: 258 seconds]
05:17:08		archiveDrill7 is now known as archiveDrill
05:35:12		archiveDrill quits [Client Quit]
05:49:09		f_ quits [Ping timeout: 260 seconds]
05:54:13		f_ (funderscore) joins
06:25:20		nepeat_ quits [Quit: ZNC - https://znc.in]
06:28:46		nepeat (nepeat) joins
06:29:20		archiveDrill joins
06:37:34		oxtyped joins
07:30:46		Webuser187009 joins
07:43:28	<masterx244\|m>	since we had it about wplace recently: someone else created a few full snapshots of it and also dumped the raw data here: https://github.com/samuelscheit/wplace-archive/releases
07:43:28	<masterx244\|m>	(he also has a viewer for that but thats derived data from those datadumps)
08:02:21		HP_Archivist (HP_Archivist) joins
08:05:04		AlsoHP_Archivist quits [Ping timeout: 260 seconds]
08:27:51		medecau (medecau) joins
08:32:10		medecau quits [Client Quit]
08:41:38		flotwig quits [Read error: Connection reset by peer]
08:42:49		flotwig joins
08:48:37		Dada joins
09:02:03		Webuser187009 quits [Client Quit]
09:14:28		Naruyoko5 joins
09:15:28		AlsoHP_Archivist joins
09:18:15		Naruyoko quits [Ping timeout: 258 seconds]
09:19:44		HP_Archivist quits [Ping timeout: 260 seconds]
09:20:56		AlsoHP_Archivist quits [Ping timeout: 258 seconds]
09:31:43		Naruyoko joins
09:34:44		Naruyoko5 quits [Ping timeout: 258 seconds]
09:36:57		nine quits [Quit: See ya!]
09:37:10		nine joins
09:37:11		nine is now authenticated as nine
09:37:11		nine quits [Changing host]
09:37:11		nine (nine) joins
10:19:10		caylin quits [Read error: Connection reset by peer]
10:19:16		caylin9 (caylin) joins
10:34:43	<hexagonwin\|m>	I just realized that TISTORY (korean weblog service like blogger that I'm trying to archive) is changing its "inactive account" policy (from removing blogs after 5 years of no login to 3 years) very soon, effective Sep 22. I believe it should be archived asap. ( https://notice.tistory.com/2693 , please use translator)
10:34:48	<hexagonwin\|m>	I've tried to do this back in July but got busy (and unexpectedly had to save androidfilehost first). I have the full list of valid blogs as of early August( https://p.z80.kr/tistory_blogs.txt ), so what's left is writing a crawler based on the document I've written back then( https://p.z80.kr/tistory_archiveteam.html ) and running it. It's not too complex, the blog post content is shown without JS and comment is loaded via xhr json
10:34:48	<hexagonwin\|m>	request, and pictures in post require some URL modification to get full resolution instead of lowres thumbnail.
10:34:54	<hexagonwin\|m>	Could someone please help with writing the crawler? I don't know Lua and have tried to study/understand archiveteam's scripts with not much success. There isn't much time left so if it isn't impossible we should at least download all the blogs in that list using something basic like archivebot so that at least the blog post text and lowres images get saved.. (excluding comments, etc)
10:49:44	<@arkiver>	hexagonwin\|m: i'll look into it
10:50:13	<@arkiver>	was this brought up in july?
10:50:30	<hexagonwin\|m>	yes i've talked about this on this chatroom
10:50:44	<@arkiver>	i missed it then
10:51:48		PredatorIWD259 joins
10:51:48	<hexagonwin\|m>	thanks a lot for looking into this, please let me know if theres anything i can help. i have multiple internet access in korea which will probably be faster than foreign connections. so hopefully the crawling process should be ok right after the crawler is ready
10:52:19	<@arkiver>	are custom domains allowed
10:52:20	<@arkiver>	?
10:52:30	<@arkiver>	hexagonwin\|m: we should be fine yeah!
10:52:39	<@arkiver>	i'll have something running well before the deadline
10:53:11	<hexagonwin\|m>	arkiver: yes custom domain is allowed, but the *.tistory.com domain also works and it doesn't redirect to the custom domain
10:53:34	<hexagonwin\|m>	we should have most if not all the active blogs by just getting everything in the list i shared above
10:53:57		petrichor quits [Quit: ZNC 1.10.1 - https://znc.in]
10:54:18	<@arkiver>	is there a link from the *.tistory.com domain to the custom domain if a custom domain exists?
10:55:02		petrichor (petrichor) joins
10:55:17		petrichor quits [Client Quit]
10:55:24		PredatorIWD25 quits [Ping timeout: 260 seconds]
10:55:24		PredatorIWD259 is now known as PredatorIWD25
10:56:09		petrichor (petrichor) joins
10:56:16	<hexagonwin\|m>	arkiver: i wasn't sure so i just checked. the blog at https://cdmanii.com/ or https://cdmanii.tistory.com/ is a good example. it seems like it's shown on the window.T.config part as DEFAULT_URL, i'm not sure if it's correct for all cases though.
10:57:18	<hexagonwin\|m>	(when archiving we should get /m/ for all blogs since the / (desktop) version custom skin is very customizable and might be even broken like https://skyvegatest.tistory.com/ , i think i have this in the document)
10:58:27	<@arkiver>	hexagonwin\|m: thanks!
10:58:34	<@arkiver>	do you have the list of blogs that have already been discovered?
10:58:48		petrichor quits [Client Quit]
10:58:57	<masterx244\|m>	best to capture both (assuming that the images use the same URLs) since links usually go to desktop version and not every user knows the trick to replace the URL when waybacking. And for the window.T.config: best to find out where that JS object is populated (which request is delivering that data or where in the initial HTML it is buried)
10:58:58	<hexagonwin\|m>	arkiver: the link i sent above is the list of blogs that have been already discovered
10:59:40	<@arkiver>	right, thank you. i was checking the html with your description
10:59:41	<hexagonwin\|m>	(the link i sent above is the result after running this bash script that checks if a blog address is valid, as in here https://p.z80.kr/tistory_archiveteam.html#org99d7e16 )
11:00:02		Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
11:00:14	<@arkiver>	feel free to also just include everything that may not exist (but does have a source proving it may have existed at one point)
11:00:21	<@arkiver>	the project would handle stuff that doesn't exist
11:01:00	<hexagonwin\|m>	masterx244: yeah ideally it would be great to save all mobile page, desktop page, thumbnail image(as in html) and original high res image but would the time be enough for this..? the thumbnail image and desktop page would be "easier" for wayback machine surfing probably but it wouldn't really be ideal for archival purpose i guess..
11:01:23	<h2ibot>	Hans5958 edited Goo Blog (-1, ‎ Use #itsgoone for Goo): https://wiki.archiveteam.org/?diff=57355&oldid=57354
11:01:27	<masterx244\|m>	arkiver knows the AT tooling well enbough to quickly whip something up
11:01:32	<hexagonwin\|m>	the desktop page can mostly be recreated from the "skin" that can be saved and the mobile page, so no info will be lost that way
11:01:41	<hexagonwin\|m>	i see, hope it works well :)
11:01:59	<masterx244\|m>	and if its warrior'd deriving mobile URLs from the desktop URL is a easy thing when itemized based on post
11:02:05	<@arkiver>	we would always try to get it from all sources, both mobile and desktop, and images from all servers (with writing revisit records to prevent duplicate data in the WARCs)
11:02:18	<@arkiver>	the main question is usually if the site can handle it.
11:02:24	<h2ibot>	Hans5958 edited Main Page/Current Projects (-1, ‎ Use #itsgoone for Goo): https://wiki.archiveteam.org/?diff=57356&oldid=57333
11:02:48		Bleo182600722719623455222 joins
11:03:17	<hexagonwin\|m>	i'm not sure if its the case with tistory but when attempting to archive another service ran by the same corporation (daum agora, marked as saved by archiveteam? https://wiki.archiveteam.org/index.php/Daum_Agora ) my friend has faced very aggressive ip ban from them
11:03:29	<hexagonwin\|m>	this was 2018 or something so maybe not relevant now
11:04:10	<masterx244\|m>	DPoS is much harder to squishnicate than a single IP crawling
11:04:24	<h2ibot>	Hans5958 edited Main Page/Current Warrior Project (-8, Back to Telegram): https://wiki.archiveteam.org/?diff=57357&oldid=57329
11:07:23		petrichor (petrichor) joins
11:08:47	<@arkiver>	hexagonwin\|m: for possible other projects, if the "issue" of data going away does not seem to be picked up here, feel free to press stronger on the issue and it'll more likely get picked up
11:12:18	<hexagonwin\|m>	thanks for the tip :) was kinda nervous it might sound irritating/disturbing
11:14:17	<@arkiver>	not at all!
11:14:31	<@arkiver>	is there anything else that has a deadline coming up?
11:15:52	<hexagonwin\|m>	arkiver: i'm not aware of something like that, although it isn't warc and might not be ideal androidfilehost.com is mostly saved by me
11:16:47		Dada quits [Remote host closed the connection]
11:17:08	<hexagonwin\|m>	(have some personal stuff going on so can't work on it now, but now i only need to verify the files i downloaded and find a way to upload, will take 2+weeks..)
11:20:26	<h2ibot>	Hans5958 edited Frequently Asked Questions (+321, Create headings and some fixes): https://wiki.archiveteam.org/?diff=57358&oldid=57257
11:21:26	<h2ibot>	Hans5958 edited Frequently Asked Questions (-2, Fix wrong bold): https://wiki.archiveteam.org/?diff=57359&oldid=57358
11:26:27	<h2ibot>	Hans5958 edited Template:IA id (+237, Add private information): https://wiki.archiveteam.org/?diff=57360&oldid=54709
11:28:27	<h2ibot>	Hans5958 edited Oshiete! Goo (+6, Use private parameter to indicate private data): https://wiki.archiveteam.org/?diff=57361&oldid=57344
11:32:28	<h2ibot>	Hans5958 edited Template:IA id (+26, Fix syntax and wording): https://wiki.archiveteam.org/?diff=57362&oldid=57360
11:34:28	<h2ibot>	Hans5958 edited Template:IA id (-12, Whoops): https://wiki.archiveteam.org/?diff=57363&oldid=57362
11:34:29	<h2ibot>	Hans5958 edited YouTube (+18, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57364&oldid=57157
11:34:30	<h2ibot>	Hans5958 edited GitHub (+3, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57365&oldid=57193
11:34:31	<h2ibot>	Hans5958 edited Google+ (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57366&oldid=57032
11:34:32	<h2ibot>	Hans5958 edited GeoCities (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57367&oldid=57038
11:34:33	<h2ibot>	Hans5958 edited Reddit (+12, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57368&oldid=57118
11:34:34	<h2ibot>	Hans5958 edited Glitch (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57369&oldid=57345
11:34:35	<h2ibot>	Hans5958 edited Blogger (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57370&oldid=57040
11:35:28	<h2ibot>	Hans5958 edited Google Video (Archive) (+3, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57371&oldid=57026
11:35:29	<h2ibot>	Hans5958 edited Itch.io (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57372&oldid=57342
11:35:30	<h2ibot>	Hans5958 edited Yahoo! Video (+3, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57373&oldid=57027
11:35:31	<h2ibot>	Hans5958 edited Telegram (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57374&oldid=57037
11:35:32	<h2ibot>	Hans5958 edited Retrospring (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57375&oldid=57207
11:35:33	<h2ibot>	Hans5958 edited Microsoft Update (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57376&oldid=57178
11:35:34	<h2ibot>	Hans5958 edited FC2 (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57377&oldid=57315
11:35:35	<h2ibot>	Hans5958 edited Goo.gl (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57378&oldid=57142
11:35:36	<h2ibot>	Hans5958 edited Typepad (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57379&oldid=57343
11:35:37	<h2ibot>	Hans5958 edited URLs (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57380&oldid=57036
11:35:38	<h2ibot>	Hans5958 edited Rumble (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57381&oldid=57035
11:35:39	<h2ibot>	Hans5958 edited FC2WEB (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57382&oldid=57033
11:35:40	<h2ibot>	Hans5958 edited Meta Ad Library (+3, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57383&oldid=57029
11:35:41	<h2ibot>	Hans5958 edited US Government (+3, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57384&oldid=57030
11:35:42	<h2ibot>	Hans5958 edited Ftp-gov (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57385&oldid=57108
11:35:43	<h2ibot>	Hans5958 edited Polar Operational Environmental Satellites (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57386&oldid=57147
11:35:44	<h2ibot>	Hans5958 edited Ge.tt (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57387&oldid=57346
11:35:45	<h2ibot>	Hans5958 edited Posts.cv (+6, Use private parameter to indicate restricted…): https://wiki.archiveteam.org/?diff=57388&oldid=57148
11:35:48	<emanuele6>	:o
11:39:45	<Hans5958>	Sorry guys
11:40:29	<h2ibot>	Hans5958 edited Frequently Asked Questions (-69, Fix wrong information regarding WARC access): https://wiki.archiveteam.org/?diff=57389&oldid=57359
11:40:30	<h2ibot>	Hans5958 edited Frequently Asked Questions (+1, Fix anchor): https://wiki.archiveteam.org/?diff=57390&oldid=57389
11:41:29	<h2ibot>	Hans5958 edited Frequently Asked Questions (-117, Remove mention of…): https://wiki.archiveteam.org/?diff=57391&oldid=57390
11:43:58		tertu quits [Quit: so long...]
11:44:17		tertu (tertu) joins
11:44:22	<Hans5958>	I hope someone updates https://twitter.com/at_warrior
11:44:22	<eggdrop>	nitter: https://nitter.net/at_warrior
11:46:30	<h2ibot>	Cooljeanius edited Oshiete! Goo (-2, Use URL template): https://wiki.archiveteam.org/?diff=57392&oldid=57361
11:48:30	<h2ibot>	Cooljeanius edited Goo (+8, Use URL template): https://wiki.archiveteam.org/?diff=57393&oldid=57353
11:48:31	<h2ibot>	Cooljeanius edited Goo (+2, derp): https://wiki.archiveteam.org/?diff=57394&oldid=57393
12:13:34		cm quits [Ping timeout: 260 seconds]
12:16:51		cm joins
12:25:36		TheEnbyperor quits [Remote host closed the connection]
12:25:36		TheEnbyperor_ quits [Remote host closed the connection]
12:35:24		TheEnbyperor joins
12:37:10		TheEnbyperor_ (TheEnbyperor) joins
12:38:31		HackMii (hacktheplanet) joins
12:39:44	<HackMii>	LLM scrapers seems to be mimicking the archivebot. I see useragent "Mozilla/5.0 (compatible; archive.org_bot +http://archive.org/details/archive.org_bot) Zeno/a07610d warc/v0.8.85" - with version number changes occasionally.
12:42:06	<c3manu>	HackMii: sorry, i didn't see this earlier. 'archive.org_bot' is not the Archive Team's bot
12:42:44	<c3manu>	i actually don't know what the Internet Archive's crawler use.
12:42:57	<c3manu>	HackMii: what makes you think it's not a legit crawl?
12:43:54		Ointment8862 quits [Ping timeout: 260 seconds]
12:45:15		Commander001 quits [Ping timeout: 258 seconds]
12:45:33		Ointment8862 (Ointment8862) joins
12:46:10	<HackMii>	c3manu: Since the URL 404s, it matches the pattern on a slow LLM scraper (lots of random IPs, slowly scraping pages) and attempts to scrape continue even if I 403 the useragent.
12:46:19	<HackMii>	*files
12:48:09		SootBector quits [Remote host closed the connection]
12:49:00	<HackMii>	Well, I guess that's that, as it's not the archivebot and the 403 is correct.
12:49:25		SootBector (SootBector) joins
12:50:42		SootBector quits [Remote host closed the connection]
12:51:50		SootBector (SootBector) joins
13:02:34		petrichor quits [Ping timeout: 260 seconds]
13:03:57		petrichor (petrichor) joins
13:17:50		Ointment8862 quits [Read error: Connection reset by peer]
13:22:02		Ointment8862 (Ointment8862) joins
13:26:58	<cruller>	NHK has issued a new announcement regarding its renewal: https://www.nhk.or.jp/nhkone/release/assets/pdf/250917_003.pdf
13:28:19	<cruller>	NHK will migrate nearly all pages to https://www.web.nhk and introduce "「ご利用にあたって」画面" there. IMO, this is not a paywall, but rather something like a ToS agreement screen. If so, full archiving is not necessary at this time.
13:30:19	<cruller>	However, some content will be deleted. Could someone please crawl it with ArchiveBot? Here's the list of seeds: https://transfer.archivete.am/lyM3Y/List%20of%20NHK%20websites%20to%20be%20closed%20this%20month.txt
13:30:20	<eggdrop>	inline (for browser viewing): https://transfer.archivete.am/inline/lyM3Y/List%20of%20NHK%20websites%20to%20be%20closed%20this%20month.txt
13:34:33		Commander001 joins
13:34:39		Ointment8862 quits [Ping timeout: 260 seconds]
13:38:09		programmerq quits [Ping timeout: 260 seconds]
13:40:28		Ointment8862 (Ointment8862) joins
13:45:26		Ointment8862 quits [Ping timeout: 258 seconds]
13:51:53		dabs joins
14:19:10		dabs quits [Ping timeout: 258 seconds]
14:43:01	<h2ibot>	Hans5958 edited 教えて! goo (-1, Redirected page to [[Oshiete! Goo]]): https://wiki.archiveteam.org/?diff=57395&oldid=57305
14:45:31	<hexagonwin\|m>	arkiver: while lurking around google/daum search i've found some blogs that are missing in the list i sent above :/ would it be possible to add new blogs to the queue later on? i'm thinking of scraping search engines with random keywords a bit more.
14:52:53		Island joins
14:55:03	<h2ibot>	Hans5958 edited Warrior projects (+28350, Update Warrior projects as of today. Needs…): https://wiki.archiveteam.org/?diff=57396&oldid=47677
14:55:35	<@arkiver>	hexagonwin\|m: absolutely! new items can be added any time
14:56:40	<hexagonwin\|m>	arkiver: i see, that's great to hear. btw, is randomly searching on search engines to find new urls commonly done for archiveteam projects?
14:57:23	<hexagonwin\|m>	it seems to be pretty effective, but requires very frequent ip rotation so i don't think it can be done in dpos..
15:00:04	<h2ibot>	TriangleDemon edited Oshiete! Goo (-6): https://wiki.archiveteam.org/?diff=57397&oldid=57392
15:00:05	<h2ibot>	TriangleDemon edited Goo (-5): https://wiki.archiveteam.org/?diff=57398&oldid=57394
15:03:14	<@arkiver>	it is not done on a large scalw
15:03:15	<@arkiver>	scale
15:04:05		fionera quits [Remote host closed the connection]
15:06:25		IDK (IDK) joins
15:10:18	<Hans5958>	Just finished doing https://wiki.archiveteam.org/index.php/Warrior_projects which hasn't been updated since 2021. If anyone is bored enough to fill the missing dates (I only did years based on the last commit date on GitHub), go ahead and edit it.
15:11:12	<Hans5958>	Also I put every completed projects as "Archive Posted" and "Qualified Success" even though it could be wrong, so if anyone wants to fix it, go ahead
15:17:07	<h2ibot>	Nintendofan885 edited Goo Blog (+77, mention ArchiveBot job): https://wiki.archiveteam.org/?diff=57399&oldid=57355
15:17:08	<h2ibot>	Nintendofan885 edited Goo Blog (+4, oops): https://wiki.archiveteam.org/?diff=57400&oldid=57399
15:18:07	<h2ibot>	Nintendofan885 edited Goo Blog (+3, missed another word): https://wiki.archiveteam.org/?diff=57401&oldid=57400
15:20:14		petrichor quits [Ping timeout: 260 seconds]
15:21:07	<h2ibot>	Hans5958 edited Goo Blog (-226, Some edits here and there): https://wiki.archiveteam.org/?diff=57402&oldid=57401
15:22:13		petrichor (petrichor) joins
15:25:34		cyanbox quits [Read error: Connection reset by peer]
15:28:59		ducky quits [Ping timeout: 260 seconds]
15:31:11		ducky (ducky) joins
15:45:23		Chris5010 quits [Quit: ]
15:47:49		Naruyoko5 joins
15:51:44		Naruyoko quits [Ping timeout: 260 seconds]
15:54:12		Chris5010 (Chris5010) joins
16:05:04		Dada joins
16:15:22	<@arkiver>	Hans5958: that is a huge list!
16:15:36	<@arkiver>	we got a lot of projects done over the years...
16:33:44		@imer quits [Ping timeout: 260 seconds]
16:41:10		imer (imer) joins
16:41:10		@ChanServ sets mode: +o imer
16:46:20	<h2ibot>	Hans5958 edited Warrior projects (+67, Put some channel to hackint): https://wiki.archiveteam.org/?diff=57403&oldid=57396
17:07:01		b3nzo joins
17:07:01	<eggdrop>	[tell] b3nzo: [2025-09-16T20:12:15Z] <JAA> The log is in the meta WARC. Relatedly, if the job crashed, there won't be a meta WARC, so in that case, you should compress the wpull.log file and include that.
17:07:16		Wake quits [Quit: The Lounge - https://thelounge.chat]
17:24:56		Wake joins
17:31:16		Commander001 quits [Remote host closed the connection]
17:35:50		Commander001 joins
17:39:51	<@JAA>	Hmm, I don't see anything about Tistory on the wiki.
17:46:48		SootBector quits [Remote host closed the connection]
17:47:52		SootBector (SootBector) joins
18:10:57	<Hans5958>	By the way, it would be nice if I can get the list of IRC channels that haven't been abandoned
18:11:14	<Hans5958>	So I can manage stuff on Matrix and the wiki, if I got time around
18:12:05	<Hans5958>	<JAA> "Hmm, I don't see anything..." <- Re: Tistory, neither on the GitHub org (can't find anything reg. Tistory there)
18:12:31	<Hans5958>	That's where I source the Warrior projects
18:37:13	<@JAA>	Hans5958: There wasn't a project, but there's apparently a deadline approaching, see earlier discussion.
18:37:13		emanuele6 quits [Read error: Connection reset by peer]
18:38:01	<pokechu22>	Ryz has been running sites, but I'm not sure where the list comes from/how complete it is
18:38:15	<@JAA>	Reminder for everyone to please add such things to Deathwatch.
18:39:57	<Ryz>	As far as I know, Tistory isn't shutting down, it's just one coming from one of the many piles I've been wanting to run through but haven't had the time until I restumbled upon said pile~
18:40:25	<@JAA>	They're not shutting down, but they're purging inactive blogs, see above.
18:40:32	<Ryz>	...Oh :(
18:41:02	<@JAA>	Or well, they've always been doing that, but they're shortening the inactivity window, so a bunch of blogs will get purged in a few days.
18:42:42	<Ryz>	arkiver, JAA, regarding Tistory blogs, they're a bit sensitive from what I gather from archiving them, they are prone to 429s if maybe there's more than 1 job in the pipeline series
18:42:58	<Ryz>	Additionally, there are particular things that make it go forever if not stopped, like calendar stuff
18:44:01	<Ryz>	Might explain some of Tistory blogs I encountered when checking through my list that are 'new' when in fact they were there before but got purged or someone took the spot
18:44:45		HackMii quits [Ping timeout: 255 seconds]
18:46:34		HackMii (hacktheplanet) joins
19:05:59		wyatt8740 quits [Ping timeout: 260 seconds]
19:07:17	<Ryz>	I could give my list of Tistory stuff if there's going to be Tistory project of sorts coming up
19:07:25	<Ryz>	It's not huge but it's something~
19:08:43		wyatt8740 joins
19:08:59	<h2ibot>	Manu edited Distributed recursive crawls (+110, Add mchs.gov.ru): https://wiki.archiveteam.org/?diff=57404&oldid=57306
19:09:00	<h2ibot>	JustAnotherArchivist edited Typepad (+291, Add list of domains): https://wiki.archiveteam.org/?diff=57405&oldid=57379
19:10:40	<h2ibot>	Benizz edited List of websites excluded from the Wayback Machine (+22, Add emma.fr): https://wiki.archiveteam.org/?diff=57406&oldid=57281
19:14:41	<h2ibot>	Pokechu22 edited Mailman/2 (+91, /* Lost */…): https://wiki.archiveteam.org/?diff=57407&oldid=57103
19:15:04		lennier2_ joins
19:15:06		wyatt8740 quits [Ping timeout: 258 seconds]
19:15:30		IDK quits [Quit: Connection closed for inactivity]
19:17:47		lennier2 quits [Ping timeout: 258 seconds]
19:19:27		wyatt8740 joins
19:25:27		wyatt8740 quits [Ping timeout: 258 seconds]
19:30:52		Dada quits [Remote host closed the connection]
19:31:35		wyatt8740 joins
19:38:53		BornOn420 quits [Quit: Textual IRC Client: www.textualapp.com]
19:49:22		IDK (IDK) joins
19:51:05		BornOn420 (BornOn420) joins
19:54:22		BornOn420_ (BornOn420) joins
20:00:49		BornOn420_ quits [Ping timeout: 260 seconds]
20:03:57	<h2ibot>	Manu edited Discourse/archived (+115, Queued community.hedgedoc.org): https://wiki.archiveteam.org/?diff=57408&oldid=57340
20:42:00		Wohlstand (Wohlstand) joins
20:43:39		Guest quits [Quit: Guest]
20:45:53		milesw joins
21:07:45		etnguyen03 (etnguyen03) joins
21:23:04		Guest joins
21:26:35		Guest quits [Client Quit]
21:26:50		Guest joins
21:29:21		Guest quits [Client Quit]
21:32:10	<h2ibot>	JustAnotherArchivist changed the user rights of User:TriangleDemon (Too many edits with incorrect information…)
21:36:55		Guest joins
21:39:24		b3nzo quits [Ping timeout: 260 seconds]
21:58:42		Guest quits [Client Quit]
21:59:22		Guest joins
22:11:29		siinus quits [Ping timeout: 260 seconds]
22:23:29		etnguyen03 quits [Client Quit]
22:31:19		Wohlstand quits [Client Quit]
22:43:38		milesw1 joins
22:43:39		milesw quits [Read error: Connection reset by peer]
22:43:41		etnguyen03 (etnguyen03) joins
22:50:46		Guest quits [Client Quit]
22:50:56		Guest joins
22:57:12		Guest quits [Client Quit]
22:58:34		Guest joins
23:03:48		nicolas17 is now authenticated as nicolas17
23:08:29		etnguyen03 quits [Client Quit]
23:21:41		emanuele6 (emanuele6) joins
23:32:12	<lemuria>	So when a streamer I knew started unlisting their videos left and right, I downloaded them. They're no longer available. Is there a guide out there or just anything to take into consideration before I upload it to the internet archive? Have wanted to do so but my DMCA paranoia is stopping me
23:32:34	<lemuria>	I know how to upload and use the ia command somewhat, just need help with the moral/legal/ethical part of it
23:46:22		siinus (siinus) joins
23:54:38	<hexagonwin\|m>	Ryz: just curious, may i ask what you mean by "calendar stuff"?
23:54:58	<@OrIdow6>	lemuria: Honestly there is no guide that I know of, and you'll find different opinions on that, though this room leans heavily towards "upload everything"
23:55:10	<nicolas17>	the streamer would have to send a formal DMCA takedown request to internet archive to get you in trouble
23:55:31	<nicolas17>	you'll know better than us if they're likely to do that
23:55:48	<@OrIdow6>	As for the legal - yeah what nicolas17 said
23:57:28	<@OrIdow6>	I personally think that if it's a streamer with like, 5 viewers it shouldn't be uploaded but that's an extreme that probably doesn't apply to you
23:57:42	<lemuria>	she's in the 3K-4K follower range
23:58:07	<lemuria>	and I guess the main concern here is whether she'll ban me from the community. i'm a member of that streamer's community, been there for like, two to three years

Home Search Previous day Next day