#archiveteam-bs log for 2023-06-29

Home Search Previous day Next day

00:06:55		AmAnd0A quits [Read error: Connection reset by peer]
00:07:14		AmAnd0A joins
00:08:30		AlsoHP_Archivist quits [Client Quit]
00:08:49		HP_Archivist (HP_Archivist) joins
00:14:13		AmAnd0A quits [Ping timeout: 258 seconds]
00:14:22		AmAnd0A joins
00:16:38		qwertyasdfuiopghjkl quits [Remote host closed the connection]
00:18:08		AmAnd0A quits [Read error: Connection reset by peer]
00:18:25		AmAnd0A joins
00:30:07	<@JAA>	Everything accessible on the Knowledge Adventure CDN and present as of my initial listing on 2023-06-14 or the relisting about 6 hours ago should now be archived.
00:30:12	<@JAA>	betamax, nicolas17: ^
00:43:03		lk quits [Ping timeout: 265 seconds]
00:54:30		dumbgoy joins
00:55:43		icedice quits [Client Quit]
00:56:32		dumbgoy quits [Client Quit]
00:57:23		dumbgoy joins
01:03:29		AmAnd0A quits [Ping timeout: 252 seconds]
01:03:49		AmAnd0A joins
01:55:08		AmAnd0A quits [Read error: Connection reset by peer]
01:55:25		AmAnd0A joins
02:22:08		killsushi quits [Ping timeout: 265 seconds]
03:19:10		dumbgoy quits [Ping timeout: 265 seconds]
03:51:38	<h2ibot>	FireonLive edited Current Projects (-10, move Tiki to recently finished): https://wiki.archiveteam.org/?diff=50050&oldid=50046
04:22:28		TastyWiener95 quits [Quit: So long, farewell, auf wiedersehen, good night]
04:30:02		HP_Archivist quits [Read error: Connection reset by peer]
04:30:44		TastyWiener95 (TastyWiener95) joins
04:56:17		IDK quits [Client Quit]
05:09:55		hitgrr8 joins
05:26:56	<fireonlive>	Visa to Acquire Pismo for US$ 1 billion in cash: https://www.pismo.io/blog/visa-to-acquire-pismo/
05:29:04	<fireonlive>	"Pismo will retain our founders and current management team. The transaction is subject to regulatory approvals and other customary closing conditions and is expected to close by the end of 2023.", website probably not super in danger i guess
05:40:50		AmAnd0A quits [Remote host closed the connection]
05:41:03		AmAnd0A joins
05:41:16		BlueMaxima quits [Client Quit]
06:35:08		nicolas17 quits [Ping timeout: 252 seconds]
06:47:26		bf_ joins
07:01:20		flashfire42\|m joins
07:55:04		Arcorann (Arcorann) joins
08:05:13		IDK (IDK) joins
08:11:05	<flashfire42\|m>	Is there any way to monitor the offload of the targets? I think someone was saying a few were getting full or close to
08:18:42		Doomahol1 quits [Read error: Connection reset by peer]
08:19:32		Doomahol1 joins
08:34:11		second (second) joins
08:37:56		sec^nd quits [Ping timeout: 245 seconds]
08:37:56		second is now known as sec^nd
08:46:34	<imer>	flashfire42\|m: nope, ideally targets run at near-full anyways to apply backpressure - if they were empty that just means IA can accept more data and we're archiving too slow ;)
08:47:48	<flashfire42\|m>	Heh I mean yeah but there are some projects currently paused because we were grabbing too much data for IA to keep up
08:49:30	<imer>	yeah. not quite sure what the status there is. someone else would have to chime in what is going to happen there, if anything
08:50:05	<imer>	could be a matter of waiting it out until things slow down naturally or there might be improvements on the IA/AT side so things can go faster
08:50:48	<imer>	a lot of data though, so all not easy I can imagine
09:19:27	<masterx244\|m>	IA is a common bottleneck, the S3 upload "loading bays" are the bottleneck pretty often. AT can suckle data out faster than they can be ingested there
10:00:01		railen63 quits [Remote host closed the connection]
10:00:17		railen63 joins
10:01:47		SF quits [Ping timeout: 265 seconds]
10:03:57		sec^nd quits [Remote host closed the connection]
10:05:20		sec^nd (second) joins
10:14:12		SF joins
10:50:36	<betamax>	JAA: that's amazing, thanks so much!
10:51:20	<betamax>	Would you be able to share your relisting from a day or so ago? My friend is working with others to reverse engineer the server for the game and having the full file listing would be very helpful
10:54:57		Chris5010 quits [Ping timeout: 265 seconds]
12:07:31		jacksonchen666 quits [Ping timeout: 245 seconds]
12:07:57		jacksonchen666 (jacksonchen666) joins
12:10:57		justmolamola joins
12:14:04		justmolamola quits [Client Quit]
12:20:12		sonick quits [Client Quit]
12:23:16		justmolamola joins
12:31:59		justmolamola quits [Client Quit]
12:32:58		justmolamola joins
12:35:16		W7RFa6AbNFz quits [Read error: Connection reset by peer]
12:35:39		W7RFa6AbNFz joins
12:48:25	<h2ibot>	OrIdow6 edited Egloos (+649, Account of the grab): https://wiki.archiveteam.org/?diff=50051&oldid=50043
12:49:41		AmAnd0A quits [Ping timeout: 252 seconds]
12:50:17		AmAnd0A joins
12:57:09	<@OrIdow6>	No reply from Wysp.ws
13:05:06		nulldata quits [Ping timeout: 258 seconds]
13:09:23		Chris5010 (Chris5010) joins
13:14:28		froschgrosch joins
13:15:21	<Hans5958>	Are there archives of the leaderboards for past projects?
13:21:06		justmolamola quits [Client Quit]
13:26:30	<Chris5010>	If you know the project name, you can use that in the normal tracker URL: https://tracker.archiveteam.org/[projectName]/. For example, the project for Enjin is done, but the leaderbord is still accessible: https://tracker.archiveteam.org/enjin/
13:46:10		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
13:49:40		Dango360_ (Dango360) joins
13:53:29		Dango360 quits [Ping timeout: 252 seconds]
13:54:31		froschgrosch quits [Remote host closed the connection]
13:56:20		Megame quits [Client Quit]
14:05:04		MrRadar_ (MrRadar) joins
14:06:08		MrRadar quits [Ping timeout: 252 seconds]
14:31:20		HP_Archivist (HP_Archivist) joins
14:32:47	<h2ibot>	Yts98 edited LINE BLOG (+139, Add link to data): https://wiki.archiveteam.org/?diff=50052&oldid=49955
15:01:50	<Hans5958>	Where is the repo to (at least the front end of) tracker.archiveteam.org?
15:03:41		BigBrain_ (bigbrain) joins
15:04:36		BigBrain quits [Ping timeout: 245 seconds]
15:09:33		Dango360_ quits [Client Quit]
15:09:37	<pokechu22>	https://github.com/ArchiveTeam/universal-tracker I think?
15:09:43		Dango360 (Dango360) joins
15:11:35		Arcorann quits [Ping timeout: 252 seconds]
15:11:38		dumbgoy joins
15:13:43	<Hans5958>	Really? Probably want to contribute some code but looks "dead"
15:16:56	<h2ibot>	Manu edited Deathwatch (+261, Stitcher will shut down end of August): https://wiki.archiveteam.org/?diff=50053&oldid=50047
15:17:00		dumbgoy_ joins
15:17:56	<h2ibot>	Noxian edited Tumblr (+0, /* See also */ latest version of TumblThree): https://wiki.archiveteam.org/?diff=50054&oldid=49141
15:17:57	<h2ibot>	Hans5958 edited Egloos (-12, Little bit of rewording): https://wiki.archiveteam.org/?diff=50055&oldid=50051
15:17:58	<h2ibot>	Exorcism edited Tiki (+23): https://wiki.archiveteam.org/?diff=50056&oldid=50049
15:17:59	<h2ibot>	Exorcism uploaded File:Tiki logo.png: https://wiki.archiveteam.org/?title=File%3ATiki%20logo.png
15:18:57	<h2ibot>	Exorcism edited Deathwatch (+0): https://wiki.archiveteam.org/?diff=50058&oldid=50053
15:19:36		sec^nd quits [Ping timeout: 245 seconds]
15:20:23		dumbgoy quits [Ping timeout: 252 seconds]
15:22:02		Hackerpcs quits [Ping timeout: 252 seconds]
15:22:11		froschgrosch joins
15:23:10		froschgrosch quits [Remote host closed the connection]
15:23:39		Hackerpcs (Hackerpcs) joins
15:41:40	<@arkiver>	egloos, tiki, and lineblog project are done!
15:42:15	<@arkiver>	tracker front page is becoming less busy :P
15:43:15	<yts98>	arkiver: great! now I want to propose a warrior project for Xuite :p https://github.com/yts98/xuite-grab
15:44:13	<fireonlive>	i read that as xtube which is both incorrect and also long gone (and already done) :c
15:44:24	<threedeeitguy>	tiki was fun. my first top 10 finish :D
15:47:26	<fireonlive>	haha yeah first where i was near the top :p
15:49:03	<h2ibot>	Yts98 edited Current Projects (+0, Move LINE BLOG to recently finished): https://wiki.archiveteam.org/?diff=50059&oldid=50050
15:50:19	<rktk>	Just wanted to throw this out as a forum to archive: https://memoriesoffear.jcink.net
15:50:40	<rktk>	They did a number of translated games, one namely Toilet in Wonderland (which Vinny Vinesauce played on stream)
15:50:42	<fireonlive>	Hans5958: looks like that's the one yeah
15:50:45	<rktk>	https://memoriesoffear.jcink.net/index.php?showtopic=56
15:53:04	<h2ibot>	Yts98 edited LINE BLOG (+1, Finish the project): https://wiki.archiveteam.org/?diff=50060&oldid=50052
15:53:09	<fireonlive>	i imagine everyone is quite busy with a lot of other things (including things outside of archiveteam) so it's not high priority as other stuff
15:54:02	<fireonlive>	yts98: :D
15:54:17	<rktk>	fireonlive, do you mean that forum I linked sorry, or replying to someone else
15:54:32	<fireonlive>	rktk: oh sorry, replying to Hans5958
15:54:32	<rktk>	If there is a recommended way of scraping a forum like that, I have no issue to do it myself
15:54:40	<rktk>	ah ok fireonlive :)
15:54:42	<fireonlive>	:)
15:54:58	<fireonlive>	regarding the https://tracker.archiveteam.org codebase
16:09:45	<pokechu22>	rktk: Probably archivebot, but it's fairly full currently. That one should be pretty easy to run though since it's small
16:10:04	<rktk>	pokechu22, could I run an archivebot myself locally?
16:10:08	<rktk>	or should I just do an wget mirror
16:10:24	<@arkiver>	yts98: why JSObj?
16:10:47	<pokechu22>	ArchiveBot isn't designed to be run locally, https://github.com/ArchiveTeam/grab-site is the more usable equivalent
16:10:56	<pokechu22>	There's also a forum-dl project or something like that that might be usable
16:11:23	<yts98>	arkiver: to deal with JS objects embedded in the HTML.
16:11:32	<pokechu22>	wget's also fine, but wouldn't end up on web.archive.org (though anything a random person does probably wouldn't end up there)
16:11:51	<pokechu22>	Looks like they also have mediafire links so those will need to be put into #mediaonfire
16:12:25	<yts98>	I found simply replacing single quotes with double quotes may still cause errors
16:13:18	<@arkiver>	yts98: on the item types, can you please make then a bit more descriptive?
16:14:00	<pokechu22>	Looks like there's actually a lot of forums under jcink.net, so that's something to check later
16:15:12	<@arkiver>	yts98: looks pretty good!
16:17:07	<rktk>	pokechu22, yeah this is just a random personal grab. and i could save to warc, mainly just as a means of throwing it on archive as an object, rather than web archive
16:17:19	<rktk>	pokechu22, yeah definitely something worth looking at
16:18:39	<yts98>	I chose very short item type names because the wiki said "Because the Tracker uses Redis as its database, memory usage is a concern."
16:18:42	<@arkiver>	let's make a channel for xuite! i'm not sure if this word has a meaning, perhaps we can have a play on words in the language of this word
16:18:57	<@arkiver>	yts98: ah. well lists are mostly offloaded, so not a huge concern now
16:19:35	<yts98>	arkiver: watch this video.
16:19:35	<yts98>	https://vlog.xuite.net/play/Qm9leW9BLTEzODg4Ni5mbHY=
16:19:43	<threedeeitguy>	There's small website that I wish to regularly save a few pages for (usually 1-2 pages a day). The prompt to save the page would be an email notification from said site. I already have extracting the link sorted. Is there an API equivalent of https://web.archive.org/save ? Saving the page is fairly time critical as once items are sold the page is
16:19:43	<threedeeitguy>	updated and information is removed.
16:20:03	<pokechu22>	rktk: I've started an archivebot job anyways, shouldn't take too long
16:20:06	<yts98>	Xuite's slogan is "My Xuite, So Sweet~"
16:20:11	<rktk>	hurray! pokechu22
16:20:15	<@arkiver>	yts98: i see some stuff there like TODOs on handling malformed JSON responses
16:20:33	<rktk>	someone should save digitalfaq before all the scam evidence is wiped away
16:20:34	<pokechu22>	threedeeitguy: Pretty sure web.archive.org/save can be treated as an API endpoint, I remember seeing some docs on that, one sec
16:20:41	<pokechu22>	digitalfaq?
16:21:19	<@arkiver>	those malformed responses should be caught in write_to_warc, then not be written to WARC, and either be marked for retrying to retrieve, or the item should be aborted. or in rare cases no write to WARC and let it continue as usual if this is an 'error' that is fine
16:21:19	<yts98>	arkiver: their API sometimes mix cp950 with utf8
16:21:26	<pokechu22>	https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit
16:21:46	<@arkiver>	right, i see. so the error is on our side, not on theirs?
16:21:52	<@arkiver>	yts98: ^
16:21:52	<rktk>	pokechu22, digitalfaq.com
16:22:22	<pokechu22>	What's the deal with scam evidence?
16:22:51	<pokechu22>	Looks like it was previously saved August 2022: https://archive.fart.website/archivebot/viewer/job/4ialw
16:23:00		sec^nd (second) joins
16:23:05	<pokechu22>	err, no, those are small enough that saving it probably failed
16:23:38	<yts98>	arkiver: yes. the error is caused in JSON.lua.
16:25:13	<threedeeitguy>	pokechu22 thanks, il take a look. It may not be suitable anyway. I just tried a page and its far from clean: https://web.archive.org/web/20230629161553/https://www.stationroadsteam.com/3-12-inch-gauge-union-pacific-big-boy-4-8-8-4-stock-code-11379/#
16:25:57	<@arkiver>	yts98: i see there is still a change of 'bad data' getting into the WARC, for example I see a check on json["ok"] get_urls. at this point the data is already in the WARC, which it should be if there is an indication of an error
16:26:07	<@JAA>	betamax: Yeah, everything will be on IA once the upload finishes.
16:26:29	<@arkiver>	so this json["ok"] check should be in write_to_warc, and then again either retried or items aborted (or accepted in rare cases) if the error is there
16:26:43	<@arkiver>	there may be other checks in get_urls that should move to write_to_warc
16:26:44		Matthww1 quits [Ping timeout: 258 seconds]
16:27:36		Matthww1 joins
16:29:10	<yts98>	arkiver: json["ok"] being false is not rare. It happens when an article is protected by the password, or an user did not activate one of the blog, album, or vlog service.
16:29:24	<@arkiver>	alright good
16:30:15	<yts98>	and then I saw thousands of usernames discovered, but the API will respond with "no such user".
16:32:10	<yts98>	their username search API even returns illegal usernames, possibly manually altered by the moderator to deactivate some accounts
16:33:10	<@arkiver>	interesting
16:33:11	<@arkiver>	so
16:33:12	<@arkiver>	on images
16:33:20	<@arkiver>	photo.xuite.net, and such
16:33:52	<@arkiver>	can different items get to the same images? can they be duplicated between items? i see they are now generally always accepted for immediate archiving
16:35:26		BigBrain_ quits [Ping timeout: 245 seconds]
16:36:46	<yts98>	I sent some image URLs in API responses of user item, but some of these images belong to an album, so the current script will grab them twice or more.
16:37:27		BigBrain_ (bigbrain) joins
16:37:32	<@arkiver>	are the URLs for a single image unique?
16:38:11	<@arkiver>	as in, is it always 3.example.com/image.png, or can there also be 2.example.com/image.png, 3.example.com/image?format=png, etc.?
16:40:16	<@arkiver>	I see the TODO about false positives. yes, this may produce false positives. but archiving is usually done with the thought of "better discovery too much than too little". so if we are sure everything will be discovered with very strict rules, then that is fine
16:40:52	<yts98>	for photo.xuite.net, the image URLs are unique;
16:40:52	<yts98>	when images embedded in blog articles, the service possibly generates another URL that accepts outlinks
16:41:04	<@arkiver>	but it is often good to keep the rules somewhat relaxed, allow for a possibility of false positives. eliminate these false positives if we find them. and that way perhaps extract/archive more than we initially were under the impression was actually there
16:41:31		lk (lk) joins
16:41:36	<@arkiver>	yts98: "another URL that accepts outlinks" - for an image? what do you mean?
16:44:03		BigBrain_ quits [Read error: Connection reset by peer]
16:44:29	<@arkiver>	yts98: on the video URLs and load balancing. can video URLs to the same video be found in different items? as in, can there be duplicates? (same as what i asked for the photos)
16:45:17	<@arkiver>	if the a certain video will _only_ be discovered from a single item, then good! and then let's get whatever load balancers they use, Wget-AT will prevent writing duplicate data, while still preserving the URLs.
16:46:06	<@arkiver>	there will only be duplicate data downloaded on the side of the Warrior, but this extra data will be deduplicated away when written to the WARC. if xuite can handle it, then it's good to get this duplicate data.
16:46:40	<@arkiver>	because this is not only about purely data preservation, but also about URL preservation. we want to try and cover the entire range of possible URLs, so that those can be found through the Wayback Machine.
16:47:54	<@arkiver>	so. let's say we have 1.example.com/image.png and 2.example.com/image.png both pointing to the same image. we download them _in the same Wget-AT session_, then they will be deduplicated, while both their URLs are preserved (yes, data will be downloaded twice)
16:48:48	<@arkiver>	if we have separate items for those two URLs to the same image, then it is likely that those separate items end up in different Wget-AT session, and are not deduplicated, which wastes bytes
16:49:27	<@arkiver>	if we're talking about 1 TB or so of duplicated data, that is not a big problem. but if it turns into 10 TB or 100 TB of duplicated data, that is a problem
16:51:25	<@arkiver>	yts98: i see you store data in _data.txt, what is the use of this. we're actually not really using data.txt anymore. in the past data.txt was used to discover items, but nowadays we use backfeed for that.
16:51:41	<@arkiver>	there is nothing on the targets currently that will do anything with the _data.txt file.
16:52:03	<yts98>	I did not remember I saw image URL formats other than 1.share.photo.xuite.net in which article.
16:52:03	<yts98>	Separating images to new items is a reasonable approach. Let's handle them like cdn-obs in lineblog.
16:52:03	<yts98>	Video URLs may also be checked in user items. But they may expire if we backfeed them as item.
16:52:03	<yts98>	I thought warc revisit can only be used on the same URL. So warc revisit applies to different URLs when the response body is identical.
16:52:45	<@arkiver>	yes, on the response body being identical
16:53:36	<@arkiver>	i see on expiring video URLs. are the video URLs you get through a user item actually used for playback? or are they "just there" in some data blob, while actually only the video URL on the post page is used for playback?
16:54:43	<@arkiver>	on FlashVars rules - those are not known yet?
16:55:40		joe joins
16:56:17		IDK quits [Client Quit]
16:57:51	<@arkiver>	yts98: well overall looks pretty good, i'll be further checking this later!
16:59:13	<yts98>	the purpose of data.txt is to inspect the metadata not included in item names, including blog_id and every <embed>.
16:59:13	<yts98>	I've discovered 5 types of FlashVars rules https://wiki.archiveteam.org/index.php/Xuite#Flash-based_creations , but I'm not sure if I missed more.
17:00:22	<yts98>	arkiver: thanks for taking a look! I learned very much about archiving practices :)
17:00:37	<@arkiver>	good to hear :)
17:00:54	<@arkiver>	alright i'm not sure yet about data.txt, will be having a better look later!
17:01:37	<@arkiver>	(i only actually looked at the code - not the site yet)
17:07:38	<yts98>	a possible alternative to data.txt is to create a dummy backfeed that does not actually backfeed the items into the project.
17:08:19	<@arkiver>	that sounds better yes
17:08:36	<@arkiver>	but i'm not sure if we actually need it, need to do some experiments as well
17:09:02		joe quits [Remote host closed the connection]
17:09:04	<@arkiver>	if there is something unexpected, can item be simply aborted?
17:09:55	<@arkiver>	i see for example that when an a: item is queued, it is always written to the data.txt as well, that is not needed i think?
17:20:14		killsushi joins
17:22:49	<fireonlive>	gettyimages acquired unsplash earlier in 2021: https://unsplash.com/blog/unsplash-getty/ and looks like they’re jumping on the “oh fuck AI is going to ruin us” bandwagon way too late https://twitter.com/sindresorhus/status/1674390882399801345
17:23:12	<fireonlive>	not sure what he means by “removed their free non-API endpoint” though
17:26:25	<@arkiver>	yts98: i see very explicity extraction of certain URLs, also from the HTML, line 1096 for example. i think this is already handled by the 'general' URLs extraction happening at line 1966? if not, that might be a better place
17:27:07	<@JAA>	Next AT project: archive everything that has a free API.
17:27:20	<@arkiver>	this is again coming from the point of "better extract too much than too little" - if we only allow extraction of very specific URLs in very specific places, there is a great risk of missing something.
17:27:51	<@arkiver>	hmm
17:28:12	<@arkiver>	or, is this being extracted specifically here to have the certain referer be different than the current URLs we're working on?
17:28:41	<@arkiver>	in which case it would be good. later it'd be picked up in the 'general' extraction code, but not queued since it was queued before
17:28:52	<@arkiver>	current URL*
18:17:31	<fireonlive>	JAA: yeeeeah :\|
18:17:52	<fireonlive>	🙃 🔫
18:18:23	<fireonlive>	they said AI/ML would destroy the internet
18:18:36	<fireonlive>	i just didn't think it would be in this way
18:28:21		sec^nd quits [Ping timeout: 245 seconds]
18:29:05		sec^nd (second) joins
18:36:39		sec^nd quits [Remote host closed the connection]
18:37:01		sec^nd (second) joins
18:47:23		nicolas17 joins
18:49:41		spirit quits [Client Quit]
18:58:51		nulldata joins
19:38:17		spirit joins
19:42:16		bf_ quits [Ping timeout: 265 seconds]
20:23:57		eroc1990 quits [Quit: The Lounge - https://thelounge.chat]
20:30:41	<pokechu22>	tinaja.com looks kinda big so I'm not going to put it into archivebot until we have a little bit more space
20:30:59	<@arkiver>	let's see
20:31:11	<@arkiver>	interesting site
20:31:38	<that_lurker>	seems to have a lot of pdf's so might be big
20:32:23	<@arkiver>	pokechu22: shall we put it in archivebot anyway?
20:32:32	<vokunal\|m>	I was about to ask what you look for to determine whether it looks big or not. At first glance I figured it looks like it's from the 90s, so small
20:33:05	<pokechu22>	Currently all the AB pipelines are full because hel3/hel4 are low on disk space because of the general upload backlog to my understanding
20:34:17	<pokechu22>	Probably we could still queue it though
20:36:59		thenes quits [Remote host closed the connection]
20:37:20		thenes (thenes) joins
20:39:16	<that_lurker>	actually those pdf's are not that big so might be something like 50 - 60 gigs at tops
20:40:25	<that_lurker>	could be good to queue it as you can just pause it in the event that there is no space right?
20:41:21	<pokechu22>	Alright, queued it
20:41:47	<pokechu22>	It'll auto-pause when there's no space (< 5 GB I think)
20:42:46	<that_lurker>	LUL was already started aparently :P
20:43:14		eroc1990 (eroc1990) joins
20:48:21		sec^nd quits [Ping timeout: 245 seconds]
20:51:24		sec^nd (second) joins
21:00:39	<@arkiver>	pokechu22: general upload backlog to where?
21:00:47	<@arkiver>	is IA the bottle neck?
21:01:24	<pokechu22>	I think so?
21:01:31	<pokechu22>	JAA talked more about it I think
21:01:52	<pokechu22>	main thing is that if you look at http://archivebot.com/pipelines most machines are full
21:01:57	<@arkiver>	we need an "ArchiveBot talk" channel
21:02:20	<@JAA>	arkiver: #down-the-tube and AB used the same rsync target. The former clogged it.
21:02:31	<@arkiver>	ah
21:02:42	<@arkiver>	JAA: how about that archivebot talk channel?
21:02:44	<@JAA>	That comes up every few months or so. It'd be mostly a dead channel probably.
21:03:09	<@arkiver>	i usually miss messages someone posts to me in #archivebot
21:03:10	<@arkiver>	oh well
21:03:25	<@arkiver>	warning to all ^ if I need to really notice the message, don't write to me in #archivebot
21:03:33	<@JAA>	Make your client log highlights into a separate window. :-)
21:03:55	<pokechu22>	Relevant messages are on 03:47:37 UTC on June 29
21:04:23	<vokunal\|m>	This is what I've been using to check. Is this known as a good way to see if they're clogged? https://monitor.archive.org/weathermap/weathermap.html
21:04:57	<pokechu22>	I don't think the rsync targets would be on there as they're archiveteam infrastructure, but I'm not 100% sure of that
21:05:14	<vokunal\|m>	the switchtc0-200paul has been in the red for around 30+ hours
21:05:17	<that_lurker>	JAA That is the one thing from znc I would like to have on thelounge
21:05:26	<@arkiver>	JAA: that would be something i need to figure out and not doing that now
21:05:30	<fireonlive>	did someone said that archive.org had an issue with (or intentionally?) limited inbound speed?
21:05:38	<@arkiver>	vokunal\|m: no, there can be many reasons
21:05:41	<fireonlive>	that was oof a while ago though
21:08:07	<pokechu22>	Oh, it was also mentioned that https://yarus.ru/ was shutting down shortly per https://yarus.ru/post/1989728469 - there's an AB job for it, but there's basically no chance it'll finish completely :\|
21:09:52		hitgrr8 quits [Client Quit]
21:10:55	<pokechu22>	ugh, it looks like that site's also JS-based so AB's not going to get anything useful :\| (and I think I pushed it too hard and am now getting 403s :\|)
21:12:36	<that_lurker>	no wonder google translate did not work on it :P
21:15:41	<vokunal\|m>	Yeah I was wondering why it wasn't working
21:18:44	<that_lurker>	Oh and just found out The Lounge has a recent mentions feature
21:19:07	<that_lurker>	thats convenient
21:19:27	<fireonlive>	indeed! the @ symbol
21:24:53	<@arkiver>	pokechu22: checking
21:25:28	<@arkiver>	pokechu22: are you planning to pull tinaja.com through AB later?
21:25:46	<pokechu22>	It turns out it was already running in AB since yesterday
21:26:06	<@arkiver>	oof just seeing yarus in my browser with that loading screen... oof oof
21:26:39	<@arkiver>	what
21:26:41	<@arkiver>	June 30?
21:26:49	<@arkiver>	not again
21:26:50	<pokechu22>	Several hours ago it was 18 hours
21:26:57	<pokechu22>	frankly I think it's not possible to get it done
21:27:01	<pokechu22>	It does have a complete sitemap though
21:27:23	<@arkiver>	they posted the message you linked today?
21:27:28	<@arkiver>	for a shutdown tomorrow?
21:28:09	<pokechu22>	nyuuzyou: ^
21:28:17	<pokechu22>	It seems like that's the case though
21:28:26	<@arkiver>	rewby: are you around?
21:28:36	<@arkiver>	i'm not sure if we can get a project up in time
21:28:45	<@arkiver>	but we might need a target for a shutdown tomorrow... announced today :(
21:29:07	<@rewby\|backup>	I'll get you a target if you get a tracker proj and vars in... 30 mins
21:29:14	<@arkiver>	woah sequential post IDs?
21:29:15	<@arkiver>	i like it
21:29:16	<pokechu22>	"У вас будет время сохранить весь свой контент" - "You will have time to save all your content." yeah, sure...
21:29:33	<pokechu22>	Sequential IDs and a full sitemap as far as I can tell
21:29:44	<pokechu22>	but on the other hand, javascript
21:29:57	<@JAA>	<dr_evil_air_quotes.gif>
21:30:02	<@arkiver>	i'm always skeptical about sitemaps
21:30:43	<@arkiver>	rewby\|backup: alright
21:32:54	<imer>	they seem to have a rate limit (on api. at least), returns a standard nginx 403
21:32:54	<imer>	and now that's changed to another 403 page
21:34:20	<@arkiver>	imer: proper status code?
21:34:27	<imer>	yep
21:34:30	<imer>	403
21:34:39	<@arkiver>	good
21:34:51	<imer>	here's the content of the non-nginx 403: https://transfer.archivete.am/hyaCY/2023-06-29_23-34-40_wmbgyH3GLo.txt
21:34:57	<imer>	i've censored my ip with XXX
21:35:57	<pokechu22>	Archivebot is still getting 403s a while after con=6, d=0 (that wasn't using the API and in fact wasn't even trying to retrieve stuff from the API, though)
21:36:08	<fireonlive>	ok everyone gather around for a picture
21:36:17	<fireonlive>	an api actually used a proper http status code
21:36:18	<imer>	block doesnt seem to be shared across domains, but obviously the site wont work
21:36:20	<fireonlive>	we need to remember this moment
21:37:37	<@arkiver>	interesting
21:37:42	<imer>	i'll keep checking if I get unblocked
21:37:43	<@arkiver>	IDs sequential with a huge sudden gap
21:38:07	<imer>	response headers: https://transfer.archivete.am/mTei1/2023-06-29_23-37-36_Qp3eqSS4hN.png content-type is proper as well
21:38:35	<imer>	no ipv6 (why do I even bother checking this)
21:39:17	<fireonlive>	one day you'll be rewarded
21:39:21	<fireonlive>	it's like finding a rare coin
21:39:30	<fireonlive>	the toyota yarus, https://en.wikipedia.org/wiki/Toyota_Yaris
21:39:32	<fireonlive>	lol
21:40:33	<imer>	do we have a channel name yet? i'll throw into the hat #norus if not
21:40:52	<fireonlive>	nop
21:40:55	<imer>	words i can arrange sentence to
21:40:56	<fireonlive>	mine was #yaaaaaaaaasus but that's kinda gay
21:40:57	<fireonlive>	:p
21:41:01	<@arkiver>	imer: see what i wrote earlier ;)
21:41:02	<fireonlive>	also not punny enough
21:41:19	<@arkiver>	#norus it is
21:41:31	<fireonlive>	arkiver: you were in the tiki channel
21:41:32	<fireonlive>	:D
21:42:00	<@arkiver>	HEY EVERYONE! JAA is not in #norus , let's party there. no one tell JAA please!!
21:54:10	<h2ibot>	JustAnotherArchivist created ЯRUS (+194, Created page with "{{Infobox project \| URL =…): https://wiki.archiveteam.org/?title=%D0%AFRUS
21:55:11	<h2ibot>	JustAnotherArchivist created Yarus.ru (+19, Redirected page to [[ЯRUS]]): https://wiki.archiveteam.org/?title=Yarus.ru
21:57:36		killsushi quits [Ping timeout: 265 seconds]
22:04:12	<h2ibot>	Pcr edited List of websites excluded from the Wayback Machine (+26, Add TH3D): https://wiki.archiveteam.org/?diff=50063&oldid=49985
22:07:17	<fireonlive>	:D
22:10:39	<thuban>	arkiver: wrt noise in #archivebot, if you use weechat, there are some filters at https://wiki.archiveteam.org/index.php/User:Switchnode
22:50:23		Unholy236131 (Unholy2361) joins
22:53:35		Unholy23613 quits [Ping timeout: 252 seconds]
22:53:35		Unholy236131 is now known as Unholy23613
23:09:18		andrew4 (andrew) joins
23:10:38		andrew quits [Ping timeout: 252 seconds]
23:10:38		andrew4 is now known as andrew
23:52:21		Megame (Megame) joins

Home Search Previous day Next day