#archiveteam-bs log for 2025-03-08

Home Search Previous day Next day

00:07:33		etnguyen03 quits [Client Quit]
00:22:09		sec^nd quits [Remote host closed the connection]
00:22:27		sec^nd (second) joins
00:29:06		Sluggs quits [Excess Flood]
00:38:04		Sluggs joins
00:44:46		Sluggs quits [Excess Flood]
00:49:27		Sluggs joins
00:57:37		etnguyen03 (etnguyen03) joins
01:00:27		BlueMaxima quits [Read error: Connection reset by peer]
01:00:37		BlueMaxima joins
01:06:01	<h2ibot>	PaulWise edited Finding subdomains (+0, more automated securitytrails scraper): https://wiki.archiveteam.org/?diff=54540&oldid=54538
01:25:12	<h2ibot>	PaulWise edited Finding subdomains (+0, even more automated securitytrails scraper): https://wiki.archiveteam.org/?diff=54541&oldid=54540
01:49:18		etnguyen03 quits [Client Quit]
01:51:57		etnguyen03 (etnguyen03) joins
02:04:18	<h2ibot>	PaulWise edited ArchiveBot/Monitoring (+64, free.fr rearchiving): https://wiki.archiveteam.org/?diff=54542&oldid=54537
02:12:19	<h2ibot>	PaulWise edited ArchiveBot/Monitoring (+51, Software related files: README): https://wiki.archiveteam.org/?diff=54543&oldid=54542
03:01:49		notarobot1 quits [Quit: The Lounge - https://thelounge.chat]
03:02:12		notarobot1 joins
03:20:11		katocala is now authenticated as katocala
03:26:01		BlueMaxima quits [Read error: Connection reset by peer]
04:12:17		Webuser782474 joins
04:15:03	<Flashfire42>	how far off is optane 10 being back?
04:48:13		BornOn420 quits [Ping timeout: 276 seconds]
05:04:55		Webuser935245 joins
05:05:40		gust quits [Read error: Connection reset by peer]
05:06:33		Webuser935245 quits [Client Quit]
05:09:58		Webuser782474 quits [Client Quit]
05:10:40		etnguyen03 quits [Remote host closed the connection]
05:25:46		night quits [Quit: goodbye]
05:27:38		lennier2_ quits [Ping timeout: 260 seconds]
05:28:46		lennier2_ joins
05:58:01		night joins
05:58:01		night is now authenticated as night
06:00:55	<h2ibot>	Sighsloth1090 edited UK Online Safety Act 2023 (+1260, Cataloged sites from ycombinator list and…): https://wiki.archiveteam.org/?diff=54544&oldid=54536
06:08:58		night quits [Client Quit]
06:14:24		night joins
06:14:24		night is now authenticated as night
06:26:59	<h2ibot>	Hook54321 edited Bluesky (+320, updated since it's a more established social…): https://wiki.archiveteam.org/?diff=54545&oldid=53732
06:46:28		BornOn420 (BornOn420) joins
06:48:29		i_have_n0_idea quits [Quit: The Lounge - https://thelounge.chat]
06:51:34		i_have_n0_idea (i_have_n0_idea) joins
07:51:38		lennier2 joins
07:54:38		lennier2_ quits [Ping timeout: 260 seconds]
08:02:59		DopefishJustin quits [Remote host closed the connection]
08:06:52		Wohlstand quits [Remote host closed the connection]
08:10:54		wyatt8740 quits [Ping timeout: 250 seconds]
08:13:02		wyatt8740 joins
08:31:22		Island quits [Read error: Connection reset by peer]
08:43:06		Island joins
08:44:26	<h2ibot>	Exorcism edited 竹白 (+40): https://wiki.archiveteam.org/?diff=54546&oldid=54515
09:19:18		Island quits [Read error: Connection reset by peer]
09:23:16		APOLLO03 quits [Ping timeout: 250 seconds]
10:01:30		BornOn420 quits [Remote host closed the connection]
10:02:11		BornOn420 (BornOn420) joins
10:36:12		DopefishJustin joins
10:36:12		DopefishJustin is now authenticated as DopefishJustin
10:37:09		DopefishJustin quits [Remote host closed the connection]
10:42:04		DopefishJustin joins
10:42:04		DopefishJustin is now authenticated as DopefishJustin
10:48:15		mete quits [Remote host closed the connection]
10:49:46	<h2ibot>	Exorcism edited 竹白 (+103): https://wiki.archiveteam.org/?diff=54547&oldid=54546
10:51:33		mete joins
11:16:47		pseudorizer quits [Quit: ZNC 1.9.1 - https://znc.in]
11:17:57		APOLLO03 joins
11:18:09		pseudorizer (pseudorizer) joins
12:00:02		Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat]
12:02:47		Bleo18260072271962345 joins
12:04:28		benjins3 quits [Ping timeout: 250 seconds]
12:12:42		i_have_n0_idea quits [Quit: The Lounge - https://thelounge.chat]
12:13:08		i_have_n0_idea (i_have_n0_idea) joins
12:26:41		simon816 quits [Quit: ZNC 1.9.1 - https://znc.in]
12:30:50		simon816 (simon816) joins
12:34:24		SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
12:34:56		SkilledAlpaca418962 joins
12:53:12		benjins3 joins
12:54:36		etnguyen03 (etnguyen03) joins
13:09:57		etnguyen03 quits [Client Quit]
13:17:00		vitzli (vitzli) joins
13:20:08		vitzli quits [Client Quit]
13:50:08		Webuser952334 joins
13:50:49		Webuser952334 quits [Client Quit]
14:53:09		penguaman joins
14:54:12		kdy quits [Remote host closed the connection]
14:54:46		Mist8kenGAS quits [Ping timeout: 250 seconds]
14:57:50		kdy (kdy) joins
15:08:36		Mist8kenGAS (Mist8kenGAS) joins
15:21:28		corentin quits [Ping timeout: 260 seconds]
15:25:16		corentin joins
15:28:26		notarobot1 quits [Quit: The Lounge - https://thelounge.chat]
15:29:38		notarobot1 joins
15:40:08		corentin quits [Ping timeout: 260 seconds]
15:42:40		corentin joins
16:04:03		corentin quits [Ping timeout: 260 seconds]
16:17:41	<h2ibot>	Exorcism edited 竹白 (+380, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54548&oldid=54547
16:21:42	<h2ibot>	Exorcism edited 竹白 (+0, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54549&oldid=54548
16:25:42	<h2ibot>	Exorcism edited 竹白 (+1, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54550&oldid=54549
16:27:43	<h2ibot>	Exorcism edited 竹白 (-1, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54551&oldid=54550
16:33:25		corentin joins
16:34:44	<h2ibot>	Exorcism edited 竹白 (+0, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54552&oldid=54551
16:38:44	<h2ibot>	Exorcism edited 竹白 (+0, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54553&oldid=54552
16:40:55		Dango360 quits [Read error: Connection reset by peer]
16:42:45	<h2ibot>	Exorcism edited 竹白 (+0, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54554&oldid=54553
16:43:45	<h2ibot>	Exorcism edited 竹白 (+0, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54555&oldid=54554
16:45:45	<h2ibot>	Exorcism edited 竹白 (+0, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54556&oldid=54555
16:47:54		Dango360 (Dango360) joins
16:57:47	<h2ibot>	Exorcism edited 竹白 (-179): https://wiki.archiveteam.org/?diff=54557&oldid=54556
17:15:50		Webuser464891 joins
17:15:53		Webuser464891 quits [Client Quit]
17:19:18		Mist8kenGAS quits [Ping timeout: 260 seconds]
17:19:42		Mist8kenGAS (Mist8kenGAS) joins
17:21:04	<yzqzss>	Exorcism: I don't think AB can handle zhubai
17:21:30	<yzqzss>	It's a simple but still a SPA
17:21:43	<yzqzss>	site
17:39:02	<Exorcism>	Yeah that's why i removed the archived list
17:50:19		@arkiver quits [Remote host closed the connection]
17:50:34		arkiver (arkiver) joins
17:50:34		@ChanServ sets mode: +o arkiver
17:54:56	<Ryz>	Any updates on tackling http://zapytaj.onet.pl/ ? I'm not sure if it's tossed in AB or something
17:58:36	<Exorcism>	Ryz: https://irc.digitaldragon.dev/uploads/fb797d126874feec/image.png
18:02:08		Webuser978503 joins
18:03:35	<Webuser978503>	test
18:09:28	<Webuser978503>	Hello! Plala, a Japanese web hosting service that has been around since the 90's, will be closing at the end of this month. I have been working on a private list of files to be included in my website.
18:09:28	<Webuser978503>	I will almost certainly not be able to register it with InternetArchive in time.
18:09:28	<Webuser978503>	I would like to request a special crawling service to provide a list of URLs for images, html, etc,
18:09:28	<Webuser978503>	Is there a place I can request this?
18:09:28	<Webuser978503>	I understand that this IRC is not an archive team.
18:09:28	<Webuser978503>	https://wiki.archiveteam.org/index.php/Frequently_Asked_Questions
18:09:28	<Webuser978503>	The official announcement that the Plala website will be closed and the URLs of the related news are as follows. All of them are in Japanese.
18:09:29	<Webuser978503>	https://www.docomo.ne.jp/info/notice/page/240627_01.html
18:09:29	<Webuser978503>	https://www.itmedia.co.jp/news/articles/2503/04/news125.html
18:10:47	<Ryz>	Oh no, another free web hosting website shutting down? Uh-oh :(
18:22:18		tzt quits [Ping timeout: 260 seconds]
18:23:37		tzt (tzt) joins
18:32:49	<Webuser978503>	It is very unfortunate that the hp service is closing. I am considering using the InternetArchive api to archive the url, but I am aware that this api can only handle a few URLs at a time. Is there any way to register using the spreadsheet as well, since it can only register in the thousands.
18:33:35	<pokechu22>	If you have a list of URLs, one url per line, we can do that via #archivebot. You can upload it to transfer.archivete.am
18:34:04	<pokechu22>	We can also do recursive crawls of sites using #archivebot
18:43:00	<Webuser978503>	I did not know about that web service, thank you! I just tried saving one url to a text file and uploading it to https://transfer.sh/ with curl, but Failed to connect to transfer.sh port 443 after 3448 ms: Could not connect to server. I will look into this some more.
18:43:44	<nicolas17>	what is transfer.sh? it doesn't seem to exist
18:44:00	<nicolas17>	as a domain
18:44:41	<nicolas17>	use transfer.archivete.am
18:49:34		ducky quits [Ping timeout: 260 seconds]
18:49:54	<pokechu22>	and you can just upload it via HTTP with the "click to browse" link
18:50:08	<pokechu22>	that's what I do most of the time
18:53:22	<Webuser978503>	Save a text file with one url, curl --upload-file . /hello-41611.txt https://transfer.archivete.am/hello-41611.txt succeeded. https://transfer.archivete.am/EieQn/hello-41611.txt
18:53:23	<eggdrop>	inline (for browser viewing): https://transfer.archivete.am/inline/hello-41611.txt https://transfer.archivete.am/inline/EieQn/hello-41611.txt
18:53:24	<Webuser978503>	The text file contains http://www9.plala.or.jp/applepig/tanpen/toraburu.html this one url. This URL is not registered in InternetArchive at this time. We hope this URL will be registered in InternetArchive in a few hours.
18:54:43	<pokechu22>	Unfortunately it tends to take a little longer for data from archivebot to reach web.archive.org - you should see it on https://archive.fart.website/archivebot/viewer/job/ksnju in a few hours and on web.archive.org after that
18:55:12	<pokechu22>	(but on the other hand, archivebot can do lists of thousands of URLs fairly easily, so the extra time for it to reach web.archive.org is a trade off)
18:56:13	<Webuser978503>	The scraping bot I am creating is fetching about 300,000 URLs in plala. I expect that these URLs will be displayed at https://web.archive.org/web/20140710060435/http://www9.plala.or.jp/~ in internetArchive.
19:01:47	<Webuser978503>	to pokechu22 , Thanks for letting me know! There was a lot I didn't know, so I'll do some research on my own again...the disappearance of HP space in the 1990's is a bad thing!
19:05:02	<@rewby>	Webuser978503: What do you mean with "scraper bot"?
19:08:46	<Webuser978503>	to rewby , I am using my own node.js program and wget to create URLs under the plala.or.jp domain. I already have about 10,000 URLs for www{\d}.plala.or.jp/[^/]+/, so I wget -r to each of those 10,000 in turn to get the image and html file URLs, and save them in my local sqlite.
19:09:28	<@rewby>	Webuser978503: The captures you're making that way will not appear in web.archive.org
19:09:41	<@rewby>	(Which your earlier message implied is your goal)
19:13:10	<Webuser978503>	rewby , Yes, that is correct. At this time the scraped data exists only on my PC and has no effect on web.archive.org. So I am looking for a way to get the huge list of URLs I have scraped into web.archive.org in a realistic amount of time.
19:13:17	<Webuser978503>	If it is only a few thousand, I can register it with api using spreadsheet, but I am struggling with the fact that it is not realistic to register more than 300,000 URLs in my estimation.
19:13:54	<@rewby>	Webuser978503: If you can feed us a list of urls in a .txt file with the format of "one url per line", we can easily get that done in a few days.
19:14:04	<@rewby>	300k urls is like, a morning of work
19:14:24	<@rewby>	As pokechu22 indicated, the bots we have can take care of that easily
19:15:26	<nicolas17>	transfer.archivete.am is not an automatic crawler, it's just a way for you to share the list with us
19:15:55	<@rewby>	Once our bots have grabbed the page content, it'll make it to web.archive.org. Might have a few days latency between capture and ingest, the IA's not the fastest system.
19:16:46	<@rewby>	Yep, transfer.archivete.am is just a file sharing utility. It doesn't do anything on its own.
19:17:16		gust joins
19:19:42	<pokechu22>	It's similar to pastebin.com or gist.github.com
19:21:10	<Webuser978503>	rewby , Thank you very much! I would like to send you a list of URLs as soon as possible, but where do I upload the file, it is 7z and about 12MB!
19:21:35	<@rewby>	As we said, transfer.archivete.am
19:21:46		egallager joins
19:21:53	<@rewby>	You upload a file to it, and it gives you a link you can paste in chat
19:22:07	<@rewby>	And someone will take the file and run with it
19:22:12	<@rewby>	No need to compress the file
19:22:40	<egallager>	What's the archiving status of 538? https://bsky.app/profile/gelliottmorris.com/post/3ljuzixmxak2p
19:24:20	<pokechu22>	We've got archivebot jobs running for it currently, and also did it last year (after it was read-only)
19:24:36	<egallager>	ok good
19:25:17		DopefishJustin quits [Remote host closed the connection]
19:25:22	<pokechu22>	that includes the data from https://data.fivethirtyeight.com/ and https://github.com/fivethirtyeight/data/ and also articles at https://fivethirtyeight.com/?archiveteam (which doesn't redirect as long as there's a query parameter it seems?)
19:25:24	<Webuser978503>	Understood. Uploaded by https://transfer.archivete.am/Lg6bN/export-url-list-result-html-only.txt Can I wait a few days and expect it to be registered at web.archive.org?
19:25:24	<eggdrop>	inline (for browser viewing): https://transfer.archivete.am/inline/Lg6bN/export-url-list-result-html-only.txt
19:25:56	<pokechu22>	(the articles can also be discovered from https://fivethirtyeight.com/sitemap.xml)
19:26:30	<pokechu22>	Webuser978503: Yes, it should end up on web.archive.org in a few days.
19:26:37	<nicolas17>	Webuser978503: how are you discovering the pages anyway?
19:26:37		ducky (ducky) joins
19:28:23	<Webuser978503>	pokechu22 , Thank you very much! Actually, the list of uploaded URLs is incomplete because I made a mistake with the wget command. I am trying to retrieve the complete list again, but it will take about a week. I will upload the URL list again at that time. Do I need to do the work of telling this IRC that I uploaded it and request it? Or is this
19:28:23	<Webuser978503>	completely automated?
19:28:27	<@rewby>	pokechu22, Webuser978503: There's some certified upload fuckery currently ongoing so expect more like a 7-14 days before the IA finishes processing. We're doing some traffic engineering and letting AB cache on disk for a bit.
19:28:52	<@rewby>	Webuser978503: Upload it again and send it here
19:29:34	<@rewby>	(The delay is due to us trying to work around the IA's upload system being quite... full...)
19:29:45	<Webuser978503>	to nicolas17 , plala's HP service has published a list of URLs it has hosted in the past. We retrieved that page from internetArchive and scraped it.
19:29:48	<pokechu22>	If you have a new list, you can share it here. I'm going to try to do a recursive job so that it will attempt to discover more URLs based on your current list though so hopefully everything will be discovered faster
19:30:09	<nicolas17>	well we can use that list alone, we do our own recursive crawling anyway
19:30:16	<nicolas17>	you don't need to do your own wget -r from it
19:31:05		gust quits [Remote host closed the connection]
19:31:07	<Webuser978503>	to rewby , We now understand that scraping takes 7-14 days. Hopefully the scraping will be done in time as plala will be shutting down at the end of this month!
19:31:19	<@rewby>	The scraping will likely be faster than that
19:31:24		gust joins
19:31:27	<@rewby>	But the upload to the IA will be slow
19:31:35	<pokechu22>	IA = wb.archive.org
19:31:42	<pokechu22>	(more specifically, IA = internet archive)
19:32:21	<@rewby>	To use an analogy, the capture/picture is taken quite quickly. But we need to then upload data to web.archive.org/develop the picture. And this can take some time. But it's fine if that takes time, since we already captured the moment.
19:37:03	<pokechu22>	Webuser978503: What about pages like http://academic3.plala.or.jp/uragaku/ ?
19:37:14	<Webuser978503>	reqby "Upload it again and send it here" , Perhaps a week from now, when the plala URL list has been created, we will upload the URL text to transfer.archivete.am and send it to this IRC. Do I just reply to rewby then?
19:37:21	<pokechu22>	and http://business3.plala.or.jp/aid/ (which redirects to https://www.asahikawainochi.or.jp/)
19:38:02	<@rewby>	Webuser978503: No need to ping/reply to me specifically.
19:38:10	<@rewby>	Just send it in here and someone will pick it up
19:39:44	<Webuser978503>	to pokechu22 , “http://academic3.plala.or.jp/uragaku/ “ I had no idea about this URL. I was shocked to learn that there is a URL that I forgot to check. Thanks for letting me know!
19:42:53	<Webuser978503>	to reqby , Understood. I have a more technical question: do you mean that after I register a list of URLs on transfer.archivete.am and post the published URLs to this IRC, someone else will register something in another system?
19:43:24	<pokechu22>	Yes
19:43:40	<pokechu22>	That is correct
19:44:23		SootBector quits [Remote host closed the connection]
19:44:46		SootBector (SootBector) joins
19:44:54		Island joins
19:46:11	<pokechu22>	I also see http://www.t-gesui.hs.plala.or.jp/gesuidoukouhoukatudounituite.html http://chb1018.hs.plala.or.jp/disaster/index.html and other stuff under hs.plala.or.jp
19:47:23	<Webuser978503>	to pokechu22 , Thank you very much. Sorry for asking such a complicated question. Is there any way to know if the url has completed registration to the scraping target or what percentage of scraping progress has been made? I checked https://archive.fart.website/archivebot/viewer/ but could not find it.
19:48:14	<pokechu22>	It will show up on https://archive.fart.website/archivebot/viewer/ eventually, but that also takes several hours
19:54:44		DopefishJustin joins
19:54:44		DopefishJustin is now authenticated as DopefishJustin
19:54:53		DopefishJustin quits [Remote host closed the connection]
19:54:55	<pokechu22>	Webuser978503: You can view download progress at http://archivebot.com/?initialFilter=plala
19:57:01	<pokechu22>	hmm, actually, that job seems to not have loaded all of the URLs properly
20:00:54	<Webuser978503>	oh,...
20:01:31	<pokechu22>	I should be able to fix it
20:09:41	<Webuser978503>	Thank you. Is there an appropriate place to talk about something like this now? Are there other sites that you guys are looking at? I don't use IRC on a regular basis, so it's hard for me to write multiple lines and when I reload my browser, the message disappears.
20:12:27	<pokechu22>	The new job is running: http://archivebot.com/?initialFilter=plala (will end up at https://archive.fart.website/archivebot/viewer/job/1dzgs and web.archive.org eventually)
20:13:06	<pokechu22>	This is the correct channel to talk about websites that are shutting down. We have mainly been looking at US government stuff recently
20:14:31	<Webuser978503>	Understood. The current turmoil in the U.S. has been reported in Japan.
20:16:17	<h2ibot>	Bzc6p edited Indafotó (+260, /* Status */ we may get all images): https://wiki.archiveteam.org/?diff=54558&oldid=54514
20:16:18	<h2ibot>	Pokechu22 edited Deathwatch (+202, /* 2025 */ plala.or.jp): https://wiki.archiveteam.org/?diff=54559&oldid=54539
20:18:53	<Webuser978503>	I'm looking at https://transfer.archivete.am/Yqlqx/www1.plala.or.jp_thru_www17.plala.or.jp_seed_urls.txt. If you look at the xml of sitemap, it registers every 40,000 URLs. I wonder if this is the list of URLs registered based on the list of URLs I uploaded? Thanks deeply.
20:18:54	<eggdrop>	inline (for browser viewing): https://transfer.archivete.am/inline/Yqlqx/www1.plala.or.jp_thru_www17.plala.or.jp_seed_urls.txt.
20:19:24	<pokechu22>	Yes, that is based on the URLs you uploaded; I changed it into an XML sitemap for technical reasons relating to how archivebot works
20:21:08	<nicolas17>	pokechu22: I'm interested in details, what's the difference between feeding archivebot a plain list and an XML sitemap?
20:22:59	<pokechu22>	If I do an XML sitemap like that, I can do an !a < list job with URLs all over the site without needing to deal with --no-parent breaking recursion (since --no-parent would only affect urls on transfer.archivete.am and not on plala.or.jp).
20:23:36	<pokechu22>	That does require creating a valid sitemap though (including converting & to & and apparently also splitting it at 50000 URLs, per https://www.sitemaps.org/protocol.html)
20:25:10	<pokechu22>	huh, apparently XML sitemaps have an implicit equivalent of --no-parent: https://www.sitemaps.org/protocol.html#location - I feel like a lot of sites don't handle that properly (and I don't think archivebot cares about that either)
20:26:38	<Webuser978503>	Thank you very much. I am relieved. I am interested in archiving my site, so I may ask you to register it on archivebot again. So, good-bye.
20:26:55	<pokechu22>	Thank you for letting us know :)
20:28:50		phiresky joins
20:32:04	<Vokun>	It's remarkable how formal a literal translation becomes from Japanese to english
20:32:35		driib9 quits [Quit: The Lounge - https://thelounge.chat]
20:34:36	<Vokun>	I took a few Japanese language classes in Highschool, and had some foreign exchange students over, both of us trying to learn from each other, and it seems classes in general teach a form of language that no one actually speaks. I spoke some form of extremely formal Japanese, and they spoke an extremely formal form of English
20:34:41	<phiresky>	Hey all! Just randomly found out that if you put in a phone number into instagram (plus and then a 4-10 digit number) it will show you random pictures of a certain category. The results do not correspond to account, hashtags, or description, so must be linked to some hidden (AI created probably) metadata. For example:
20:34:41	<phiresky>	+3626267274 is watches
20:34:41	<phiresky>	+1284662791 is maths
20:34:41	<phiresky>	+726267488176 is sports cars
20:34:41	<phiresky>	+6747266394749 is old women with big boobs
20:34:41	<phiresky>	might be interesting for heuristic scraping, and would be interesting to figure out why this happens. does not seem to work on desktop search, only mobile
20:34:55		driib9 (driib) joins
20:36:14		phiresky quits [Client Quit]
20:38:49	<pokechu22>	hmm, the archivebot job is currently only hitting http://www1.plala.or.jp/ and http://www10.plala.or.jp/, and http://www1.plala.or.jp/ seems to be running into errors sometimes. I'm going to shuffle the list and then start it one more time so that it does stuff on all 17 servers more evenly
20:45:15	<pokechu22>	the new job is https://archive.fart.website/archivebot/viewer/job/27rjj
21:03:08		etnguyen03 (etnguyen03) joins
21:04:40		Wohlstand (Wohlstand) joins
21:16:00		etnguyen03 quits [Client Quit]
21:19:13		BlueMaxima joins
21:29:14		itachi1706 quits [Quit: Bye :P]
21:34:50		itachi1706 (itachi1706) joins
21:37:38		etnguyen03 (etnguyen03) joins
21:59:49		etnguyen03 quits [Client Quit]
22:00:24		etnguyen03 (etnguyen03) joins
22:06:33		Webuser021842 joins
22:06:50		Webuser021842 quits [Client Quit]
22:23:13		corentin quits [Ping timeout: 260 seconds]
22:24:21		wb joins
22:26:34		corentin joins
22:27:40	<wb>	hello, can someone guide me how to find a file from a tindeck archives? https://archive.org/details/archiveteam_tindeck
22:28:15	<wb>	i know the original url but i do now know where to look for it in the archives
22:30:48		APOLLO03 quits [Ping timeout: 260 seconds]
22:31:31	<pokechu22>	For most projects you should just be able to plug the URL in like https://web.archive.org/web/*/https://example.com
22:40:48		Wohlstand quits [Client Quit]
22:42:13	<wb>	yes i tried that, but i cant access the file https://web.archive.org/web/20131106033836/http://tindeck.com/listen/llal
22:44:50	<nicolas17>	then it probably wasn't archived
22:46:19		etnguyen03 quits [Client Quit]
22:47:25	<nicolas17>	I just tried another song where /dl/ was archived
22:47:55	<nicolas17>	and it just says "Your download will start in 10 seconds. Having trouble? Try this direct link." but the direct link redirects back to /listen/
22:48:49	<nicolas17>	I'm suspicious... was any song actually saved here
22:50:14	<wb>	i did the same with some other url, and it actually started to download :\|
22:50:24		sediment joins
22:50:30	<wb>	but i dont know if i can rely on the web archive
22:51:10	<wb>	is there no complete index for a project that i could check?
22:51:58	<nicolas17>	https://web.archive.org/web//http://tindeck.com/dl/ :P
22:53:10	<pokechu22>	Hmm, that's only listed on archivebot and to archive.org crawls, not https://archive.org/details/archiveteam_tindeck
22:53:25	<pokechu22>	(based on the "about this capture" section)
22:53:26	<wb>	yea
22:54:00	<nicolas17>	hm maybe there's multiple captures and some are broken
22:54:02	<pokechu22>	And the archivebot capture got a 404 on it actually
22:54:55	<pokechu22>	ah, https://wiki.archiveteam.org/index.php/Tindeck says there were also false 404s when the site was broken so the archivebot job is incomplete :\|
22:56:03	<nicolas17>	I have had the same problem with files that got saved properly via archivebot, but then someone sent the same URL to savepagenow and that capture doesn't work
22:57:11		sediment quits [Client Quit]
23:11:59		loug83181422 quits [Quit: The Lounge - https://thelounge.chat]
23:36:47		sparky14928 (sparky1492) joins
23:40:24		sparky1492 quits [Ping timeout: 250 seconds]
23:40:25		sparky14928 is now known as sparky1492
23:50:05		APOLLO03 joins
23:54:13		Riku_V quits [Ping timeout: 260 seconds]
23:57:06		Riku_V (riku) joins
23:58:14	<pokechu22>	wb: I checked all of the CDX for https://archive.org/details/archiveteam_tindeck and /llal doesn't appear in it at all, so I guess it wasn't saved :/

Home Search Previous day Next day