00:07:33 | | etnguyen03 quits [Client Quit] |
00:22:09 | | sec^nd quits [Remote host closed the connection] |
00:22:27 | | sec^nd (second) joins |
00:29:06 | | Sluggs quits [Excess Flood] |
00:38:04 | | Sluggs joins |
00:44:46 | | Sluggs quits [Excess Flood] |
00:49:27 | | Sluggs joins |
00:57:37 | | etnguyen03 (etnguyen03) joins |
01:00:27 | | BlueMaxima quits [Read error: Connection reset by peer] |
01:00:37 | | BlueMaxima joins |
01:06:01 | <h2ibot> | PaulWise edited Finding subdomains (+0, more automated securitytrails scraper): https://wiki.archiveteam.org/?diff=54540&oldid=54538 |
01:25:12 | <h2ibot> | PaulWise edited Finding subdomains (+0, even more automated securitytrails scraper): https://wiki.archiveteam.org/?diff=54541&oldid=54540 |
01:49:18 | | etnguyen03 quits [Client Quit] |
01:51:57 | | etnguyen03 (etnguyen03) joins |
02:04:18 | <h2ibot> | PaulWise edited ArchiveBot/Monitoring (+64, free.fr rearchiving): https://wiki.archiveteam.org/?diff=54542&oldid=54537 |
02:12:19 | <h2ibot> | PaulWise edited ArchiveBot/Monitoring (+51, Software related files: README): https://wiki.archiveteam.org/?diff=54543&oldid=54542 |
03:01:49 | | notarobot1 quits [Quit: The Lounge - https://thelounge.chat] |
03:02:12 | | notarobot1 joins |
03:20:11 | | katocala is now authenticated as katocala |
03:26:01 | | BlueMaxima quits [Read error: Connection reset by peer] |
04:12:17 | | Webuser782474 joins |
04:15:03 | <Flashfire42> | how far off is optane 10 being back? |
04:48:13 | | BornOn420 quits [Ping timeout: 276 seconds] |
05:04:55 | | Webuser935245 joins |
05:05:40 | | gust quits [Read error: Connection reset by peer] |
05:06:33 | | Webuser935245 quits [Client Quit] |
05:09:58 | | Webuser782474 quits [Client Quit] |
05:10:40 | | etnguyen03 quits [Remote host closed the connection] |
05:25:46 | | night quits [Quit: goodbye] |
05:27:38 | | lennier2_ quits [Ping timeout: 260 seconds] |
05:28:46 | | lennier2_ joins |
05:58:01 | | night joins |
05:58:01 | | night is now authenticated as night |
06:00:55 | <h2ibot> | Sighsloth1090 edited UK Online Safety Act 2023 (+1260, Cataloged sites from ycombinator list and…): https://wiki.archiveteam.org/?diff=54544&oldid=54536 |
06:08:58 | | night quits [Client Quit] |
06:14:24 | | night joins |
06:14:24 | | night is now authenticated as night |
06:26:59 | <h2ibot> | Hook54321 edited Bluesky (+320, updated since it's a more established social…): https://wiki.archiveteam.org/?diff=54545&oldid=53732 |
06:46:28 | | BornOn420 (BornOn420) joins |
06:48:29 | | i_have_n0_idea quits [Quit: The Lounge - https://thelounge.chat] |
06:51:34 | | i_have_n0_idea (i_have_n0_idea) joins |
07:51:38 | | lennier2 joins |
07:54:38 | | lennier2_ quits [Ping timeout: 260 seconds] |
08:02:59 | | DopefishJustin quits [Remote host closed the connection] |
08:06:52 | | Wohlstand quits [Remote host closed the connection] |
08:10:54 | | wyatt8740 quits [Ping timeout: 250 seconds] |
08:13:02 | | wyatt8740 joins |
08:31:22 | | Island quits [Read error: Connection reset by peer] |
08:43:06 | | Island joins |
08:44:26 | <h2ibot> | Exorcism edited 竹白 (+40): https://wiki.archiveteam.org/?diff=54546&oldid=54515 |
09:19:18 | | Island quits [Read error: Connection reset by peer] |
09:23:16 | | APOLLO03 quits [Ping timeout: 250 seconds] |
10:01:30 | | BornOn420 quits [Remote host closed the connection] |
10:02:11 | | BornOn420 (BornOn420) joins |
10:36:12 | | DopefishJustin joins |
10:36:12 | | DopefishJustin is now authenticated as DopefishJustin |
10:37:09 | | DopefishJustin quits [Remote host closed the connection] |
10:42:04 | | DopefishJustin joins |
10:42:04 | | DopefishJustin is now authenticated as DopefishJustin |
10:48:15 | | mete quits [Remote host closed the connection] |
10:49:46 | <h2ibot> | Exorcism edited 竹白 (+103): https://wiki.archiveteam.org/?diff=54547&oldid=54546 |
10:51:33 | | mete joins |
11:16:47 | | pseudorizer quits [Quit: ZNC 1.9.1 - https://znc.in] |
11:17:57 | | APOLLO03 joins |
11:18:09 | | pseudorizer (pseudorizer) joins |
12:00:02 | | Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat] |
12:02:47 | | Bleo18260072271962345 joins |
12:04:28 | | benjins3 quits [Ping timeout: 250 seconds] |
12:12:42 | | i_have_n0_idea quits [Quit: The Lounge - https://thelounge.chat] |
12:13:08 | | i_have_n0_idea (i_have_n0_idea) joins |
12:26:41 | | simon816 quits [Quit: ZNC 1.9.1 - https://znc.in] |
12:30:50 | | simon816 (simon816) joins |
12:34:24 | | SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962] |
12:34:56 | | SkilledAlpaca418962 joins |
12:53:12 | | benjins3 joins |
12:54:36 | | etnguyen03 (etnguyen03) joins |
13:09:57 | | etnguyen03 quits [Client Quit] |
13:17:00 | | vitzli (vitzli) joins |
13:20:08 | | vitzli quits [Client Quit] |
13:50:08 | | Webuser952334 joins |
13:50:49 | | Webuser952334 quits [Client Quit] |
14:53:09 | | penguaman joins |
14:54:12 | | kdy quits [Remote host closed the connection] |
14:54:46 | | Mist8kenGAS quits [Ping timeout: 250 seconds] |
14:57:50 | | kdy (kdy) joins |
15:08:36 | | Mist8kenGAS (Mist8kenGAS) joins |
15:21:28 | | corentin quits [Ping timeout: 260 seconds] |
15:25:16 | | corentin joins |
15:28:26 | | notarobot1 quits [Quit: The Lounge - https://thelounge.chat] |
15:29:38 | | notarobot1 joins |
15:40:08 | | corentin quits [Ping timeout: 260 seconds] |
15:42:40 | | corentin joins |
16:04:03 | | corentin quits [Ping timeout: 260 seconds] |
16:17:41 | <h2ibot> | Exorcism edited 竹白 (+380, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54548&oldid=54547 |
16:21:42 | <h2ibot> | Exorcism edited 竹白 (+0, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54549&oldid=54548 |
16:25:42 | <h2ibot> | Exorcism edited 竹白 (+1, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54550&oldid=54549 |
16:27:43 | <h2ibot> | Exorcism edited 竹白 (-1, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54551&oldid=54550 |
16:33:25 | | corentin joins |
16:34:44 | <h2ibot> | Exorcism edited 竹白 (+0, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54552&oldid=54551 |
16:38:44 | <h2ibot> | Exorcism edited 竹白 (+0, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54553&oldid=54552 |
16:40:55 | | Dango360 quits [Read error: Connection reset by peer] |
16:42:45 | <h2ibot> | Exorcism edited 竹白 (+0, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54554&oldid=54553 |
16:43:45 | <h2ibot> | Exorcism edited 竹白 (+0, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54555&oldid=54554 |
16:45:45 | <h2ibot> | Exorcism edited 竹白 (+0, /* Special Subdomains */): https://wiki.archiveteam.org/?diff=54556&oldid=54555 |
16:47:54 | | Dango360 (Dango360) joins |
16:57:47 | <h2ibot> | Exorcism edited 竹白 (-179): https://wiki.archiveteam.org/?diff=54557&oldid=54556 |
17:15:50 | | Webuser464891 joins |
17:15:53 | | Webuser464891 quits [Client Quit] |
17:19:18 | | Mist8kenGAS quits [Ping timeout: 260 seconds] |
17:19:42 | | Mist8kenGAS (Mist8kenGAS) joins |
17:21:04 | <yzqzss> | Exorcism: I don't think AB can handle zhubai |
17:21:30 | <yzqzss> | It's a simple but still a SPA |
17:21:43 | <yzqzss> | site |
17:39:02 | <Exorcism> | Yeah that's why i removed the archived list |
17:50:19 | | @arkiver quits [Remote host closed the connection] |
17:50:34 | | arkiver (arkiver) joins |
17:50:34 | | @ChanServ sets mode: +o arkiver |
17:54:56 | <Ryz> | Any updates on tackling http://zapytaj.onet.pl/ ? I'm not sure if it's tossed in AB or something |
17:58:36 | <Exorcism> | Ryz: https://irc.digitaldragon.dev/uploads/fb797d126874feec/image.png |
18:02:08 | | Webuser978503 joins |
18:03:35 | <Webuser978503> | test |
18:09:28 | <Webuser978503> | Hello! Plala, a Japanese web hosting service that has been around since the 90's, will be closing at the end of this month. I have been working on a private list of files to be included in my website. |
18:09:28 | <Webuser978503> | I will almost certainly not be able to register it with InternetArchive in time. |
18:09:28 | <Webuser978503> | I would like to request a special crawling service to provide a list of URLs for images, html, etc, |
18:09:28 | <Webuser978503> | Is there a place I can request this? |
18:09:28 | <Webuser978503> | I understand that this IRC is not an archive team. |
18:09:28 | <Webuser978503> | https://wiki.archiveteam.org/index.php/Frequently_Asked_Questions |
18:09:28 | <Webuser978503> | The official announcement that the Plala website will be closed and the URLs of the related news are as follows. All of them are in Japanese. |
18:09:29 | <Webuser978503> | https://www.docomo.ne.jp/info/notice/page/240627_01.html |
18:09:29 | <Webuser978503> | https://www.itmedia.co.jp/news/articles/2503/04/news125.html |
18:10:47 | <Ryz> | Oh no, another free web hosting website shutting down? Uh-oh :( |
18:22:18 | | tzt quits [Ping timeout: 260 seconds] |
18:23:37 | | tzt (tzt) joins |
18:32:49 | <Webuser978503> | It is very unfortunate that the hp service is closing. I am considering using the InternetArchive api to archive the url, but I am aware that this api can only handle a few URLs at a time. Is there any way to register using the spreadsheet as well, since it can only register in the thousands. |
18:33:35 | <pokechu22> | If you have a list of URLs, one url per line, we can do that via #archivebot. You can upload it to transfer.archivete.am |
18:34:04 | <pokechu22> | We can also do recursive crawls of sites using #archivebot |
18:43:00 | <Webuser978503> | I did not know about that web service, thank you! I just tried saving one url to a text file and uploading it to https://transfer.sh/ with curl, but Failed to connect to transfer.sh port 443 after 3448 ms: Could not connect to server. I will look into this some more. |
18:43:44 | <nicolas17> | what is transfer.sh? it doesn't seem to exist |
18:44:00 | <nicolas17> | as a domain |
18:44:41 | <nicolas17> | use transfer.archivete.am |
18:49:34 | | ducky quits [Ping timeout: 260 seconds] |
18:49:54 | <pokechu22> | and you can just upload it via HTTP with the "click to browse" link |
18:50:08 | <pokechu22> | that's what I do most of the time |
18:53:22 | <Webuser978503> | Save a text file with one url, curl --upload-file . /hello-41611.txt https://transfer.archivete.am/hello-41611.txt succeeded. https://transfer.archivete.am/EieQn/hello-41611.txt |
18:53:23 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/hello-41611.txt https://transfer.archivete.am/inline/EieQn/hello-41611.txt |
18:53:24 | <Webuser978503> | The text file contains http://www9.plala.or.jp/applepig/tanpen/toraburu.html this one url. This URL is not registered in InternetArchive at this time. We hope this URL will be registered in InternetArchive in a few hours. |
18:54:43 | <pokechu22> | Unfortunately it tends to take a little longer for data from archivebot to reach web.archive.org - you should see it on https://archive.fart.website/archivebot/viewer/job/ksnju in a few hours and on web.archive.org after that |
18:55:12 | <pokechu22> | (but on the other hand, archivebot can do lists of thousands of URLs fairly easily, so the extra time for it to reach web.archive.org is a trade off) |
18:56:13 | <Webuser978503> | The scraping bot I am creating is fetching about 300,000 URLs in plala. I expect that these URLs will be displayed at https://web.archive.org/web/20140710060435/http://www9.plala.or.jp/~ in internetArchive. |
19:01:47 | <Webuser978503> | to pokechu22 , Thanks for letting me know! There was a lot I didn't know, so I'll do some research on my own again...the disappearance of HP space in the 1990's is a bad thing! |
19:05:02 | <@rewby> | Webuser978503: What do you mean with "scraper bot"? |
19:08:46 | <Webuser978503> | to rewby , I am using my own node.js program and wget to create URLs under the plala.or.jp domain. I already have about 10,000 URLs for www{\d}.plala.or.jp/[^/]+/, so I wget -r to each of those 10,000 in turn to get the image and html file URLs, and save them in my local sqlite. |
19:09:28 | <@rewby> | Webuser978503: The captures you're making that way will not appear in web.archive.org |
19:09:41 | <@rewby> | (Which your earlier message implied is your goal) |
19:13:10 | <Webuser978503> | rewby , Yes, that is correct. At this time the scraped data exists only on my PC and has no effect on web.archive.org. So I am looking for a way to get the huge list of URLs I have scraped into web.archive.org in a realistic amount of time. |
19:13:17 | <Webuser978503> | If it is only a few thousand, I can register it with api using spreadsheet, but I am struggling with the fact that it is not realistic to register more than 300,000 URLs in my estimation. |
19:13:54 | <@rewby> | Webuser978503: If you can feed us a list of urls in a .txt file with the format of "one url per line", we can easily get that done in a few days. |
19:14:04 | <@rewby> | 300k urls is like, a morning of work |
19:14:24 | <@rewby> | As pokechu22 indicated, the bots we have can take care of that easily |
19:15:26 | <nicolas17> | transfer.archivete.am is not an automatic crawler, it's just a way for you to share the list with us |
19:15:55 | <@rewby> | Once our bots have grabbed the page content, it'll make it to web.archive.org. Might have a few days latency between capture and ingest, the IA's not the fastest system. |
19:16:46 | <@rewby> | Yep, transfer.archivete.am is just a file sharing utility. It doesn't do anything on its own. |
19:17:16 | | gust joins |
19:19:42 | <pokechu22> | It's similar to pastebin.com or gist.github.com |
19:21:10 | <Webuser978503> | rewby , Thank you very much! I would like to send you a list of URLs as soon as possible, but where do I upload the file, it is 7z and about 12MB! |
19:21:35 | <@rewby> | As we said, transfer.archivete.am |
19:21:46 | | egallager joins |
19:21:53 | <@rewby> | You upload a file to it, and it gives you a link you can paste in chat |
19:22:07 | <@rewby> | And someone will take the file and run with it |
19:22:12 | <@rewby> | No need to compress the file |
19:22:40 | <egallager> | What's the archiving status of 538? https://bsky.app/profile/gelliottmorris.com/post/3ljuzixmxak2p |
19:24:20 | <pokechu22> | We've got archivebot jobs running for it currently, and also did it last year (after it was read-only) |
19:24:36 | <egallager> | ok good |
19:25:17 | | DopefishJustin quits [Remote host closed the connection] |
19:25:22 | <pokechu22> | that includes the data from https://data.fivethirtyeight.com/ and https://github.com/fivethirtyeight/data/ and also articles at https://fivethirtyeight.com/?archiveteam (which doesn't redirect as long as there's a query parameter it seems?) |
19:25:24 | <Webuser978503> | Understood. Uploaded by https://transfer.archivete.am/Lg6bN/export-url-list-result-html-only.txt Can I wait a few days and expect it to be registered at web.archive.org? |
19:25:24 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/Lg6bN/export-url-list-result-html-only.txt |
19:25:56 | <pokechu22> | (the articles can also be discovered from https://fivethirtyeight.com/sitemap.xml) |
19:26:30 | <pokechu22> | Webuser978503: Yes, it should end up on web.archive.org in a few days. |
19:26:37 | <nicolas17> | Webuser978503: how are you discovering the pages anyway? |
19:26:37 | | ducky (ducky) joins |
19:28:23 | <Webuser978503> | pokechu22 , Thank you very much! Actually, the list of uploaded URLs is incomplete because I made a mistake with the wget command. I am trying to retrieve the complete list again, but it will take about a week. I will upload the URL list again at that time. Do I need to do the work of telling this IRC that I uploaded it and request it? Or is this |
19:28:23 | <Webuser978503> | completely automated? |
19:28:27 | <@rewby> | pokechu22, Webuser978503: There's some certified upload fuckery currently ongoing so expect more like a 7-14 days before the IA finishes processing. We're doing some traffic engineering and letting AB cache on disk for a bit. |
19:28:52 | <@rewby> | Webuser978503: Upload it again and send it here |
19:29:34 | <@rewby> | (The delay is due to us trying to work around the IA's upload system being quite... full...) |
19:29:45 | <Webuser978503> | to nicolas17 , plala's HP service has published a list of URLs it has hosted in the past. We retrieved that page from internetArchive and scraped it. |
19:29:48 | <pokechu22> | If you have a new list, you can share it here. I'm going to try to do a recursive job so that it will attempt to discover more URLs based on your current list though so hopefully everything will be discovered faster |
19:30:09 | <nicolas17> | well we can use that list alone, we do our own recursive crawling anyway |
19:30:16 | <nicolas17> | you don't need to do your own wget -r from it |
19:31:05 | | gust quits [Remote host closed the connection] |
19:31:07 | <Webuser978503> | to rewby , We now understand that scraping takes 7-14 days. Hopefully the scraping will be done in time as plala will be shutting down at the end of this month! |
19:31:19 | <@rewby> | The scraping will likely be faster than that |
19:31:24 | | gust joins |
19:31:27 | <@rewby> | But the upload to the IA will be slow |
19:31:35 | <pokechu22> | IA = wb.archive.org |
19:31:42 | <pokechu22> | (more specifically, IA = internet archive) |
19:32:21 | <@rewby> | To use an analogy, the capture/picture is taken quite quickly. But we need to then upload data to web.archive.org/develop the picture. And this can take some time. But it's fine if that takes time, since we already captured the moment. |
19:37:03 | <pokechu22> | Webuser978503: What about pages like http://academic3.plala.or.jp/uragaku/ ? |
19:37:14 | <Webuser978503> | reqby "Upload it again and send it here" , Perhaps a week from now, when the plala URL list has been created, we will upload the URL text to transfer.archivete.am and send it to this IRC. Do I just reply to rewby then? |
19:37:21 | <pokechu22> | and http://business3.plala.or.jp/aid/ (which redirects to https://www.asahikawainochi.or.jp/) |
19:38:02 | <@rewby> | Webuser978503: No need to ping/reply to me specifically. |
19:38:10 | <@rewby> | Just send it in here and someone will pick it up |
19:39:44 | <Webuser978503> | to pokechu22 , “http://academic3.plala.or.jp/uragaku/ “ I had no idea about this URL. I was shocked to learn that there is a URL that I forgot to check. Thanks for letting me know! |
19:42:53 | <Webuser978503> | to reqby , Understood. I have a more technical question: do you mean that after I register a list of URLs on transfer.archivete.am and post the published URLs to this IRC, someone else will register something in another system? |
19:43:24 | <pokechu22> | Yes |
19:43:40 | <pokechu22> | That is correct |
19:44:23 | | SootBector quits [Remote host closed the connection] |
19:44:46 | | SootBector (SootBector) joins |
19:44:54 | | Island joins |
19:46:11 | <pokechu22> | I also see http://www.t-gesui.hs.plala.or.jp/gesuidoukouhoukatudounituite.html http://chb1018.hs.plala.or.jp/disaster/index.html and other stuff under hs.plala.or.jp |
19:47:23 | <Webuser978503> | to pokechu22 , Thank you very much. Sorry for asking such a complicated question. Is there any way to know if the url has completed registration to the scraping target or what percentage of scraping progress has been made? I checked https://archive.fart.website/archivebot/viewer/ but could not find it. |
19:48:14 | <pokechu22> | It will show up on https://archive.fart.website/archivebot/viewer/ eventually, but that also takes several hours |
19:54:44 | | DopefishJustin joins |
19:54:44 | | DopefishJustin is now authenticated as DopefishJustin |
19:54:53 | | DopefishJustin quits [Remote host closed the connection] |
19:54:55 | <pokechu22> | Webuser978503: You can view download progress at http://archivebot.com/?initialFilter=plala |
19:57:01 | <pokechu22> | hmm, actually, that job seems to not have loaded all of the URLs properly |
20:00:54 | <Webuser978503> | oh,... |
20:01:31 | <pokechu22> | I should be able to fix it |
20:09:41 | <Webuser978503> | Thank you. Is there an appropriate place to talk about something like this now? Are there other sites that you guys are looking at? I don't use IRC on a regular basis, so it's hard for me to write multiple lines and when I reload my browser, the message disappears. |
20:12:27 | <pokechu22> | The new job is running: http://archivebot.com/?initialFilter=plala (will end up at https://archive.fart.website/archivebot/viewer/job/1dzgs and web.archive.org eventually) |
20:13:06 | <pokechu22> | This is the correct channel to talk about websites that are shutting down. We have mainly been looking at US government stuff recently |
20:14:31 | <Webuser978503> | Understood. The current turmoil in the U.S. has been reported in Japan. |
20:16:17 | <h2ibot> | Bzc6p edited Indafotó (+260, /* Status */ we may get all images): https://wiki.archiveteam.org/?diff=54558&oldid=54514 |
20:16:18 | <h2ibot> | Pokechu22 edited Deathwatch (+202, /* 2025 */ plala.or.jp): https://wiki.archiveteam.org/?diff=54559&oldid=54539 |
20:18:53 | <Webuser978503> | I'm looking at https://transfer.archivete.am/Yqlqx/www1.plala.or.jp_thru_www17.plala.or.jp_seed_urls.txt. If you look at the xml of sitemap, it registers every 40,000 URLs. I wonder if this is the list of URLs registered based on the list of URLs I uploaded? Thanks deeply. |
20:18:54 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/Yqlqx/www1.plala.or.jp_thru_www17.plala.or.jp_seed_urls.txt. |
20:19:24 | <pokechu22> | Yes, that is based on the URLs you uploaded; I changed it into an XML sitemap for technical reasons relating to how archivebot works |
20:21:08 | <nicolas17> | pokechu22: I'm interested in details, what's the difference between feeding archivebot a plain list and an XML sitemap? |
20:22:59 | <pokechu22> | If I do an XML sitemap like that, I can do an !a < list job with URLs all over the site without needing to deal with --no-parent breaking recursion (since --no-parent would only affect urls on transfer.archivete.am and not on plala.or.jp). |
20:23:36 | <pokechu22> | That does require creating a valid sitemap though (including converting & to & and apparently also splitting it at 50000 URLs, per https://www.sitemaps.org/protocol.html) |
20:25:10 | <pokechu22> | huh, apparently XML sitemaps have an implicit equivalent of --no-parent: https://www.sitemaps.org/protocol.html#location - I feel like a lot of sites don't handle that properly (and I don't think archivebot cares about that either) |
20:26:38 | <Webuser978503> | Thank you very much. I am relieved. I am interested in archiving my site, so I may ask you to register it on archivebot again. So, good-bye. |
20:26:55 | <pokechu22> | Thank you for letting us know :) |
20:28:50 | | phiresky joins |
20:32:04 | <Vokun> | It's remarkable how formal a literal translation becomes from Japanese to english |
20:32:35 | | driib9 quits [Quit: The Lounge - https://thelounge.chat] |
20:34:36 | <Vokun> | I took a few Japanese language classes in Highschool, and had some foreign exchange students over, both of us trying to learn from each other, and it seems classes in general teach a form of language that no one actually speaks. I spoke some form of extremely formal Japanese, and they spoke an extremely formal form of English |
20:34:41 | <phiresky> | Hey all! Just randomly found out that if you put in a phone number into instagram (plus and then a 4-10 digit number) it will show you random pictures of a certain category. The results do not correspond to account, hashtags, or description, so must be linked to some hidden (AI created probably) metadata. For example: |
20:34:41 | <phiresky> | +3626267274 is watches |
20:34:41 | <phiresky> | +1284662791 is maths |
20:34:41 | <phiresky> | +726267488176 is sports cars |
20:34:41 | <phiresky> | +6747266394749 is old women with big boobs |
20:34:41 | <phiresky> | might be interesting for heuristic scraping, and would be interesting to figure out why this happens. does not seem to work on desktop search, only mobile |
20:34:55 | | driib9 (driib) joins |
20:36:14 | | phiresky quits [Client Quit] |
20:38:49 | <pokechu22> | hmm, the archivebot job is currently only hitting http://www1.plala.or.jp/ and http://www10.plala.or.jp/, and http://www1.plala.or.jp/ seems to be running into errors sometimes. I'm going to shuffle the list and then start it one more time so that it does stuff on all 17 servers more evenly |
20:45:15 | <pokechu22> | the new job is https://archive.fart.website/archivebot/viewer/job/27rjj |
21:03:08 | | etnguyen03 (etnguyen03) joins |
21:04:40 | | Wohlstand (Wohlstand) joins |
21:16:00 | | etnguyen03 quits [Client Quit] |
21:19:13 | | BlueMaxima joins |
21:29:14 | | itachi1706 quits [Quit: Bye :P] |
21:34:50 | | itachi1706 (itachi1706) joins |
21:37:38 | | etnguyen03 (etnguyen03) joins |
21:59:49 | | etnguyen03 quits [Client Quit] |
22:00:24 | | etnguyen03 (etnguyen03) joins |
22:06:33 | | Webuser021842 joins |
22:06:50 | | Webuser021842 quits [Client Quit] |
22:23:13 | | corentin quits [Ping timeout: 260 seconds] |
22:24:21 | | wb joins |
22:26:34 | | corentin joins |
22:27:40 | <wb> | hello, can someone guide me how to find a file from a tindeck archives? https://archive.org/details/archiveteam_tindeck |
22:28:15 | <wb> | i know the original url but i do now know where to look for it in the archives |
22:30:48 | | APOLLO03 quits [Ping timeout: 260 seconds] |
22:31:31 | <pokechu22> | For most projects you should just be able to plug the URL in like https://web.archive.org/web/*/https://example.com |
22:40:48 | | Wohlstand quits [Client Quit] |
22:42:13 | <wb> | yes i tried that, but i cant access the file https://web.archive.org/web/20131106033836/http://tindeck.com/listen/llal |
22:44:50 | <nicolas17> | then it probably wasn't archived |
22:46:19 | | etnguyen03 quits [Client Quit] |
22:47:25 | <nicolas17> | I just tried another song where /dl/ *was* archived |
22:47:55 | <nicolas17> | and it just says "Your download will start in 10 seconds. Having trouble? Try this direct link." but the direct link redirects back to /listen/ |
22:48:49 | <nicolas17> | I'm suspicious... was any song actually saved here |
22:50:14 | <wb> | i did the same with some other url, and it actually started to download :| |
22:50:24 | | sediment joins |
22:50:30 | <wb> | but i dont know if i can rely on the web archive |
22:51:10 | <wb> | is there no complete index for a project that i could check? |
22:51:58 | <nicolas17> | https://web.archive.org/web/*/http://tindeck.com/dl/* :P |
22:53:10 | <pokechu22> | Hmm, that's only listed on archivebot and to archive.org crawls, not https://archive.org/details/archiveteam_tindeck |
22:53:25 | <pokechu22> | (based on the "about this capture" section) |
22:53:26 | <wb> | yea |
22:54:00 | <nicolas17> | hm maybe there's multiple captures and *some* are broken |
22:54:02 | <pokechu22> | And the archivebot capture got a 404 on it actually |
22:54:55 | <pokechu22> | ah, https://wiki.archiveteam.org/index.php/Tindeck says there were also false 404s when the site was broken so the archivebot job is incomplete :| |
22:56:03 | <nicolas17> | I have had the same problem with files that got saved properly via archivebot, but then someone sent the same URL to savepagenow and that capture doesn't work |
22:57:11 | | sediment quits [Client Quit] |
23:11:59 | | loug83181422 quits [Quit: The Lounge - https://thelounge.chat] |
23:36:47 | | sparky14928 (sparky1492) joins |
23:40:24 | | sparky1492 quits [Ping timeout: 250 seconds] |
23:40:25 | | sparky14928 is now known as sparky1492 |
23:50:05 | | APOLLO03 joins |
23:54:13 | | Riku_V quits [Ping timeout: 260 seconds] |
23:57:06 | | Riku_V (riku) joins |
23:58:14 | <pokechu22> | wb: I checked all of the CDX for https://archive.org/details/archiveteam_tindeck and /llal doesn't appear in it at all, so I guess it wasn't saved :/ |