00:02:07 | | etnguyen03 (etnguyen03) joins |
00:09:55 | | loug83181422 quits [Quit: The Lounge - https://thelounge.chat] |
00:19:25 | <aeg> | has anyone archived the splits.io database? |
00:23:42 | <pabs> | "We will continue with our original shutdown plans. Splits.io's last day online will be March 31, 2025." |
00:24:44 | <aeg> | sure, but has anyone archived the database? |
00:26:59 | | pixel (pixel) joins |
00:29:30 | <pabs> | doesn't seem publicly available? |
00:33:39 | <aeg> | presumably it would need to be scraped through the public api |
00:34:27 | <pabs> | nothing in AB yet https://archive.fart.website/archivebot/viewer/?q=splits.io |
00:35:29 | <pabs> | if you have a URL list, we can run it there. |
00:35:44 | <pabs> | its also in Deathwatch, so someone might look nearer to the deadline, not sure if they would do the API too |
00:37:29 | <aeg> | doesn't archivebot archive only websites, crawler-style? i don't think that would be the right way to archive splits.io, it is a database backed site |
00:39:18 | <pabs> | it can do either crawls, or individual URLs, or lists of URLs (for eg every user in an API), or crawl a list of sites |
00:40:06 | <pabs> | if the database is public, we can download it with AB. if the data needs API calls, we can do those calls with the URL list option |
00:40:43 | <pabs> | I don't know enough about the site to figure out how to save the data tho |
00:44:10 | <aeg> | i don't know how many users splits.io has. speedrun.com has 2.3 million users; let's say splits.io has somewhere between 100k and 2 million. would archivebot take a list of that many urls? |
00:45:04 | <aeg> | each user would have numerous runs, but when i poked at the api a few weeks ago, it looked like the user endpoint returned all the associated run data |
00:48:26 | <pabs> | yeah, 100-200k is easy |
00:48:39 | <pokechu22> | It depends on if they have any rate-limiting |
00:48:45 | <pabs> | 2mil should be fine too |
00:49:05 | <pabs> | yeah, especially with the deadline |
00:53:20 | | lennier2 quits [Client Quit] |
00:57:51 | | lennier1 (lennier1) joins |
01:01:28 | | th3z0l4_ joins |
01:03:32 | | th3z0l4 quits [Ping timeout: 250 seconds] |
01:06:33 | <aeg> | so then archivebot eventually uploads a warc to archive.org? how do you do quality control on the scrape results? |
01:07:36 | <pabs> | we watch the job for errors etc on http://archivebot.com/ |
01:07:59 | <pabs> | and via other monitoring https://wiki.archiveteam.org/index.php/ArchiveBot/Monitoring |
01:08:16 | <pabs> | and yeah, everything ends up in web.archive.o.rg |
01:08:39 | <pabs> | and the warcs are on archive.org |
01:09:30 | <pabs> | folks can then extract data from either the WBM/CDX, or download/parse the warcs https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem |
01:10:22 | <pabs> | https://archive.fart.website/archivebot/viewer/ is handy for finding the warc files for a job |
01:11:02 | | gust quits [Read error: Connection reset by peer] |
01:11:03 | <aeg> | what's the turn time from job initiation to warc availability? |
01:11:44 | <nicolas17> | unknown |
01:11:53 | <pokechu22> | Generally less than a day right now |
01:11:57 | <pabs> | usually a few days though, depending on the data size |
01:12:40 | <pabs> | (longer when IA ingestion is slow/stuck) |
01:12:50 | <nicolas17> | this reminds me... |
01:13:12 | <pokechu22> | I generally download pagination locally and then save the list of URLs I downloaded, as well as page contents from that pagination, via archivebot, rather than generating a list of pagination URLs, saving them, and then downloading the WARC and generating a list from that |
01:13:16 | <pokechu22> | but it depends on the site |
01:14:15 | <pabs> | I'd !a < the pagination URLs :) lazy route |
01:14:27 | <pabs> | not always possible though |
01:14:46 | <pabs> | btw pokechu22 did you document that sitemap trick somewhere? it definitely works? |
01:14:49 | <pokechu22> | Yeah, that's better if it works (but --no-parent and archivebot not liking to extract stuff from json makes that a problem) |
01:14:56 | <pokechu22> | It definitely works, haven't really documented it anywhere though |
01:15:15 | <nicolas17> | URLs that I saved via archivebot on nov 3, nov 5, nov 7, and nov 11 (2024) are *still* not up on IA, I guess the outage backlog is still being uploaded super slowly? |
01:15:29 | <pabs> | which parent gets used for the sitemap trick? |
01:15:48 | <pokechu22> | I think that specific backlog is still on separate storage and hasn't been getting uploaded by the normal means, but I'm not sure of the details |
01:16:22 | <pokechu22> | I'm not sure which parent gets used (it might be the sitemap itself on transfer.archivete.am), but if you have e.g. https://example.com in the list and example.com URLs in the sitemap it's fine |
01:16:59 | <pokechu22> | things would break if you pointed to an existing sitemap on the site that's in a subdirectory (which apparently isn't supposed to cover things not in that subdirectory according to sitemaps.org, but that doesn't stop sites from doing it anyways...) |
01:17:06 | <aeg> | splits.io has a code repository on github also. how would that get archived? |
01:17:18 | <pokechu22> | We can do github stuff in #gitgud |
01:18:18 | | notarobot1 quits [Quit: The Lounge - https://thelounge.chat] |
01:18:57 | | notarobot1 joins |
01:30:17 | <h2ibot> | PaulWise edited ArchiveBot (+474, add more archiving suggestions): https://wiki.archiveteam.org/?diff=55077&oldid=54522 |
01:32:17 | <h2ibot> | PaulWise edited Software Heritage (+50, add bulk repos feature): https://wiki.archiveteam.org/?diff=55078&oldid=50949 |
01:34:12 | <pabs> | pokechu22: the tips section might be a place to put the sitemap trick https://wiki.archiveteam.org/index.php/ArchiveBot#Usage_tips |
01:34:58 | <pabs> | aeg: once you have a list of URLs, put it in a descriptively named file like splits.io-all-user-api-urls.txt and upload it to https://transfer.archivete.am/ |
01:36:26 | | egallager quits [Client Quit] |
01:39:58 | | etnguyen03 quits [Client Quit] |
01:42:35 | | kuroger quits [Quit: ZNC 1.9.1 - https://znc.in] |
01:46:03 | | chains joins |
01:49:02 | | kuroger (kuroger) joins |
01:52:40 | <pabs> | that_lurker: on the channels from this page that you operate, could you add the page to /topic (#hetzner-firehose for eg) https://wiki.archiveteam.org/index.php/Archiveteam:IRC/Relay |
02:06:04 | | BlueMaxima quits [Read error: Connection reset by peer] |
02:09:22 | | BennyOtt_ joins |
02:09:22 | | chains quits [Client Quit] |
02:09:24 | | BennyOtt quits [Ping timeout: 250 seconds] |
02:10:23 | | BennyOtt_ is now known as BennyOtt |
02:10:29 | | BennyOtt is now authenticated as BennyOtt |
02:11:02 | <aeg> | splits.io also has a twitter (https://twitter.com/splitsio). can that get archived? |
02:11:02 | <eggdrop> | nitter: https://nitter.net/splitsio |
02:14:19 | | etnguyen03 (etnguyen03) joins |
02:26:08 | | notarobot1 quits [Client Quit] |
02:27:25 | | notarobot1 joins |
02:33:20 | | egallager joins |
02:41:04 | <nicolas17> | I think we haven't been able to archive twitter since elon took over |
02:41:14 | <nicolas17> | and added the login requirement |
02:42:17 | | notarobot1 quits [Client Quit] |
02:43:31 | | notarobot1 joins |
02:52:44 | | notarobot1 quits [Client Quit] |
02:54:01 | | notarobot1 joins |
02:57:12 | | etnguyen03 quits [Remote host closed the connection] |
03:06:26 | | lennier1 quits [Ping timeout: 260 seconds] |
03:13:25 | | lennier1 (lennier1) joins |
03:22:36 | <h2ibot> | Vitzli edited Radio Free Europe (+310, /* Estimated size */ Add missing @rferlonline): https://wiki.archiveteam.org/?diff=55079&oldid=55076 |
03:26:40 | | lennier2 joins |
03:30:21 | | lennier1 quits [Ping timeout: 260 seconds] |
03:33:48 | | YooperKirks quits [Quit: Ooops, wrong browser tab.] |
03:37:04 | <DigitalDragons> | No kind of wireguard/etc is acceptable, right? Or is it more specific to public VPN services |
03:56:11 | <TheTechRobo> | DigitalDragons: It wasn't an official stance, but |
03:56:12 | <TheTechRobo> | [#warrior] <@J.AA> TheTechRobo: My view: if you control both ends, and if all worker traffic to the internet (including DNS) exits at the same place, and if that other place's internet connection is clean, and if that other place is exclusively used by you, it should be fine (unless I forgot about another condition). |
03:56:18 | <TheTechRobo> | DigitalDragons: It wasn't an official stance, but |
03:56:18 | <TheTechRobo> | [#warrior] <@J.AA> But it's easy to get the configuration wrong, so I still wouldn't recommend it. |
03:56:30 | <TheTechRobo> | ...thank you, The Lounge |
03:56:39 | <DigitalDragons> | The Lounge-- |
03:56:41 | <eggdrop> | [karma] 'The Lounge' now has -65 karma! |
03:57:08 | <DigitalDragons> | thanks :) |
03:59:24 | <pabs> | aeg: add to https://pad.notkiska.pw/p/archivebot-twitter |
03:59:40 | <pabs> | (mentioned on https://wiki.archiveteam.org/index.php/Twitter) |
04:00:04 | <pabs> | nicolas17: ^ |
04:00:58 | | dendory quits [Quit: The Lounge - https://thelounge.chat] |
04:01:23 | | dendory (dendory) joins |
04:01:25 | | egallager quits [Client Quit] |
04:04:40 | | StarletCharlotte joins |
04:06:38 | <StarletCharlotte> | Hey, are there any programs which can extract files from .WARC files similarly to 7z? |
04:06:48 | <StarletCharlotte> | Preferably ones which support Windows? |
04:08:02 | <StarletCharlotte> | A friend of mine is trying to open a bunch of .warc.gz files from Archive Team and as it currently stands it looks as if they are close to unusable for the average person when not trying to pull one file at a time slowly through replayweb.page (if that even decides to work). |
04:08:41 | <StarletCharlotte> | Which it usually doesn't |
04:10:07 | <StarletCharlotte> | I'm seeing a lot of tools to put things into WARCs and convert to WARC, but not a lot to actually get things out of WARCs efficiently. |
04:13:18 | <StarletCharlotte> | Preferably with a GUI apparently |
04:13:40 | <nicolas17> | hm can't the 7-Zip app literally open warcs? |
04:13:56 | <StarletCharlotte> | Not that I've heard. Where'd you hear that? |
04:15:26 | <aeg> | pabs: i added it to the list. but is anyone actually grabbing those? the last date i see for anything done is 2024-06 |
04:16:24 | <StarletCharlotte> | @nicolas17: Are you talking about this? https://www.tc4shell.com/en/7zip/edecoder/ |
04:17:16 | <pabs> | aeg: not at the moment, we don't have a way to do it. Barto was thinking about setting up an archiving-only Nitter instance, but IIRC it would require registering lots of accounts, so a lot of ongoing work |
04:17:18 | <nicolas17> | some people dealing with the "teraleak" TestFlight warcs said 7zip worked but maybe they had some plugin installed; I never tried it myself |
04:31:06 | | StarletCharlotte quits [Ping timeout: 250 seconds] |
04:35:10 | <aeg> | may/should i create a page on archiveteam wiki (or somewhere) to document archival of splits.io? |
04:36:05 | | StarletCharlotte joins |
04:36:34 | <Ryz> | Not sure if it's been mentioned, but Twitter support was removed from https://socialblade.com/ on 2025 Mar 14 - https://twitter.com/SocialBlade/status/1900589770671071518 |
04:36:34 | <eggdrop> | nitter: https://nitter.net/SocialBlade/status/1900589770671071518 |
04:37:01 | | Webuser603791 quits [Quit: Ooops, wrong browser tab.] |
04:42:07 | | Shevrolet joins |
04:42:11 | | chains joins |
04:42:51 | | Shevrolet quits [Client Quit] |
04:50:59 | <StarletCharlotte> | @nicolas17: I see... I've been wondering, can those WARCs even be found anywhere anymore? I don't have much from the "Teraleak" (hate that name, sensationalist garbage) anymore. |
04:51:22 | <StarletCharlotte> | I at least saved what was relevant for Omniarchive (Minecraft archival group), but that's about it. |
04:51:31 | <nicolas17> | I saved the torrents https://data.nicolas17.xyz/testflight-torrents/ |
04:51:36 | <nicolas17> | idk if anyone is still seeding them |
04:51:38 | <nicolas17> | haven't checked in a while |
04:52:11 | <StarletCharlotte> | I hope so. |
04:52:52 | <h2ibot> | Vitzli edited Voice of America (+1716, /* Youtube channels */ Add estimated sizes for…): https://wiki.archiveteam.org/?diff=55080&oldid=54985 |
04:53:01 | | Webuser724271 joins |
04:54:44 | <StarletCharlotte> | Seeds "0 (2)" and peers "0 (1)", whatever that means |
05:02:10 | | chains quits [Client Quit] |
05:11:46 | | sparky14920 (sparky1492) joins |
05:15:18 | | sparky1492 quits [Ping timeout: 250 seconds] |
05:15:19 | | sparky14920 is now known as sparky1492 |
05:33:40 | | flotwig quits [Quit: ZNC - http://znc.in] |
05:37:18 | | flotwig joins |
06:00:49 | <that_lurker> | pabs: Sure. Adding and info page about those relay channels has been on my todo :-) |
06:02:06 | <that_lurker> | Some of the channels still have eggdrop as op so I cannot change the titles in them |
06:04:19 | <that_lurker> | s/Some/Most |
06:08:38 | <that_lurker> | s/titles/topics |
06:09:05 | | Island quits [Read error: Connection reset by peer] |
06:14:02 | | sec^nd quits [Remote host closed the connection] |
06:14:16 | | sec^nd (second) joins |
06:19:15 | | Wohlstand (Wohlstand) joins |
06:44:31 | | egallager joins |
06:52:32 | | PredatorIWD25 quits [Read error: Connection reset by peer] |
07:21:13 | | ahm2587 quits [Quit: The Lounge - https://thelounge.chat] |
07:21:33 | | ahm2587 joins |
07:34:11 | | StarletCharlotte quits [Ping timeout: 260 seconds] |
07:48:49 | | StarletCharlotte joins |
08:14:01 | | PredatorIWD25 joins |
08:17:56 | | BearFortress quits [Ping timeout: 260 seconds] |
08:21:01 | | Grzesiek11_ joins |
08:21:01 | | Grzesiek11 quits [Read error: Connection reset by peer] |
08:25:09 | | HP_Archivist quits [Read error: Connection reset by peer] |
08:25:31 | | HP_Archivist (HP_Archivist) joins |
08:40:56 | | egallager quits [Client Quit] |
08:47:07 | | BearFortress joins |
10:22:42 | | BornOn420 quits [Remote host closed the connection] |
10:23:16 | | BornOn420 (BornOn420) joins |
11:00:01 | | Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat] |
11:02:48 | | Bleo18260072271962345 joins |
11:28:47 | | nine quits [Quit: See ya!] |
11:29:00 | | nine joins |
11:29:00 | | nine is now authenticated as nine |
11:29:00 | | nine quits [Changing host] |
11:29:00 | | nine (nine) joins |
11:30:58 | | th3z0l4_ quits [Read error: Connection reset by peer] |
11:32:07 | | th3z0l4 joins |
11:33:43 | | SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962] |
11:34:13 | | SkilledAlpaca418962 joins |
11:56:46 | | FiTheArchiver joins |
12:18:07 | | vitzli (vitzli) joins |
12:24:17 | <h2ibot> | Vitzli edited Voice of America (+147, Add video links repository URL): https://wiki.archiveteam.org/?diff=55081&oldid=55080 |
12:25:17 | <h2ibot> | Vitzli edited Radio Free Europe (+114, Add video links repository URL): https://wiki.archiveteam.org/?diff=55082&oldid=55079 |
12:26:18 | <h2ibot> | Vitzli edited Radio Free Asia (+147, Add video links repository URL): https://wiki.archiveteam.org/?diff=55083&oldid=55065 |
12:29:03 | | vitzli quits [Client Quit] |
12:32:53 | | PredatorIWD25 quits [Read error: Connection reset by peer] |
12:35:05 | | PAARCLiCKS quits [Quit: Ping timeout (120 seconds)] |
12:35:06 | | Miori quits [Quit: Ping timeout (120 seconds)] |
12:35:53 | | PredatorIWD25 joins |
12:36:44 | | SootBector quits [Remote host closed the connection] |
12:37:06 | | SootBector (SootBector) joins |
12:37:24 | | Miori joins |
12:37:46 | | PAARCLiCKS (s4n1ty) joins |
12:47:18 | | Wohlstand quits [Quit: Wohlstand] |
13:06:04 | | beastbg8 (beastbg8) joins |
13:18:21 | | StarletCharlotte quits [Ping timeout: 260 seconds] |
13:48:39 | | vitzli (vitzli) joins |
13:51:33 | | VoynichCR (VoynichCR) joins |
14:00:30 | | StarletCharlotte joins |
14:02:38 | <h2ibot> | Dango360 edited Discourse/uncategorized (+41, Added dcs.community): https://wiki.archiveteam.org/?diff=55084&oldid=54449 |
14:05:01 | | StarletCharlotte quits [Ping timeout: 260 seconds] |
14:38:23 | <PredatorIWD25> | Given the new https://blog.cloudflare.com/ai-labyrinth/, it might be a good time to now revisit the possibilities of getting AT whitelisted by Cloudflare? |
14:41:05 | <VoynichCR> | has the ai labeyrinth good content? |
14:44:45 | <h2ibot> | VoynichCr edited WikiTeam (-1, logo): https://wiki.archiveteam.org/?diff=55085&oldid=55075 |
14:47:12 | | SootBector quits [Client Quit] |
14:48:46 | <h2ibot> | VoynichCr uploaded File:Wikiteam3.png (https://github.com/saveweb/wikiteam3/): https://wiki.archiveteam.org/?title=File%3AWikiteam3.png |
14:48:47 | <h2ibot> | VoynichCr edited WikiTeam (+45, /* WikiTeam3 */ image): https://wiki.archiveteam.org/?diff=55087&oldid=55085 |
14:51:46 | <h2ibot> | VoynichCr uploaded File:Wikiteam 2025.png (https://github.com/WikiTeam/wikiteam): https://wiki.archiveteam.org/?title=File%3AWikiteam%202025.png |
14:52:43 | | SootBector (SootBector) joins |
14:52:46 | <h2ibot> | VoynichCr edited WikiTeam (+28, image): https://wiki.archiveteam.org/?diff=55089&oldid=55087 |
14:58:47 | <h2ibot> | VoynichCr uploaded File:Wikibot.png (https://wikibot.digitaldragon.dev/): https://wiki.archiveteam.org/?title=File%3AWikibot.png |
14:59:47 | <h2ibot> | VoynichCr edited Wikibot (+2, image): https://wiki.archiveteam.org/?diff=55091&oldid=54992 |
15:01:48 | <h2ibot> | VoynichCr edited Software Heritage (+58): https://wiki.archiveteam.org/?diff=55092&oldid=55078 |
15:03:48 | <h2ibot> | VoynichCr uploaded File:Software Heritage.png (https://www.softwareheritage.org/): https://wiki.archiveteam.org/?title=File%3ASoftware%20Heritage.png |
15:04:38 | | ThreeHM quits [Ping timeout: 250 seconds] |
15:04:48 | <h2ibot> | VoynichCr edited Software Heritage (+326, infobox): https://wiki.archiveteam.org/?diff=55094&oldid=55092 |
15:10:31 | | vitzli quits [Client Quit] |
15:26:25 | | Ashurbinary joins |
15:31:14 | | sparky14920 (sparky1492) joins |
15:34:51 | | sparky1492 quits [Ping timeout: 260 seconds] |
15:34:51 | | sparky14920 is now known as sparky1492 |
15:40:55 | | ThreeHM (ThreeHeadedMonkey) joins |
15:49:43 | | sparky14922 (sparky1492) joins |
15:50:04 | | Webuser299214 quits [Quit: Ooops, wrong browser tab.] |
15:53:10 | | sparky1492 quits [Ping timeout: 250 seconds] |
15:53:11 | | sparky14922 is now known as sparky1492 |
16:29:42 | | us3rrr joins |
16:33:02 | | onetruth quits [Ping timeout: 250 seconds] |
16:37:15 | | szczot3k quits [Remote host closed the connection] |
16:37:24 | | szczot3k (szczot3k) joins |
16:43:15 | | szczot3k quits [Remote host closed the connection] |
16:43:58 | | szczot3k (szczot3k) joins |
16:51:33 | | gust joins |
16:55:33 | | egallager joins |
16:56:38 | | sparky14926 (sparky1492) joins |
16:59:29 | | StarletCharlotte joins |
16:59:54 | | sparky1492 quits [Ping timeout: 250 seconds] |
16:59:55 | | sparky14926 is now known as sparky1492 |
17:05:16 | | StarletCharlotte quits [Ping timeout: 260 seconds] |
17:05:26 | | onetruth joins |
17:09:21 | | us3rrr quits [Ping timeout: 260 seconds] |
17:21:02 | <nstrom|m> | https://ip4.me shutdown notice, owner passed away |
17:25:29 | | StarletCharlotte joins |
17:36:02 | | StarletCharlotte quits [Remote host closed the connection] |
17:36:20 | | StarletCharlotte joins |
17:41:31 | | Island joins |
18:20:01 | | lennier2_ joins |
18:22:51 | | lennier2 quits [Ping timeout: 260 seconds] |
18:25:21 | | Hackerpcs quits [Quit: Hackerpcs] |
18:27:00 | | StarletCharlotte quits [Ping timeout: 250 seconds] |
18:27:29 | <@JAA> | I believe we've covered Kevin Loch's things already, yeah. |
18:30:39 | | Hackerpcs (Hackerpcs) joins |
18:30:50 | | StarletCharlotte joins |
18:51:41 | <xkey> | > |
18:51:46 | <xkey> | > Internet Archive Europe is a project by the Dutch non-profit research library Stichting Internet Archive. https://www.internetarchive.eu/ |
18:51:50 | <xkey> | was this already discussed? |
19:14:14 | | Juest quits [Ping timeout: 250 seconds] |
19:17:03 | | itrooz (itrooz) joins |
19:18:18 | <itrooz> | Hey ! I was trying to download https://archive.org/details/archiveteam_github_20180704020939 but the download seem to be restricted (as well as other github archives). Does someone know why ? |
19:19:13 | <nicolas17> | arkiver: ^ is there a public answer for why WARCs are restricted? |
19:19:41 | <nicolas17> | I don't want to keep giving speculation and misremembering stuff that was said in other channels months ago :p |
19:19:52 | <nicolas17> | public/official answer* |
19:19:52 | | FiTheArchiver quits [Ping timeout: 250 seconds] |
19:24:05 | <@arkiver> | multiple reasons (usually one is more at play than the other in different cases). generally, data is accessible through the Wayback Machine for regular viewing. there may be problems with some of the data, which can then be handled by blocking viewing through the Wayback Machine. if the original WARCs are available for public download, we would either have to take the specific record out of the WARC, or make the entire WARC unavailable. |
19:24:05 | <@arkiver> | just letting the Wayback Machine (and their decision teams) handle decision on what is public and not works easier |
19:25:03 | <@arkiver> | second is the LLM training stuff we see lately. Archive Team is a huge juicy pile of data that can be made lots of money from by big AI companies. but that is not what web archives are for. if web archives are used commercially in that way, support will go away very fast and this will significantly hurt out ability to archive |
19:26:13 | <@arkiver> | in short - we just "archive stuff", and the responsibility for what to make public, how, and when, is given to someone else... which gives us quite some freedom, and a lot less headaches around data access, what can and cannot be public, exclusion requests, etc. |
19:26:43 | <@arkiver> | of course, we put some trust in the other party to do the right thing when it comes to these decisions. |
19:27:21 | | gust quits [Client Quit] |
19:29:39 | <TheTechRobo> | yeah, I suspect providers will be happier to allow us to scrape if the WARCs aren't public, exactly because of AI. |
19:39:08 | | Juest (Juest) joins |
19:49:30 | | Wohlstand (Wohlstand) joins |
19:52:27 | | VoynichCR quits [Quit: Ooops, wrong browser tab.] |
20:04:59 | | gust joins |
20:17:46 | | lennier2_ quits [Ping timeout: 260 seconds] |
20:18:52 | | lennier2_ joins |
20:33:54 | <itrooz> | Oh, I see. I was considering creating a project to index/view GitHub issues/discussions of projects that made them private (that sometimes happen, and important knowledge can be lost in that case), and that's why I was looking around for WARCs |
20:35:45 | <itrooz> | I'll look through the wayback machine apis, I can probably crawl through the pages manually if the rate limit allows it |
20:53:00 | | sparky14929 (sparky1492) joins |
20:56:30 | | sparky1492 quits [Ping timeout: 250 seconds] |
20:56:31 | | sparky14929 is now known as sparky1492 |
21:03:47 | <aeg> | so if i provide the url list for archivebot to scrape the splits.io database... does this mean that the resulting warc won't be available to the public? |
21:12:18 | <nicolas17> | I think archivebot warcs are public |
21:31:12 | | Ashurbinary quits [Remote host closed the connection] |
21:32:59 | | StarletCharlotte quits [Remote host closed the connection] |
21:39:05 | | sparky14924 (sparky1492) joins |
21:39:50 | | sparky14925 (sparky1492) joins |
21:42:26 | | sparky1492 quits [Ping timeout: 250 seconds] |
21:42:27 | | sparky14925 is now known as sparky1492 |
21:43:44 | | sparky14924 quits [Ping timeout: 250 seconds] |
21:50:33 | | etnguyen03 (etnguyen03) joins |
21:57:35 | | BlueMaxima joins |
21:59:31 | <pokechu22> | Yeah, archivebot warcs are public |
22:05:41 | | Wohlstand quits [Ping timeout: 260 seconds] |
22:23:59 | | etnguyen03 quits [Client Quit] |
22:31:25 | | klaffty joins |
22:35:22 | | BlueMaxima quits [Client Quit] |
22:55:25 | | chains joins |
22:58:09 | | matoro joins |
23:10:43 | | Ketchup902 quits [Remote host closed the connection] |
23:10:55 | | Ketchup901 (Ketchup901) joins |
23:17:38 | | flotwig quits [Quit: ZNC - http://znc.in] |
23:18:42 | | flotwig joins |
23:28:13 | | chains quits [Client Quit] |