00:23:56sneezey quits [Ping timeout: 258 seconds]
00:25:25Arcorann (Arcorann) joins
00:37:30sneezey joins
01:03:08dm4v quits [Read error: Connection reset by peer]
01:04:09dm4v joins
01:04:12dm4v quits [Changing host]
01:04:12dm4v (dm4v) joins
01:16:01HackMii quits [Remote host closed the connection]
01:17:03HackMii (hacktheplanet) joins
01:22:36HackMii quits [Remote host closed the connection]
01:22:57HackMii (hacktheplanet) joins
01:24:50Nulo joins
01:25:35<Nulo>hi, does archive team somehow archive websites? I'm trying to download a website but web.archive.org is down and archive.ph's zip files are broken (404)
01:30:11<@OrIdow6>The INternet Archive is undergoing a planned shutdown today
01:31:13<@OrIdow6>But to directly answer the question Nulo, no, for the most part ArchiveTeam usually just sends data to archive.org or to other places, and does not store it long-term itself
01:32:33<Nulo>OrIdow6, okay, thanks
01:33:35<@OrIdow6>No problem
01:40:54<@dxrt>So https://logbot.info which is shutting down has uploaded all the logs to https://archive.logbot.info/. I've run it through AB and will upload the files in an item soon™
01:41:04<@OrIdow6>Japanese game/microblogging site Gree going down the 24th (https://jp.apps.gree.net/ja/static/page/20210201_pcnotices), in the process of trying to determine the specifics
01:56:19HP_Archivist quits [Ping timeout: 258 seconds]
01:59:31HP_Archivist (HP_Archivist) joins
02:27:45HP_Archivist quits [Ping timeout: 258 seconds]
02:31:02jspiros joins
02:32:37HP_Archivist (HP_Archivist) joins
02:40:01HP_Archivist quits [Ping timeout: 258 seconds]
03:53:24HP_Archivist (HP_Archivist) joins
03:53:37Nulo quits [Ping timeout: 258 seconds]
03:57:40qw3rty__ joins
04:00:31HackMii quits [Remote host closed the connection]
04:00:56HackMii (hacktheplanet) joins
04:01:17qw3rty_ quits [Ping timeout: 258 seconds]
04:03:35HP_Archivist quits [Ping timeout: 258 seconds]
04:20:40DogsRNice quits [Read error: Connection reset by peer]
04:36:16save_fn joins
04:36:42<save_fn>Now that the old Freenode is gone that I know of, was much saved?
04:36:53<save_fn>I wanted to save some data but I was too late
04:37:03<Doranwen>there was some discussion earlier about it, but I wasn't paying close attention to it
04:37:31<Doranwen>some people saved something, from what I gathered
04:37:48<save_fn>They shut it down right when I was starting to try to save some things
04:40:40Iki1 joins
04:41:07<thuban>we have public channel info, and nick info for a subset of users
04:41:43<save_fn>I had a script that was going to save INFO for every channel
04:41:55Iki quits [Ping timeout: 258 seconds]
04:41:55<save_fn> /msg ChanServ INFO <#channel>
04:42:22<save_fn>thuban, I'm glad you were able to get that!
04:43:11<thuban>don't thank me, pekster did most of the work
04:43:26<thuban>should be up on ia sometime in the near future
04:45:54<save_fn>thanks
05:36:23HP_Archivist (HP_Archivist) joins
05:45:55IDK quits [Ping timeout: 244 seconds]
05:49:45HP_Archivist quits [Read error: Connection reset by peer]
05:50:12HP_Archivist (HP_Archivist) joins
06:02:59noteness_ (noteness) joins
06:04:01noteness quits [Quit: Ping Timeout: 419 seconds]
06:17:08<pekster>Yea, I've got raw files for chanserv info for a combination of ALIS queries I ran on 6/11, plus a pile of extra channels thuban handed me. It's very raw and needs some dedupe, but I've got chanserv info for some thousands of channels, plus nickserv info for anyone that still had an account from the founder & access lists.
06:17:40<pekster>The final portion of my nickserv queries completed about an hour before the final servers started being taken offline.
06:41:01C4K3 joins
07:12:18IDK joins
07:15:44sonick quits [Client Quit]
07:21:32sonick (sonick) joins
07:22:03Hackerpcs quits [Client Quit]
07:22:43Hackerpcs (Hackerpcs) joins
07:22:44Atom quits [Read error: Connection reset by peer]
07:22:56Atom joins
07:26:57<s-crypt>ive been in and out for the past few months. is there a reason that we started to use transfer.archivete.am instead of transfer.notkiska.pw?
07:29:25<Jake>just some different servers for each, I believe. the archivete.am one is a bit faster for me too.
07:29:38<Jake>(Ki ska's may have been filling up too?)
08:32:23<Doranwen>I think speed will depend on where you are
08:33:05<Doranwen>iirc the archivete.am is in North America and the other's in Europe, so depending on someone's location…
08:41:29<@HCross>Doranwen: transfer.archivete.am is everywhere :)
08:41:38<Doranwen>lol, well then
08:41:41<Doranwen>that works
08:41:54<Doranwen>I just remember the comment about it being faster for me now that it wasn't going across the ocean
08:42:09<@HCross>it does cross the Atlantic, but indirectly
08:42:17<s-crypt>what infrastructure does the .am use to be 'everywhere'?
08:42:31<@HCross>so you make a POST request to a server in the US, and it forwards it onto Europe
09:15:21yanmaani1 quits [Remote host closed the connection]
09:15:52yanmaani1 (yanmaani) joins
09:43:31BlueMaxima quits [Client Quit]
10:00:05mary quits [Ping timeout: 258 seconds]
10:01:17mary joins
10:04:14<@JAA>save_fn: I archived the CNET forums, at least the forum and thread pages. https://archive.org/details/www.cnet.com_forums_202101 All in the WBM as well.
10:10:27EdSavoie quits [Ping timeout: 244 seconds]
10:15:25fuzzy8021 quits [Read error: Connection reset by peer]
10:15:48fuzzy8021 (fuzzy8021) joins
10:30:45mary quits [Ping timeout: 258 seconds]
10:31:43mary joins
10:31:58noteness_ quits [Remote host closed the connection]
10:32:17noteness (noteness) joins
10:43:24HP_Archivist quits [Ping timeout: 258 seconds]
11:23:00nick4 joins
11:23:07<nick4>hey i'm having some error archiving a website using grab-site i think it's with http authentication using cookies
11:25:20<nick4>paste.ee/p/NybXF
11:48:07<nick4>can someone help?
11:48:23bruh joins
11:48:37<bruh>Hey guys
11:48:40<bruh>New here
11:48:42<bruh>What happens to the data downloaded and already uploaded? Does it get automatically deleted from my system?
11:56:19bruh quits [Remote host closed the connection]
11:56:54bruh joins
11:58:40<@EggplantN>What are you running bruh
11:58:48<@EggplantN>The warrior?
11:59:45<bruh>yeah
11:59:46<bruh>the vm
11:59:50<bruh>in virtualbox
12:00:51<@EggplantN>it does remove data uploaded to the targets
12:01:22<bruh>perfect
12:01:57<bruh>and can i change the 60 gig storage befrre importing the vm
12:02:09<@EggplantN>depending on what project yes.
12:02:20<@EggplantN>but keep in mind some projects can have items quite large
12:02:52<bruh>I don't understand. The storage limit is of the vm, right?
12:03:09<bruh>How do i change the limit to a 100 gigs, for example?
12:27:26bruh quits [Remote host closed the connection]
12:27:29<nick4>how to solve this error?
12:27:54<nick4>https://paste.ee/p/oNxd2
12:32:11<Jake>nick4: If I remember correctly, that's the SQLAlchemy issue, try `pip install 'SQLAlchemy<1.4'` More info here: https://github.com/ArchiveTeam/wpull/issues/463
12:34:24<nick4>Requirement already satisfied
12:40:33<Jake>Ah, I'm sorry, I read it wrong. Is there more output than the paste above?
12:44:21<rewby>Oh that warning. Is that at the end of the grab?
12:44:38<rewby>My grab-sites sometimes do this when ending the grab because the connection to the gs-server doesn't end cleanly
12:44:43<rewby>but I don't think it affects the warcs in any way
12:45:56<nick4>i can't use the cookie feature
12:47:38<nick4>https://paste.ee/p/WulEQ
12:47:52<nick4>full output
12:49:52<nick4>it works if i just grab the website without cookies but it's kinda useless cuz the website requires login
12:56:46<nick4>ok i got it working
12:57:25<nick4>it was a erorr with cookie file
12:59:35<nick4>how to extract/view WARC files?
13:02:22paul2520 (paul2520) joins
13:02:53<Jake>I was about to ask if the cookie file was in the right format and correct. Glad to hear that was it. I've used replayweb.page in the past to view some WARCs, there are a lot of others as well: https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem
13:05:24<nick4>so there's no way to convert to html and host it ?
13:15:24paul2520 quits [Remote host closed the connection]
13:31:06Mateon1 quits [Ping timeout: 250 seconds]
13:37:18<@EggplantN>nick4 yes
13:37:20<@EggplantN>pyweb
13:37:35<@EggplantN>*pywb
13:37:36<@EggplantN>https://github.com/webrecorder/pywb
13:38:08Jack_Thompson joins
13:39:08<@EggplantN>Yeah arkiver / OrIdow6 who wants to take brilliant.org project lol #archiveteam
13:39:33<@OrIdow6>So many sites shutting down
13:39:37<@OrIdow6>I am going to go crazy
13:40:55<@OrIdow6>At least it's in a few weeks
13:41:35<@EggplantN>I know :(
13:41:45<@EggplantN>Quite a few, we're around if we can help at all :D
13:45:34<rewby>Wait, brilliant is shutting down?
13:46:54<russss>just their forums it seems
13:47:02<rewby>Ah okay
13:47:12<russss>there comes a time in every website's life when they have to shut down their forums for no reason
13:47:51<rewby>So we have until july second to grab it?
13:47:55yanmaani1 is now known as yanmaani
13:49:25<rewby>OrIdow6: I was planning to learn how to do warrior projects some time in july, if that might be helpful to you eventually.
13:53:06Mateon1 joins
13:55:12nick4 quits [Ping timeout: 244 seconds]
13:57:31nick4 joins
13:57:55<nick4>when i'm running grab-site it clicks on logout and then the cookie become useless
13:58:03<nick4>is there a way to fix it?
13:58:36<@OrIdow6>And I appreciate it EggplantN
13:59:24<@OrIdow6>rewby: Yes
13:59:55<rewby>I'm probably gonna pull my hair out trying to understand how wget-lua works
14:00:00<rewby>But that's part of the fun
14:00:25<@OrIdow6>You don't need to understand how wget-lua itself works
14:00:31<@OrIdow6>Well, you eventually do
14:00:37<rewby>Well yes, but I'll need to understand the hooks
14:00:38<@OrIdow6>But to start no
14:00:40<@OrIdow6>Yes
14:00:49<rewby>And I can't find any docs on this
14:00:58<@EggplantN>Add an ignore for the logout page nick4
14:00:59<@OrIdow6>https://github.com/archiveteam/wget-lua/wiki/Wget-with-Lua-hooks
14:00:59<rewby>So I'll probably end up reading the code to figure out what everything does
14:01:05<rewby>Oh am I just blink
14:01:07<rewby>*blind
14:01:30<@OrIdow6>It's somewhat outdated but it has the essentials
14:01:50<rewby>Coo.
14:01:53<rewby>*Cool
14:02:19<rewby>I'll probably ask the tracker admins for a testing tracker and do a small project that really doesn't need warrior just on my own to try and learn the tech
14:02:52<rewby>Unless there's a particular project you want me to do
14:05:01<@OrIdow6>That would be a question for arkiver, I'm not in charge
14:05:04<nick4>guys, how can i avoid being logged out while archiving?
14:05:18<nick4>the bot clicks on logout button
14:06:13<@OrIdow6>nick4: Add an ignore for the URL the logout button leads to?
14:06:31<nick4>--import-ignores this?
14:06:46<rewby>OrIdow6: Fair enough. I'll ask arkiver if he wants my help and if there's a project he thinks I could do.
14:08:34<nick4>i don't understand how this works "--import-ignores: Copy this file to to DIR/ignores before the crawl begins."
14:12:09<@OrIdow6>arkiver: So I've put what I have at https://github.com/OrIdow6/wikdot-grab, set as private (to prevent people from trying to run it again), and added you
14:12:49<@OrIdow6>Comments (besides the fact that it's obviously unfinished):
14:13:59<nick4>grab-site --import-ignores 'ingnored url' ...?
14:14:09<@OrIdow6>I'm setting item by sending it the list, from pipeline.py (through a JSON-encoded environment var), of start URLs, and set_new_item watches for those
14:15:47<@OrIdow6>It keeps track of request to ajax-connector-module.php through a parameter the site proper uses called callbackIndex
14:16:17<@OrIdow6>Cookie is only being sent as an explicit header on those API calls
14:17:58<@OrIdow6>Again, this obviously isn't final, but all that needs to be changed is cleanup, add user items, decide what to do with outlinks, and add tags and possibly recent changes (both of which would just be copies of how it handles version)
14:18:23<@OrIdow6>nick4: I assume it means as opposed to during the grab
14:18:56<@OrIdow6>Anyhow, think I will be able to do all that today (may be tomorrow in Europe)
14:22:54<rewby>nick4: You want to put the ignored url into a text file, and then --import-ignores <filename>
14:25:51<nick4>in this case the link is https://thesource2.to/forum/index.php?logout/&t=16238
14:26:10<nick4>can i use ?logout/* ?
14:26:22<nick4>the numbers change
14:27:00<save_fn>JAA, NICE about CNET. Why haven't I heard about this project until now!
14:28:08<save_fn>pekster, Do you think it was intentional that the services data was not kept when Freenode switched to a different IRCd?
14:31:54<sknebel>sounded like turning it into a completely different free-for-all was intentional, yes. e.g. they dropped the entire concept of on-topic "projects" registering their space on Freenode etc too
14:32:51<sknebel>(although maybe they pushed that because they couldn't figure out how to migrate the DB? who knows with freenode)
14:33:34<save_fn>Neither did Libera take a copy of the database
14:35:35<russss>they did also switch IRC services but they probably could have migrated if they cared about it, which they clearly didn't
14:40:30<@EggplantN>rewby i can create a testing tracker for you
14:43:13<save_fn>Does anyone have methods or software to extract cached websites from old computers browser caches?
14:43:17<rewby>EggplantN: I won't have time for a week or so still.
14:56:58<nick4>why grab-site doesn't archive the entire website just like 50%?
15:06:56<nick4>how can i ignore a dynamic link?
15:08:48<@arkiver>OrIdow6: why are you using start_urls?
15:08:59<@arkiver>cant you just detect when you are on the main page of a wiki?
15:09:01<@arkiver>or a user
15:12:15<@arkiver>i feel like this is making it overly complicated, and it's not needed to get it to do what you want it to do
15:18:21<@arkiver>OrIdow6: what do the cookies do?
15:34:10Arcorann quits [Ping timeout: 250 seconds]
15:48:53<@arkiver>rewby: not exactly sure at this moment, but will keep it in mind
15:48:56<@arkiver>will likely ping later
15:49:02<@arkiver>(or ping me in case of anything(
15:49:04<@arkiver>)
15:49:16<rewby>Aight.
15:54:10<@arkiver>if you have ideas, let me know :)
15:59:25paul2520 (paul2520) joins
16:00:48systwi quits [Ping timeout: 258 seconds]
16:12:09rtlrelay is now known as superkuh
16:12:11Matthww quits [Quit: The Lounge - https://thelounge.chat]
16:13:02<superkuh>I miss sci-hub so much. It had been an integral part of my life, for what, a decade+? It's like losing a limb. Suddenly so much of the academic sphere is completely inaccessible to me behind paywalls.
16:15:36<nick4>i'm trying to grab a site with ignores but it's not ignoring, i've used regex and still nothing, could someone help?
16:27:16<thuban>nick4: post the url(s) you're trying to ignore and the regex you've tried
16:28:10<thuban>as for your previous question, that depends; the most common issue is that parts of the site won't work without javascript
16:30:05Matthww joins
16:36:45paul2520 quits [Remote host closed the connection]
16:41:51<nick4>thuban https://thesource2.to/forum/index.php?logout/&t=162385285
16:42:26<nick4>but everything after the logout/& changes every time the site loads
16:46:41paul2520 (paul2520) joins
16:47:03<paul2520>superkuh I got diconnected, so sorry if I missed something but isn't Sci-Hub still around? https://sci-hub.st/
16:47:07<paul2520>disconnected*
16:47:24<thuban>paul2520: https://www.reddit.com/r/scihub/comments/lofj0r/announcement_scihub_has_been_paused_no_new/
16:47:32<paul2520>thanks
16:47:33<thuban>nick4: and your regex?
16:47:52<nick4>^https://thesource2.to/forum/index.php?logout.*$
16:48:33<thuban>? is a special character in this dialect of regular expressions (as in most dialects); you need to escape it
16:48:38<thuban>\?
16:48:51paul2520 quits [Remote host closed the connection]
16:49:09<nick4>it's in english
16:49:47<thuban>i'm not sure what you mean by that
16:50:25<nick4>i got to go, ill talk better after
16:53:24<wizards>anyone have an idea how to download the actual application data from blackberry world?
16:54:31<@JAA>nick4: I believe the 'HTTP session did not complete.' warning is safe to ignore. Happens all the time on ArchiveBot, and I haven't seen any issues from it.
16:55:00nick4 quits [Ping timeout: 244 seconds]
16:57:03<@JAA>save_fn: Yeah, Many of the smaller projects aren't really documented well. This one is though: https://wiki.archiveteam.org/index.php/CNET_Forums
17:09:35mutantmonkey quits [Remote host closed the connection]
17:09:51mutantmonkey (mutantmonkey) joins
17:25:16<Jake>(Related to Brillant.org, owners of the site seem responsive to archiving. https://www.reddit.com/r/DataHoarder/comments/o0qrey/brilliantorg_shutting_down_community/h1zerf6/ )
17:29:51<@JAA>Nice
17:37:28Matthww quits [Client Quit]
17:46:24lunik1 quits [Quit: :x]
17:54:11Matthww joins
18:10:49paul2520 (paul2520) joins
18:21:16paul2520 quits [Remote host closed the connection]
18:25:23AnotherIki joins
18:29:09Iki1 quits [Ping timeout: 258 seconds]
18:53:13spirit joins
19:03:08noteness quits [Remote host closed the connection]
19:03:26noteness (noteness) joins
19:09:46HP_Archivist (HP_Archivist) joins
19:10:05lunik1 joins
19:17:06AlsoHP_Archivist joins
19:17:06HP_Archivist quits [Read error: Connection reset by peer]
19:17:50AlsoHP_Archivist quits [Read error: Connection reset by peer]
19:18:17AlsoHP_Archivist joins
19:34:19nick4 joins
19:34:54<nick4>i'm back
19:35:48<Jake>You got a few comments while you were gone: from J AA "I believe the 'HTTP session did not complete.' warning is safe to ignore. Happens all the time on ArchiveBot, and I haven't seen any issues from it."
19:36:57<Jake>also I believe the issue with your regex was the non-escaped question mark as thuba n said, try this: ^https://thesource2.to/forum/index.php\?logout.*$
19:37:32<nick4>ok, ill try, thanks
19:52:42<nick4>so far it's working
19:54:42<Jake>Great.
19:55:21<thuban>fwiw i prefer not to have my nick mangled when i'm mentioned--i don't mind extraneous pings and i value being able to grep for it reliably
19:59:06<Jake>apologies! not sure who is fine with it and who isn't. i'll try to remember for next time.
19:59:45<@EggplantN>okay thuba n
20:00:45<thuban>no need to apologize! it's just an unfortunate preference collision and i don't have a good solution
20:03:14<thuban>if anti-ping munging were always performed in the same way, i would just pay attention to the munged version as well, but there doesn't seem to be a consistent convention
20:28:14<@JAA>thuban: t\W*h\W*u\W*b\W*a\W*n should probably catch most of them.
20:38:05<spirit>first time i hear of nick mangling
20:38:27DogsRNice (Webuser299) joins
20:45:12Doranwen also prefers no mangling
20:45:24<Doranwen>but most people seem to
20:45:26Doranwen shrugs
20:45:35<Doranwen>I'm involved a whole lot less than thuban is, though
20:45:43<Doranwen>so it doesn't matter that much
20:46:00IDK quits [Remote host closed the connection]
20:50:42Guest69 joins
20:57:49<@EggplantN>*Dora*
20:59:25save_fn quits [Ping timeout: 258 seconds]
21:05:45Guest69 quits [Client Quit]
21:09:43spirit quits [Client Quit]
21:25:25Matthww6 joins
21:26:15Matthww quits [Ping timeout: 258 seconds]
21:26:15Matthww6 is now known as Matthww
21:31:03Matthww8 joins
21:32:32Matthww quits [Ping timeout: 250 seconds]
21:32:32Matthww8 is now known as Matthww
21:43:14systwi (systwi) joins
21:45:22Matthww5 joins
21:45:48Matthww quits [Ping timeout: 258 seconds]
21:45:48Matthww5 is now known as Matthww
21:49:44<Doranwen>EggplantN: lol, you really must be talking to someone else ;)
21:50:02Doranwen hates "Dora", and picked Doranwen before Dora the Explorer ever existed
21:50:09<Doranwen>Doran is fine, though :)
21:53:56Matthww6 joins
21:56:09Matthww quits [Ping timeout: 258 seconds]
21:56:09Matthww6 is now known as Matthww
22:04:28Larsenv quits [Quit: ZNC 1.8.2+deb1+focal2 - https://znc.in]
22:05:25sec^nd quits [Ping timeout: 255 seconds]
22:05:57sec^nd (second) joins
22:11:18lunik1 quits [Read error: Connection reset by peer]
22:14:23Mateon2 joins
22:14:23Mateon1 quits [Read error: Connection reset by peer]
22:14:31Mateon2 is now known as Mateon1
22:15:44nick4 quits [Remote host closed the connection]
22:21:11<@OrIdow6>arkiver: It would be more complicated than that, because of custom domains
22:21:42<@OrIdow6>And possibly outlinks, depending on how those end up being handled
22:22:27<@OrIdow6>I also thought this was generalizable to other projects, or item types if need be
22:23:00<@OrIdow6>I don't know what the cookie is used for, but it's required, else ajax-module-connector gives an error
22:25:17Mateon1 quits [Ping timeout: 258 seconds]
22:26:56Mateon1 joins
22:48:18<@OrIdow6>And I'm setting it with headers= because of custom domains, didn't mention that before
23:02:16lennier1 quits [Quit: Going offline, see ya! (www.adiirc.com)]
23:02:46lennier1 (lennier1) joins
23:15:10Matthww4 joins
23:17:02Matthww quits [Ping timeout: 258 seconds]
23:17:02Matthww4 is now known as Matthww
23:23:24berndj quits [Client Quit]
23:34:55Larsenv (Larsenv) joins
23:35:11Arcorann (Arcorann) joins
23:43:28lunik1 joins
23:45:17BlueMaxima joins
23:49:37AnotherIki quits [Ping timeout: 258 seconds]
23:56:21<Jake>The brillant guy released a data dump: https://ds055uzetaobb.cloudfront.net/exported_user_content.zip 9.1GB compressed.