00:00:48dm4v quits [Read error: Connection reset by peer]
00:03:09dm4v joins
00:03:12dm4v quits [Changing host]
00:03:12dm4v (dm4v) joins
00:15:28leo60228 (leo60228) joins
00:19:56leo60228 quits [Ping timeout: 258 seconds]
01:01:15dm4v quits [Read error: Connection reset by peer]
01:02:31dm4v joins
01:02:33dm4v quits [Changing host]
01:02:33dm4v (dm4v) joins
01:39:56Wayward (wayward) joins
02:17:13Krownest (Krownest) joins
02:20:18Krownest1 quits [Ping timeout: 258 seconds]
02:20:58Jake0 (Jake) joins
02:21:27britmob quits [Ping timeout: 258 seconds]
02:21:28Jake quits [Ping timeout: 250 seconds]
02:21:28Jake0 is now known as Jake
02:53:41HP_Archivist (HP_Archivist) joins
03:12:52AlsoHP_Archivist joins
03:16:16HP_Archivist quits [Ping timeout: 258 seconds]
03:31:56superkuh_ quits [Remote host closed the connection]
03:32:11superkuh_ joins
03:33:57wizards quits [Remote host closed the connection]
03:44:42<@JAA>I looked a bit into Windows Community. Ugly. Comments are loaded via GET XHR, pagination works via a custom 'token' header. Obviously definitely won't work in the WBM.
03:45:47OrIdow6 (OrIdow6) joins
03:45:47@ChanServ sets mode: +o OrIdow6
03:46:08<@JAA>But I'll try to get a copy of all conversations later.
03:46:52<@JAA>(Assuming it doesn't vanish in the meantime. They haven't given a concrete deadline.)
03:49:26OrIdow6^2 (OrIdow6) joins
03:49:26@ChanServ sets mode: +o OrIdow6^2
03:50:21qw3rty_ joins
03:52:02@OrIdow6 quits [Ping timeout: 250 seconds]
03:54:13qw3rty__ quits [Ping timeout: 258 seconds]
03:59:36@OrIdow6^2 is now known as @OrIdow6
04:17:51nertzy quits [Read error: Connection reset by peer]
04:18:53nertzy joins
04:52:50AlsoHP_Archivist quits [Client Quit]
04:53:07HP_Archivist (HP_Archivist) joins
05:11:51Video joins
05:14:44Megame (Megame) joins
05:46:32HP_Archivist quits [Ping timeout: 258 seconds]
06:38:52HP_Archivist (HP_Archivist) joins
06:39:30<@OrIdow6>Video: You can obviously share it here or on the wiki
06:40:30<@OrIdow6>Keep in mind that, unless it's not possible, ArchiveTeam likes to do full capture to warcs instead of just scraping - so if you know of an instance of this forum software that is going down likely a different script would be made for ArchiveTeam use
06:44:06<@OrIdow6>Especially if it's an API as opposed to the regular interface
06:44:43<@OrIdow6>But knowledge of structure, and components that discover content as opposed to saving it, are important
06:54:40<Video>OrIdow6: does the fact that flarum forums tend to have that "endless scrolling" for threads/posts change anything
07:00:49fuzzy8021 quits [Read error: Connection reset by peer]
07:01:15fuzzy8021 (fuzzy8021) joins
07:07:24<@OrIdow6>Video: Depends how it's implemented
07:07:50<@OrIdow6>E.g. GET with offset is fine; POST with JS-generated session key thing is not
07:09:58<@OrIdow6>"Fine" in the sense that WARCS will both capture and play back well
07:11:17<Video>for the API: when querying a thread, the api stores a list of ids for each post in the JSON response, which you have to throw into a separate request (i believe it was like /api/posts) and specify each post id in one continuous string
07:20:12<@OrIdow6>Could work, sounds like it depends on the /api/posts requests - if it's POST it won't play back (rather, it will, but just in a broken way); or if it's GET and the order is randomized or it has a timestamp; but basically if it's JUST a transformed version of another response it would play back
07:20:20<@OrIdow6>Deterministically
07:21:22<@OrIdow6>I won't discourage you from writing your own script - even on "no chance" things ArchiveTeam (and especially me, due to what I've ended up sort of specializing in) likes to use WARC because there's already a bunch of infrastructure in place for handling it
07:22:00<@OrIdow6>That's to say, don't think the ArchiveTeam way is necessarily the only adequate way
07:22:36<@OrIdow6>And obviously even when you're working with and API there's a benefit to capturing the response headers etc
07:26:02FalconK quits [Quit: WeeChat 3.2]
07:26:44FalconK (FalconK) joins
07:38:28Krownest quits [Ping timeout: 258 seconds]
07:57:18nertzy quits [Ping timeout: 250 seconds]
08:00:18<Video>OrIdow6: apologies for the delay - flarum's api does do things in GET requests. you can see an example of an API response at https://discuss.flarum.org/api/discussions/27852
08:07:51qwertyasdfuiopghjkl joins
08:25:02HP_Archivist quits [Ping timeout: 250 seconds]
08:37:48<@OrIdow6>Video: Trying it in a normal page (not API), seems fine
08:38:07<@OrIdow6>But anyhow, is there a specific site running this software going down? Or this is just general?
08:38:20<Video>this is just general afaik
08:38:50<Video>in a community i'm a part of, the previous owner completely nuked a forum with this software and there was no archive of it
08:38:56<Video>nuked its forum**
08:43:09jamesatjaminit (jamesatjaminit) joins
08:45:25<@OrIdow6>Oh
08:50:15<Video>i wrote the script to ensure this could never happen again
08:51:40<@OrIdow6>Nothing you to prevent it from running it yourself, or getting other people here to help run it, but as an overall project that makes it low-priority
08:51:59<@OrIdow6>Someday hopefully we will have a project to scrape all forums on the web at #msgbored
08:52:47<@OrIdow6>Also, looks like the site works without Javascript, so if you want to get an instance into the Wayback Machine, albeit in an older-looking form (and perhaps with reduced functionality), #archivebot should work
08:54:12<@OrIdow6>Anyway always possible someone (including me, who knows) might want to work on it anyway
08:54:25<@OrIdow6>But overall low-prioeirt
08:54:31<@OrIdow6>*priority
09:00:16jamesatjaminit quits [Client Quit]
09:41:18Krownest (Krownest) joins
09:47:18mutantm0nkey (mutantmonkey) joins
09:49:57mutantmnky quits [Ping timeout: 258 seconds]
09:55:54qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds]
10:49:46Stilett0 quits [Ping timeout: 250 seconds]
11:16:30Dj-Wawa quits [Quit: Dj-Wawa]
11:16:55Dj-Wawa joins
11:25:31Dj-Wawa quits [Client Quit]
11:26:12Dj-Wawa joins
11:54:40<luckcolors>hey people. I'm trying to archive a login-walled google site i have access to
11:54:49<luckcolors>i'm using grab-site
11:55:27<luckcolors>I've made a netscape cookies jar with my session cookies and i've feed it to wpul via the flag
11:55:42<luckcolors>and it is using the cookies i can see it in the warc recording
11:56:21<luckcolors>but google still thinks i'm not logged in as on the first request it immediately redirects to the login page
11:56:58<luckcolors>any gotchas?
11:57:10<luckcolors>i've already tried changing the User-agent
12:09:27<ArchivalEfforts>luckcolors How did you create the cookie file? I recently had issues when the extension I used prefixed some cookies with "#HttpOnly_".
12:09:30<ArchivalEfforts>After I removed that it worked.
12:13:13<luckcolors>I actually switched 3 different exstensions
12:13:55<luckcolors>since this one is broken with the "containers" feture https://addons.mozilla.org/it/firefox/addon/cookies-txt/
12:14:46<luckcolors>this one creates a.txt file wich for some reason wpul can't parse properly
12:14:47<luckcolors>https://addons.mozilla.org/it/firefox/addon/cookie-quick-manager/
12:16:15<luckcolors>Then i've used this one wich seems to work: https://addons.mozilla.org/it/firefox/addon/export-cookies-txt/
12:16:38<luckcolors>But yeah now that i look at it it does add the "#HttpOnly_"
12:16:48<luckcolors>I'll try removing them and see if it works
12:27:06<luckcolors>ArchivalEfforts: thanks you man. you saved me from going insane
12:27:07<luckcolors>XD
12:27:16<luckcolors>it's actually working
12:27:43<luckcolors>not to limit it so it doesn't actually try to change account
12:28:29<luckcolors>how can i regex /?authuser=1
12:28:44<ArchivalEfforts>Glad I could help, took me a while to figure that out when I ran into it
12:29:27<ArchivalEfforts>Sorry, can't help with regex
12:29:39<luckcolors>no worries
12:33:53BlueMaxima joins
12:43:11wizards joins
12:45:08<@OrIdow6>I think the only char you need to escape there is the ?
12:48:13<luckcolors>i'm just using this as regex
12:48:14<luckcolors>authuser=\d
12:48:18<luckcolors>should suffice
12:48:44<luckcolors>i kinda forgot to only ignore that particular page EG.
12:48:46<luckcolors>^https?://sites\.google\.com/site
12:48:57<luckcolors>i don't want to append the trailing / right?
12:49:12<luckcolors>else it will not fetch any path longer than that
12:49:56<luckcolors>like this will ignore anything under /site ^https?://sites\.google\.com/site/
12:52:31<luckcolors>ok no this is not how exact matches are made
12:52:51luckcolors opens dusty regex book
12:59:03<luckcolors>ended up using ^https?://sites\.google\.com/site$ wich i think is not ideal since it won't ignore potential url params
12:59:08<luckcolors>but works so
13:21:07rewby quits [Remote host closed the connection]
13:21:22rewby (rewby) joins
13:34:12Video quits [Ping timeout: 258 seconds]
14:15:59sdomi quits [Ping timeout: 258 seconds]
14:17:23britmob256 joins
14:20:00britmob joins
16:05:37Arcorann quits [Ping timeout: 258 seconds]
16:09:26CookMePlox joins
16:12:28<CookMePlox>Hey folks! Does anyone have suggestions for getting Internet Archive to respond to an email request? I've convinced the owner of runescape.com to ask for their site to get un-blacklisted from the wayback machine, but they sent something to info@archive.org and haven't heard back in 6 months
16:15:01<h2ibot>Anput uploaded File:My folder.png: https://wiki.archiveteam.org/?title=File%3AMy%20folder.png
16:34:49lennier1 quits [Client Quit]
16:45:06Krownest quits [Ping timeout: 258 seconds]
16:51:52<thuban>http://www.rrrrthats5rs.com/ :<
17:00:24Krownest (Krownest) joins
17:10:03Matthww8 quits [Quit: Ping timeout (120 seconds)]
17:10:26Matthww8 joins
17:13:28CookMePlox quits [Remote host closed the connection]
17:18:06BlueMaxima quits [Read error: Connection reset by peer]
17:20:25<@HCross>arkiver: - see above from CookMePlox who isn't here any more
17:51:36HP_Archivist (HP_Archivist) joins
17:54:17lennier1 (lennier1) joins
17:58:26Stiletto joins
18:02:40grafck quits [Ping timeout: 250 seconds]
18:13:37Daloader joins
18:44:38Daloader quits [Client Quit]
18:58:08<@arkiver>HCross: checking
18:58:17<@arkiver>ouch
18:58:35<@arkiver>HCross: thanks, will ping internally
19:19:12driib quits [Client Quit]
19:19:27driib (driib) joins
19:22:08Minkafighter2 quits [Quit: The Lounge - https://thelounge.chat]
19:22:39Minkafighter2 joins
19:48:56JackeithWelley quits [Remote host closed the connection]
20:09:45jonboy3452 joins
20:09:45Jonboy3451 quits [Read error: Connection reset by peer]
20:35:38gazorpazorp quits [Ping timeout: 250 seconds]
20:42:00HP_Archivist quits [Client Quit]
20:56:08average_student joins
21:01:48HP_Archivist (HP_Archivist) joins
21:22:53Megame quits [Client Quit]
21:30:46gazorpazorp joins
21:36:28Matthww88 joins
21:37:12Matthww8 quits [Ping timeout: 258 seconds]
21:37:12Matthww88 is now known as Matthww8
21:46:12Video joins
22:02:29leo60228 (leo60228) joins
22:12:28leo60228 quits [Ping timeout: 258 seconds]
22:19:50qwertyasdfuiopghjkl joins
22:23:47leo60228 (leo60228) joins
22:27:53<Stiletto><@JAA> Per the notice on https://3dwarehouse.sketchup.com/: 'Heads up! After July 27, SketchUp 2017 models will no longer be available for download on 3D Warehouse unless it was originally uploaded in that format.'
22:28:19<Stiletto>not sure if anyone was actually looking more into this, but it seems they have extended the deadline to August 11.
22:53:53qwertyasdfuiopghjkl quits [Remote host closed the connection]
22:57:32qwertyasdfuiopghjkl joins
23:23:51<Jake>(I was a little bit, but the API was VERY slow, like 3 minutes per request slow...)
23:23:59<Jake>Glad to see they added some more time, I'll take a look again tonight.
23:42:27lennier1 quits [Client Quit]
23:42:51lennier1 (lennier1) joins
23:43:36qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds]