| 00:00:48 | | dm4v quits [Read error: Connection reset by peer] |
| 00:03:09 | | dm4v joins |
| 00:03:11 | | dm4v is now authenticated as dm4v |
| 00:03:12 | | dm4v quits [Changing host] |
| 00:03:12 | | dm4v (dm4v) joins |
| 00:15:28 | | leo60228 (leo60228) joins |
| 00:19:56 | | leo60228 quits [Ping timeout: 258 seconds] |
| 01:01:15 | | dm4v quits [Read error: Connection reset by peer] |
| 01:02:31 | | dm4v joins |
| 01:02:33 | | dm4v is now authenticated as dm4v |
| 01:02:33 | | dm4v quits [Changing host] |
| 01:02:33 | | dm4v (dm4v) joins |
| 01:39:56 | | Wayward (wayward) joins |
| 02:17:13 | | Krownest (Krownest) joins |
| 02:20:18 | | Krownest1 quits [Ping timeout: 258 seconds] |
| 02:20:58 | | Jake0 (Jake) joins |
| 02:21:27 | | britmob quits [Ping timeout: 258 seconds] |
| 02:21:28 | | Jake quits [Ping timeout: 250 seconds] |
| 02:21:28 | | Jake0 is now known as Jake |
| 02:53:41 | | HP_Archivist (HP_Archivist) joins |
| 03:12:52 | | AlsoHP_Archivist joins |
| 03:16:16 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 03:31:56 | | superkuh_ quits [Remote host closed the connection] |
| 03:32:11 | | superkuh_ joins |
| 03:33:57 | | wizards quits [Remote host closed the connection] |
| 03:44:42 | <@JAA> | I looked a bit into Windows Community. Ugly. Comments are loaded via GET XHR, pagination works via a custom 'token' header. Obviously definitely won't work in the WBM. |
| 03:45:47 | | OrIdow6 (OrIdow6) joins |
| 03:45:47 | | @ChanServ sets mode: +o OrIdow6 |
| 03:46:08 | <@JAA> | But I'll try to get a copy of all conversations later. |
| 03:46:52 | <@JAA> | (Assuming it doesn't vanish in the meantime. They haven't given a concrete deadline.) |
| 03:49:26 | | OrIdow6^2 (OrIdow6) joins |
| 03:49:26 | | @ChanServ sets mode: +o OrIdow6^2 |
| 03:50:21 | | qw3rty_ joins |
| 03:52:02 | | @OrIdow6 quits [Ping timeout: 250 seconds] |
| 03:54:13 | | qw3rty__ quits [Ping timeout: 258 seconds] |
| 03:59:36 | | @OrIdow6^2 is now known as @OrIdow6 |
| 04:17:51 | | nertzy quits [Read error: Connection reset by peer] |
| 04:18:53 | | nertzy joins |
| 04:18:54 | | nertzy is now authenticated as nertzy |
| 04:52:50 | | AlsoHP_Archivist quits [Client Quit] |
| 04:53:07 | | HP_Archivist (HP_Archivist) joins |
| 05:11:51 | | Video joins |
| 05:14:44 | | Megame (Megame) joins |
| 05:46:32 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 06:38:52 | | HP_Archivist (HP_Archivist) joins |
| 06:39:30 | <@OrIdow6> | Video: You can obviously share it here or on the wiki |
| 06:40:30 | <@OrIdow6> | Keep in mind that, unless it's not possible, ArchiveTeam likes to do full capture to warcs instead of just scraping - so if you know of an instance of this forum software that is going down likely a different script would be made for ArchiveTeam use |
| 06:44:06 | <@OrIdow6> | Especially if it's an API as opposed to the regular interface |
| 06:44:43 | <@OrIdow6> | But knowledge of structure, and components that discover content as opposed to saving it, are important |
| 06:54:40 | <Video> | OrIdow6: does the fact that flarum forums tend to have that "endless scrolling" for threads/posts change anything |
| 07:00:49 | | fuzzy8021 quits [Read error: Connection reset by peer] |
| 07:01:15 | | fuzzy8021 (fuzzy8021) joins |
| 07:07:24 | <@OrIdow6> | Video: Depends how it's implemented |
| 07:07:50 | <@OrIdow6> | E.g. GET with offset is fine; POST with JS-generated session key thing is not |
| 07:09:58 | <@OrIdow6> | "Fine" in the sense that WARCS will both capture and play back well |
| 07:11:17 | <Video> | for the API: when querying a thread, the api stores a list of ids for each post in the JSON response, which you have to throw into a separate request (i believe it was like /api/posts) and specify each post id in one continuous string |
| 07:20:12 | <@OrIdow6> | Could work, sounds like it depends on the /api/posts requests - if it's POST it won't play back (rather, it will, but just in a broken way); or if it's GET and the order is randomized or it has a timestamp; but basically if it's JUST a transformed version of another response it would play back |
| 07:20:20 | <@OrIdow6> | Deterministically |
| 07:21:22 | <@OrIdow6> | I won't discourage you from writing your own script - even on "no chance" things ArchiveTeam (and especially me, due to what I've ended up sort of specializing in) likes to use WARC because there's already a bunch of infrastructure in place for handling it |
| 07:22:00 | <@OrIdow6> | That's to say, don't think the ArchiveTeam way is necessarily the only adequate way |
| 07:22:36 | <@OrIdow6> | And obviously even when you're working with and API there's a benefit to capturing the response headers etc |
| 07:26:02 | | FalconK quits [Quit: WeeChat 3.2] |
| 07:26:44 | | FalconK (FalconK) joins |
| 07:38:28 | | Krownest quits [Ping timeout: 258 seconds] |
| 07:57:18 | | nertzy quits [Ping timeout: 250 seconds] |
| 08:00:18 | <Video> | OrIdow6: apologies for the delay - flarum's api does do things in GET requests. you can see an example of an API response at https://discuss.flarum.org/api/discussions/27852 |
| 08:07:51 | | qwertyasdfuiopghjkl joins |
| 08:25:02 | | HP_Archivist quits [Ping timeout: 250 seconds] |
| 08:37:48 | <@OrIdow6> | Video: Trying it in a normal page (not API), seems fine |
| 08:38:07 | <@OrIdow6> | But anyhow, is there a specific site running this software going down? Or this is just general? |
| 08:38:20 | <Video> | this is just general afaik |
| 08:38:50 | <Video> | in a community i'm a part of, the previous owner completely nuked a forum with this software and there was no archive of it |
| 08:38:56 | <Video> | nuked its forum** |
| 08:43:09 | | jamesatjaminit (jamesatjaminit) joins |
| 08:45:25 | <@OrIdow6> | Oh |
| 08:50:15 | <Video> | i wrote the script to ensure this could never happen again |
| 08:51:40 | <@OrIdow6> | Nothing you to prevent it from running it yourself, or getting other people here to help run it, but as an overall project that makes it low-priority |
| 08:51:59 | <@OrIdow6> | Someday hopefully we will have a project to scrape all forums on the web at #msgbored |
| 08:52:47 | <@OrIdow6> | Also, looks like the site works without Javascript, so if you want to get an instance into the Wayback Machine, albeit in an older-looking form (and perhaps with reduced functionality), #archivebot should work |
| 08:54:12 | <@OrIdow6> | Anyway always possible someone (including me, who knows) might want to work on it anyway |
| 08:54:25 | <@OrIdow6> | But overall low-prioeirt |
| 08:54:31 | <@OrIdow6> | *priority |
| 09:00:16 | | jamesatjaminit quits [Client Quit] |
| 09:41:18 | | Krownest (Krownest) joins |
| 09:47:18 | | mutantm0nkey (mutantmonkey) joins |
| 09:49:57 | | mutantmnky quits [Ping timeout: 258 seconds] |
| 09:55:54 | | qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds] |
| 10:49:46 | | Stilett0 quits [Ping timeout: 250 seconds] |
| 11:16:30 | | Dj-Wawa quits [Quit: Dj-Wawa] |
| 11:16:55 | | Dj-Wawa joins |
| 11:16:55 | | Dj-Wawa is now authenticated as Dj-Wawa |
| 11:25:31 | | Dj-Wawa quits [Client Quit] |
| 11:26:12 | | Dj-Wawa joins |
| 11:26:12 | | Dj-Wawa is now authenticated as Dj-Wawa |
| 11:54:40 | <luckcolors> | hey people. I'm trying to archive a login-walled google site i have access to |
| 11:54:49 | <luckcolors> | i'm using grab-site |
| 11:55:27 | <luckcolors> | I've made a netscape cookies jar with my session cookies and i've feed it to wpul via the flag |
| 11:55:42 | <luckcolors> | and it is using the cookies i can see it in the warc recording |
| 11:56:21 | <luckcolors> | but google still thinks i'm not logged in as on the first request it immediately redirects to the login page |
| 11:56:58 | <luckcolors> | any gotchas? |
| 11:57:10 | <luckcolors> | i've already tried changing the User-agent |
| 12:09:27 | <ArchivalEfforts> | luckcolors How did you create the cookie file? I recently had issues when the extension I used prefixed some cookies with "#HttpOnly_". |
| 12:09:30 | <ArchivalEfforts> | After I removed that it worked. |
| 12:13:13 | <luckcolors> | I actually switched 3 different exstensions |
| 12:13:55 | <luckcolors> | since this one is broken with the "containers" feture https://addons.mozilla.org/it/firefox/addon/cookies-txt/ |
| 12:14:46 | <luckcolors> | this one creates a.txt file wich for some reason wpul can't parse properly |
| 12:14:47 | <luckcolors> | https://addons.mozilla.org/it/firefox/addon/cookie-quick-manager/ |
| 12:16:15 | <luckcolors> | Then i've used this one wich seems to work: https://addons.mozilla.org/it/firefox/addon/export-cookies-txt/ |
| 12:16:38 | <luckcolors> | But yeah now that i look at it it does add the "#HttpOnly_" |
| 12:16:48 | <luckcolors> | I'll try removing them and see if it works |
| 12:27:06 | <luckcolors> | ArchivalEfforts: thanks you man. you saved me from going insane |
| 12:27:07 | <luckcolors> | XD |
| 12:27:16 | <luckcolors> | it's actually working |
| 12:27:43 | <luckcolors> | not to limit it so it doesn't actually try to change account |
| 12:28:29 | <luckcolors> | how can i regex /?authuser=1 |
| 12:28:44 | <ArchivalEfforts> | Glad I could help, took me a while to figure that out when I ran into it |
| 12:29:27 | <ArchivalEfforts> | Sorry, can't help with regex |
| 12:29:39 | <luckcolors> | no worries |
| 12:33:53 | | BlueMaxima joins |
| 12:43:11 | | wizards joins |
| 12:45:08 | <@OrIdow6> | I think the only char you need to escape there is the ? |
| 12:48:13 | <luckcolors> | i'm just using this as regex |
| 12:48:14 | <luckcolors> | authuser=\d |
| 12:48:18 | <luckcolors> | should suffice |
| 12:48:44 | <luckcolors> | i kinda forgot to only ignore that particular page EG. |
| 12:48:46 | <luckcolors> | ^https?://sites\.google\.com/site |
| 12:48:57 | <luckcolors> | i don't want to append the trailing / right? |
| 12:49:12 | <luckcolors> | else it will not fetch any path longer than that |
| 12:49:56 | <luckcolors> | like this will ignore anything under /site ^https?://sites\.google\.com/site/ |
| 12:52:31 | <luckcolors> | ok no this is not how exact matches are made |
| 12:52:51 | | luckcolors opens dusty regex book |
| 12:59:03 | <luckcolors> | ended up using ^https?://sites\.google\.com/site$ wich i think is not ideal since it won't ignore potential url params |
| 12:59:08 | <luckcolors> | but works so |
| 13:21:07 | | rewby quits [Remote host closed the connection] |
| 13:21:22 | | rewby (rewby) joins |
| 13:34:12 | | Video quits [Ping timeout: 258 seconds] |
| 14:15:59 | | sdomi quits [Ping timeout: 258 seconds] |
| 14:17:23 | | britmob256 joins |
| 14:20:00 | | britmob joins |
| 16:05:37 | | Arcorann quits [Ping timeout: 258 seconds] |
| 16:09:26 | | CookMePlox joins |
| 16:12:28 | <CookMePlox> | Hey folks! Does anyone have suggestions for getting Internet Archive to respond to an email request? I've convinced the owner of runescape.com to ask for their site to get un-blacklisted from the wayback machine, but they sent something to info@archive.org and haven't heard back in 6 months |
| 16:13:46 | | bradp is now authenticated as bradp |
| 16:15:01 | <h2ibot> | Anput uploaded File:My folder.png: https://wiki.archiveteam.org/?title=File%3AMy%20folder.png |
| 16:34:49 | | lennier1 quits [Client Quit] |
| 16:45:06 | | Krownest quits [Ping timeout: 258 seconds] |
| 16:51:52 | <thuban> | http://www.rrrrthats5rs.com/ :< |
| 17:00:24 | | Krownest (Krownest) joins |
| 17:10:03 | | Matthww8 quits [Quit: Ping timeout (120 seconds)] |
| 17:10:26 | | Matthww8 joins |
| 17:13:28 | | CookMePlox quits [Remote host closed the connection] |
| 17:18:06 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 17:20:25 | <@HCross> | arkiver: - see above from CookMePlox who isn't here any more |
| 17:51:36 | | HP_Archivist (HP_Archivist) joins |
| 17:54:17 | | lennier1 (lennier1) joins |
| 17:58:26 | | Stiletto joins |
| 18:02:40 | | grafck quits [Ping timeout: 250 seconds] |
| 18:13:37 | | Daloader joins |
| 18:44:38 | | Daloader quits [Client Quit] |
| 18:58:08 | <@arkiver> | HCross: checking |
| 18:58:17 | <@arkiver> | ouch |
| 18:58:35 | <@arkiver> | HCross: thanks, will ping internally |
| 19:19:12 | | driib quits [Client Quit] |
| 19:19:27 | | driib (driib) joins |
| 19:22:08 | | Minkafighter2 quits [Quit: The Lounge - https://thelounge.chat] |
| 19:22:39 | | Minkafighter2 joins |
| 19:48:56 | | JackeithWelley quits [Remote host closed the connection] |
| 20:09:45 | | jonboy3452 joins |
| 20:09:45 | | Jonboy3451 quits [Read error: Connection reset by peer] |
| 20:35:38 | | gazorpazorp quits [Ping timeout: 250 seconds] |
| 20:42:00 | | HP_Archivist quits [Client Quit] |
| 20:56:08 | | average_student joins |
| 21:01:48 | | HP_Archivist (HP_Archivist) joins |
| 21:22:53 | | Megame quits [Client Quit] |
| 21:30:46 | | gazorpazorp joins |
| 21:36:28 | | Matthww88 joins |
| 21:37:12 | | Matthww8 quits [Ping timeout: 258 seconds] |
| 21:37:12 | | Matthww88 is now known as Matthww8 |
| 21:46:12 | | Video joins |
| 22:02:29 | | leo60228 (leo60228) joins |
| 22:12:28 | | leo60228 quits [Ping timeout: 258 seconds] |
| 22:19:50 | | qwertyasdfuiopghjkl joins |
| 22:23:47 | | leo60228 (leo60228) joins |
| 22:27:53 | <Stiletto> | <@JAA> Per the notice on https://3dwarehouse.sketchup.com/: 'Heads up! After July 27, SketchUp 2017 models will no longer be available for download on 3D Warehouse unless it was originally uploaded in that format.' |
| 22:28:19 | <Stiletto> | not sure if anyone was actually looking more into this, but it seems they have extended the deadline to August 11. |
| 22:53:53 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 22:57:32 | | qwertyasdfuiopghjkl joins |
| 23:23:51 | <Jake> | (I was a little bit, but the API was VERY slow, like 3 minutes per request slow...) |
| 23:23:59 | <Jake> | Glad to see they added some more time, I'll take a look again tonight. |
| 23:42:27 | | lennier1 quits [Client Quit] |
| 23:42:51 | | lennier1 (lennier1) joins |
| 23:43:36 | | qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds] |