00:01:50 | <pabs> | and that means offsite links probably need doing in a separate job with -u firefox or similar |
00:02:14 | <pabs> | AB needs some URL to UA regex functionality :) |
00:05:21 | | janos777 quits [Ping timeout: 260 seconds] |
00:09:42 | | janos777 joins |
00:14:41 | | janos777 quits [Ping timeout: 260 seconds] |
00:26:47 | | etnguyen03 (etnguyen03) joins |
00:34:31 | | pabs quits [Ping timeout: 260 seconds] |
00:41:23 | | BitsNBytesNBagels quits [Client Quit] |
00:42:23 | | pabs (pabs) joins |
00:55:09 | | etnguyen03 quits [Client Quit] |
01:07:48 | | ericgallager quits [Quit: This computer has gone to sleep] |
01:24:53 | | DogsRNice_ joins |
01:26:26 | | ericgallager joins |
01:27:20 | | DogsRNice quits [Ping timeout: 258 seconds] |
01:35:03 | | dabs quits [Read error: Connection reset by peer] |
01:35:54 | | DogsRNice joins |
01:38:41 | | DogsRNice_ quits [Ping timeout: 260 seconds] |
01:47:59 | | ericgallager quits [Client Quit] |
01:48:08 | | DogsRNice_ joins |
01:50:20 | | DogsRNice quits [Ping timeout: 258 seconds] |
01:52:04 | | DogsRNice joins |
01:54:10 | | DogsRNice_ quits [Ping timeout: 258 seconds] |
01:56:13 | | cyanbox joins |
01:56:46 | | DopefishJustin quits [Ping timeout: 260 seconds] |
01:59:07 | | DogsRNice quits [Read error: Connection reset by peer] |
02:02:18 | | DopefishJustin joins |
02:02:18 | | DopefishJustin is now authenticated as DopefishJustin |
02:20:41 | | ericgallager joins |
02:24:46 | | katocala quits [Ping timeout: 260 seconds] |
02:25:07 | | katocala joins |
02:43:30 | | ericgallager quits [Client Quit] |
02:44:21 | | NeonGlitch (NeonGlitch) joins |
02:45:39 | <mind_combatant> | would it be worth considering/possible to archive amazon product pages? i've on multiple occasions had product pages linked either on a list of mine or that i'd ordered before either disappear completely or be replaced with an entirely different product when i went to check on them for one reason or another. |
03:20:55 | | Webuser298098 quits [Quit: Ooops, wrong browser tab.] |
03:22:52 | | NeonGlitch quits [Client Quit] |
03:30:10 | | Megame quits [Quit: Leaving] |
03:43:10 | <pabs> | if you have an example page we can try it in #archivebot and #jseater and you can try it in web.archive.org/save |
03:43:23 | <pabs> | IIRC it doesn't work in AB though |
03:47:22 | <pabs> | https://www.morpheus-research.com/backblaze/ https://news.ycombinator.com/item?id=43802675 |
03:50:13 | | Webuser012683 joins |
04:06:05 | | SootBector quits [Remote host closed the connection] |
04:11:28 | | SootBector (SootBector) joins |
04:14:54 | <wickedplayer494> | Well that explains a heck of a lot of why BLZE's chart deflated to the pittance it's been trading at most of its public life |
04:22:01 | | cmlow quits [Ping timeout: 260 seconds] |
04:38:46 | <mind_combatant> | an example of an item that dissapeared: www.amazon.com/dp/B083J229RJ/?coliid=I18Y7V6X9MXSBH&colid=3FEA242LHQKWQ&psc=0&ref_=cm_sw_r_apann_lstpd_M02PAJ67GM8MBTES9HGT&language=en-US |
04:38:46 | <mind_combatant> | an example of a product that seems to be in the process of changing (surprised i came accross this one, the title is still the old product while the images are for an unrelated screen protecter it seems): https://www.amazon.com/dp/B00N1BXIEA?ref=ppx_pop_mob_ap_share |
04:38:46 | <mind_combatant> | another one that's dissapeared: https://www.amazon.com/dp/B077PMSQPZ?ref=ppx_pop_mob_ap_share |
04:38:46 | <mind_combatant> | another one in the process of being changed: https://www.amazon.com/dp/B010U1DNE8?ref=ppx_pop_mob_ap_share |
04:38:46 | <mind_combatant> | i've got more examples of completely deleted ones, and a lot of the ones that were fully changed before seem to be deleted now, but this seems to be happening at scale, given how much i can find just in my own history. |
04:39:40 | | PredatorIWD25 quits [Read error: Connection reset by peer] |
04:55:08 | | PredatorIWD25 joins |
04:57:33 | <@hook54321> | currently scraping comments from ovarit (AB didn't get them), not getting recorded into a WARC as i don't know how to code |
04:58:26 | <@Fusl> | hook54321: https://github.com/internetarchive/warcprox |
05:01:19 | <@hook54321> | thanks, will try to rig that into the python script. doubt it'll make it playable in WBM though |
05:03:08 | <@Fusl> | maybe not but still worth scraping through warcprox and then dumping the raw warc into IA just in case someone wants to play around with it |
05:05:11 | <@hook54321> | yeah. i really don't know what i'm doing, i played around with network console to see how it's getting the comments, replicated the request with CURL and then chatgpt gave m a script which... works but is probably ugly |
05:12:36 | <@hook54321> | yeah, don't think i'm gonna be able to get that to work tonight |
05:12:36 | <@hook54321> | "Fetching post ID 154... |
05:12:36 | <@hook54321> | 💥 Error fetching post 154: HTTPSConnectionPool(host='ovarit.com', port=443): Max retries exceeded with url: /graphql (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))" |
05:27:38 | | Webuser907855 joins |
05:28:04 | | Webuser012683 quits [Client Quit] |
05:28:15 | | Webuser907855 quits [Client Quit] |
05:40:44 | <pabs> | mind_combatant: the non-404 ones seem to work with -u firefox |
05:40:52 | <pabs> | (in AB) |
05:44:49 | <pabs> | hook54321: if you are using the Python requests lib, add verify=False to the get() calls |
06:10:46 | | Webuser738386 joins |
06:13:18 | | Island joins |
06:13:19 | | Webuser738386 quits [Client Quit] |
06:29:33 | | Webuser823470 joins |
07:16:35 | | Webuser823470 quits [Client Quit] |
07:23:02 | | Webuser685269 joins |
07:29:37 | | Webuser831171 quits [Quit: Ooops, wrong browser tab.] |
07:46:39 | | Island quits [Read error: Connection reset by peer] |
07:50:31 | | sec^nd quits [Remote host closed the connection] |
07:50:53 | | sec^nd (second) joins |
08:15:56 | | ducky quits [Ping timeout: 260 seconds] |
08:21:07 | | ducky (ducky) joins |
08:23:31 | | lflare quits [Quit: Bye] |
08:24:14 | | lflare (lflare) joins |
08:29:41 | | ducky quits [Remote host closed the connection] |
08:29:47 | | ducky (ducky) joins |
08:49:57 | <c3manu> | pabs++ |
08:49:58 | <eggdrop> | [karma] 'pabs' now has 83 karma! |
08:50:09 | <c3manu> | it works! \o/ thank you so much |
09:39:50 | | Medaka joins |
09:49:38 | <Medaka> | Hello. I have created URL lists for pr.fc2.com and vote.fc2.com. |
09:49:56 | <Medaka> | Both services are scheduled to shut down on June 2, 2025. |
09:49:57 | <Medaka> | Please help archive them. |
09:50:22 | <Medaka> | https://transfer.archivete.am/inline/B1oRr/pr_fc2_urls.txt |
09:50:29 | <Medaka> | https://transfer.archivete.am/inline/pX8Tz/vote_fc2_urls.txt |
10:01:45 | | Webuser692884 joins |
10:02:04 | | NatTheCat quits [Quit: Ping timeout (120 seconds)] |
10:02:27 | | NatTheCat (NatTheCat) joins |
10:05:42 | | BearFortress quits [] |
10:06:35 | | Webuser685269 quits [Client Quit] |
10:10:41 | | Webuser768109 joins |
10:13:55 | <c3manu> | Medaka: thanks for those lists! i know nothing about the context (yet); are those URLs not discoverable from their respective index URLs? |
10:15:55 | <c3manu> | ..and i’m assuming those are going to be pretty big right? |
10:18:53 | <c3manu> | at least pr.fc2.com has a sitemap, so that might just not be complete then? https://pr.fc2.com/sitemap.xml |
10:19:47 | <c3manu> | i’m gonna wait what pabs can say about this, since they queued a bunch of https://piyo.fc2.com/ URLs yesterday and probably know more ;) |
10:49:44 | | Webuser768109 quits [Client Quit] |
11:00:01 | | Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat] |
11:00:29 | | BearFortress joins |
11:02:44 | | Bleo18260072271962345 joins |
11:08:21 | | Medaka quits [Read error: Connection reset by peer] |
11:08:39 | | Medaka joins |
11:12:44 | <Medaka> | c3manu: Thank you for your reply! Ah, I didn't realize there was a sitemap. |
11:12:44 | <Medaka> | There are indeed several index URLs there that I had missed. |
11:23:11 | | @Sanqui quits [Ping timeout: 260 seconds] |
11:23:42 | <c3manu> | Medaka: the sitemap seems way shorter than your list though, so i don't know what's going on there |
11:41:37 | | BlankEclair quits [Remote host closed the connection] |
11:47:35 | | lflare quits [Quit: Bye] |
11:51:58 | | BlankEclair (BlankEclair) joins |
11:57:53 | | Sanqui joins |
11:57:53 | | Sanqui is now authenticated as Sanqui |
11:57:53 | | Sanqui quits [Changing host] |
11:57:53 | | Sanqui (Sanqui) joins |
11:57:53 | | @ChanServ sets mode: +o Sanqui |
12:01:09 | | lflare (lflare) joins |
12:07:25 | <Medaka> | c3manu: I mainly collected only the pr.fc2.com/{username}/ (user profile) pages. The sitemap includes a small number of pr.fc2.com/{username}/ pages, as well as some pages that link to multiple {username} profiles. That's probably why my list looks much larger. (I missed several pages like /contents/birthday/ and /contents/age/, etc.) |
12:08:29 | <Medaka> | Also, it seems that even the URLs listed in the sitemap are missing some entries. I will try to create a more complete URL list. |
12:23:58 | | etnguyen03 (etnguyen03) joins |
12:28:22 | <pabs> | Medaka++ |
12:28:22 | <eggdrop> | [karma] 'Medaka' now has 1 karma! |
12:28:26 | <pabs> | thanks for working on this |
12:29:54 | <pabs> | there are some other domains that seem fc2 related in the existing jobs, might want to process the ab2f job log from https://ab2f.archivingyoursh.it/ |
12:37:37 | | ducky_ (ducky) joins |
12:37:37 | | ducky quits [Remote host closed the connection] |
12:37:54 | | ducky_ quits [Remote host closed the connection] |
12:53:56 | | makeworld quits [Read error: Connection reset by peer] |
12:54:42 | | makeworld joins |
12:59:35 | <Medaka> | pabs: Thank you. |
12:59:46 | <Medaka> | However, when I try to access https://ab2f.archivingyoursh.it/, I get a 403 Forbidden error. |
13:00:16 | <pabs> | katia: ^ ab2f broke again :) |
13:00:31 | <pabs> | some more about ab2f and related things on https://wiki.archiveteam.org/index.php/ArchiveBot/Monitoring |
13:01:48 | <Medaka> | Is that different from https://archive.fart.website/archivebot/viewer/? |
13:03:27 | <pabs> | yeah, its realtime, same as archivebot.com |
13:03:46 | <pabs> | the viewer only indexes jobs after they are uploaded to IA and complete |
13:06:08 | | janos777 joins |
13:07:04 | <Medaka> | I see |
13:21:28 | | etnguyen03 quits [Client Quit] |
13:24:01 | <katia> | pabs: sorry. Works again I think? If it was a 404 |
13:24:32 | <pabs> | looks good |
13:29:30 | | Webuser007948 joins |
13:29:50 | | Webuser007948 quits [Client Quit] |
13:35:11 | | cmlow joins |
13:42:01 | | BennyOtt quits [Ping timeout: 260 seconds] |
13:45:07 | | BennyOtt (BennyOtt) joins |
13:51:43 | | ducky (ducky) joins |
13:56:36 | | ducky quits [Ping timeout: 260 seconds] |
14:14:28 | | BornOn420 quits [Remote host closed the connection] |
14:15:01 | | BornOn420 (BornOn420) joins |
14:28:50 | | ducky (ducky) joins |
14:33:21 | | ducky quits [Ping timeout: 260 seconds] |
14:54:26 | | cyanbox quits [Read error: Connection reset by peer] |
14:55:43 | | ducky (ducky) joins |
15:11:34 | | pedantic-darwin quits [Quit: The Lounge - https://thelounge.chat] |
15:11:37 | | Megame (Megame) joins |
15:48:16 | | pedantic-darwin joins |
15:58:08 | | grill (grill) joins |
16:18:13 | | Hackerpcs quits [Quit: Hackerpcs] |
16:32:52 | | SootBector quits [Remote host closed the connection] |
16:34:03 | | SootBector (SootBector) joins |
16:35:47 | | DopefishJustin quits [Remote host closed the connection] |
16:39:09 | | Hackerpcs (Hackerpcs) joins |
16:39:28 | | DopefishJustin joins |
16:39:28 | | DopefishJustin is now authenticated as DopefishJustin |
16:58:54 | | etnguyen03 (etnguyen03) joins |
17:05:46 | | Webuser846077 joins |
17:05:56 | | Webuser846077 quits [Client Quit] |
17:26:28 | | nicolas17 joins |
17:26:49 | | ericgallager joins |
17:29:53 | | grill quits [Ping timeout: 258 seconds] |
17:34:33 | | etnguyen03 quits [Client Quit] |
17:52:07 | | tzt quits [Ping timeout: 258 seconds] |
18:08:13 | <h2ibot> | Manu edited Mailman/2 (+31, /* Queued lists.erlangen.ccc.de */): https://wiki.archiveteam.org/?diff=55508&oldid=55097 |
18:14:13 | | tzt (tzt) joins |
18:16:11 | | eroc19906 quits [Ping timeout: 260 seconds] |
18:16:50 | | grill (grill) joins |
18:28:26 | | janos777 quits [Ping timeout: 260 seconds] |
18:30:15 | | ericgallager quits [Client Quit] |
18:38:13 | | janos777 joins |
18:43:22 | | ericgallager joins |
18:44:19 | <h2ibot> | Manu edited Mailman/2 (+31, /* Queued lists.ccc-mannheim.de */): https://wiki.archiveteam.org/?diff=55509&oldid=55508 |
18:50:01 | | janos777 quits [Ping timeout: 260 seconds] |
18:52:01 | | FiTheArchiver joins |
18:52:03 | | FiTheArchiver quits [Remote host closed the connection] |
19:02:43 | | ericgallager quits [Client Quit] |
19:02:44 | | janos777 joins |
19:37:29 | | janos777 quits [Read error: Connection reset by peer] |
19:56:15 | | ericgallager joins |
19:57:58 | | janos777 joins |
20:29:17 | | Webuser381280 joins |
20:30:53 | | janos778 joins |
20:35:01 | | janos777 quits [Ping timeout: 260 seconds] |
20:38:31 | | grill quits [Ping timeout: 260 seconds] |
20:43:16 | | janos778 quits [Read error: Connection reset by peer] |
20:50:39 | | eroc1990 (eroc1990) joins |
21:13:57 | | dabs joins |
21:21:04 | | Island joins |
21:24:36 | | nicolas17 quits [Quit: Konversation terminated!] |
21:24:47 | | nicolas17 joins |
21:25:51 | | nicolas17 quits [Client Quit] |
21:30:25 | | nicolas17 joins |
21:42:58 | | murb quits [Quit: gone] |
22:00:07 | | murb (murb) joins |
22:22:03 | | ericgallager quits [Client Quit] |
23:04:56 | | ThreeHM quits [Ping timeout: 260 seconds] |
23:06:32 | | ThreeHM (ThreeHeadedMonkey) joins |
23:06:56 | | NeonGlitch (NeonGlitch) joins |
23:07:33 | | NeonGlitch quits [Client Quit] |
23:08:06 | | NeonGlitch (NeonGlitch) joins |
23:08:23 | | etnguyen03 (etnguyen03) joins |
23:30:39 | | etnguyen03 quits [Client Quit] |
23:48:41 | | tzt quits [Read error: Connection reset by peer] |
23:49:12 | | tzt (tzt) joins |
23:54:56 | | ericgallager joins |
23:55:12 | | ericgallager quits [Remote host closed the connection] |