| 00:08:02 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
| 00:11:17 | | Lord_Nightmare (Lord_Nightmare) joins |
| 00:14:01 | | Wohlstand quits [Quit: Wohlstand] |
| 00:27:09 | | ^ quits [Ping timeout: 272 seconds] |
| 00:30:26 | | SootBector quits [Remote host closed the connection] |
| 00:31:38 | | SootBector (SootBector) joins |
| 00:42:00 | | ichdasich quits [Remote host closed the connection] |
| 00:46:58 | | tekulvw (tekulvw) joins |
| 00:49:17 | | etnguyen03 quits [Client Quit] |
| 00:51:46 | | tekulvw quits [Ping timeout: 268 seconds] |
| 00:54:47 | | ^ (^) joins |
| 00:56:55 | | Barto quits [Ping timeout: 272 seconds] |
| 01:03:43 | | Barto (Barto) joins |
| 01:09:35 | | archiveDrill quits [Ping timeout: 272 seconds] |
| 01:29:13 | | etnguyen03 (etnguyen03) joins |
| 01:29:33 | | tekulvw (tekulvw) joins |
| 01:34:19 | | tekulvw quits [Ping timeout: 268 seconds] |
| 01:36:40 | | archiveDrill joins |
| 01:39:54 | | tekulvw (tekulvw) joins |
| 01:45:03 | | tekulvw quits [Ping timeout: 272 seconds] |
| 01:48:24 | | tekulvw (tekulvw) joins |
| 01:48:31 | <nstrom|m> | ty imer, see that now. hard to keep track of chat in telegrab because of bot activity |
| 01:48:56 | <@imer> | #telegrab-chat ? :) |
| 01:53:17 | | tekulvw quits [Ping timeout: 272 seconds] |
| 01:58:29 | | TunaLobster quits [Quit: So long and thanks for all the fish] |
| 01:59:22 | | TunaLobster joins |
| 01:59:45 | | Webuser378569 joins |
| 01:59:54 | | tekulvw (tekulvw) joins |
| 02:01:18 | | Webuser378569 quits [Client Quit] |
| 02:35:43 | | dxrt_ is now known as dxrt |
| 02:35:43 | | dxrt is now authenticated as dxrt |
| 02:35:43 | | dxrt quits [Changing host] |
| 02:35:43 | | dxrt (dxrt) joins |
| 02:35:43 | | @ChanServ sets mode: +o dxrt |
| 02:37:21 | <datechnoman> | Omg I didnt know that that channel existed! >.< |
| 02:45:19 | | Hackerpcs quits [Remote host closed the connection] |
| 02:46:13 | | Hackerpcs (Hackerpcs) joins |
| 02:46:26 | | chunkynutz609 joins |
| 02:47:09 | | chunkynutz60 quits [Read error: Connection reset by peer] |
| 02:47:09 | | chunkynutz609 is now known as chunkynutz60 |
| 02:59:47 | | tekulvw quits [Ping timeout: 272 seconds] |
| 03:07:01 | | tekulvw (tekulvw) joins |
| 03:11:45 | | tekulvw quits [Ping timeout: 268 seconds] |
| 03:23:03 | | tekulvw (tekulvw) joins |
| 03:27:39 | | tekulvw quits [Ping timeout: 272 seconds] |
| 03:41:59 | | etnguyen03 quits [Client Quit] |
| 03:42:58 | | etnguyen03 (etnguyen03) joins |
| 03:50:46 | | DogsRNice_ quits [Read error: Connection reset by peer] |
| 03:51:30 | | tekulvw (tekulvw) joins |
| 03:56:09 | | tekulvw quits [Ping timeout: 268 seconds] |
| 04:02:25 | | tekulvw (tekulvw) joins |
| 04:07:22 | | Island quits [Read error: Connection reset by peer] |
| 04:07:27 | | etnguyen03 quits [Remote host closed the connection] |
| 04:11:48 | | tekulvw quits [Remote host closed the connection] |
| 04:12:06 | | tekulvw (tekulvw) joins |
| 04:19:35 | | tekulvw quits [Ping timeout: 272 seconds] |
| 04:21:06 | | tekulvw (tekulvw) joins |
| 04:25:45 | | tekulvw quits [Ping timeout: 268 seconds] |
| 04:27:01 | | tekulvw (tekulvw) joins |
| 04:32:15 | | tekulvw quits [Ping timeout: 272 seconds] |
| 04:35:50 | | fireatseaparks quits [Quit: Textual IRC Client: www.textualapp.com] |
| 04:36:18 | | fireatseaparks (fireatseaparks) joins |
| 04:38:36 | | tekulvw (tekulvw) joins |
| 04:57:49 | | tekulvw quits [Ping timeout: 268 seconds] |
| 05:02:07 | | tekulvw (tekulvw) joins |
| 05:03:10 | | Webuser760137 joins |
| 05:04:36 | | n9nes quits [Ping timeout: 268 seconds] |
| 05:06:07 | | n9nes joins |
| 05:06:39 | | Webuser760137 quits [Client Quit] |
| 05:12:09 | | tekulvw quits [Ping timeout: 272 seconds] |
| 05:14:49 | | tekulvw (tekulvw) joins |
| 05:21:52 | | tekulvw quits [Ping timeout: 268 seconds] |
| 05:25:53 | | Webuser463246 joins |
| 05:26:04 | | tekulvw (tekulvw) joins |
| 05:36:51 | | tekulvw quits [Remote host closed the connection] |
| 05:37:09 | | tekulvw (tekulvw) joins |
| 05:50:00 | | Webuser463246 quits [Client Quit] |
| 05:52:06 | | tekulvw quits [Ping timeout: 268 seconds] |
| 05:54:36 | | sec^nd quits [Remote host closed the connection] |
| 05:55:07 | | sec^nd (second) joins |
| 05:55:14 | | tekulvw (tekulvw) joins |
| 05:57:51 | | nexussfan quits [Quit: Konversation terminated!] |
| 06:00:17 | | tekulvw quits [Ping timeout: 272 seconds] |
| 06:04:17 | | tekulvw (tekulvw) joins |
| 06:09:21 | | tekulvw quits [Ping timeout: 268 seconds] |
| 06:30:00 | | tekulvw (tekulvw) joins |
| 06:33:12 | <@arkiver> | datechnoman: me neither :P |
| 06:33:24 | | nine quits [Quit: See ya!] |
| 06:33:38 | | nine joins |
| 06:33:38 | | nine is now authenticated as nine |
| 06:33:38 | | nine quits [Changing host] |
| 06:33:38 | | nine (nine) joins |
| 06:33:53 | <datechnoman> | I feel so stupid. The amount of times I thought to myself we needed a channel for #telegrab... it was there all along... |
| 06:46:31 | <datechnoman> | Hoping we can keep images in the mix as they are a gold mine |
| 07:04:51 | | tekulvw quits [Ping timeout: 268 seconds] |
| 07:04:58 | <pabs> | justauser: I haven't yet mailed them about the git, IIRC that isn't affected |
| 07:23:27 | | tekulvw (tekulvw) joins |
| 07:28:19 | | tekulvw quits [Ping timeout: 272 seconds] |
| 07:45:53 | | runxiyu quits [Quit: ZNC 1.8.2+deb3.1+deb12u1 - https://znc.in] |
| 08:00:35 | | tekulvw (tekulvw) joins |
| 08:05:17 | | tekulvw quits [Ping timeout: 268 seconds] |
| 08:08:51 | <DlugasnyPL> | when You grabbing telegram content, for sure You have file duplicates (for example shared pictures between channels, users), do You have any mechanismus which removing duplicate content (check hash etc.) ? Or You are simply storing complete stream ? |
| 08:18:57 | | HP_Archivist quits [Read error: Connection reset by peer] |
| 08:44:32 | | JTL quits [Remote host closed the connection] |
| 08:45:14 | | JTL (JTL) joins |
| 09:16:32 | | TheEnbyperor quits [Read error: Connection reset by peer] |
| 09:16:33 | | TheEnbyperor_ is now known as TheEnbyperor |
| 09:17:12 | | linuxgemini3 (linuxgemini) joins |
| 09:18:40 | | linuxgemini quits [Ping timeout: 268 seconds] |
| 09:18:41 | | linuxgemini3 is now known as linuxgemini |
| 09:26:55 | | TheEnbyperor_ (TheEnbyperor) joins |
| 09:43:09 | | tekulvw (tekulvw) joins |
| 09:48:17 | | tekulvw quits [Ping timeout: 272 seconds] |
| 10:16:41 | | Webuser955198 joins |
| 10:17:03 | <Webuser955198> | ? |
| 10:17:12 | <Webuser955198> | 不得,,? |
| 10:18:24 | | Webuser955198 quits [Client Quit] |
| 11:02:34 | | Sluggs quits [Excess Flood] |
| 11:03:48 | | evergreen5 quits [Quit: Bye] |
| 11:04:14 | | evergreen56 joins |
| 11:07:41 | | Sluggs (Sluggs) joins |
| 11:11:24 | | Sluggs quits [Excess Flood] |
| 11:18:49 | | Sluggs (Sluggs) joins |
| 11:27:53 | | Sluggs quits [Excess Flood] |
| 11:28:07 | | sec^nd quits [Ping timeout: 245 seconds] |
| 11:28:31 | | sec^nd (second) joins |
| 11:33:29 | | Sluggs (Sluggs) joins |
| 11:35:15 | <h2ibot> | Cooljeanius edited Substack (+23, +cat): https://wiki.archiveteam.org/?diff=60512&oldid=60496 |
| 11:40:15 | <h2ibot> | Cooljeanius edited Medium (+93, copyedit (wikify, add category)): https://wiki.archiveteam.org/?diff=60513&oldid=60479 |
| 11:41:05 | <DlugasnyPL> | I have compared warc`s generated by wget-at and browsertrix (which I`m currently using to archive pages). I know the difference between two of that tools. What do You think about creating some small team (subgroup of archiveteam) which will record pages using Browsertrix or any other crawler which will be able to properly execute java scripts and rest of site components which are not possible for wget-at ? I`m talking about warcs which user |
| 11:41:05 | <DlugasnyPL> | can download and browse - keeping in mind that page is almost 90-100% the same as in real. What do You think about ? |
| 11:44:16 | <h2ibot> | Cooljeanius created Scribd (+196, Created page with "{{stub}} '''Scribd''' is a…): https://wiki.archiveteam.org/?oldid=60514 |
| 11:44:17 | <h2ibot> | Cooljeanius edited ArchiveBot/Monitoring (+4, /* Ideas */ Wikify ([[Scribd]])): https://wiki.archiveteam.org/?diff=60515&oldid=58718 |
| 11:45:16 | <h2ibot> | Cooljeanius edited Academia.edu (+49, mention [[Scribd]]): https://wiki.archiveteam.org/?diff=60516&oldid=60028 |
| 12:00:01 | | Bleo1826007227196234552220 quits [Quit: The Lounge - https://thelounge.chat] |
| 12:01:18 | <h2ibot> | Manu edited Discourse/archived (+96, Queued forums.whonix.org): https://wiki.archiveteam.org/?diff=60517&oldid=60488 |
| 12:02:41 | | Bleo1826007227196234552220 joins |
| 12:03:18 | <h2ibot> | Manu edited Discourse/archived (+100, Queued community.ntppool.org): https://wiki.archiveteam.org/?diff=60518&oldid=60517 |
| 12:09:07 | | DlugasnyPL quits [Read error: Connection reset by peer] |
| 12:09:10 | | DlugasnyPL joins |
| 12:43:32 | | oxtyped_ joins |
| 12:43:34 | | tekulvw (tekulvw) joins |
| 12:43:43 | | oxtyped quits [Ping timeout: 272 seconds] |
| 12:43:46 | | oxtyped_ is now known as oxtyped |
| 12:48:20 | | tekulvw quits [Ping timeout: 268 seconds] |
| 13:11:46 | | Arcorann quits [Ping timeout: 268 seconds] |
| 13:27:54 | | PredatorIWD257 joins |
| 13:31:51 | | PredatorIWD25 quits [Ping timeout: 272 seconds] |
| 13:31:51 | | PredatorIWD257 is now known as PredatorIWD25 |
| 13:32:44 | | benjins3 quits [Ping timeout: 268 seconds] |
| 13:33:05 | | FiTheArchiver joins |
| 13:35:13 | | FiTheArchiver quits [Client Quit] |
| 13:42:36 | | useretail quits [Remote host closed the connection] |
| 13:43:37 | | useretail joins |
| 14:21:40 | | Dada joins |
| 14:33:12 | <cruller> | To quote https://github.com/iipc/warcaroo/blob/main/cdp/README.md, "In the future, we might switch to the webdriver-bidi protocol once it covers everything we need." |
| 14:34:09 | <cruller> | Is such a future possible? (Of course, I hope so.) |
| 14:37:33 | | Dada quits [Remote host closed the connection] |
| 14:38:45 | | Dada joins |
| 15:07:04 | | benjins3 joins |
| 15:11:29 | | Island joins |
| 15:12:33 | | knecht quits [Ping timeout: 272 seconds] |
| 15:19:08 | | knecht (knecht) joins |
| 15:19:31 | | Cuphead2527480 (Cuphead2527480) joins |
| 15:30:28 | | nulldata-alt1 (nulldata) joins |
| 15:49:30 | | tekulvw (tekulvw) joins |
| 15:54:34 | | tekulvw quits [Ping timeout: 268 seconds] |
| 16:02:20 | | tekulvw (tekulvw) joins |
| 16:06:54 | | tekulvw quits [Ping timeout: 268 seconds] |
| 16:11:54 | <justauser> | Yakov, klea: Can't reproduce. blog.archive.today redirects me to https://archive-is.tumblr.com/, which is still good. |
| 16:12:13 | <klea> | Yeah, it also redirects me to thta. |
| 16:12:26 | <klea> | I supposed that's what Yakov meant by them no longer being on tumblr. |
| 16:14:21 | <justauser> | They are, last post today and certainly nothing was wiped. |
| 16:17:55 | <klea> | Oh ok. |
| 16:19:06 | | sec^nd quits [Remote host closed the connection] |
| 16:19:31 | | sec^nd (second) joins |
| 16:21:00 | <h2ibot> | Klea edited RTVE Play (-97): https://wiki.archiveteam.org/?diff=60519&oldid=58295 |
| 16:38:26 | | CYBERDEV quits [Quit: Leaving] |
| 16:41:30 | | ducky quits [Remote host closed the connection] |
| 16:41:45 | | ducky (ducky) joins |
| 16:42:22 | | CYBERDEV joins |
| 16:57:09 | | kansei (kansei) joins |
| 16:58:42 | | kansei- quits [Ping timeout: 268 seconds] |
| 17:00:13 | | APOLLO03 quits [Ping timeout: 272 seconds] |
| 17:01:18 | | DlugasnyPL quits [Quit: Leaving] |
| 17:04:39 | | @Sanqui quits [Ping timeout: 272 seconds] |
| 17:07:33 | | Sanqui joins |
| 17:07:38 | | Sanqui is now authenticated as Sanqui |
| 17:07:38 | | Sanqui quits [Changing host] |
| 17:07:38 | | Sanqui (Sanqui) joins |
| 17:07:38 | | @ChanServ sets mode: +o Sanqui |
| 17:21:14 | <hexagonwin> | i'm developing a python+warcio based crawler for some web service (needs auth). does anyone know of a good way to keep track of downloaded URLs? |
| 17:22:02 | <hexagonwin> | i'd prefer to have it default to not re-visiting URLs as the contents are (mostly) unchanging, and there are same URLs being repeated in many different pages. |
| 17:22:16 | <justauser> | Warrior tracker uses Bloom filter. |
| 17:22:24 | <justauser> | AB uses SQLite DB. |
| 17:22:42 | <hexagonwin> | i'll take a look, thanks |
| 17:22:50 | <justauser> | And warcio is discouraged IIRC. |
| 17:22:56 | <hexagonwin> | oh is it? |
| 17:23:09 | <justauser> | Writing WARCs: No. Has long-standing bugs regarding correct preservation of data as sent by the server.[6][7] |
| 17:23:14 | <hexagonwin> | oops |
| 17:23:37 | <hexagonwin> | i couldn't find documentation on using wpull as a library (and not a command line util) so i was using warcio instead :/ |
| 17:24:23 | <justauser> | Maybe warc-for-humans is considered production-grade now? /cc DigitalDragons |
| 17:27:13 | | tekulvw (tekulvw) joins |
| 17:32:00 | | tekulvw quits [Ping timeout: 268 seconds] |
| 17:33:05 | | Ryz quits [Quit: Ping timeout (120 seconds)] |
| 17:42:37 | | @dxrt quits [Quit: ZNC - http://znc.sourceforge.net] |
| 17:44:18 | | croissant quits [Quit: Leaving] |
| 17:46:27 | | pokechu22 quits [Ping timeout: 272 seconds] |
| 17:47:19 | | tekulvw (tekulvw) joins |
| 17:47:49 | | dxrt joins |
| 17:48:35 | | dxrt is now authenticated as dxrt |
| 17:48:35 | | dxrt quits [Changing host] |
| 17:48:35 | | dxrt (dxrt) joins |
| 17:48:35 | | @ChanServ sets mode: +o dxrt |
| 17:51:21 | | klea gives out the link to the big warc table: https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem#Information |
| 17:51:48 | | pokechu22 (pokechu22) joins |
| 17:52:09 | | tekulvw quits [Ping timeout: 272 seconds] |
| 17:53:35 | <hexagonwin> | only wget-at, wpull, warcprox, qwarc seems to be usable.. wget-at wouldn't be ideal for integrating in a python script, i can't find documentation for wpull and qwarc, warcprox hardly looks ideal since it's a proxy |
| 17:58:46 | | croissant joins |
| 18:00:07 | <hexagonwin> | so according to https://github.com/webrecorder/warcio/issues/128 , this seems to be the commit that causes non-standard WARCs in warcio: https://github.com/webrecorder/warcio/pull/45 |
| 18:00:46 | <hexagonwin> | would it be okay to use if this is reverted? since I can use other libraries for reading the WARC |
| 18:01:01 | | cm quits [Ping timeout: 272 seconds] |
| 18:01:50 | | Dada quits [Remote host closed the connection] |
| 18:03:24 | | Dada joins |
| 18:03:59 | | cm joins |
| 18:07:46 | | nulldata-alt1 quits [Client Quit] |
| 18:14:03 | <justauser> | https://transfer.archivete.am/p0BMm/pomf.lain.la_final.txt |
| 18:14:03 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/p0BMm/pomf.lain.la_final.txt |
| 18:15:28 | | nulldata-alt1 (nulldata) joins |
| 18:32:30 | | HP_Archivist (HP_Archivist) joins |
| 18:35:12 | | nulldata-alt1 quits [Client Quit] |
| 18:35:29 | | nulldata-alt1 (nulldata) joins |
| 18:39:02 | | Cuphead2527480 quits [Quit: Connection closed for inactivity] |
| 18:40:39 | | kansei- (kansei) joins |
| 18:42:11 | | kansei quits [Ping timeout: 272 seconds] |
| 18:47:12 | | Ryz (Ryz) joins |
| 18:56:33 | | nulldata-alt1 quits [Client Quit] |
| 18:56:50 | | nulldata-alt1 (nulldata) joins |
| 19:04:15 | | irisfreckles13 joins |
| 19:04:27 | | tekulvw (tekulvw) joins |
| 19:09:26 | | tekulvw quits [Ping timeout: 268 seconds] |
| 19:16:23 | | kansei (kansei) joins |
| 19:17:01 | | kansei- quits [Ping timeout: 272 seconds] |
| 19:18:50 | | nulldata-alt1 quits [Client Quit] |
| 19:19:07 | | nulldata-alt1 (nulldata) joins |