| 00:23:23 | | wickedplayer494 quits [Remote host closed the connection] |
| 00:27:18 | | damianwebster quits [Remote host closed the connection] |
| 00:28:08 | | wickedplayer494 joins |
| 00:30:02 | | wickedplayer494 is now authenticated as wickedplayer494 |
| 00:34:04 | | britmob quits [Ping timeout: 258 seconds] |
| 00:41:44 | | britmob joins |
| 00:44:27 | <namspc> | Anyone in here doing anything where being able to prove the time of the grab is important? I'd be happy to help. |
| 00:45:04 | <@JAA> | Not really. Proving integrity would be important, but that's impossible. |
| 00:57:56 | | Mateon2 joins |
| 00:59:30 | | Mateon1 quits [Ping timeout: 250 seconds] |
| 00:59:30 | | Mateon2 is now known as Mateon1 |
| 01:01:55 | <namspc> | JAA, You mean proving that what was grabbed is what was being served? |
| 01:02:03 | <namspc> | And you haven't modified it/etc? |
| 01:02:06 | <@JAA> | Yeah |
| 01:02:20 | <namspc> | Mm, it's not impossible per se. But it is more difficult. |
| 01:02:29 | <namspc> | In general the best way to 'prove' that is with witnesses. |
| 01:02:42 | <namspc> | Or, more concretely, you have multiple people grab it and sign the grab. |
| 01:02:43 | <@JAA> | Nope, it's impossible without the origin server cooperating in some way. |
| 01:02:58 | <namspc> | But obviously witnesses can lie. |
| 01:03:05 | | dm4v_ joins |
| 01:03:06 | <@JAA> | Yeah, we often don't have enough time to grab everything once, so that doesn't work. |
| 01:03:10 | <namspc> | Yeah. |
| 01:03:16 | <namspc> | I'm just talking about the general case there. |
| 01:03:22 | <namspc> | For ArchiveTeam's specific use case, not helpful. |
| 01:03:24 | | dm4v quits [Ping timeout: 250 seconds] |
| 01:03:24 | | dm4v_ is now known as dm4v |
| 01:03:24 | | dm4v is now authenticated as dm4v |
| 01:03:24 | | dm4v quits [Changing host] |
| 01:03:24 | | dm4v (dm4v) joins |
| 01:03:52 | <namspc> | One thing that timestamping can do is at least show that you haven't tampered with it past X date, which can be useful in establishing that "well if there was tampering it had to have occurred within this timeframe". |
| 01:04:11 | <namspc> | Which can be useful approximate proof that you haven't been messing with it. |
| 01:04:34 | <namspc> | Nobody can go back and slip things in years after a grab, etc. |
| 01:07:40 | <namspc> | And, it hadn't occurred to me before but I could actually just run a bot that goes through items on archive.org and timestamps them, then upload the proof collections to archive.org itself. |
| 01:08:15 | <namspc> | Like I was initially thinking if I wanted to do that I had to ask someone at the archive to help, but I actually don't. So I should probably do that. |
| 01:09:14 | <namspc> | Is there an easy machine readable way to get the hashes of collection items, like an API or something? |
| 01:10:56 | <namspc> | In principle they have to exist since torrents are being offered. |
| 01:12:21 | <thuban> | they're in the _files.xml metadata file (click 'show all' under 'download options' on the details page). |
| 01:12:48 | <thuban> | there's an api: https://archive.org/services/docs/api/metadata.html |
| 01:13:35 | <namspc> | Hm, is SHA-1 still secure for files? I know it's generally deprecated for passwords, but those are short. |
| 01:15:32 | <@JAA> | Even MD5 and MD4 are fine. I'm not even sure whether MD2 is broken yet in that regard. |
| 01:15:43 | <@JAA> | Preimage attacks are hard. |
| 01:15:47 | | namspc nods |
| 01:15:52 | <namspc> | Sounds perfect then, thanks. |
| 01:18:18 | <namspc> | https://www.goanywhere.com/blog/2017/03/02/still-using-sha-1-to-secure-file-transfers-its-time-to-say-goodbye |
| 01:18:19 | <namspc> | Hm. |
| 01:19:03 | <namspc> | https://www.computerworld.com/article/3173616/the-sha1-hash-function-is-now-completely-unsafe.html |
| 01:19:26 | <@JAA> | Yes, a lot of people don't understand the difference between collision and preimage resistance. |
| 01:19:42 | <@JAA> | Anyway, this discussion belongs in -ot. |
| 01:27:52 | | Arcorann (Arcorann) joins |
| 01:41:24 | <Somebody2> | namspc: Check out "Internet Archive Census" on the archiveteam wiki -- that's where the previous efforts at this was documented. |
| 01:41:37 | <Somebody2> | and I'm DELIGHTED you are making another effort at it. |
| 01:58:23 | <namspc> | Somebody2, I'm in OT now, we can discuss it there. |
| 02:16:58 | | summerisle quits [Quit: In my vision, I was on the veranda of a vast estate, a palazzo of some fantastic proportion.] |
| 02:17:05 | | summerisle (summerisle) joins |
| 02:30:36 | | Arcorann quits [Ping timeout: 258 seconds] |
| 02:48:27 | | pawbs quits [Quit: My ZNC server died. Probably updating my kernel...] |
| 02:53:06 | | pawbs joins |
| 03:03:25 | | etnguyen03 quits [Client Quit] |
| 03:21:13 | | Sylirana quits [Ping timeout: 244 seconds] |
| 03:21:48 | | qw3rty_ joins |
| 03:22:12 | | Sylirana (Sylirana) joins |
| 03:25:25 | | qw3rty__ quits [Ping timeout: 258 seconds] |
| 04:04:16 | | DogsRNice quits [Read error: Connection reset by peer] |
| 04:16:58 | | benjinsmith joins |
| 04:18:42 | | benjins quits [Ping timeout: 258 seconds] |
| 04:28:02 | | Iki1 joins |
| 04:31:21 | | Iki quits [Ping timeout: 258 seconds] |
| 04:36:22 | | HP_Archivist (HP_Archivist) joins |
| 04:36:58 | | Lord_Nightmare quits [Client Quit] |
| 04:37:25 | | Lord_Nightmare (Lord_Nightmare) joins |
| 04:39:12 | | tzt quits [Ping timeout: 250 seconds] |
| 04:40:06 | | tzt joins |
| 04:42:28 | | aleph quits [Ping timeout: 258 seconds] |
| 04:44:00 | | LordThanatos quits [Ping timeout: 258 seconds] |
| 04:44:09 | | aleph joins |
| 04:45:03 | | pcr leaves |
| 04:47:36 | | LordThanatos joins |
| 05:29:44 | | BlueMaxima quits [Client Quit] |
| 05:31:37 | | Eighty quits [Quit: leaving] |
| 05:39:04 | | Eighty (Eighty) joins |
| 05:45:56 | | dm4v quits [Read error: Connection reset by peer] |
| 06:07:27 | | pcr joins |
| 06:51:39 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 07:49:31 | | sec^nd quits [Remote host closed the connection] |
| 07:49:56 | | sec^nd (second) joins |
| 08:16:48 | | Atom-- joins |
| 08:17:36 | | Atom quits [Ping timeout: 250 seconds] |
| 08:56:53 | | bobbyb quits [Remote host closed the connection] |
| 09:36:14 | <@OrIdow6> | arkiver: Get a reply on that wiki thing? |
| 10:52:46 | | Iki1 quits [Ping timeout: 258 seconds] |
| 11:15:41 | | Iki1 joins |
| 11:46:49 | | @OrIdow6 quits [Ping timeout: 258 seconds] |
| 11:50:25 | | OrIdow6 (OrIdow6) joins |
| 11:50:25 | | @ChanServ sets mode: +o OrIdow6 |
| 12:03:41 | | kiskaWeebChat quits [Ping timeout: 258 seconds] |
| 12:21:04 | | benjinsmith is now known as benjins |
| 12:21:06 | | benjins is now authenticated as benjins |
| 12:24:29 | <@arkiver> | OrIdow6: one - they said there is not much time left, there 1.2 million sites, and mostly without traffix |
| 12:24:34 | <@arkiver> | replied |
| 12:24:38 | <@arkiver> | nothing more |
| 12:32:38 | | BlueMaxima joins |
| 13:08:05 | | BlueMaxima quits [Client Quit] |
| 13:11:42 | | marked quits [Quit: Ping timeout (120 seconds)] |
| 13:14:18 | | marked joins |
| 13:24:59 | | driib joins |
| 15:09:16 | | monoxane quits [Ping timeout: 250 seconds] |
| 15:14:42 | | monoxane (monoxane) joins |
| 15:18:48 | | ThreeHeadedMonkey quits [Ping timeout: 250 seconds] |
| 15:32:23 | | ThreeHeadedMonkey (ThreeHeadedMonkey) joins |
| 15:56:50 | | HackMii (hacktheplanet) joins |
| 15:58:26 | | HackMii_ quits [Remote host closed the connection] |
| 16:05:57 | | dewdrop quits [Ping timeout: 258 seconds] |
| 16:07:27 | | dewdrop (dewdrop) joins |
| 16:35:54 | | godane (godane) joins |
| 16:45:02 | | ddd joins |
| 16:57:59 | | Daloader__ joins |
| 17:05:40 | | HP_Archivist (HP_Archivist) joins |
| 17:18:30 | | LeGoupil joins |
| 17:33:46 | | DogsRNice (Webuser299) joins |
| 17:54:49 | | Vukky (Vukky) joins |
| 17:58:06 | | Vukky quits [Client Quit] |
| 18:06:03 | | spirit joins |
| 18:44:39 | | Wayward quits [Ping timeout: 258 seconds] |
| 18:48:44 | | NIC007a83 joins |
| 18:49:37 | | Ryz quits [Remote host closed the connection] |
| 18:51:09 | | Ryz (Ryz) joins |
| 19:22:37 | | ThreeHeadedMonkey is now known as ThreeHM |
| 19:29:30 | <JensRex> | Reposting a question/topic from #archivebot: |
| 19:29:36 | <JensRex> | <JensRex> So, I'd like to have this site archived, because as far as I can tell it's not in IA: http://odensebilleder.dk/billedliste.asp |
| 19:29:46 | <JensRex> | <JensRex> But... |
| 19:29:51 | <JensRex> | <JensRex> It has to set aome stupid session cookie from this page first: http://odensebilleder.dk/billedstart.asp |
| 19:29:57 | <JensRex> | <JensRex> Just pressing "Søg" (search) returns all images in the archive. |
| 19:30:02 | <JensRex> | <JensRex> Don't know if that's something archivebot is able to handle. |
| 19:30:21 | <JensRex> | <JensRex> This site was not made by competent people. |
| 19:30:34 | <JensRex> | <JensRex> It’s ~17k historic images of the city Odense. The third largest in Denmark. These people are barely able to write parseable html, who knows what their backup policy is like. |
| 19:34:37 | | ddd quits [Remote host closed the connection] |
| 19:59:32 | | ddd joins |
| 20:23:52 | <Sylirana> | JensRex, I can't speak for archivebot or adding the site to IA, but if you just want the images, it's as simple as iterating over image.service.museum.odense.dk/$ID and there doesn't seem to be any cookie checking there. |
| 20:24:35 | | HP_Archivist quits [Read error: Connection reset by peer] |
| 20:25:03 | | HP_Archivist (HP_Archivist) joins |
| 20:26:52 | <Sylirana> | Something I've noticed: IDs below 128 exist on http://image.service.museum.odense.dk/, but not anywhere on http://odensebilleder.dk/ for some reason. |
| 20:28:16 | | ddd quits [Remote host closed the connection] |
| 20:29:04 | <JensRex> | Yea, I considered crawling it for myself over the weekend. |
| 20:30:23 | <thuban> | unfortunately, the pages (http://odensebilleder.dk/billedvis.asp?billednr=$ID) contain valuable metadata and do have the cookie check. i don't think archivebot can accept cookie configuration at present (see https://github.com/ArchiveTeam/ArchiveBot/issues/416). grab-site can, but grab-site warcs don't get automatically ingested into the wayback machine. |
| 20:31:11 | <thuban> | JAA, do we have anybody whitelisted for wbm who could run this via grab-site? |
| 20:33:08 | <thuban> | Sylirana: that's funny, i don't see any ids below 128. got an example? |
| 20:33:34 | <Sylirana> | http://image.service.museum.odense.dk/ has a bit of information about the image host... Apparently there seem to be different sets there which are accessed through something like http://image.service.museum.odense.dk/FKM/FGV/$ID . But I haven't been able to find another set like "FKM/FGV" anywhere. |
| 20:33:41 | <Sylirana> | http://image.service.museum.odense.dk/100 |
| 20:40:56 | | dm4v joins |
| 20:40:58 | | dm4v is now authenticated as dm4v |
| 20:40:58 | | dm4v quits [Changing host] |
| 20:40:58 | | dm4v (dm4v) joins |
| 20:46:32 | <thuban> | huh! 91-108 exist and seem to be archaeological. there are other gaps in the ids returned by this search, some of which don't seem to have a corresponding image, some which do but likewise depict other topics (eg 156 / 944). i wonder whether there are other galleries hiding somewhere... if there's a link somewhere on odensebysmuseer.dk i haven't turned it up |
| 20:52:18 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 20:53:45 | <Sylirana> | thuban, I can't get 108, are you sure that exists? The first "set" I've found is 88-89 and the next one 91-107. |
| 20:55:56 | | Lilpea quits [Ping timeout: 250 seconds] |
| 20:56:05 | <thuban> | whoops, yes, typo. |
| 20:58:59 | | NIC007a83 quits [Client Quit] |
| 21:02:50 | | KRG` joins |
| 21:03:25 | | KRG quits [Ping timeout: 258 seconds] |
| 21:03:35 | | Lilpea joins |
| 21:11:17 | | LeGoupil quits [Client Quit] |
| 21:22:20 | | BlueMaxima joins |
| 21:33:27 | | nertzy quits [Client Quit] |
| 21:41:22 | | Iki1 quits [Ping timeout: 258 seconds] |
| 21:51:36 | | HP_Archivist (HP_Archivist) joins |
| 21:56:56 | | nertzy (nertzy) joins |
| 21:58:21 | | godane1 joins |
| 21:58:28 | | HP_Archivist quits [Read error: Connection reset by peer] |
| 21:58:39 | | godane quits [Read error: Connection reset by peer] |
| 21:58:55 | | HP_Archivist (HP_Archivist) joins |
| 21:59:04 | | BlueMaxima quits [Client Quit] |
| 21:59:37 | | godane2 joins |
| 22:03:06 | | godane1 quits [Ping timeout: 250 seconds] |
| 22:03:09 | | Aoede quits [Quit: ZNC - https://znc.in] |
| 22:03:29 | | Aoede (Aoede) joins |
| 22:28:45 | | MaxG joins |
| 22:35:02 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 22:38:03 | <Sylirana> | Other things which are a bit odd about the site: Instead of simply changing the ID parameter, sometimes "flg" is used to go to the next/previous page (and the ID stays the same). And if you change the "antal" parameter to something like 20000, there will be a button for the "next" page on the last image which just leads to an invalid page. I mean it works, but it would also work without those two parameters. |
| 22:38:03 | | jtagcat quits [Quit: Bye!] |
| 22:40:36 | | jtagcat (jtagcat) joins |
| 22:43:48 | <Sylirana> | As for the images, from 0-20000, there are already over 18k image files out of which only 7954 are also on the "main" site, so there are most likely a lot more images than on the main site. |
| 22:56:16 | | godane2 quits [Read error: Connection reset by peer] |
| 22:58:32 | | Iki1 joins |
| 23:18:30 | | Daloader__ quits [Ping timeout: 250 seconds] |
| 23:29:10 | | minus leaves [WeeChat 2.8] |