Help Cataloging A Personal Library/Archive?

midwinter · January 25, 2007 12:06AM

Hi all.

I have recently figured out that my filing system for my personal library/archive doesn't work. How do I know this? I couldn't figure out which of my four tubs of file folders an article was in.

Here's the skinny:

I'm an academic and I do a large amount of work with sources from the 19th century. Lots of these are photocopies. Some are my notes. I need to figure out a way to catalog what I have and store it so I can retrieve it when I need it. For my larger documents, I use clipless binders with a title on the spine indicating what's in them.

But I've got hundreds of other, loose, bits in hanging folders. I want to change that.

I'm wondering if anyone has any experience trying to organize/catalog something like this. I do NOT want to use an electronic database. I do NOT want to scan anything. I just want to catalog, index and file. At the moment, I'm thinking of just doing this in 3-ring binders with plastic sleeves to hold the documents and then some kind of coding system (a number of the binder+a document number) and a plain excel sheet indexing everything.

The problem is that this needs to be scalable; ideally, I'll keep this system going for many years and will add to it, at times, rapidly. Does anyone have any advice? Experience?

Thanks in advance.

trick fall · January 25, 2007 12:36AM

You have my sympathies. I probably don't have much advice to offer, but I do have to maintain archive files for my accounting office and what we do is use bankers boxes with hanging file folders. The boxes are numbered and labeled with their contents. The boxes are created by type and then filed alphabetically. There is a database that matches the boxes. I used to work for a union and I was able to quickly pull files from the forties and fifties using this system. When using binders I find it preferable if possible to photo copy everything to standard size, but plastic sleeves could work. My two biggest concerns with using binders are space and cost. Of course I am in the process of trying to get as much of this material stored as pdf's as possible which is a whole other set of problems!

midwinter · January 25, 2007 12:41AM

Thanks, TF. It looks like what I'm going to have to do is create my own cataloging system. I would, at some point, turn these into PDFs, but frankly it's easier just to deal with the hard copies right now--and for the foreseeable future.

dmz · January 25, 2007 12:54AM

To hell with that midwinter -- GIGO!! Get some clean scans and blow your stuff through either OmniPage Pro or ABBYY FineReader.

In fact you could go a long, long way towards your goal by turning a copy of Acrobat Professional loose on your scans (pro will do OCR). It wouldn't be 100%, (it would catch 90% of the verbage) but you could still create Acrobat 'indexes' of your docs, and search them on the fly.

Don't be a ninny!!

It's either that or Excel Hell.

midwinter · January 25, 2007 1:03AM

DMZ: I understand, but OCR isn't really an option. I need to retain page numbers on documents that were printed. I have lots of documents that are hand-written (e.g. the diary of a clergyman from 1848, handwritten letters). And getting a clean scan? Impossible for large chunks of the time because I'm working either with original documents that were in bad shape to begin with or with microfiche/film xeroxes.

This isn't really about having everything available at the click of a mouse. It's about being able to find some essay when I find myself, like today, trying to find something for a footnote. There's also a lot of memory of the shape of pages and where information is on a page that would be lost if they were OCR'd, if that makes sense (I often remember that something I want to remember was printed on a certain column where the page looked a certain way).

addabox · January 25, 2007 1:11AM

Quote:

Originally Posted by midwinter

DMZ: I understand, but OCR isn't really an option. I need to retain page numbers on documents that were printed. I have lots of documents that are hand-written (e.g. the diary of a clergyman from 1848, handwritten letters). And getting a clean scan? Impossible for large chunks of the time because I'm working either with original documents that were in bad shape to begin with or with microfiche/film xeroxes.

This isn't really about having everything available at the click of a mouse. It's about being able to find some essay when I find myself, like today, trying to find something for a footnote. There's also a lot of memory of the shape of pages and where information is on a page that would be lost if they were OCR'd, if that makes sense (I often remember that something I want to remember was printed on a certain column where the page looked a certain way).

Aha, this is where you and that Baker essay on card catalogues intersect.

midwinter · January 25, 2007 1:14AM

Quote:

Originally Posted by addabox

Aha, this is where you and that Baker essay on card catalogues intersect.

Hehe. Well, only if I'm burning them!

dmz · January 25, 2007 1:51AM

Quote:

Originally Posted by midwinter

DMZ: I understand, but OCR isn't really an option. I need to retain page numbers on documents that were printed. I have lots of documents that are hand-written (e.g. the diary of a clergyman from 1848, handwritten letters). And getting a clean scan? Impossible for large chunks of the time because I'm working either with original documents that were in bad shape to begin with or with microfiche/film xeroxes.

This isn't really about having everything available at the click of a mouse. It's about being able to find some essay when I find myself, like today, trying to find something for a footnote. There's also a lot of memory of the shape of pages and where information is on a page that would be lost if they were OCR'd, if that makes sense (I often remember that something I want to remember was printed on a certain column where the page looked a certain way).

hmmmm... Acrobat Pro (especially) would keep things nailed down on the page where they originated, and it's possible to number pages -- 'paste' down headers, etc. Acrobat makes a point of keeping the xy coordinates of objects the same -- to a fault. Also, don't be too afraid of the microfiche representations -- a lot of the programs now boast working from screen captures and faxes -- even digital cameras. Abbyy's documentation seems to be keen on that -- 'just take a picture....'

But for handwritten docs or things that you can't scan at all, or large jobs you couldn't pawn off on the University copy shop, I dunno.

Wasn't there something.. Delicious library??...

Edit: check this out:

http://www.delicious-monster.com/

midwinter · January 25, 2007 2:00AM

Quote:

Originally Posted by dmz

hmmmm... Acrobat Pro (especially) would keep things nailed down on the page where they originated, and it's possible to number pages -- 'paste' down headers, etc. Acrobat makes a point of keeping the xy coordinates of objects the same -- to a fault. Also, don't be too afraid of the microfiche representations -- a lot of the programs now boast working from screen captures and faxes -- even digital cameras. Abbyy's documentation seems to be keen on that -- 'just take a picture....'

But for handwritten docs or things that you can't scan at all, or large jobs you couldn't pawn off on the University copy shop, I dunno.

Wasn't there something.. Delicious library??...

Believe me. I understand how all of this works and how cool it all is. But most of my documents are scanned with a 2-page layout in landscape. They have marginal notes and underlining that fuck up OCR. Lots of them are mid-90s fax quality. But again, I don't really want PDF. It would take me a year to scan it all. It would take me forever to tag it all in Yojimbo (which I use as my research junk drawer). Right now I just want to catalog, file and index so I know where it all is the next time I'm looking for WR Greg's essay "Why Are Women Redundant" and can't find it.

trick fall · January 25, 2007 7:43AM

I'm sure there is also a certain level of enjoyment going through all of those bits of paper. Just keep in mind you will need shelving for all those binders.

splinemodel · January 25, 2007 7:54AM

Quote:

Originally Posted by midwinter

Believe me. I understand how all of this works and how cool it all is. . . . .

I'm not sure about that. I bought a Fujitsu ScanSnap Pro a few months ago, and it has been life-altering. It scans full bleed, duplex, and it's sheet-fed. It eat through stacks of paper faster than I can shred them, and dumps everything into PDF straight away. I'm not sure if it does OCR or not.

If you're still hell-bent on non-digital organization, I would agree that three-ring binder are the way to go. I keep all of my documents in three ring binders in addition to the scanned digital copies. The only other suitable option is hanging folders in a file cabinet, but they aren't as portable.

dmz · January 25, 2007 8:34AM

Quote:

Originally Posted by midwinter

Believe me. I understand how all of this works and how cool it all is. But most of my documents are scanned with a 2-page layout in landscape. They have marginal notes and underlining that fuck up OCR. Lots of them are mid-90s fax quality. But again, I don't really want PDF. It would take me a year to scan it all. It would take me forever to tag it all in Yojimbo (which I use as my research junk drawer). Right now I just want to catalog, file and index so I know where it all is the next time I'm looking for WR Greg's essay "Why Are Women Redundant" and can't find it.

You can tell the OCR to look for facing pages -- and the fax quality isn't all that big a deal. Maybe the thing isn't to try to go all one way or the other; you could scan what will represent well and then manually archive the stuff that was too funky. Divide and conquer?

What about delicious?

groverat · January 25, 2007 9:47AM

Divide and conquer is definitely my suggestion as well. Scan (or get someone else to scan) what is suitable for scanning and physically store the rest.

Your organization can (should?) be all digital, even if you keep the physical documents, your reference guide to find where they are can (should?) be digital, for easy updating and easy searching. After that it is simply a matter of pointing you in the right direction (digital or physical storage).

I would say 90% of my stuff is electronic, but I still have binders with plastic sleeves for a lot of stuff that might not lend itself well to OCR. I try to avoid that where possible, because in-document search mechanisms (Spotlight on Mac and the Windows equivalents) are invaluable for information searching.

And if you are planning on building up a wealth of stuff until you die, digital is ideal (as you know I am sure).

I would also suggest talking to local print shops that might have digitization services. Call or e-mail and describe what you have and maybe you could pay someone to make it all nice and sexy for you.

midwinter · January 25, 2007 11:41AM

Quote:

Originally Posted by dmz

You can tell the OCR to look for facing pages -- and the fax quality isn't all that big a deal. Maybe the thing isn't to try to go all one way or the other; you could scan what will represent well and then manually archive the stuff that was too funky. Divide and conquer?

What about delicious?

I do keep a digital archive of some things, mostly in Yojimbo. And I do keep digital versions of my notes, all of which are synced up among my three Macs and kept in RTF so they won't be inaccessible 10 years from now.

Delicious, in my experience, is too slow on my computers and is in the end designed to do something else.

dmz · January 25, 2007 12:06PM

Quote:

Originally Posted by midwinter

I do keep a digital archive of some things, mostly in Yojimbo. And I do keep digital versions of my notes, all of which are synced up among my three Macs and kept in RTF so they won't be inaccessible 10 years from now.

Delicious, in my experience, is too slow on my computers and is in the end designed to do something else.

You sir, are a bona fide Luddite and an Adobephobe.

Shame. Shaaaame!

midwinter · January 25, 2007 12:56PM

Quote:

Originally Posted by dmz

You sir, are a bona fide Luddite and an Adobephobe.

Shame. Shaaaame!

Um, I'm an English professor! What did you expect!?!

And I am NOT a Luddite. I have never even BEEN to Manchester in 1826, nor would I have smashed any looms when I was there.

But yes, there are some things I do not like to use technology for.

kickaha · January 25, 2007 1:08PM

My wife is in a similar boat, and did the OCR/scanning route for a while - too much work for too little benefit.

You may consider getting BibDesk. It's a LaTeX/BibTeX reference manager that you could use to manage all the keywords, etc, and then have a single entry for which file folder it's in. Heck, number the folders sequentially - they don't have to have any sense to them at all, other than just partitioning down the number of documents you need to flip through.

The entry format is completely extensible, so you can add whatever metadata you want.

dmz · January 25, 2007 1:12PM

Quote:

Originally Posted by Kickaha

My wife is in a similar boat, and did the OCR/scanning route for a while - too much work for too little benefit.

I do think that the OCR route would only be to 'bomb in' your documents; if you caught 80-90% of the words that would probably go long way towards locating something. But if you try to to any correction to the OCR, especially if it is in the 80-90% range -- it's death of a thousands typos.

midwinter · January 25, 2007 1:17PM

Quote:

Originally Posted by dmz

I do think that the OCR route would only be to 'bomb in' your documents; if you caught 80-90% of the words that would probably go long way towards locating something. But if you try to to any correction to the OCR, especially if it is in the 80-90% range -- it's death of a thousands typos.

I'm very, very reluctant to do anything that is guaranteed to introduce error into this.

midwinter · January 25, 2007 1:18PM

Quote:

Originally Posted by Kickaha

My wife is in a similar boat, and did the OCR/scanning route for a while - too much work for too little benefit.

You may consider getting BibDesk. It's a LaTeX/BibTeX reference manager that you could use to manage all the keywords, etc, and then have a single entry for which file folder it's in. Heck, number the folders sequentially - they don't have to have any sense to them at all, other than just partitioning down the number of documents you need to flip through.

The entry format is completely extensible, so you can add whatever metadata you want.

I played around with it for a while, and I felt like i was trying to use a combine to mow my yard. Right now, I can maintain an Excel list with some tagging information and combine that with Yojimbo to incrementally scan and store as PDF).

dmz · January 25, 2007 1:19PM

Quote:

Originally Posted by midwinter

I'm very, very reluctant to do anything that is guaranteed to introduce error into this.

Dammit midwinter they're just documents! There's nothing outside of the text!

Help Cataloging A Personal Library/Archive?

Comments