Help Cataloging A Personal Library/Archive?
Hi all.
I have recently figured out that my filing system for my personal library/archive doesn't work. How do I know this? I couldn't figure out which of my four tubs of file folders an article was in.
Here's the skinny:
I'm an academic and I do a large amount of work with sources from the 19th century. Lots of these are photocopies. Some are my notes. I need to figure out a way to catalog what I have and store it so I can retrieve it when I need it. For my larger documents, I use clipless binders with a title on the spine indicating what's in them.
But I've got hundreds of other, loose, bits in hanging folders. I want to change that.
I'm wondering if anyone has any experience trying to organize/catalog something like this. I do NOT want to use an electronic database. I do NOT want to scan anything. I just want to catalog, index and file. At the moment, I'm thinking of just doing this in 3-ring binders with plastic sleeves to hold the documents and then some kind of coding system (a number of the binder+a document number) and a plain excel sheet indexing everything.
The problem is that this needs to be scalable; ideally, I'll keep this system going for many years and will add to it, at times, rapidly. Does anyone have any advice? Experience?
Thanks in advance.
I have recently figured out that my filing system for my personal library/archive doesn't work. How do I know this? I couldn't figure out which of my four tubs of file folders an article was in.
Here's the skinny:
I'm an academic and I do a large amount of work with sources from the 19th century. Lots of these are photocopies. Some are my notes. I need to figure out a way to catalog what I have and store it so I can retrieve it when I need it. For my larger documents, I use clipless binders with a title on the spine indicating what's in them.
But I've got hundreds of other, loose, bits in hanging folders. I want to change that.
I'm wondering if anyone has any experience trying to organize/catalog something like this. I do NOT want to use an electronic database. I do NOT want to scan anything. I just want to catalog, index and file. At the moment, I'm thinking of just doing this in 3-ring binders with plastic sleeves to hold the documents and then some kind of coding system (a number of the binder+a document number) and a plain excel sheet indexing everything.
The problem is that this needs to be scalable; ideally, I'll keep this system going for many years and will add to it, at times, rapidly. Does anyone have any advice? Experience?
Thanks in advance.
Comments
In fact you could go a long, long way towards your goal by turning a copy of Acrobat Professional loose on your scans (pro will do OCR). It wouldn't be 100%, (it would catch 90% of the verbage) but you could still create Acrobat 'indexes' of your docs, and search them on the fly.
Don't be a ninny!!
It's either that or Excel Hell.
This isn't really about having everything available at the click of a mouse. It's about being able to find some essay when I find myself, like today, trying to find something for a footnote. There's also a lot of memory of the shape of pages and where information is on a page that would be lost if they were OCR'd, if that makes sense (I often remember that something I want to remember was printed on a certain column where the page looked a certain way).
DMZ: I understand, but OCR isn't really an option. I need to retain page numbers on documents that were printed. I have lots of documents that are hand-written (e.g. the diary of a clergyman from 1848, handwritten letters). And getting a clean scan? Impossible for large chunks of the time because I'm working either with original documents that were in bad shape to begin with or with microfiche/film xeroxes.
This isn't really about having everything available at the click of a mouse. It's about being able to find some essay when I find myself, like today, trying to find something for a footnote. There's also a lot of memory of the shape of pages and where information is on a page that would be lost if they were OCR'd, if that makes sense (I often remember that something I want to remember was printed on a certain column where the page looked a certain way).
Aha, this is where you and that Baker essay on card catalogues intersect.
Aha, this is where you and that Baker essay on card catalogues intersect.
Hehe. Well, only if I'm burning them!
DMZ: I understand, but OCR isn't really an option. I need to retain page numbers on documents that were printed. I have lots of documents that are hand-written (e.g. the diary of a clergyman from 1848, handwritten letters). And getting a clean scan? Impossible for large chunks of the time because I'm working either with original documents that were in bad shape to begin with or with microfiche/film xeroxes.
This isn't really about having everything available at the click of a mouse. It's about being able to find some essay when I find myself, like today, trying to find something for a footnote. There's also a lot of memory of the shape of pages and where information is on a page that would be lost if they were OCR'd, if that makes sense (I often remember that something I want to remember was printed on a certain column where the page looked a certain way).
hmmmm... Acrobat Pro (especially) would keep things nailed down on the page where they originated, and it's possible to number pages -- 'paste' down headers, etc. Acrobat makes a point of keeping the xy coordinates of objects the same -- to a fault. Also, don't be too afraid of the microfiche representations -- a lot of the programs now boast working from screen captures and faxes -- even digital cameras. Abbyy's documentation seems to be keen on that -- 'just take a picture....'
But for handwritten docs or things that you can't scan at all, or large jobs you couldn't pawn off on the University copy shop, I dunno.
Wasn't there something.. Delicious library??...
Edit: check this out:
http://www.delicious-monster.com/
hmmmm... Acrobat Pro (especially) would keep things nailed down on the page where they originated, and it's possible to number pages -- 'paste' down headers, etc. Acrobat makes a point of keeping the xy coordinates of objects the same -- to a fault. Also, don't be too afraid of the microfiche representations -- a lot of the programs now boast working from screen captures and faxes -- even digital cameras. Abbyy's documentation seems to be keen on that -- 'just take a picture....'
But for handwritten docs or things that you can't scan at all, or large jobs you couldn't pawn off on the University copy shop, I dunno.
Wasn't there something.. Delicious library??...
Believe me. I understand how all of this works and how cool it all is. But most of my documents are scanned with a 2-page layout in landscape. They have marginal notes and underlining that fuck up OCR. Lots of them are mid-90s fax quality. But again, I don't really want PDF. It would take me a year to scan it all. It would take me forever to tag it all in Yojimbo (which I use as my research junk drawer). Right now I just want to catalog, file and index so I know where it all is the next time I'm looking for WR Greg's essay "Why Are Women Redundant" and can't find it.
Believe me. I understand how all of this works and how cool it all is. . . . .
I'm not sure about that. I bought a Fujitsu ScanSnap Pro a few months ago, and it has been life-altering. It scans full bleed, duplex, and it's sheet-fed. It eat through stacks of paper faster than I can shred them, and dumps everything into PDF straight away. I'm not sure if it does OCR or not.
If you're still hell-bent on non-digital organization, I would agree that three-ring binder are the way to go. I keep all of my documents in three ring binders in addition to the scanned digital copies. The only other suitable option is hanging folders in a file cabinet, but they aren't as portable.
Believe me. I understand how all of this works and how cool it all is. But most of my documents are scanned with a 2-page layout in landscape. They have marginal notes and underlining that fuck up OCR. Lots of them are mid-90s fax quality. But again, I don't really want PDF. It would take me a year to scan it all. It would take me forever to tag it all in Yojimbo (which I use as my research junk drawer). Right now I just want to catalog, file and index so I know where it all is the next time I'm looking for WR Greg's essay "Why Are Women Redundant" and can't find it.
You can tell the OCR to look for facing pages -- and the fax quality isn't all that big a deal. Maybe the thing isn't to try to go all one way or the other; you could scan what will represent well and then manually archive the stuff that was too funky. Divide and conquer?
What about delicious?
Your organization can (should?) be all digital, even if you keep the physical documents, your reference guide to find where they are can (should?) be digital, for easy updating and easy searching. After that it is simply a matter of pointing you in the right direction (digital or physical storage).
I would say 90% of my stuff is electronic, but I still have binders with plastic sleeves for a lot of stuff that might not lend itself well to OCR. I try to avoid that where possible, because in-document search mechanisms (Spotlight on Mac and the Windows equivalents) are invaluable for information searching.
And if you are planning on building up a wealth of stuff until you die, digital is ideal (as you know I am sure).
I would also suggest talking to local print shops that might have digitization services. Call or e-mail and describe what you have and maybe you could pay someone to make it all nice and sexy for you.
You can tell the OCR to look for facing pages -- and the fax quality isn't all that big a deal. Maybe the thing isn't to try to go all one way or the other; you could scan what will represent well and then manually archive the stuff that was too funky. Divide and conquer?
What about delicious?
I do keep a digital archive of some things, mostly in Yojimbo. And I do keep digital versions of my notes, all of which are synced up among my three Macs and kept in RTF so they won't be inaccessible 10 years from now.
Delicious, in my experience, is too slow on my computers and is in the end designed to do something else.
I do keep a digital archive of some things, mostly in Yojimbo. And I do keep digital versions of my notes, all of which are synced up among my three Macs and kept in RTF so they won't be inaccessible 10 years from now.
Delicious, in my experience, is too slow on my computers and is in the end designed to do something else.
You sir, are a bona fide Luddite and an Adobephobe.
Shame. Shaaaame!
You sir, are a bona fide Luddite and an Adobephobe.
Shame. Shaaaame!
Um, I'm an English professor! What did you expect!?!
And I am NOT a Luddite. I have never even BEEN to Manchester in 1826, nor would I have smashed any looms when I was there.
But yes, there are some things I do not like to use technology for.
You may consider getting BibDesk. It's a LaTeX/BibTeX reference manager that you could use to manage all the keywords, etc, and then have a single entry for which file folder it's in. Heck, number the folders sequentially - they don't have to have any sense to them at all, other than just partitioning down the number of documents you need to flip through.
The entry format is completely extensible, so you can add whatever metadata you want.
My wife is in a similar boat, and did the OCR/scanning route for a while - too much work for too little benefit.
I do think that the OCR route would only be to 'bomb in' your documents; if you caught 80-90% of the words that would probably go long way towards locating something. But if you try to to any correction to the OCR, especially if it is in the 80-90% range -- it's death of a thousands typos.
I do think that the OCR route would only be to 'bomb in' your documents; if you caught 80-90% of the words that would probably go long way towards locating something. But if you try to to any correction to the OCR, especially if it is in the 80-90% range -- it's death of a thousands typos.
I'm very, very reluctant to do anything that is guaranteed to introduce error into this.
My wife is in a similar boat, and did the OCR/scanning route for a while - too much work for too little benefit.
You may consider getting BibDesk. It's a LaTeX/BibTeX reference manager that you could use to manage all the keywords, etc, and then have a single entry for which file folder it's in. Heck, number the folders sequentially - they don't have to have any sense to them at all, other than just partitioning down the number of documents you need to flip through.
The entry format is completely extensible, so you can add whatever metadata you want.
I played around with it for a while, and I felt like i was trying to use a combine to mow my yard. Right now, I can maintain an Excel list with some tagging information and combine that with Yojimbo to incrementally scan and store as PDF).
I'm very, very reluctant to do anything that is guaranteed to introduce error into this.
Dammit midwinter they're just documents! There's nothing outside of the text!