database, i guess.

Posted:
in Mac Software edited January 2014
I am outside of any Mac user groups, and a few thousand miles from the nearest Mac dealer (which is in Shanghai). Please excuse my ignorance if there is an easy answer to this question.



What's the best way to do a keyword search in an archive containing thousands of articles across multiple document formats? I mean, not just to search for file names, but search within these documents for a given keyword/s or phrase/s. I have a few gigs of articles that I need to access regularly and am having a hard time managing them.



Thanks in advance!

Comments

  • Reply 1 of 12
    Devonthink might be worth a look (although much depends on what formats exactly the data is in):



    Devonthink
  • Reply 2 of 12
    torifiletorifile Posts: 4,024member
    perl and grep. I once searched my entire hard drive for references to a rogue file that was causing my system to not be able to login. I'll see if I can dig up some same code for you.



    edit: I just realized that I may have misunderstood what you were looking for. You're looking for some sort of organizational system? I've thought about programming a database with stuff like author names, abstracts, keywords and a numbering scheme for my thousands of articles (hard copies) so I could easily find what I was looking for. I never got around to it. :/ If you have something like Appleworks, you can use the database module in there and do the cataloging yourself. It would take some time, but with creative cut and pasting, it shouldn't be too terribly bad.
  • Reply 3 of 12
    torifiletorifile Posts: 4,024member
    Quote:

    Originally posted by staphbaby

    Devonthink might be worth a look (although much depends on what formats exactly the data is in):



    Devonthink




    Shit, man. Nice find! I'm going to give this app a nice tryout. I've been looking for something like this...
  • Reply 4 of 12
    goodgod, man- Devonthink is quite possibly the answer here. have you used it much?



    to clarify a bit, i often read up to a dozen dense articles a day. sometimes months later, while working on one project or another (usually a co-authored document) i often recall having read this-or-that idea/figure/test from a document in my archive, but haven't a clue where to find it for re-checking and sourcing. i'll *know* I have it, but be unable to find it. i need a way to search within these documents without actually opening them. it seems like this would be impossible, but then again, i've said that many times about many things, and sure enough, there's almost always a WAY.



    the files i need searched are .doc, .rtf, .html, mostly, but it would be great if other media formats could be included in the search: .jpg (just the title would do), .rm (titles), .mov (titles), .wmv (titles), etc.



    i don't have hundreds of movie/(other media) files to search through, but it's getting there. it would be great if i could search through the "comments" fields that many movie files have, or even be able to attach some sort of abstract to them (and other media files) for searching months or years later when searching titles alone won't differentiate each file enough. this may be unrealistic, wishful thinking, however. most important is massive document library searching.
  • Reply 5 of 12
    this is kind of funny- almost nothing ever matches what i need, but this Devonthink is seemingly right on:



    "Or are you a newsmaniac? <yes> A scientist? <yes> A software developer? <a wannabe, yes> Do you collect newsbits wherever you go? <hell yes> DEVONthink can store your text or code fragments with just one mouse click via its Mac OS X service, and help you working with them with its intelligent classification function, lightning-fast search and even faster concordance. <jeezus- 'intelligent classification function" seems to be just what i need> Developers can also store their documentations, help files and references and look up articles faster than ever before."



    if i can get some user reviews, i'm practically sold.
  • Reply 6 of 12
    staphbabystaphbaby Posts: 353member
    Quote:

    Originally posted by orchidstrat

    goodgod, man- Devonthink is quite possibly the answer here. have you used it much?



    No, although I've been tracking its progress with interest, because it's rapidly approaching the kind of usefulness I'm willing to pay for (and I'm rapidly approaching the amount of information where I might actually need it!)



    It's still very much in aggressive feature-adding mode at the moment ? the Enterprise Edition won't be launched until later this year.



    Quote:

    to clarify a bit, i often read up to a dozen dense articles a day. sometimes months later, while working on one project or another (usually a co-authored document) i often recall having read this-or-that idea/figure/test from a document in my archive, but haven't a clue where to find it for re-checking and sourcing. i'll *know* I have it, but be unable to find it. i need a way to search within these documents without actually opening them. it seems like this would be impossible, but then again, i've said that many times about many things, and sure enough, there's almost always a WAY.



    the files i need searched are .doc, .rtf, .html, mostly, but it would be great if other media formats could be included in the search: .jpg (just the title would do), .rm (titles), .mov (titles), .wmv (titles), etc.



    Well, Devonthink supports all of these (or so says their supported file type and features page). As for metadata, they don't explicitly say anywhere... it's probably worth a feature suggestion to the developer tho'



    Quote:

    i don't have hundreds of movie/(other media) files to search through, but it's getting there. it would be great if i could search through the "comments" fields that many movie files have, or even be able to attach some sort of abstract to them (and other media files) for searching months or years later when searching titles alone won't differentiate each file enough. this may be unrealistic, wishful thinking, however. most important is massive document library searching.



    Well, there's a downloadable demo ? so you may as well give it a try (let us know how it goes, it sounds like you'll give it a real crash-testing).



    Btw Torifile: the reason I found this is because of Devon's excellent freeware including AntiWordService (which allows Cocoa apps to open Word documents, albeit in a seriously unformatted state) and Word Service, which implements in the Services Menu about 40-odd useful little macros for reformatting text.
  • Reply 7 of 12
    staphbabystaphbaby Posts: 353member
    Well, having just had a little play, I told it to import my History Honours folder, with its 300-odd files (tifs, pdfs, rtfs, and txt files, as well as some filemaker files, and the odd word documents).



    It did whatever import does in about a minute (on my iBook 800), and from there it gave me pretty much instant search of the content of the 100-odd text/rtf files... this doesn't give much indication as to how it will scale, I know, but its certainly promising.



    The built-in browser is also very very cool indeed. Hooray for WebKit.
  • Reply 8 of 12
    torifiletorifile Posts: 4,024member
    Quote:

    Originally posted by staphbaby

    Btw Torifile: the reason I found this is because of Devon's excellent freeware including AntiWordService (which allows Cocoa apps to open Word documents, albeit in a seriously unformatted state) and Word Service, which implements in the Services Menu about 40-odd useful little macros for reformatting text.



    You know, all Cocoa apps get this for free with Panther, right? Not to take anything away from this app, I'm just pointing that out. I hope their education price is good. If so, I'm in.
  • Reply 9 of 12
    staphbabystaphbaby Posts: 353member
    Quote:

    Originally posted by torifile

    You know, all Cocoa apps get this for free with Panther, right? Not to take anything away from this app, I'm just pointing that out. I hope their education price is good. If so, I'm in.



    Well, the Word file import, yes ? but it also works in 10.1 and 10.2, which is where I was using it, so for all those Jaguar users (are there any left at AI anyway?) its still excellent anti-Microsoft goodness.



    The biggest problem with WordService was that it added so many items to the Service Menu that it brought out the Carbon service menu bug (the services menu would not show up in Carbon apps if it had more than a certain number of items). Having word counts in TextEdit is worth its weight in gold, though. Well, maybe not. But it is cool.



    I'm pretty impressed so far in my playing around, so I've just emailed them to check out the edu price tonight. If people care I'll post it once I've got it.
  • Reply 10 of 12
    alcimedesalcimedes Posts: 5,486member
    isn't this thing only $75? i mean, this isn't Photoshop pricing or anything up at $600. if it really does all the stuff it says it does, than anything less than $200 is a great deal.
  • Reply 11 of 12
    torifiletorifile Posts: 4,024member
    devonthink is amazing. I don't know how it does what it does but WOW. I wish I had a little more control over keywords and attaching descriptions, but damn it's a nice program.



    I also found out about their ed pricing: not the greatest discount, only $5 off, but it's something.... That makes their personal edition $35. I'm seriously considering it... If you haven't tried this program and you have a lot of information to keep track of, give it a go.



    I really like the way they've leveraged all the good stuff in OS X. Quartz, quicktime, webkit, .doc reading. Very impressive show of the power of OSX. It should win some kind of apple design award.



    staphbaby, thanks for the heads up.
  • Reply 12 of 12
    midwintermidwinter Posts: 10,060member
    I've been using Devonthink for a while now, and it's pretty darned good. The only serious problems I have with are these: 1) there is no easy way to sync up the DT database between two computers, 2) the interface is a little confusing at times.



    You might also want to try MacJournal, which is free.
Sign In or Register to comment.