New Apple document details Spotlight search queries

AppleInsider · November 3, 2004 9:42AM

Apple provides technical details of its forthcoming Spotlight technology in a new public developer documentation.

Apple Computer on Tuesday published to its developer Web site an in-depth technical document on its forthcoming meta-data driven search technology. The document, titled "Working with Spotlight," covers the Spotlight Store—a file system-level database that holds all of the meta-data attributes about the files, as well as an index of their contents—and provides tips on examining meta-data.

More interesting, however, is a discussion of Spotlight search queries and their presence throughout the next-generation Mac operating system.

"The ability to create queries, and get a list of files in response to those queries, is what allows Spotlight to transcend the typical behavior of a file system and enables you to build a totally new category of applications," the document reads, in part.

"One of the best ways to find examples of complex queries is to use the Finder. Build a query using the Finder's Find feature and then save it. Then, navigate to the Saved Searches folder in your Home folder. You'll see the saved search as a Smart Folder. Get Info about the folder and you'll see the query nicely listed for you to examine."

According to the document, Tiger will also ship with several meta-data importers for a variety of common file formats as well as all the important file formats used by Apple's applications such as iTunes and the Address Book. A partial list of file formats includes: JPEG, PNG, TIFF, and GIF images; MP3 and AAC audio files; QuickTime movies; PDF files; Microsoft Word and Excel documents; iChat transcripts; Email messages; Address Book contacts;\tand iCal calendar files.

Over this past weekend, Apple provided its developers with the first new build of Tiger since its World Wide Developers Conference in June.

[ View this article at AppleInsider.com ]

buonrotto · November 3, 2004 10:28AM

Actually, one interesting tidbit that maybe I just realize is this:

Quote:

Queries can be run either in one-shot mode (shown above) or as live queries that work with run loops. [emphasis mine] Live queries are useful when you have a need to monitor the file system over time. As new files are saved that match the query, your code can be called allowing you to act on the new information.

So apps don't require you to manually tell it what to look for. I imagine that you could have an app that, when you opened a file, for example, "Murphy Account," could look up any related documents (those with some shared metadata info) like spreadsheets, timesheets, expense reports, correspondence, etc. on the fly and keep an updated list of links or related articles to one side. So I guess this means that any app can act osrt of like a smart folder in itself for whatever that's worth.

Also, it sounds like the metadata plugins for third parties will be kept in a folder in the root library. Does this mean an app would require an installer to put this in the right place, or can it be done in the background on first-run? That might be a security issue.

Finally: for you geeks, Spotlight works in the CLI too.

louzer · November 3, 2004 11:51AM

Quote:

Originally posted by BuonRotto

Also, it sounds like the metadata plugins for third parties will be kept in a folder in the root library. Does this mean an app would require an installer to put this in the right place, or can it be done in the background on first-run? That might be a security issue.

Actually, according to the article:

Quote:

Once the meta-data plug-in is built and has been tested, you make it available for Spotlight's use by putting it into one of the following directories:

~/Library/MDImporters

/Library/MDImporters

So it can go into your home folder as well.

Now, for what I'm sure is a stupid question. What would you search for in a graphic file like a JPG or PNG?

kickaha · November 3, 2004 12:05PM

How about keywords associated with that file by the user?

I worked on a project a couple of years ago to automatically track objects in video. One extension we planned but never did was to have automatic detection of 'interesting' objects such as human faces... I can see doing that with images as well, and automatically tagging photos with the people's names that appear in them.

buonrotto · November 3, 2004 12:28PM

EXIF data also.

But the next HUGE breakthrough in search will be with images in their visual content as Kickaha said. We have face recognition technology now, I can imagine how, with some training, this would be incredibly useful once it can be done econommically and reliably.

booga · November 3, 2004 12:41PM

Quote:

Originally posted by Louzer

Now, for what I'm sure is a stupid question. What would you search for in a graphic file like a JPG or PNG?

For TIFF you can have huge numbers of tags to search for (thus the name Tagged Interchange File Format). PNG is based roughly around the same concept. JPG is more limited, but you could still presumably do searches for images of a certain size, bit depth, etc.

So a query like "show me all the black and white versions of those thumbnails" would be possible.

swhitty · November 3, 2004 9:56PM

i have tiger installed and think the search feature is useful... but i don't think there is a way to run queries from the command line. would be nice if the OS actually was built on a database and wasn't just a DB service running ontop of the filesystem.

buonrotto · November 3, 2004 10:13PM

Quote:

Originally posted by swhitty

i have tiger installed and think the search feature is useful... but i don't think there is a way to run queries from the command line.

Maybe it's not fully finished in that area yet. Here's what the Apple document says about it:

Quote:

There's one more thing about Spotlight that should be mentioned. Since the core of Spotlight lives at the very lowest levels of the operating system, it is only natural that there are some command-line tools for power-users to work with file system meta-data and perform queries.

The first command is mdls. Just as traditional Unix ls command will list all of the file in a directory, mdls will list all of the meta-data attributes for a file. [snip]

You can also run queries from the command line using the mdfind tool. [snip]

Not only are these command-line tools useful for the power-user, but they can also be put to good effect in a shell script.

Quote:

would be nice if the OS actually was built on a database and wasn't just a DB service running ontop of the filesystem.

You figure, while I assume that a filesystem-level DB would have better peformance, the amount of development time required to get the thing running might preclude such a solution, especially if the performance gains are relatively trivial. Also, Apple had already done a ton of work with HFS, knew the filesystem and had extended it quite a bit already. I suppose that HFS had earned its stripes and why take the risk of getting bogged down like what apparently has happened with Longhorn. Maybe adding this as a layer on top of the filesystem allows it to be more flexible and extensible, ultimately replaceable due to its modularity. I suspect the most important reasons were that they didn't have to reinvent the wheel in places, could divvy up the work better this way, and could get a product out faster -- all business/administrative reasons in the end.

swhitty · November 3, 2004 11:08PM

of course building it as a service that runs alongside HFS+ is the only viable option for the medium term, its probably the quickest in the long term also - hell even WINFS is built upon NTFS.

but atlest with Longhorn eveything is wrapped up in .NET even the console...just adds the extra abstraction for developers. this approach is not for apple though with its UNIX roots.

WinFS and Spotlight are looking rather similiar though...its just how the surrounding OS uses it.

incidentally what DB runs spotlight?

buonrotto · November 4, 2004 8:10AM

I thought they rolled their own code to make the Spotlight Store DB. Could be wrong.

tim1724 · November 4, 2004 9:28PM

Quote:

Originally posted by swhitty

i have tiger installed and think the search feature is useful... but i don't think there is a way to run queries from the command line.

Actually, there is. It's a bit cumbersome to use (the search keys are really long to type) but someone could easily write a more usable wrapper around it. Note that Apple has posted quite a bit of information about it already.. look at the "Command-line integration" section of http://developer.apple.com/macosx/tiger/spotlight.html, especially the "mdfind" command. (and if you have Tiger installed, then you can just look at the mdfind(1) man page.)

I've used it and it's pretty cool, although typing out queries sucks given the syntax. I need to write a wrapper with "find" syntax.

Quote:

would be nice if the OS actually was built on a database and wasn't just a DB service running ontop of the filesystem.

That would be a huge architecture change, and would likely break a lot of UNIX stuff unless they put a huge amount of thought into how they do it. Maybe someday, but Apple has its hands full with a lot of other stuff in Tiger.

the article

dahacouk · November 5, 2004 11:26AM

There has been very little (OK absolutely none) discussion on adding metadata to a file. I hope we are not going to be limited to adding metadata according to EXIF, ID3, etc schemas. I want far more flexibility than that!

So what I want to know is how you add additional metadata tags to a file. So, for instance, for an image I want to add:

person = Fred

person = John

person = Jane

Because Fred, John and Jane all appear in the image.

You don't want to say:

keyword = Fred

keyword = John

keyword = Jane

As that is just not specific enough.

Please tell me there is an API or some method for doing this?

Cheers Daniel

amorya · November 5, 2004 12:34PM

Quote:

Originally posted by dahacouk

Please tell me there is an API or some method for doing this?

Cheers Daniel

I don't think it's possible. I may be wrong - I'd kinda like to do that as well.

Amorya

karl kuehn · November 5, 2004 2:12PM

Quote:

Originally posted by BuonRotto

You figure, while I assume that a filesystem-level DB would have better peformance, the amount of development time required to get the thing running might preclude such a solution, especially if the performance gains are relatively trivial.

This is dead wrong. For sequential reads, or large reads, databases are always slower than filesystems. Always. Databases are superior only when it comes to searching for data that has been indexed... and to be effective the indexes usually add 2 to 3 times the storage requirements (ie anything that goes into a database typically takes up 3 to 4 times at much space as it would in a filesystem).

The rumors that the next version of Windows would have a filesystem based in a database was simply because people misunderstood the marketing-speak from Redmond. Initially their system was going to be much more tied to the meta-database and their was to be a client-server relationship of these databases to network servers... possibly even peer-to-peer, but all of the talk about that has disappeared under the rug as it became clear that they were going to be massively late and features started getting the axe.

Apple is doing the exact right thing: keep the existing filesystem and add a database of meta-information next to it. This is just a souped up version of the indexing that is already available in 10.3. The only big thing here is that they are allowing a plug-in system to mine the data out of different file types.

Quote:

Originally posted by swhitty

incidentally what DB runs spotlight?

The early word was that it would be based on SQLite, same as CoreData. This was the rumor after WWDC.

dahacouk · November 5, 2004 2:16PM

Quote:

Originally posted by Amorya

I don't think it's possible. I may be wrong - I'd kinda like to do that as well.

Amorya

Well, that just sucks (am I allowed to say that? I guess I'll find out...)

But, really, we've got to be able to add metadata to files other than just sticking to the defacto schemas.

Spotlight is a seperate database and is not constained to EXIF or ID3 or whatever so all that developers need to do is build a kind of generalised interface for "Meta-data Importers".

Cheers Daniel

dahacouk · November 5, 2004 2:23PM

Quote:

http://developer.apple.com/macosx/tiger/spotlight.html

Notice that these keys are abstract rather than the name of a key in a particular format. This is because different file formats might express the same meta-data using different terms. The normalization of terms into a single namespace simplifies creating constrained searches. Tiger will ship with a large number of keys defined to handle a variety of meta-data types.

This idea of normalising the namespace is very interesting. Does that mean that developers creating "Meta-data Importers" will be resticted to using a pre-defined and unchangeable set of keys?

Is this going to be a bottleneck?

Can they define their own?

Is there a list somewhere?

Why am I so concerned? ;-)

Cheers Daniel

kickaha · November 5, 2004 2:38PM

Quote:

Originally posted by Karl Kuehn

The rumors that the next version of Windows would have a filesystem based in a database was simply because people misunderstood the marketing-speak from Redmond.

Actually, the original plan was to *replace* the filesystem with an actual honest to god database. The DB would be the only way to access files, not just a way to access an index of them. Name/folder would would be another set of metadata in the DB.

Then they said they wanted a DB on top of NTFS. Then they said they wanted a DB for local file searches. Now even that is in jeopardy of being cut.

karl kuehn · November 5, 2004 2:56PM

Quote:

Originally posted by Kickaha

Actually, the original plan was to *replace* the filesystem with an actual honest to god database. The DB would be the only way to access files, not just a way to access an index of them. Name/folder would would be another set of metadata in the DB.

Then they said they wanted a DB on top of NTFS. Then they said they wanted a DB for local file searches. Now even that is in jeopardy of being cut.

I really don't think that this is the case. When the very first rumors of this whole "database filesystem" came out I did a lot of reading up on it. All of the ZDNet level reporting was talking about what you are talking about... or even something approaching Newton's file soup method. But every single tech-level document was talking about something very close Spotlight... but with a network structure.

I really think that this whole thing is because someone in marketing at MS got the wrong idea and a bunch of buzzwords and the rumor got out of hand (this is not uncommon for companies run by marketing departments).

How do you argue around both the size and speed hit of putting everything into a database?

kickaha · November 5, 2004 3:10PM

Quote:

Originally posted by Karl Kuehn

I really don't think that this is the case. When the very first rumors of this whole "database filesystem" came out I did a lot of reading up on it. All of the ZDNet level reporting was talking about what you are talking about... or even something approaching Newton's file soup method. But every single tech-level document was talking about something very close Spotlight... but with a network structure.

I really think that this whole thing is because someone in marketing at MS got the wrong idea and a bunch of buzzwords and the rumor got out of hand (this is not uncommon for companies run by marketing departments).

How do you argue around both the size and speed hit of putting everything into a database?

That was the question everyone was asking MS...

Remember, WinFS dates back conceptually to Cairo, oh, around '93 or so. It wasn't until '98/99 that they realized that a full database FS just wasn't going to be really feasible yet.

mpmoriarty · November 5, 2004 10:16PM

Well someone told me that you'll be able to enter data into the "comments" field in a file's Get Info window. Although this isn't exactly like defining a new type of meta data tag for your system, at least you'll be able to add file specific information to your files.

But come on let's get serious. Apple is trying to create a system that is user friendly and will appeal to the largest audience. Not everybody wants to be able to create custom meta data tags. Even more so, those that want it probably wouldn't use it. Plus think of all the time you would have to spend entering all this meta data into your files.

It's often better to create a system that works with common types of meta data that all files use and is able to be extracted automatically.

This is sorta like iTunes in a way. iTunes has all the types of meta data tags already defined and setup. How often do you need to create different types of tags for your music in iTunes. Usually if you need to add any additional information that you might want to organize your music, you add it in the comments tag.

Also here's a question referring to something somebody said about the Newton. I never saw or used one so I was wondering...

What do you mean about the Newton having a "file soup" file system?

Mike

tuttle · November 5, 2004 10:24PM

"What do you mean about the Newton having a "file soup" file system?"

http://en.wikipedia.org/wiki/Apple_Newton

"Data in Newton was stored in object-oriented databases known as soups. One of the revolutionary aspects of Newton was that soups were available to all programs; and programs could operate cross-soup; meaning that the calendar could refer to names in the address book; a note in the notepad could be converted to an appointment, and so forth; and the soups could be programmer-extended - a new address book enhancement could be built on the data from the existing address book."

I never did any Newton coding, so maybe someone with experience can give a better answer.

New Apple document details Spotlight search queries

Comments