Hi everybody, i have a question: I use curl and wget usually to download single file or a file that are "number continuos", for example, 1.jpg, 2.jpg etc..
But how i can do to download in a single time all the .jpg files ?
i do the url part in quotes, because sometimes there are question marks or other characters in the url, which the shell wants to treat specially, but shouldn't. i tend to throw --continue on there too. for curl it'll be mostly the same, except you'll probably want to put '-O' before the http part (with unquoted white space between them).
that will execute several independant wget calls (each after the previous finishes). this will be unnoticeably slower than the first version (because wget will keep the connection alive in the first version, but will kill the conenction between each grab in the second).
also noteworthy is the range i put in ( {1,2,3,4...} and '0 1 2 3 4...' ) aren't restricted to numbers. you can put any string in there, but if they have spaces then you should quotes them. ie:
wget 'http://url/'{1,"this is my string",2,"lesserString"}.jpg
Thanks for reply. But my problem is that the file name isn't a number, is a "name", with characters. But all the files begin with the "a" letter.
So, what i can do ?
Bye
Nico
It shouldn't be too hard to write a script to do this, but I don't have time to do it at the moment. If you're still having trouble with this on Thursday (June 10th) night, let me know and I'll give the script thing a shot for you.
Sorry but that doesn't work. Just for know, the site is http://www.fantagiochi.com/xIm/DVD/a/ and into the directory "a" there are a lot of jpg. If i try to list the directory safari return me the error:"Directory Listing Denied
This Virtual Directory does not allow contents to be listed." So i assume that the browsing of the directory isn't allowed. But if i try the direc link http://www.fantagiochi.com/xIm/DVD/a...ondo-front.jpg it works.
Sorry but that doesn't work. Just for know, the site is http://www.fantagiochi.com/xIm/DVD/a/ and into the directory "a" there are a lot of jpg. If i try to list the directory safari return me the error:"Directory Listing Denied
This Virtual Directory does not allow contents to be listed." So i assume that the browsing of the directory isn't allowed. But if i try the direc link http://www.fantagiochi.com/xIm/DVD/a...ondo-front.jpg it works.
thanks
Nico
If you do not know the names of the jpg files that you wish to retrieve it will not be possible to retrieve them.
Or am I missing something here?
If there are links to the images from html pages then you could use a tool like Interarchy to mirror the whole site...
Yeah, there's no way to just say "Gimme everything in this directory, now! GIMME!" You have to know individual file URLs ahead of time.
Since they've disallowed your directory listing, your *only* recourse is to grab an app like SiteSucker point it at a known webpage that you think will get you all the files you want, and let it crawl the site.
Comments
note: the following may require the z-shell (zsh), but i don't think it will.
wget 'http://most-of-the-url-goes-here/'{1,2,3,4,5,6,7,8,9}.jpg
that will be the equivalent of doing:
wget 'http://most-of-the-url-goes-here/'1.jpg 'http://most-of-the-url-goes-here/'2.jpg [etcetera]
i do the url part in quotes, because sometimes there are question marks or other characters in the url, which the shell wants to treat specially, but shouldn't. i tend to throw --continue on there too. for curl it'll be mostly the same, except you'll probably want to put '-O' before the http part (with unquoted white space between them).
alternatively, you could do this:
for f in 0 1 2 3 4 5 6 7 8 9; do; wget 'http://most-of-the-url-goes-here/'"$f"; done
that will execute several independant wget calls (each after the previous finishes). this will be unnoticeably slower than the first version (because wget will keep the connection alive in the first version, but will kill the conenction between each grab in the second).
also noteworthy is the range i put in ( {1,2,3,4...} and '0 1 2 3 4...' ) aren't restricted to numbers. you can put any string in there, but if they have spaces then you should quotes them. ie:
wget 'http://url/'{1,"this is my string",2,"lesserString"}.jpg
So, what i can do ?
Bye
Nico
Originally posted by nico
Thanks for reply. But my problem is that the file name isn't a number, is a "name", with characters. But all the files begin with the "a" letter.
So, what i can do ?
Bye
Nico
It shouldn't be too hard to write a script to do this, but I don't have time to do it at the moment. If you're still having trouble with this on Thursday (June 10th) night, let me know and I'll give the script thing a shot for you.
curl --output ~/Documents/Downloads/#1.jpg http://my.favorite.porn.site/pics/[a-z].jpg
This Virtual Directory does not allow contents to be listed." So i assume that the browsing of the directory isn't allowed. But if i try the direc link http://www.fantagiochi.com/xIm/DVD/a...ondo-front.jpg it works.
thanks
Nico
Originally posted by nico
Sorry but that doesn't work. Just for know, the site is http://www.fantagiochi.com/xIm/DVD/a/ and into the directory "a" there are a lot of jpg. If i try to list the directory safari return me the error:"Directory Listing Denied
This Virtual Directory does not allow contents to be listed." So i assume that the browsing of the directory isn't allowed. But if i try the direc link http://www.fantagiochi.com/xIm/DVD/a...ondo-front.jpg it works.
thanks
Nico
If you do not know the names of the jpg files that you wish to retrieve it will not be possible to retrieve them.
Or am I missing something here?
If there are links to the images from html pages then you could use a tool like Interarchy to mirror the whole site...
Since they've disallowed your directory listing, your *only* recourse is to grab an app like SiteSucker point it at a known webpage that you think will get you all the files you want, and let it crawl the site.