Tannin
Storage? I am Storage!
No, not the post, the program.
I need something to do a specific, quite well-defined task. Easiest to explain the "why" first, then I'll try the "what".
WHY
I take a lot of photographs, sometimes several thousand in a week. Some are taken in both raw and JPG, some JPG only. They all end up with a filename in the form:
010203-123456-XXXXX.XXX This represents:
2001
February
3rd
-
timestamp
-
(0 or more extra letters which differentiate between files with identical timestamps)
.
jpg or cr2 (= Canon raw file)
So it's a pretty straightforward naming scheme. There is a naming scheme for folders too, but that's not relevant here. For simplicity, I'll just pretend that the top-level folder is called HOME.
OK, so I start with a lot of files. First thing I do is sort out the raw files and put them in their own sub-folder, HOME/RAW. Then I flick through the JPGs, sorting the good from the bad. I never delete anything (well, practicallly never). I find it more time-efficient to move stuff. This is a multiple-pass process. Let's say I start with 1000 JPGs in HOME/. (There will be maybe 500 matching raw files in the HOME/RAW folder as well. We can ignore them for now.)
I look at every image, moving those that are clearly the worst ones into HOME/X1. When I've got rid of ~50% of the files, I rename HOME/X1 to HOME/X9 - these are the files I will probably never want to access again - but I keep them because (a) storage is cheap, (b) sometimes I make mistakes and I can work faster if I know that a mistake can always be reversed if need be, and (c) because sometimes (not often) I want some information contained in an otherwise worthless photograph - for example, a road sign so I can find a particular place again, a differerent view of a bird or plant that I can't identify from the more artistically worthwhile shots, stiff like that.
I usually take a break at this point, and come back to this folder a few days or weeks later, when I can see the images afresh and make better decisions. Then I trawl all the way through again, once more moving the worst ~50% into HOME/X1. When I'm done, this is renamed to HOME/x8.
Lather, rinse and repeat. From a starting poiint with 1000 files in HOME, I end up with something like this the file structure below:
HOME - 54 files
HOME/X5 - 20 files
HOME/X6 - 42 files
HOME/X7 - 104 files
HOME/X8 - 273 files
HOME/x9 - 507 files
HOME/RAW - 500 files
Does this sound complicated and cumbersome? Not really. It's pretty-much second-nature, and I can do it in my sleep - except for the human-being-level-decisions about which images to keep and which to get rid of.
In reality, it's a bit more complicated than it appears above (though still second nature) because the file structure above is what ends up on my file server, while the sorting actually takes place on my Thinkpad, so there is some copying and moving and deleting between machines involved. (Essentially, it is file synchronization, only backwards - I need to make the many-files folder the same as the few-files folder, where file synchronisation software is designed to do the reverse. I do it by hand. No big deal, the sequence is:
1: Count the files in the top level folder on the server
2: Count the number of files in both levels on the Thinkpad (e.g., HOME and HOME/X6).
3: If the two counts are different, there is an error. Sort it out. Otherwise:
4: Delete all the files in the HOME folder on the server (leaving the sub-folders intact)
5: Move HOME/X6 to the server
6: Copy the files in HOME/ to the server
7: Count the files again to be sure there is no error.
Takes longer to write down like this than it does to actually do it. This part isn't my problem (though an automated way to do this would be worth considering, if anyone knows of one). And, in reality, I rather enjoy the few minutes it takes to do this bit. There is a real satisfaction in watching my laptop drive get a little bit emptier after several hours hard work sorting images.
Now all this sorting takes place with the JPGs. The raw files are all archived off to the server right away, because I don't have the space to store them on the laptop. Mostly, when I want to do something with an image like post-process it and prepare it for further use, I work from the JPG. I only resort to the raw file when I get that rare combination of an image which is not good enough to use direct from JPG, but is good enough to use if I do the raw conversion myself and tweak it. Bad white balance is the most common reason, bad exposure less common - if it's too far out to use the JPG, the raw is unlikely to be presentation-quality anyway. But I do want particular raw files from time to time, so I keep them.
Now we get to the real problem. I would like to have the raw files sorted in exactly the same way as the JPGs. Remember, every raw file has an equivalent JPG, but not every JPG has a matching RAW. The program logic should be simple enough:
* Read filename from the HOME/RAW folder
* Look in the HOME/X9 folder for a *.JPG match. If found, move the raw file to HOME/RAW/X9
* Look in the HOME/X8 folder for *.JPG. If found ....
* etc
* repeat
Is there an off-the-shelf program that can do this for me?
OR, can I figure out a way to write it myself? Bear in mind that the only programming language I'm current in is PHP. You can run PHP on Win XP (as opposed to on a web server) but I'd have to do a fair bit of messing about with the install/config/security setup, and it's using a screwdriver to hammer nails.
I do not want to have to learn any new skills! If I'm going to write it in any language other than PHP, it has to be a language I can pick up and be productive in right away. I simply don't have time to learn something completely new. I used to be reasonably fluent in BASIC, Fortran, Pascal, Z-80 assembler, and Modula-2, but all those were a long, long time ago. C or any of the C-like languages are definately out of the question! Has to be something I can pick up in a few hours.
So ..... any ideas?
* One exception: if the best answer turn out to be something essentially involving some fairly serious regular expressions and not too much else, I'll regard developing my (small) skills in Pearl or POSIX regexes as a useful thing, and worth spending time on.
I need something to do a specific, quite well-defined task. Easiest to explain the "why" first, then I'll try the "what".
WHY
I take a lot of photographs, sometimes several thousand in a week. Some are taken in both raw and JPG, some JPG only. They all end up with a filename in the form:
010203-123456-XXXXX.XXX This represents:
2001
February
3rd
-
timestamp
-
(0 or more extra letters which differentiate between files with identical timestamps)
.
jpg or cr2 (= Canon raw file)
So it's a pretty straightforward naming scheme. There is a naming scheme for folders too, but that's not relevant here. For simplicity, I'll just pretend that the top-level folder is called HOME.
OK, so I start with a lot of files. First thing I do is sort out the raw files and put them in their own sub-folder, HOME/RAW. Then I flick through the JPGs, sorting the good from the bad. I never delete anything (well, practicallly never). I find it more time-efficient to move stuff. This is a multiple-pass process. Let's say I start with 1000 JPGs in HOME/. (There will be maybe 500 matching raw files in the HOME/RAW folder as well. We can ignore them for now.)
I look at every image, moving those that are clearly the worst ones into HOME/X1. When I've got rid of ~50% of the files, I rename HOME/X1 to HOME/X9 - these are the files I will probably never want to access again - but I keep them because (a) storage is cheap, (b) sometimes I make mistakes and I can work faster if I know that a mistake can always be reversed if need be, and (c) because sometimes (not often) I want some information contained in an otherwise worthless photograph - for example, a road sign so I can find a particular place again, a differerent view of a bird or plant that I can't identify from the more artistically worthwhile shots, stiff like that.
I usually take a break at this point, and come back to this folder a few days or weeks later, when I can see the images afresh and make better decisions. Then I trawl all the way through again, once more moving the worst ~50% into HOME/X1. When I'm done, this is renamed to HOME/x8.
Lather, rinse and repeat. From a starting poiint with 1000 files in HOME, I end up with something like this the file structure below:
HOME - 54 files
HOME/X5 - 20 files
HOME/X6 - 42 files
HOME/X7 - 104 files
HOME/X8 - 273 files
HOME/x9 - 507 files
HOME/RAW - 500 files
Does this sound complicated and cumbersome? Not really. It's pretty-much second-nature, and I can do it in my sleep - except for the human-being-level-decisions about which images to keep and which to get rid of.
In reality, it's a bit more complicated than it appears above (though still second nature) because the file structure above is what ends up on my file server, while the sorting actually takes place on my Thinkpad, so there is some copying and moving and deleting between machines involved. (Essentially, it is file synchronization, only backwards - I need to make the many-files folder the same as the few-files folder, where file synchronisation software is designed to do the reverse. I do it by hand. No big deal, the sequence is:
1: Count the files in the top level folder on the server
2: Count the number of files in both levels on the Thinkpad (e.g., HOME and HOME/X6).
3: If the two counts are different, there is an error. Sort it out. Otherwise:
4: Delete all the files in the HOME folder on the server (leaving the sub-folders intact)
5: Move HOME/X6 to the server
6: Copy the files in HOME/ to the server
7: Count the files again to be sure there is no error.
Takes longer to write down like this than it does to actually do it. This part isn't my problem (though an automated way to do this would be worth considering, if anyone knows of one). And, in reality, I rather enjoy the few minutes it takes to do this bit. There is a real satisfaction in watching my laptop drive get a little bit emptier after several hours hard work sorting images.
Now all this sorting takes place with the JPGs. The raw files are all archived off to the server right away, because I don't have the space to store them on the laptop. Mostly, when I want to do something with an image like post-process it and prepare it for further use, I work from the JPG. I only resort to the raw file when I get that rare combination of an image which is not good enough to use direct from JPG, but is good enough to use if I do the raw conversion myself and tweak it. Bad white balance is the most common reason, bad exposure less common - if it's too far out to use the JPG, the raw is unlikely to be presentation-quality anyway. But I do want particular raw files from time to time, so I keep them.
Now we get to the real problem. I would like to have the raw files sorted in exactly the same way as the JPGs. Remember, every raw file has an equivalent JPG, but not every JPG has a matching RAW. The program logic should be simple enough:
* Read filename from the HOME/RAW folder
* Look in the HOME/X9 folder for a *.JPG match. If found, move the raw file to HOME/RAW/X9
* Look in the HOME/X8 folder for *.JPG. If found ....
* etc
* repeat
Is there an off-the-shelf program that can do this for me?
OR, can I figure out a way to write it myself? Bear in mind that the only programming language I'm current in is PHP. You can run PHP on Win XP (as opposed to on a web server) but I'd have to do a fair bit of messing about with the install/config/security setup, and it's using a screwdriver to hammer nails.
I do not want to have to learn any new skills! If I'm going to write it in any language other than PHP, it has to be a language I can pick up and be productive in right away. I simply don't have time to learn something completely new. I used to be reasonably fluent in BASIC, Fortran, Pascal, Z-80 assembler, and Modula-2, but all those were a long, long time ago. C or any of the C-like languages are definately out of the question! Has to be something I can pick up in a few hours.
So ..... any ideas?
* One exception: if the best answer turn out to be something essentially involving some fairly serious regular expressions and not too much else, I'll regard developing my (small) skills in Pearl or POSIX regexes as a useful thing, and worth spending time on.