Odyssey: Converting 1800+ CSV files to HTML via Excel 2004

Posted:
in Mac Software edited January 2014
Howdy,



I have 1800+ CSV contact files that I must convert to HTML files in Excel. The orginal source files vary widely in length and structure, but I have been able to structure them using TextWrangler and a few other programs.



To date, only by using Excel's conversion engine can I automatically get the superior consistency and error-tolerance I need to process what are poorly-formed and inconsistent CSV files into well-formed HTML output. With it, I can then will run each file through a really nifty program called Anthracite by Metafy in order to export the tables into arrays. These arrays will then import into Filemaker or some other database for final normalization.



But of course I have hit the limit of operator knowledge. I'm an analyst, not a programmer. I've been working on this stuff for weeks.The learning curve has been steep, not the least of which is due to the fact I've had to use a bunch of different new tools to get to where I'm at now. Upstream there was a bunch of preprocessing that had to happen just to convert my old files into this format and get rid of the unnecessary tags and other information, but that's another story. There may be other ways to do this, but I am focused on doing it this way, and have invested huge hours to get it 95% done. It's the final 5% that's killing me.



I have cobbled together a little Applescript that will automate the process of opening the CSV file in Excel 2004, then converting and saving it as an HTML file. It works beautifully, as simple as it is. However, the AppleScript as written will only convert one file at a time. I need a way to have this script loop through the chosen directory, choose each successive file, convert it, and go back to the next in line in the same folder. I'm sure its a simple fix, but I can't find any reference or example to help me chunk through this directory one file at a time. I've been scouring for reference for this but just can't fnd it.



Here's the entirety of my script, to date. I hope those knowledgeable folks out there will understand just getting to this point is harder than it looks for a non-scripter like me. And in fact, I got this far only by scouring news groups and picking up tips here and code left there by extremely talented and generous people whose prior work allowed me to finally write this script for Excel 2004 using OS X 10.4.7:





set the_file to "Users:mynameesktop:TESTfolder:Uniquecontactdata 1.csv"

tell application "Microsoft Excel"

\tactivate

\topen the_file

\tsave active workbook in the_file & "_new.html" as HTML

\tclose active workbook without saving

end tell



I need to revise the AppleScript in order to have it run through, alter, re-name and save-as all the files to HTML.



It may be important to note that I have 1800+files, sequentially numbered "1.csv" through "2200.csv". Because there are numbers missing from this sequence, my gut tells me I also need some type of "auto skip if file not present" function and/or auto error-handling.



Can anyone help me in this quest? Any help of any sort would greatfully appreciated.



Thanks !



Rob @ Decision Informatics LLC

Comments

  • Reply 1 of 8
    mr. memr. me Posts: 3,221member
    My question is that if you are using TextWrangler to reform all of your CSV files, why not do the whole job in TextWrangler?
  • Reply 2 of 8
    Mr Me



    Thank you for your reply. As it is now, I have 1500 files that are regularly formed, 200 that have a "lot in common" and about 100 that are just almost mush. I considered proceeding with TW, but the files lacked pervasive and strong regular expressions so that it was becoming really tedious to take them farther with TW. The files are also not regularly structured in terms of content, which is highly variable, which has created a bunch of complications in data structure. Someone more talented and experienced might be able to deal with this better but I am not that guy. Excel overlooks a lot of the oddities and variances of the input files, and outputs an html file that I can then convert easily to an array using tools I understand better, tweak some more in that form, then import into Filemaker.



    If I can get the repetitive seek file and process step in Applescript knocked, I'll be done with this project in an hour or two. Any suggestions?
  • Reply 3 of 8
    Quote:
    Originally Posted by DecisionInformaticsLLC


    If I can get the repetitive seek file and process step in Applescript knocked, I'll be done with this project in an hour or two. Any suggestions?



    So it sounds like you're just looking for a way to loop through the directory of files? If so, this might help ... this is an old Applescript I used to use to get a directory's contents into a list in Tex-Edit Plus, a text editor. Maybe you can use this format to have the Finder loop through the folder, and then put the Excel instructions inside the loop (replace the Tex-Edit Plus instructions I have with your own Excel instructions), using the loop variable name of theFile for the file name of course. Hope this helps!


    HTML Code:
    tell application "Finder"
    set theFolder to folder "your:foldername:here" --put your folder path in the quotes
    set theNumber to (the number of items in theFolder)
    repeat with i from 1 to theNumber
    set theFile to the name of item i of theFolder
    tell application "Tex-Edit Plus" -- change this to Excel
    make new paragraph at end of window 1 with data theFile -- do the Excel stuff here
    end tell
    end repeat
    end tell
  • Reply 4 of 8
    Quote:
    Originally Posted by naknek


    So it sounds like you're just looking for a way to loop through the directory of files? If so, this might help ... this is an old Applescript I used to use to get a directory's contents into a list in Tex-Edit Plus, a text editor. Maybe you can use this format to have the Finder loop through the folder, and then put the Excel instructions inside the loop (replace the Tex-Edit Plus instructions I have with your own Excel instructions), using the loop variable name of theFile for the file name of course. Hope this helps!


    HTML Code:
    tell application "Finder"
    set theFolder to folder "your:foldername:here" --put your folder path in the quotes
    set theNumber to (the number of items in theFolder)
    repeat with i from 1 to theNumber
    set theFile to the name of item i of theFolder
    tell application "Tex-Edit Plus" -- change this to Excel
    make new paragraph at end of window 1 with data theFile -- do the Excel stuff here
    end tell
    end repeat
    end tell




    Naknek, thank you for your very helpful post. I have gotten past the first couple of hurdles in the form of errors finding the folder. Now I've hit a snag with the Finder's inability to get file 1 of the total files in the folder. I've learned about several points that could impact this tho I am unsure of their relevance.



    1. It appears that the collective wisdom and recommendation is to be certain to enable the assistive devices checkbox in Universal Access preference window. Supposedly Applescript is not fully functional otherwise in areas requiring GUI scripting and others. Not sure how or if this impacts this script.

    2. I couldn't get the "set the_Folder to folder {folderpath}" to run initially. Had to put a line break between "folder" and {folderpath} to do so. Surprised at this req, but perhaps I'm just more clueless that I thought.

    3. Now I am getting an Applescript error that "can't get item 1 of 4" of total in the_Folder. My research so far indicates three possibilities:

    a. There are known problems using the term "item", not sure what these are yet.

    b. Another script example shows that there might be the need to set the_File as an alias, for reasons I don't yet understand.

    c. Yet another script shows some variance in whether to use "set doFile to the name of file bar of foo" OR "set doFile to file bar of foo". if you know what I mean.

    d. Finally, another script calls for files "without invisibles" which could directly impact whether item 1 is an invisible, or the first file I actually want to process.



    I'm a little overwhelmed, but game to try to figure this out. I'll check back later with an update.



    My code at present follows below.



    Thanks!



    Rob @ DecisionInformatics



    tell application "Finder"

    set the_Folder to folder

    "Users:iamtheuseresktop:Testcontactlists:" --put your folder path in the quotes

    set the_Number to 4

    repeat with I from 1 to the_Number

    set the_file to the name of item I of the_Number

    tell application "Microsoft Excel"

    activate

    open the_file

    save active workbook in the_file & "_new.html" as HTML

    close active workbook without saving

    end tell

    end repeat

    end tell
  • Reply 5 of 8
    You need to change this line:

    set the_file to the name of item I of the_Number



    to this:

    set the_file to the name of item I of the_Folder



    Also, you can literally use this line:

    set the_Number to (the number of items in the_Folder)



    You don't have to put in 4 or whatever, just put in that line exactly as it appears with the parentheses and all. That way I think the Finder will automatically calculate how many items in the folder. (At least it always worked that way for me, but I haven't used this script for a couple years so some of it might be outdated).
  • Reply 6 of 8
    Morning Naknek,



    Thank you for your continued patience and support on this.



    The transposed target was a glazed-eyes problem. Thanks for catching it. The other revs you suggested are noted below, as are comments regarding current error: "Can't get every item of folder." Have read in other groups that "item" is a problem word, but don't understand why or how to fix it. Concerned that the .DS_Store file in the folder may be gumming things up, and have tried to append "without invisibles" in appropriate places to no effect. Wondering if it needs a top-level file type filter so it only counts CSV but haven't tracked that down yet. Also wondering whether there is a difference between using "the number" versus "count."



    FWIW I am running OS X 10.4.7 with Applescript 1.10.7 Script Editor 2.1.1 (81) and Excel 2004 11.2.5 on a Powerbook G4 1.5ghz.



    This is the current script SSS.Naknek092206.scpt forevermore named in your honor:



    tell application "Finder"

    \tset the_Folder to folder

    \t"Users:iamtheuseresktop:testcopy2" --SE requires this on a separate line for some reason

    \tset the_Number to (the number of items in the_Folder) --the word "number" in the parenthetical clause becomes highlighted by a blue box and SE/AS returns "Applescript Error: Can't get every item of folder

    \trepeat with i from 1 to the_Number

    \t\tset theFile to the name of item i of the_Folder

    \t\ttell application "Microsoft Excel" --automatically fills in with this appname when ME2004 is open

    \t\t\tactivate

    \t\t\topen the_file

    \t\t\tsave active workbook in the_file & "_new.html" as HTML

    \t\t\tclose active workbook without saving

    \t\tend tell

    \tend repeat

    end tell





    Let me know what you think.



    Rob at DecisionInformatics (at) gmail (dot) com
  • Reply 7 of 8
    I'm not an AppleScript person but I think you might be able to do this with Automator.



    Try using the "Ask For Finder Items" and "Run AppleScript" actions.



    I think Office has some Automator actions too, so you might be able to do it without an AppleScript.
  • Reply 8 of 8
    Quote:
    Originally Posted by DecisionInformaticsLLC


    Morning Naknek,



    Concerned that the .DS_Store file in the folder may be gumming things up, and have tried to append "without invisibles" in appropriate places to no effect. Wondering if it needs a top-level file type filter so it only counts CSV but haven't tracked that down yet. Also wondering whether there is a difference between using "the number" versus "count."



    OK, I'm sure some of these errors are due to my old code (originally used this under OS 9). Try using "file" in place of "item" and also it sounds like you should skip the parenthetical count I was suggesting and just enter in the actual number like you did before.



    Hope this helps ... my Applescript knowledge is definitely rusty (haven't used it much since I started learning PHP a few years ago).
Sign In or Register to comment.