compare text file app - global
hi,
i just had a tab delineated address list run though the postal service address check software.
what a recieved out from the program was a 'new' list of known good addresses.
i need to know if there a piece of software that can compare the two docs and then output what items do not exist in both files... with this info ill be able to delete the 'bad' addresses from my database.
ive tried bbedit's compare doc function, but it looks like it looks at the doc on a line number per line number basis and not globally..
if at the very least if the program says, "found 'John' in file A 10x and 9x in file B", or "Found 'Peterson' in file A but not in file B".. i would have good place to start.
thanks..
i just had a tab delineated address list run though the postal service address check software.
what a recieved out from the program was a 'new' list of known good addresses.
i need to know if there a piece of software that can compare the two docs and then output what items do not exist in both files... with this info ill be able to delete the 'bad' addresses from my database.
ive tried bbedit's compare doc function, but it looks like it looks at the doc on a line number per line number basis and not globally..
if at the very least if the program says, "found 'John' in file A 10x and 9x in file B", or "Found 'Peterson' in file A but not in file B".. i would have good place to start.
thanks..
Comments
If they are in the same order, then there's an app comes with Apple's developer tools called FileMerge that's quite good - although I've only ever used it for code...
Amorya
Originally posted by LGnome
if at the very least if the program says, "found 'John' in file A 10x and 9x in file B", or "Found 'Peterson' in file A but not in file B".. i would have good place to start.
Grep does this. Open up Terminal, cd to wherever the two files are, and type "grep John file1". It will spit out all lines in file1 in which 'John' is found.
What's the exact format of your tab-delimted files? Each address on a different line, with components of the address separated by tabs? Like this?
Jane Doe<tab>123 Some Street<tab>SomeCity, NY 10001
John Smith<tab>345 Another Way<tab>AnotherCity, CA 99991
Sort function:
http://www.ncl.ac.uk/ucs/unix/unixhelp/sort.html
#! /usr/bin/perl -w
$inputfile1 = $ARGV[0];
$inputfile2 = $ARGV[1];
$outputfile = $ARGV[2];
open (INPUT1, "$inputfile1") or die "Can't open $inputfile1\
";
while (<INPUT1>) {
chomp;
$input1lines{$_} = 0;
}
close INPUT1;
open (INPUT2, "$inputfile2") or die "Can't open $inputfile2\
";
while (<INPUT2>) {
chomp;
$input2lines{$_} = 0;
}
close INPUT2;
open (OUTPUT, ">>$outputfile") or die "Can't open output file $outputfile\
";
foreach (keys %input1lines) {
unless (exists $input2lines{$_}) {
print OUTPUT ("$_\
");
print STDOUT ("No match for $_\
");
}
}
close OUTPUT;
First, copy the above code into a text file (in BBEdit, for example) and save it with some name. I called it "comparelines.pl". Next you have to make it executable. Open the Terminal, cd to the folder in which you saved the program, and type:
chmod u+x comparelines.pl
Finally, put your "before" and "after" address files into the same folder. Two points of caution: first, make sure all your files have Unix line endings, including the program - you can verify this in BBEdit under Save As/Options, or in SubEthaEdit under Format/Line Endings. Second, this will only match lines if they are exactly alike. So if the address-checking software rearranged the formats or expanded abbreviations, for example, you'd need a more complicated program to find the matches (probably by matching only names, instead of the whole address).
You run the program by typing its name in the Terminal, along with three file names - the before and after files, and whatever name you want to give the output. Like so:
./comparelines.pl BeforeAddresses AfterAddresses OutputFile
That's it. It'll spit out on screen any non-matching lines, as well as writing them to the output file.
I love Perl. "Make the easy things easy and the hard things possible." This is definitely an example of making the easy things easy.
Originally posted by Towel
I timed myself; it took eight minutes, and that included typing up sample data.
YIZERS!! ask and yee shall receive..
super cool.. well thanks for all the help.. im going to try this out Monday when i get back work.. ill let you all know the progress..