How to edit a **LARGE** data file ?
I have a large text file containing about 1.5 million of lines, tabulated into 8 columns of integers. I can't open this file in Excel or OpenOffice. I simply want to erase the first two columns and the last three ones. How can I do that ? I suspect I could do it with the Terminal app, using some UNIX commands, but I don't know which ones. Any idea ?
The data is typicaly of this type :
1 31 182 207782 251702 497161 155968 177129
2 37 242 179448 251159 456757 482649 51000
3 33 263 58414 183974 286273 394303 359364
4 150 435 62905 383035 150833 39433 65673
5 59 226 126530 17941 368627 127627 20542
6 71 298 113108 307803 62563 184296 540070
As I said, simply want to remove the first two columns, and the last three columns of data, keeping only the third, fourth and fifth columns. I can't do this in Excel since the limit is about 65000 lines or so. My file contains 1.5 million of lines !
The data is typicaly of this type :
1 31 182 207782 251702 497161 155968 177129
2 37 242 179448 251159 456757 482649 51000
3 33 263 58414 183974 286273 394303 359364
4 150 435 62905 383035 150833 39433 65673
5 59 226 126530 17941 368627 127627 20542
6 71 298 113108 307803 62563 184296 540070
As I said, simply want to remove the first two columns, and the last three columns of data, keeping only the third, fourth and fifth columns. I can't do this in Excel since the limit is about 65000 lines or so. My file contains 1.5 million of lines !
Comments
Hexfiend
http://ridiculousfish.com/hexfiend/
or 0xED
http://www.suavetech.com/0xed/0xed.html
You can use a hex editor to do this if it's just plain formatting.
Hexfiend
http://ridiculousfish.com/hexfiend/
or 0xED
http://www.suavetech.com/0xed/0xed.html
Hmmm, this is interesting. Thanks for the suggestion. But how do I use the HexEditor to remove the first and second columns, for example ? This isn't obvious to me.
Hmmm, this is interesting. Thanks for the suggestion. But how do I use the HexEditor to remove the first and second columns, for example ? This isn't obvious to me.
Ah I see what you're trying to do, The Hex editor won't be good for doing that. You can use a pretty basic script loop that will do this though:
import sys
import os
import string
if (len(sys.argv)<3):
>print "error"
>sys.exit()
filein = open(sys.argv[1],'r')
fileout = open(sys.argv[2],'w')
for line in filein:
>lnsplit = line.split(" ")
>output = " ".join(lnsplit[2:-3])
>fileout.write(output+"\
")
filein.close()
fileout.close()
Paste that into Textedit - you will need to indent the code correctly by replacing the > at the beginning of the lines with a tab. Save it as columns.py and don't append .txt on the end.
Open up the terminal and type:
python <drag columns.py in here> <drag data file here> <drag data file in again but change the name>
When you drag the files in, you will have something like:
python /downloads/columns.py /documents/data.txt /documents/data_modified.txt
Then hit return and wait for the output.
cat path_to_data | awk '{ print $3 "\" $4 "\" $5 }' > path_to_putput
Here is a one-liner that you could also use:
cat path_to_data | awk '{ print $3 "\" $4 "\" $5 }' > path_to_putput
OH GOD ! This worked like a charm ! Simply perfect ! This is what I was looking for. Thanks a LOT !
>man sed
I was going to call for sed
>man sed
You *sadist*!
3.2380553818938\t-0.1145133041870\t0.00386
3.2380553818938\t-0.1145133041870\t0.00386
3.2380553818938\t-0.1145133041870\t0.00386
3.2380553818938\t-0.1145133041870\t0.00386
3.2380553818938\t-0.1145133041870\t0.00386
Then can we compute sin(column1)*cos(column2) and make a new file with the output as a new column ?
How can we merge the data from two files with several columns and the same number of lines ?
Okay, this brings me to some other question with UNIX in the Terminal app. Is it possible to use the output file and compute some maths like sinus and cosinus ? For example, suppose the data looks like this (columns 1 and 2 are now angles in degrees) :
Then can we compute sin(column1)*cos(column2) and make a new file with the output as a new column ?
This too can be tackled by awk:
cat path_to_input | awk '{ print $1 "\" $2 "\" $3 "\" sin($1*atan2(0, -1)/180)*cos($2*atan2(0, -1)/180) }' > path_to_output
This is about as complicated as I would want to go with awk, it does extend further than even this. I would usually go with a small perl or python program at this point because it is easier to understand 3 months later.
This too can be tackled by awk:
cat path_to_input | awk '{ print $1 "\" $2 "\" $3 "\" sin($1*atan2(0, -1)/180)*cos($2*atan2(0, -1)/180) }' > path_to_output
Thanks a lot again !
And do you know how to add a column from another file, using awk ? I mean suppose you have a file "File1" with three columns, and want to add a fourth column from the data in column 6 of file "File2" ? Is this possible ? (I'm not talking about a simple concatenation of both files, since this operation wont add a fourth column).
Suppose we have a large data file containing three columns of numbers, and we want to keep one line of data after each three lines. How can we do that ? More explicitely, we have something like this :
12 32 45
11 33 55
22 21 25
45 43 67
48 43 56
33 47 78
32 45 98
32 47 88
21 23 67
... etc ...
and we want to keep these lines only :
12 32 45
45 43 67
32 45 98
... etc ...
Is there a way, using the Terminal app only ?