convert string to dataframe
up vote
0
down vote
favorite
I have a large text file around 450 mb. I have read it and out come is as string.
import pandas as pd
import numpy as np
import re
def readInChunks(fileObj, chunkSize=2048):
while True:
data = fileObj.read(chunkSize)
if not data:
break
yield data
result=
f = open("textfile.txt")
for chunk in readInChunks(f):
result.append(chunk)
f.close()
Result I got is a big string file, let say it result.
And result[0] is given below
Alin Deutsch, Mary F. Fernandez, 1998
Alin Deutsch, Daniela Florescu, 1998
Alin Deutsch, Alon Y. Levy, 1998
Now I want this string to converted to dataframe in following way
c1 c2 c3
r1 Alin Deutsch Mary F. Fernandez 1998
r2 Alin Deutsch Daniela Florescu 1998
python pandas
|
show 1 more comment
up vote
0
down vote
favorite
I have a large text file around 450 mb. I have read it and out come is as string.
import pandas as pd
import numpy as np
import re
def readInChunks(fileObj, chunkSize=2048):
while True:
data = fileObj.read(chunkSize)
if not data:
break
yield data
result=
f = open("textfile.txt")
for chunk in readInChunks(f):
result.append(chunk)
f.close()
Result I got is a big string file, let say it result.
And result[0] is given below
Alin Deutsch, Mary F. Fernandez, 1998
Alin Deutsch, Daniela Florescu, 1998
Alin Deutsch, Alon Y. Levy, 1998
Now I want this string to converted to dataframe in following way
c1 c2 c3
r1 Alin Deutsch Mary F. Fernandez 1998
r2 Alin Deutsch Daniela Florescu 1998
python pandas
have you tried pandas read_csv() method to read the whole dataset?
– anotherone
Nov 9 at 11:08
yes, but it consume the ram and system goes to "not responding" sate
– Talha Anwar
Nov 9 at 11:10
with the chunksize option enabled? pandas.pydata.org/pandas-docs/stable/io.html#io-chunking
– anotherone
Nov 9 at 11:12
also see this answer stackoverflow.com/questions/25962114/…
– anotherone
Nov 9 at 11:14
Thanks all of you. I have tried your suggestion. But I get an error. Error tokenizing data. C error: Expected 2 fields in line 118, saw 3. Which lead me to use of read_fwf. But problem is that Its not splitting the data into columns, though I am using delimiter.
– Talha Anwar
Nov 9 at 12:26
|
show 1 more comment
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a large text file around 450 mb. I have read it and out come is as string.
import pandas as pd
import numpy as np
import re
def readInChunks(fileObj, chunkSize=2048):
while True:
data = fileObj.read(chunkSize)
if not data:
break
yield data
result=
f = open("textfile.txt")
for chunk in readInChunks(f):
result.append(chunk)
f.close()
Result I got is a big string file, let say it result.
And result[0] is given below
Alin Deutsch, Mary F. Fernandez, 1998
Alin Deutsch, Daniela Florescu, 1998
Alin Deutsch, Alon Y. Levy, 1998
Now I want this string to converted to dataframe in following way
c1 c2 c3
r1 Alin Deutsch Mary F. Fernandez 1998
r2 Alin Deutsch Daniela Florescu 1998
python pandas
I have a large text file around 450 mb. I have read it and out come is as string.
import pandas as pd
import numpy as np
import re
def readInChunks(fileObj, chunkSize=2048):
while True:
data = fileObj.read(chunkSize)
if not data:
break
yield data
result=
f = open("textfile.txt")
for chunk in readInChunks(f):
result.append(chunk)
f.close()
Result I got is a big string file, let say it result.
And result[0] is given below
Alin Deutsch, Mary F. Fernandez, 1998
Alin Deutsch, Daniela Florescu, 1998
Alin Deutsch, Alon Y. Levy, 1998
Now I want this string to converted to dataframe in following way
c1 c2 c3
r1 Alin Deutsch Mary F. Fernandez 1998
r2 Alin Deutsch Daniela Florescu 1998
python pandas
python pandas
asked Nov 9 at 11:05
Talha Anwar
111
111
have you tried pandas read_csv() method to read the whole dataset?
– anotherone
Nov 9 at 11:08
yes, but it consume the ram and system goes to "not responding" sate
– Talha Anwar
Nov 9 at 11:10
with the chunksize option enabled? pandas.pydata.org/pandas-docs/stable/io.html#io-chunking
– anotherone
Nov 9 at 11:12
also see this answer stackoverflow.com/questions/25962114/…
– anotherone
Nov 9 at 11:14
Thanks all of you. I have tried your suggestion. But I get an error. Error tokenizing data. C error: Expected 2 fields in line 118, saw 3. Which lead me to use of read_fwf. But problem is that Its not splitting the data into columns, though I am using delimiter.
– Talha Anwar
Nov 9 at 12:26
|
show 1 more comment
have you tried pandas read_csv() method to read the whole dataset?
– anotherone
Nov 9 at 11:08
yes, but it consume the ram and system goes to "not responding" sate
– Talha Anwar
Nov 9 at 11:10
with the chunksize option enabled? pandas.pydata.org/pandas-docs/stable/io.html#io-chunking
– anotherone
Nov 9 at 11:12
also see this answer stackoverflow.com/questions/25962114/…
– anotherone
Nov 9 at 11:14
Thanks all of you. I have tried your suggestion. But I get an error. Error tokenizing data. C error: Expected 2 fields in line 118, saw 3. Which lead me to use of read_fwf. But problem is that Its not splitting the data into columns, though I am using delimiter.
– Talha Anwar
Nov 9 at 12:26
have you tried pandas read_csv() method to read the whole dataset?
– anotherone
Nov 9 at 11:08
have you tried pandas read_csv() method to read the whole dataset?
– anotherone
Nov 9 at 11:08
yes, but it consume the ram and system goes to "not responding" sate
– Talha Anwar
Nov 9 at 11:10
yes, but it consume the ram and system goes to "not responding" sate
– Talha Anwar
Nov 9 at 11:10
with the chunksize option enabled? pandas.pydata.org/pandas-docs/stable/io.html#io-chunking
– anotherone
Nov 9 at 11:12
with the chunksize option enabled? pandas.pydata.org/pandas-docs/stable/io.html#io-chunking
– anotherone
Nov 9 at 11:12
also see this answer stackoverflow.com/questions/25962114/…
– anotherone
Nov 9 at 11:14
also see this answer stackoverflow.com/questions/25962114/…
– anotherone
Nov 9 at 11:14
Thanks all of you. I have tried your suggestion. But I get an error. Error tokenizing data. C error: Expected 2 fields in line 118, saw 3. Which lead me to use of read_fwf. But problem is that Its not splitting the data into columns, though I am using delimiter.
– Talha Anwar
Nov 9 at 12:26
Thanks all of you. I have tried your suggestion. But I get an error. Error tokenizing data. C error: Expected 2 fields in line 118, saw 3. Which lead me to use of read_fwf. But problem is that Its not splitting the data into columns, though I am using delimiter.
– Talha Anwar
Nov 9 at 12:26
|
show 1 more comment
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53224531%2fconvert-string-to-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
have you tried pandas read_csv() method to read the whole dataset?
– anotherone
Nov 9 at 11:08
yes, but it consume the ram and system goes to "not responding" sate
– Talha Anwar
Nov 9 at 11:10
with the chunksize option enabled? pandas.pydata.org/pandas-docs/stable/io.html#io-chunking
– anotherone
Nov 9 at 11:12
also see this answer stackoverflow.com/questions/25962114/…
– anotherone
Nov 9 at 11:14
Thanks all of you. I have tried your suggestion. But I get an error. Error tokenizing data. C error: Expected 2 fields in line 118, saw 3. Which lead me to use of read_fwf. But problem is that Its not splitting the data into columns, though I am using delimiter.
– Talha Anwar
Nov 9 at 12:26