Apply TfidfVectorizer in every row of dataframe that is a list of lists
up vote
0
down vote
favorite
I have a pandas dataframe containing 2 columns and I want to use sklearn TfidfVectorizer
for text-classification in one of them. However this column is a list of lists and TFIDF wants raw input as text. In this question they provide a solution in case we have just one list of lists, but I would like to ask how it would be possible to apply this function in every single row of my dataframe, which row contains a list of lists. Thank you in advance.
Input:
0 [[this, is, the], [first, row], [of, dataframe]]
1 [[that, is, the], [second], [row, of, dataframe]]
2 [[etc], [etc, etc]]
Wanted Output:
0 ['this is the', 'first row', 'of dataframe']
1 ['that is the', 'second', 'row of dataframe']
2 ['etc', 'etc etc']
python list dataframe tfidfvectorizer
add a comment |
up vote
0
down vote
favorite
I have a pandas dataframe containing 2 columns and I want to use sklearn TfidfVectorizer
for text-classification in one of them. However this column is a list of lists and TFIDF wants raw input as text. In this question they provide a solution in case we have just one list of lists, but I would like to ask how it would be possible to apply this function in every single row of my dataframe, which row contains a list of lists. Thank you in advance.
Input:
0 [[this, is, the], [first, row], [of, dataframe]]
1 [[that, is, the], [second], [row, of, dataframe]]
2 [[etc], [etc, etc]]
Wanted Output:
0 ['this is the', 'first row', 'of dataframe']
1 ['that is the', 'second', 'row of dataframe']
2 ['etc', 'etc etc']
python list dataframe tfidfvectorizer
Could you add some sample input?
– Daniel Mesejo
Nov 8 at 10:06
I updated the question Daniel
– joasa
Nov 8 at 10:15
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a pandas dataframe containing 2 columns and I want to use sklearn TfidfVectorizer
for text-classification in one of them. However this column is a list of lists and TFIDF wants raw input as text. In this question they provide a solution in case we have just one list of lists, but I would like to ask how it would be possible to apply this function in every single row of my dataframe, which row contains a list of lists. Thank you in advance.
Input:
0 [[this, is, the], [first, row], [of, dataframe]]
1 [[that, is, the], [second], [row, of, dataframe]]
2 [[etc], [etc, etc]]
Wanted Output:
0 ['this is the', 'first row', 'of dataframe']
1 ['that is the', 'second', 'row of dataframe']
2 ['etc', 'etc etc']
python list dataframe tfidfvectorizer
I have a pandas dataframe containing 2 columns and I want to use sklearn TfidfVectorizer
for text-classification in one of them. However this column is a list of lists and TFIDF wants raw input as text. In this question they provide a solution in case we have just one list of lists, but I would like to ask how it would be possible to apply this function in every single row of my dataframe, which row contains a list of lists. Thank you in advance.
Input:
0 [[this, is, the], [first, row], [of, dataframe]]
1 [[that, is, the], [second], [row, of, dataframe]]
2 [[etc], [etc, etc]]
Wanted Output:
0 ['this is the', 'first row', 'of dataframe']
1 ['that is the', 'second', 'row of dataframe']
2 ['etc', 'etc etc']
Input:
0 [[this, is, the], [first, row], [of, dataframe]]
1 [[that, is, the], [second], [row, of, dataframe]]
2 [[etc], [etc, etc]]
Input:
0 [[this, is, the], [first, row], [of, dataframe]]
1 [[that, is, the], [second], [row, of, dataframe]]
2 [[etc], [etc, etc]]
python list dataframe tfidfvectorizer
python list dataframe tfidfvectorizer
edited Nov 8 at 10:14
asked Nov 8 at 10:04
joasa
168116
168116
Could you add some sample input?
– Daniel Mesejo
Nov 8 at 10:06
I updated the question Daniel
– joasa
Nov 8 at 10:15
add a comment |
Could you add some sample input?
– Daniel Mesejo
Nov 8 at 10:06
I updated the question Daniel
– joasa
Nov 8 at 10:15
Could you add some sample input?
– Daniel Mesejo
Nov 8 at 10:06
Could you add some sample input?
– Daniel Mesejo
Nov 8 at 10:06
I updated the question Daniel
– joasa
Nov 8 at 10:15
I updated the question Daniel
– joasa
Nov 8 at 10:15
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
You could use apply:
import pandas as pd
df = pd.DataFrame(data=[[[['this', 'is', 'the'], ['first', 'row'], ['of', 'dataframe']]],
[[['that', 'is', 'the'], ['second'], ['row', 'of', 'dataframe']]]],
columns=['paragraphs'])
df['result'] = df['paragraphs'].apply(lambda xs: [' '.join(x) for x in xs])
print(df['result'])
Output
0 [this is the, first row, of dataframe]
1 [that is the, second, row of dataframe]
Name: result, dtype: object
Further, if you want to apply the vectorizer in conjunction with the above function you could do something like this:
def vectorize(xs, vectorizer=TfidfVectorizer(min_df=1, stop_words="english")):
text = [' '.join(x) for x in xs]
return vectorizer.fit_transform(text)
df['vectors'] = df['paragraphs'].apply(vectorize)
print(df['vectors'].values)
Is this result normal?[<10x17 sparse matrix of type '<class 'numpy.float64'>'
` with 19 stored elements in Compressed Sparse Row format>`<644x855 sparse matrix of type '<class 'numpy.float64'>'
with 3092 stored elements in Compressed Sparse Row format>
– joasa
Nov 8 at 10:40
1
@joasa That is because vectorizer.fit_transform returns a sparse matrix, by applying it to each cell you get a column of sparse matrices.
– Daniel Mesejo
Nov 8 at 10:43
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You could use apply:
import pandas as pd
df = pd.DataFrame(data=[[[['this', 'is', 'the'], ['first', 'row'], ['of', 'dataframe']]],
[[['that', 'is', 'the'], ['second'], ['row', 'of', 'dataframe']]]],
columns=['paragraphs'])
df['result'] = df['paragraphs'].apply(lambda xs: [' '.join(x) for x in xs])
print(df['result'])
Output
0 [this is the, first row, of dataframe]
1 [that is the, second, row of dataframe]
Name: result, dtype: object
Further, if you want to apply the vectorizer in conjunction with the above function you could do something like this:
def vectorize(xs, vectorizer=TfidfVectorizer(min_df=1, stop_words="english")):
text = [' '.join(x) for x in xs]
return vectorizer.fit_transform(text)
df['vectors'] = df['paragraphs'].apply(vectorize)
print(df['vectors'].values)
Is this result normal?[<10x17 sparse matrix of type '<class 'numpy.float64'>'
` with 19 stored elements in Compressed Sparse Row format>`<644x855 sparse matrix of type '<class 'numpy.float64'>'
with 3092 stored elements in Compressed Sparse Row format>
– joasa
Nov 8 at 10:40
1
@joasa That is because vectorizer.fit_transform returns a sparse matrix, by applying it to each cell you get a column of sparse matrices.
– Daniel Mesejo
Nov 8 at 10:43
add a comment |
up vote
1
down vote
accepted
You could use apply:
import pandas as pd
df = pd.DataFrame(data=[[[['this', 'is', 'the'], ['first', 'row'], ['of', 'dataframe']]],
[[['that', 'is', 'the'], ['second'], ['row', 'of', 'dataframe']]]],
columns=['paragraphs'])
df['result'] = df['paragraphs'].apply(lambda xs: [' '.join(x) for x in xs])
print(df['result'])
Output
0 [this is the, first row, of dataframe]
1 [that is the, second, row of dataframe]
Name: result, dtype: object
Further, if you want to apply the vectorizer in conjunction with the above function you could do something like this:
def vectorize(xs, vectorizer=TfidfVectorizer(min_df=1, stop_words="english")):
text = [' '.join(x) for x in xs]
return vectorizer.fit_transform(text)
df['vectors'] = df['paragraphs'].apply(vectorize)
print(df['vectors'].values)
Is this result normal?[<10x17 sparse matrix of type '<class 'numpy.float64'>'
` with 19 stored elements in Compressed Sparse Row format>`<644x855 sparse matrix of type '<class 'numpy.float64'>'
with 3092 stored elements in Compressed Sparse Row format>
– joasa
Nov 8 at 10:40
1
@joasa That is because vectorizer.fit_transform returns a sparse matrix, by applying it to each cell you get a column of sparse matrices.
– Daniel Mesejo
Nov 8 at 10:43
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You could use apply:
import pandas as pd
df = pd.DataFrame(data=[[[['this', 'is', 'the'], ['first', 'row'], ['of', 'dataframe']]],
[[['that', 'is', 'the'], ['second'], ['row', 'of', 'dataframe']]]],
columns=['paragraphs'])
df['result'] = df['paragraphs'].apply(lambda xs: [' '.join(x) for x in xs])
print(df['result'])
Output
0 [this is the, first row, of dataframe]
1 [that is the, second, row of dataframe]
Name: result, dtype: object
Further, if you want to apply the vectorizer in conjunction with the above function you could do something like this:
def vectorize(xs, vectorizer=TfidfVectorizer(min_df=1, stop_words="english")):
text = [' '.join(x) for x in xs]
return vectorizer.fit_transform(text)
df['vectors'] = df['paragraphs'].apply(vectorize)
print(df['vectors'].values)
You could use apply:
import pandas as pd
df = pd.DataFrame(data=[[[['this', 'is', 'the'], ['first', 'row'], ['of', 'dataframe']]],
[[['that', 'is', 'the'], ['second'], ['row', 'of', 'dataframe']]]],
columns=['paragraphs'])
df['result'] = df['paragraphs'].apply(lambda xs: [' '.join(x) for x in xs])
print(df['result'])
Output
0 [this is the, first row, of dataframe]
1 [that is the, second, row of dataframe]
Name: result, dtype: object
Further, if you want to apply the vectorizer in conjunction with the above function you could do something like this:
def vectorize(xs, vectorizer=TfidfVectorizer(min_df=1, stop_words="english")):
text = [' '.join(x) for x in xs]
return vectorizer.fit_transform(text)
df['vectors'] = df['paragraphs'].apply(vectorize)
print(df['vectors'].values)
answered Nov 8 at 10:19
Daniel Mesejo
7,7191821
7,7191821
Is this result normal?[<10x17 sparse matrix of type '<class 'numpy.float64'>'
` with 19 stored elements in Compressed Sparse Row format>`<644x855 sparse matrix of type '<class 'numpy.float64'>'
with 3092 stored elements in Compressed Sparse Row format>
– joasa
Nov 8 at 10:40
1
@joasa That is because vectorizer.fit_transform returns a sparse matrix, by applying it to each cell you get a column of sparse matrices.
– Daniel Mesejo
Nov 8 at 10:43
add a comment |
Is this result normal?[<10x17 sparse matrix of type '<class 'numpy.float64'>'
` with 19 stored elements in Compressed Sparse Row format>`<644x855 sparse matrix of type '<class 'numpy.float64'>'
with 3092 stored elements in Compressed Sparse Row format>
– joasa
Nov 8 at 10:40
1
@joasa That is because vectorizer.fit_transform returns a sparse matrix, by applying it to each cell you get a column of sparse matrices.
– Daniel Mesejo
Nov 8 at 10:43
Is this result normal?
[<10x17 sparse matrix of type '<class 'numpy.float64'>'
` with 19 stored elements in Compressed Sparse Row format>` <644x855 sparse matrix of type '<class 'numpy.float64'>'
with 3092 stored elements in Compressed Sparse Row format>
– joasa
Nov 8 at 10:40
Is this result normal?
[<10x17 sparse matrix of type '<class 'numpy.float64'>'
` with 19 stored elements in Compressed Sparse Row format>` <644x855 sparse matrix of type '<class 'numpy.float64'>'
with 3092 stored elements in Compressed Sparse Row format>
– joasa
Nov 8 at 10:40
1
1
@joasa That is because vectorizer.fit_transform returns a sparse matrix, by applying it to each cell you get a column of sparse matrices.
– Daniel Mesejo
Nov 8 at 10:43
@joasa That is because vectorizer.fit_transform returns a sparse matrix, by applying it to each cell you get a column of sparse matrices.
– Daniel Mesejo
Nov 8 at 10:43
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53205421%2fapply-tfidfvectorizer-in-every-row-of-dataframe-that-is-a-list-of-lists%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Could you add some sample input?
– Daniel Mesejo
Nov 8 at 10:06
I updated the question Daniel
– joasa
Nov 8 at 10:15