Apply TfidfVectorizer in every row of dataframe that is a list of lists

Multi tool use
up vote
0
down vote
favorite
I have a pandas dataframe containing 2 columns and I want to use sklearn TfidfVectorizer
for text-classification in one of them. However this column is a list of lists and TFIDF wants raw input as text. In this question they provide a solution in case we have just one list of lists, but I would like to ask how it would be possible to apply this function in every single row of my dataframe, which row contains a list of lists. Thank you in advance.
Input:
0 [[this, is, the], [first, row], [of, dataframe]]
1 [[that, is, the], [second], [row, of, dataframe]]
2 [[etc], [etc, etc]]
Wanted Output:
0 ['this is the', 'first row', 'of dataframe']
1 ['that is the', 'second', 'row of dataframe']
2 ['etc', 'etc etc']
python list dataframe tfidfvectorizer
add a comment |
up vote
0
down vote
favorite
I have a pandas dataframe containing 2 columns and I want to use sklearn TfidfVectorizer
for text-classification in one of them. However this column is a list of lists and TFIDF wants raw input as text. In this question they provide a solution in case we have just one list of lists, but I would like to ask how it would be possible to apply this function in every single row of my dataframe, which row contains a list of lists. Thank you in advance.
Input:
0 [[this, is, the], [first, row], [of, dataframe]]
1 [[that, is, the], [second], [row, of, dataframe]]
2 [[etc], [etc, etc]]
Wanted Output:
0 ['this is the', 'first row', 'of dataframe']
1 ['that is the', 'second', 'row of dataframe']
2 ['etc', 'etc etc']
python list dataframe tfidfvectorizer
Could you add some sample input?
– Daniel Mesejo
Nov 8 at 10:06
I updated the question Daniel
– joasa
Nov 8 at 10:15
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a pandas dataframe containing 2 columns and I want to use sklearn TfidfVectorizer
for text-classification in one of them. However this column is a list of lists and TFIDF wants raw input as text. In this question they provide a solution in case we have just one list of lists, but I would like to ask how it would be possible to apply this function in every single row of my dataframe, which row contains a list of lists. Thank you in advance.
Input:
0 [[this, is, the], [first, row], [of, dataframe]]
1 [[that, is, the], [second], [row, of, dataframe]]
2 [[etc], [etc, etc]]
Wanted Output:
0 ['this is the', 'first row', 'of dataframe']
1 ['that is the', 'second', 'row of dataframe']
2 ['etc', 'etc etc']
python list dataframe tfidfvectorizer
I have a pandas dataframe containing 2 columns and I want to use sklearn TfidfVectorizer
for text-classification in one of them. However this column is a list of lists and TFIDF wants raw input as text. In this question they provide a solution in case we have just one list of lists, but I would like to ask how it would be possible to apply this function in every single row of my dataframe, which row contains a list of lists. Thank you in advance.
Input:
0 [[this, is, the], [first, row], [of, dataframe]]
1 [[that, is, the], [second], [row, of, dataframe]]
2 [[etc], [etc, etc]]
Wanted Output:
0 ['this is the', 'first row', 'of dataframe']
1 ['that is the', 'second', 'row of dataframe']
2 ['etc', 'etc etc']
Input:
0 [[this, is, the], [first, row], [of, dataframe]]
1 [[that, is, the], [second], [row, of, dataframe]]
2 [[etc], [etc, etc]]
Input:
0 [[this, is, the], [first, row], [of, dataframe]]
1 [[that, is, the], [second], [row, of, dataframe]]
2 [[etc], [etc, etc]]
python list dataframe tfidfvectorizer
python list dataframe tfidfvectorizer
edited Nov 8 at 10:14
asked Nov 8 at 10:04


joasa
168116
168116
Could you add some sample input?
– Daniel Mesejo
Nov 8 at 10:06
I updated the question Daniel
– joasa
Nov 8 at 10:15
add a comment |
Could you add some sample input?
– Daniel Mesejo
Nov 8 at 10:06
I updated the question Daniel
– joasa
Nov 8 at 10:15
Could you add some sample input?
– Daniel Mesejo
Nov 8 at 10:06
Could you add some sample input?
– Daniel Mesejo
Nov 8 at 10:06
I updated the question Daniel
– joasa
Nov 8 at 10:15
I updated the question Daniel
– joasa
Nov 8 at 10:15
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
You could use apply:
import pandas as pd
df = pd.DataFrame(data=[[[['this', 'is', 'the'], ['first', 'row'], ['of', 'dataframe']]],
[[['that', 'is', 'the'], ['second'], ['row', 'of', 'dataframe']]]],
columns=['paragraphs'])
df['result'] = df['paragraphs'].apply(lambda xs: [' '.join(x) for x in xs])
print(df['result'])
Output
0 [this is the, first row, of dataframe]
1 [that is the, second, row of dataframe]
Name: result, dtype: object
Further, if you want to apply the vectorizer in conjunction with the above function you could do something like this:
def vectorize(xs, vectorizer=TfidfVectorizer(min_df=1, stop_words="english")):
text = [' '.join(x) for x in xs]
return vectorizer.fit_transform(text)
df['vectors'] = df['paragraphs'].apply(vectorize)
print(df['vectors'].values)
Is this result normal?[<10x17 sparse matrix of type '<class 'numpy.float64'>'
` with 19 stored elements in Compressed Sparse Row format>`<644x855 sparse matrix of type '<class 'numpy.float64'>'
with 3092 stored elements in Compressed Sparse Row format>
– joasa
Nov 8 at 10:40
1
@joasa That is because vectorizer.fit_transform returns a sparse matrix, by applying it to each cell you get a column of sparse matrices.
– Daniel Mesejo
Nov 8 at 10:43
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You could use apply:
import pandas as pd
df = pd.DataFrame(data=[[[['this', 'is', 'the'], ['first', 'row'], ['of', 'dataframe']]],
[[['that', 'is', 'the'], ['second'], ['row', 'of', 'dataframe']]]],
columns=['paragraphs'])
df['result'] = df['paragraphs'].apply(lambda xs: [' '.join(x) for x in xs])
print(df['result'])
Output
0 [this is the, first row, of dataframe]
1 [that is the, second, row of dataframe]
Name: result, dtype: object
Further, if you want to apply the vectorizer in conjunction with the above function you could do something like this:
def vectorize(xs, vectorizer=TfidfVectorizer(min_df=1, stop_words="english")):
text = [' '.join(x) for x in xs]
return vectorizer.fit_transform(text)
df['vectors'] = df['paragraphs'].apply(vectorize)
print(df['vectors'].values)
Is this result normal?[<10x17 sparse matrix of type '<class 'numpy.float64'>'
` with 19 stored elements in Compressed Sparse Row format>`<644x855 sparse matrix of type '<class 'numpy.float64'>'
with 3092 stored elements in Compressed Sparse Row format>
– joasa
Nov 8 at 10:40
1
@joasa That is because vectorizer.fit_transform returns a sparse matrix, by applying it to each cell you get a column of sparse matrices.
– Daniel Mesejo
Nov 8 at 10:43
add a comment |
up vote
1
down vote
accepted
You could use apply:
import pandas as pd
df = pd.DataFrame(data=[[[['this', 'is', 'the'], ['first', 'row'], ['of', 'dataframe']]],
[[['that', 'is', 'the'], ['second'], ['row', 'of', 'dataframe']]]],
columns=['paragraphs'])
df['result'] = df['paragraphs'].apply(lambda xs: [' '.join(x) for x in xs])
print(df['result'])
Output
0 [this is the, first row, of dataframe]
1 [that is the, second, row of dataframe]
Name: result, dtype: object
Further, if you want to apply the vectorizer in conjunction with the above function you could do something like this:
def vectorize(xs, vectorizer=TfidfVectorizer(min_df=1, stop_words="english")):
text = [' '.join(x) for x in xs]
return vectorizer.fit_transform(text)
df['vectors'] = df['paragraphs'].apply(vectorize)
print(df['vectors'].values)
Is this result normal?[<10x17 sparse matrix of type '<class 'numpy.float64'>'
` with 19 stored elements in Compressed Sparse Row format>`<644x855 sparse matrix of type '<class 'numpy.float64'>'
with 3092 stored elements in Compressed Sparse Row format>
– joasa
Nov 8 at 10:40
1
@joasa That is because vectorizer.fit_transform returns a sparse matrix, by applying it to each cell you get a column of sparse matrices.
– Daniel Mesejo
Nov 8 at 10:43
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You could use apply:
import pandas as pd
df = pd.DataFrame(data=[[[['this', 'is', 'the'], ['first', 'row'], ['of', 'dataframe']]],
[[['that', 'is', 'the'], ['second'], ['row', 'of', 'dataframe']]]],
columns=['paragraphs'])
df['result'] = df['paragraphs'].apply(lambda xs: [' '.join(x) for x in xs])
print(df['result'])
Output
0 [this is the, first row, of dataframe]
1 [that is the, second, row of dataframe]
Name: result, dtype: object
Further, if you want to apply the vectorizer in conjunction with the above function you could do something like this:
def vectorize(xs, vectorizer=TfidfVectorizer(min_df=1, stop_words="english")):
text = [' '.join(x) for x in xs]
return vectorizer.fit_transform(text)
df['vectors'] = df['paragraphs'].apply(vectorize)
print(df['vectors'].values)
You could use apply:
import pandas as pd
df = pd.DataFrame(data=[[[['this', 'is', 'the'], ['first', 'row'], ['of', 'dataframe']]],
[[['that', 'is', 'the'], ['second'], ['row', 'of', 'dataframe']]]],
columns=['paragraphs'])
df['result'] = df['paragraphs'].apply(lambda xs: [' '.join(x) for x in xs])
print(df['result'])
Output
0 [this is the, first row, of dataframe]
1 [that is the, second, row of dataframe]
Name: result, dtype: object
Further, if you want to apply the vectorizer in conjunction with the above function you could do something like this:
def vectorize(xs, vectorizer=TfidfVectorizer(min_df=1, stop_words="english")):
text = [' '.join(x) for x in xs]
return vectorizer.fit_transform(text)
df['vectors'] = df['paragraphs'].apply(vectorize)
print(df['vectors'].values)
answered Nov 8 at 10:19


Daniel Mesejo
7,7191821
7,7191821
Is this result normal?[<10x17 sparse matrix of type '<class 'numpy.float64'>'
` with 19 stored elements in Compressed Sparse Row format>`<644x855 sparse matrix of type '<class 'numpy.float64'>'
with 3092 stored elements in Compressed Sparse Row format>
– joasa
Nov 8 at 10:40
1
@joasa That is because vectorizer.fit_transform returns a sparse matrix, by applying it to each cell you get a column of sparse matrices.
– Daniel Mesejo
Nov 8 at 10:43
add a comment |
Is this result normal?[<10x17 sparse matrix of type '<class 'numpy.float64'>'
` with 19 stored elements in Compressed Sparse Row format>`<644x855 sparse matrix of type '<class 'numpy.float64'>'
with 3092 stored elements in Compressed Sparse Row format>
– joasa
Nov 8 at 10:40
1
@joasa That is because vectorizer.fit_transform returns a sparse matrix, by applying it to each cell you get a column of sparse matrices.
– Daniel Mesejo
Nov 8 at 10:43
Is this result normal?
[<10x17 sparse matrix of type '<class 'numpy.float64'>'
` with 19 stored elements in Compressed Sparse Row format>` <644x855 sparse matrix of type '<class 'numpy.float64'>'
with 3092 stored elements in Compressed Sparse Row format>
– joasa
Nov 8 at 10:40
Is this result normal?
[<10x17 sparse matrix of type '<class 'numpy.float64'>'
` with 19 stored elements in Compressed Sparse Row format>` <644x855 sparse matrix of type '<class 'numpy.float64'>'
with 3092 stored elements in Compressed Sparse Row format>
– joasa
Nov 8 at 10:40
1
1
@joasa That is because vectorizer.fit_transform returns a sparse matrix, by applying it to each cell you get a column of sparse matrices.
– Daniel Mesejo
Nov 8 at 10:43
@joasa That is because vectorizer.fit_transform returns a sparse matrix, by applying it to each cell you get a column of sparse matrices.
– Daniel Mesejo
Nov 8 at 10:43
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53205421%2fapply-tfidfvectorizer-in-every-row-of-dataframe-that-is-a-list-of-lists%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
BZgRvnJNIBRz,FH8vL3w3hOzvkbN MFo49bDq11d
Could you add some sample input?
– Daniel Mesejo
Nov 8 at 10:06
I updated the question Daniel
– joasa
Nov 8 at 10:15