当前位置: 动力学知识库 > 问答 > 编程问答 >

python - Classification fit get ValueError: setting an array element with a sequence

问题描述:

I want to predict if user click on link or not. I use logistic regression. I have got a lot of data for start. But on 23 examples i didn't get this exception. If i try 3mio data the i get this exception

The following is my code, adapted from the example on the scikit-learn website:

data = [line.strip() for line in open('dataforSVM.txt')]

pod=[];

listData=[];

y=[];

for i in range(0,len(data)):

splitData=data[i].split(',' );

tempPod=[];

for j in range(0,len(splitData)-1):

if isFloat(splitData[j]):

tempPod.append(float(splitData[j]));

y.append(float(splitData[j]));

pod.append(tempPod)

X=pod;

Y=y;

h = .02 # step size in the mesh

logreg = linear_model.LogisticRegression(C=1.0, class_weight='auto', dual=False, fit_intercept=True,

intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001)

Z=logreg.predict(X)

print Z

acc = accuracy_score(Y, Z)

print acc

I get error:

Traceback (most recent call last):

File "D:/Users/jures/Desktop/logisticRegression.py", line 45, in <module>

logreg.fit(X, Y)

File "C:\Python27\lib\site-packages\sklearn\svm\base.py", line 668, in fit

X = atleast2d_or_csr(X, dtype=np.float64, order="C")

File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 134, in atleast2d_or_csr

"tocsr", force_all_finite)

File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 111, in _atleast2d_or_sparse

force_all_finite=force_all_finite)

File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 91, in array2d

X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)

File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 320, in asarray

return array(a, dtype, copy=False, order=order)

ValueError: setting an array element with a sequence.

网友答案:

Your problem can be reproduced by using the following content for your data file:

1,1,0
A,3,1
5,5,0

Because of the if isFloat(splitData[j]) you ignore some values of your data for X. Therefore you end up with a 2D array pod in which some rows have less entries than others, resulting in an error. You should clean up your data and then get rid of that if.

Furthermore your y seems wrong to me. By using y.append(float(splitData[j])); you will use the last value of your for loop as j. But you don't stop that for loop at the last element of the row, but instead at the second to last element. So the last element in each of your data rows (which is usually the label) will be discarded. You probably want j+1 there.

分享给朋友:
您可能感兴趣的文章:
随机阅读: