python - Document classification in spark mllib -

- March 15, 2015

i want classify documents if belong sports, entertainment, politics. have created bag of words output somthing :

(1, 'saurashtra') (1, 'saumyajit') (1, 'satyendra')

i want implement naive bayes algorithm classification using spark mllib. question how convert output can naive bayes use input classifcation rdd or if there trick can convert directly html files can used mllib naive bayes.

for text classification, need:

a word dictionary
convert document vector using dictionary
label document vectors:

doc_vec1 -> label1

doc_vec2 -> label2

...

this sample pretty straghtforward.

Search This Blog

Erty

python - Document classification in spark mllib -

Comments

Post a Comment

Popular posts from this blog

Cross-Compiling Linux Kernel for Raspberry Pi - ${CCPREFIX}gcc -v does not work -

c++ - llvm function pass ReplaceInstWithInst malloc -

python - IO.UnsupportedOperation: Not Writable -