Fork me on GitHub
#data-science
<
2021-12-22
>
danielglauser17:12:42

I've got a simple data of three columns, (A) a name, (B) a value, and (C) a count. The values (column B) are a limited set of values but the names (column A) are all over the place. There are about 30K total rows. We have values for 6K out of the 30K. I'd like to see if I can train a classifier to categorize the rest. If I were to try and build something can anyone recommend a library to start with?