https://clojurians.slack.com/archives/C8NUSGWG6/p1708715413365429
Hello. friends! I used to process data with Python Pandas but have been using TMD (and tablecloth) for a few months now. I’ve been using it really well so far without any problems. I would like to thank the @chris441 and scicloj community for creating such a wonderful library. Now I’m facing a very minor problem. This is an issue where “NA” is treated as nil when saving and reading the dataset as csv. I want to read the original string as is without changing it to nil. For example, Pandas’ read_csv function has a keep_default_na parameter. If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN. Are there similar options for TMD? I couldn’t find it.
Good question. Looks like the short answer is 'no': https://github.com/techascent/tech.ml.dataset/blob/daaf47037a597902897cbc947af23fe832cb3c6b/src/tech/v3/dataset/io/column_parsers.clj#L184 You may need to pre- or post-process the data for now. I threw together a quick patch: https://github.com/techascent/tech.ml.dataset/pull/399 • Probably needs some thought before it lands, but at least has a test in it hth
That patch is now released - the option is as yet undocumented but as Harold noted it is tested and you can derive how to use it from the testing.
Was/is there a reason this is not an option that is respected for the fixed type as well? e.g. if we want to read in a CSV, but not promote values and keep all types as strings, while still keeping an NA values in the column(s)?
wondering if it is worth opening a PR for supporting the option on the fixed type parser
Responded in zulip: https://clojurians.zulipchat.com/#narrow/channel/236259-tech.2Eml.2Edataset.2Edev/topic/.3Adisable-na-as-missing.3F.20option.20for.20FixedTypeParser
👍