data-science

2024-02-23T05:35:22.549399Z

Hello. friends! I used to process data with Python Pandas but have been using TMD (and tablecloth) for a few months now. I’ve been using it really well so far without any problems. I would like to thank the @chris441 and scicloj community for creating such a wonderful library. Now I’m facing a very minor problem. This is an issue where “NA” is treated as nil when saving and reading the dataset as csv. I want to read the original string as is without changing it to nil. For example, Pandas’ read_csv function has a keep_default_na parameter. If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN. Are there similar options for TMD? I couldn’t find it.

Harold 2024-02-23T17:16:35.419609Z

Good question. Looks like the short answer is 'no': https://github.com/techascent/tech.ml.dataset/blob/daaf47037a597902897cbc947af23fe832cb3c6b/src/tech/v3/dataset/io/column_parsers.clj#L184 You may need to pre- or post-process the data for now. I threw together a quick patch: https://github.com/techascent/tech.ml.dataset/pull/399 • Probably needs some thought before it lands, but at least has a test in it hth

2024-02-24T01:45:04.820419Z

@hhausman Thank you for the reply and the lightning patch. 👍

👍 1
chrisn 2024-02-24T15:04:50.445939Z

That patch is now released - the option is as yet undocumented but as Harold noted it is tested and you can derive how to use it from the testing.

2025-07-16T14:52:41.028309Z

Was/is there a reason this is not an option that is respected for the fixed type as well? e.g. if we want to read in a CSV, but not promote values and keep all types as strings, while still keeping an NA values in the column(s)?

2025-07-16T15:10:01.206159Z

wondering if it is worth opening a PR for supporting the option on the fixed type parser

2025-07-16T16:25:30.021269Z

👍