Fork me on GitHub
Mark Wardle20:06:13

Announcing a new healthcare related project... v0.1.54 - "interoperable outcomes research tools'... A Clojure-based 'Swiss Army knife' library and a command-line tool for the OMOP CDM (common data model). Many healthcare enterprises extract data from their operational clinical systems and aggregate into a 'standard' CDM to permit downstream analytics. The OMOP CDM is one example of this, fairly widely used, and includes a standardised vocabulary to aid maintaining the semantics of health care data across and between systems. The official OMOP tooling is a combination of R, markdown and some Java, and doesn't work for SQLite either and isn't easy to use in non R based data pipelines. iort replaces this with an easy to use library and command line interface, and it will provide a HTTP server as well in a future release, and it is built in Clojure. Still at an early stage of development and feedback very welcome.

🎉 14

Sounds interesting. Do you know of any datasets in the mental health space in this format?

Mark Wardle19:06:46

Thanks! No I'm sorry I don't know any published datasets, although it is likely we will use internally for this purpose.


You will? Would you mind sharing what you’re working on?

Mark Wardle07:06:32

We have multiple PAS and multiple EPRs so building a sane intermediary representation that analytics can work with is attractive. Once the basics are working, I can see we could take an extract from our ePMA system for example, which will be working across our acute and mental health facilities and so add treatment information across our sites. The old approach has been analysts have build their analyses against the fairly raw, untransformed data, which means you just end up reinforcing the operational silos that already exist. At the same time, I have a running specialist EPR I wrote in Java many years ago, that I'm slowly re-engineering into Clojure piece-by-piece. People are always wanting data extracts for analytics, usually de-identified. I currently export as zip files containing CSV files tailored for a specific use, but I want to use the same tooling (`iort`) for dynamically generating a SQLite based OMOP CDM extract with other data from other sources downstream. It's not dissimilar to this approach from KCH and GSTT: I already have a fair number (~14) of Clojure based tools for processing data (see so I have the ability to transform as part of the wider ETL from a range of systems. That's the plan anyway.

Mark Wardle07:06:53

I've done a fair amount of real-world health data work previously so there are problems and pitfalls, but we don't do enough to monitor and improve and spot issues with routine clinical interventions/outcomes. I recorded a podcast with Vaughn Vernon recently on this ...

Mark Wardle07:06:48

Sadly there is a school of thought that the best way to do this is to all use the same system (E.g. everyone install Epic) but in my mind, healthcare professionals - AND patients - need tools for thought, and tools for decision making, and most EPRs do some things well, but some things very badly, or not at all. Therefore, there is need for iteration and improvement and disruption!


Wow, so you’re a neurologist and a clojure programmer! Sounds like you’ve got some great ideas about decision making. I’m going to be honest, this is an area is very important to me. I had a severe life-threatening reaction (akathisia) to a neuroleptic drug. When I went on twitter to campaign about it and raise awareness, the idea that some drugs could be harmful to such a degree that their use should be massively reduced was repeatedly attacked by the mainstream. They liked to assert that my (and other patients) anecdotes were not evidence, and that the evidence supports use of the drugs without the consent of the patient. But the evidence mainly comes from pharma funded RCTs…. so how do you get good data? And who is going to do the analysis - I’d be interested in doing some Bayesian analysis on the use of neuroleptics and antidepressants etc but I’d need population data and I don’t have any research funding (I’m just a dev with schizophrenia) so it’s a stalemate. Am I wrong in understanding that you’re working in this space - I haven’t listened to the podcast but did I misunderstand one of the things you’re doing is making anonymised records that could be used for analysis of the use of drugs (and other interventions) out in the general population.

Mark Wardle09:06:53

Thanks Dan - no these datasets wouldn't be published publicly sorry. Ben Goldacre has done some work in this area with TREs - because the reality is that you can't de-identify when you know lots about a patient. You only need a handful of different sources of information to permit fairly easy re-identification when you release such datasets into the wild, unless you take very active steps to scrub or blur that data. Sadly, most of the treatments I use can have catastrophic side effects, and so its a much more nuanced 'balancing' act weighing up the potential pros+cons, and understanding those in context, and data, and good information systems, should improve that process and the uncertainty. The advent of precision medicine, and better genetics, might also add some interest into the mix - e.g. I already avoid using drugs like azathioprine if I know a patient will not metabolise it well through checking their TPMT levels... this will I am sure increase in the future. There are some general population datasets around - notably relating to prescribing - again see Ben Goldacre's work.


" I already avoid using drugs like azathioprine if I know a patient will not metabolise it well through checking their TPMT levels” Oh my, that sounds encouraging. I’ve heared that a drug like Olanzapine (one of the neuroleptics that caused my ADR) is metabolized at a rate that can vary between individuals by a factor of 20 - but nothing is done to check that - it’s just given out routinely at standard doses. I thought nothing can be done…. What is TPMT?


Thanks for the link to openprescribing - this sounds interesting - is this all aggregate data. Is it impossible (or too big) to use to investigate causal relationships. I get what you’re saying about re-identification - what does that mean, does that mean only trusted people can work with that data, or that no-one can?


@U013CFKNP2R Very cool, thanks for sharing. Will try to have a look 😀

👍 1
Mark Wardle11:06:54

Hi @U051H1KL1 - see TPMT (Thiopurine methyl transferase) (http://gloshospitals.nhs.uk - this kind of approach will be increasingly important in the future, tailoring therapies to the known genotype and other 'omics'. Generally for causal relationships we want as much data as possible... not sure there is 'too big' nowadays 🙂 You should think of trusted research environments (TREs) as smoke cupboards in which trusted people can work. e.g. there is SAIL in Wales through which I can access pretty much all GP data, but de-identified. There are important limitations, but this work goes on all of the time - e.g. EMA will raise an alert about the safety of a drug and then the drug company will usually fund work to check usage - e.g. we've done this recently to look at safety and effectiveness of certain drugs used in multiple sclerosis. Drug safety is considered very carefully, and isn't a black and white... this is safe... this is unsafe issue.... but instead nuanced. When there are rare diseases, or rare complications, then you need big data and good systems to spot signals... but you have to remember the potential costs of not using drugs - e.g. hopefully as a clinician I use appropriately and cause more benefit than harm... but I cannot always avoid the latter. Think chemotherapy in cancer - we're using drugs (and other interventions like surgery, or procedures) with known risks so that's why medicine is hard and I had 18 years of training! But data, and information systems should be helping us with shared decision making and better decisions.

Mark Wardle11:06:48

The approach advocated by Ben Goldacre is to write analytics scripts and get them submitted to a TRE so that the team can supervise the use of such sensitive data, and simply get back the results, rather than having access to the raw data in an unsupervised fashion. The SAIL databank in Wales uses a virtual desktop approach and very limited data ingress and egress akin to working in a smoke cupboard. Things like differential privacy, and new approaches like homomorphic encryption - in which we can share encrypted data and your analysis code runs against that encrypted data without need for decryption - may help.


Ah that makes sense from a security perspective - you write the analytics script, and that can be reviewed and checked that it’s not doing anything suspect that might result in deidentification and only the results of the script run are returned. Is that it?

👍 1
Mark Wardle11:06:49

Yes exactly. So going back to the original announcement... which is trying to work on a slightly different problem in that we have healthcare in multipledifferent systems all in different formats, so to make any sense of that, you need to do transformation and mapping. e.g. understanding drug codes etc. - the OMOP CDM is a potentially useful schema for healthcare for such purposes, and allow you to then build analytics steps downstream knowing that the hard work to do that standardisation has been done upfront... and not left for you to do badly and slightly differently to everyone else doing slightly similar, but slightly different things!


Yep that makes a lot of sense, the last thing you’d want as someone wanting to do analysis is to have to handle loads of slightly different representations from different systems in different formats, especially if everyone else is needing to do exactly the same steps too and may interpret the records differently

Mark Wardle11:06:37

Yes. So OHDSI provide a bunch of R tooling... but I can bake this into the output from an operational clinical system, and build schema in duckdb, or SQLite, or PostgreSQL from my CLI or in a pipeline more easily using this Clojure tooling... at least that's the hope. The OHDSI R tooling didn't work for SQLite for me for example.

❤️ 1

btw as a neurologist what do you think of Stasha Gominack’s work on the relationship between vitamin d, sleep, gut microbiome and neurological disorders (e.g. Is she a quack?


apologies for hijacking your announcement thread 😬

❤️ 1
Mark Wardle14:06:54

Never heard of her, and I'd fallback on looking at published literature, and double check conflicts of interest 🙂


Ha conflicts of interest seem very hard to check well. I saw an article in the guardian recently promoting antidepressants as “not being addictive” as an opinion piece by Carmine something. No conflicts of interests reported. But then I dug into it and his research won a NARSAD scholarship, which was a prize linked to a weird looking Brain and Behaviour org ( which supposedly was funded by “two generous families” (I assume the Sacklers are one of them). How do you go about checking COI?


Is that a conflict of interest? It seems pretty obvious linking it to her


I thought conflict of interests were those thorny hidden away things that you have to dig at? Am I wrong?

Mark Wardle14:06:44

A conflict of interest simply means you are potentially or actually biased because you are conflicted... If I'm recommending you buy 'doodahs' and I benefit financially from you buying 'doodahs', then there's rather little doubt that I have a conflict of interest. Indeed, that's the most egregious type. More subtle conflicts arise from companies sponsoring professionals, or providing educational resources.


If I wanted to fund a piece of independent research (e.g. in the area of neuroleptic use) how would I go about it? How much would it cost me?


No worries, I have a lead on this now 🙂

👍 1
Mark Wardle07:06:34

Lots of charities fund research and act as a go between with dedicated research commissioning staff - eg donations might fund a PhD for example.

Mark Wardle07:06:50

But that person can’t work alone so will be supervised so key is the wider team/dept/lead.

❤️ 1