Machine Learning in Clojure with libpython‑clj: Unlocking Causal Insights Using Microsoft’s EconML [Series 3] - Flexiana
avatar

Jiri Knesl

Posted on 22nd December 2025

Machine Learning in Clojure with libpython‑clj: Unlocking Causal Insights Using Microsoft’s EconML [Series 3]

news-paper Clojure | News | Software Development |

“Beyond A/B testing: causal inference meets functional programming.”

In the first two parts of this series, we kept our core in Clojure while still using the best of Python.

╰┈➤ Series 1: In the first part, we explore how libpython-clj lets you use Python’s machine learning libraries right from the JVM, without needing to jump between languages.

╰┈➤ Series 2: In the second part, we explored Bayesian Networks. They are great for building models that actually make sense to humans, not just machines. That is a big deal in fields like healthcare or finance, where you cannot rely on black-box answers. Clarity is important. 

In Series 3, we focus on causal inference using EconML from Microsoft Research

╰┈➤ Predictive models answer “what might happen?”

╰┈➤ Causal inference asks “why did it happen?” and “what if we change the action?” 

That difference is essential when you need decisions you can trust, not just guesswork.

A simple A/B test might say a campaign increased conversions by 5%. EconML goes a layer deeper. It shows that students saw a 20% increase, while retirees saw no change. So you don ot get an average. You get heterogeneous treatment effects across segments. That is what you need to act with confidence.

If you work in Clojure, the process does not really change. You write the same clean and functional code. When you need Python’s causal tools, just use libpython-clj. You run your models, whether they use observational or experimental data. Then send the results right back into your JVM apps, without leaving Clojure.

Where this approach really shines:

╰┈➤ Dynamic pricing: Set prices by segment, not with a one-size-fits-all approach.

╰┈➤ Marketing: Focus on people who actually benefit, skip the rest.

╰┈➤ Healthcare: See how treatments work for different patient groups.

╰┈➤ Policy: Compare what works and break it down by demographic.

No need to make big promises. You get smarter decisions with models that separate signals from noise and show what’s making a difference for different people.

An infographic that shows “Prediction: Discount → +5% sales” vs “Causality: Students +20%, retirees no effect.”

🔗 Internal links:

[Series 1: ML in Clojure with libpython‑clj]

[Series 2: Bayesian Networks with libpython‑clj]

🌐 External link: Microsoft Research EconML – This page provides the official library overview with docs, papers, and examples.

The Problem with Traditional ML

Most traditional ML models are built on the idea that:

╰┈➤ One pattern fits the whole population

╰┈➤ One prediction applies to everyone

╰┈➤ One model captures the “average” relationship

This works only when people behave similarly, which they don’t. This is the homogeneity assumption.

A/B testing compares two groups and reports one average result.

But that average hides:

╰┈➤ Who loved it

╰┈➤ Who didn’t care

╰┈➤ Who reacted negatively

This variation is called heterogeneous treatment effects, the same issue ML struggles with.

“Traditional ML and A/B testing both look at averages. But averages hide differences. People don’t respond the same way, so relying on one number leads to decisions that don’t match what your audience actually needs.”

What should we monitor?

╰┈➤ Segment differences: Don’t stop at the average. Break results down by audience groups to see who benefits and who doesn’t.

╰┈➤ Adverse effects: A “winning” variant can still hurt certain groups. Look for where performance drops. 

╰┈➤ Context matters: Timing, demographics, past behavior, and geography all shape how people respond.

The Contrast Explained:

MetricOverall ImpactKey Takeaway
Aggregate Lift ($+5\%$ Overall)Shows mild success, but hides differences between groups.Average view is misleading- do not assume one strategy fits all.
Segmented LiftStudents ($+20\%$)Robust response.Action: Invest more in promotions for this group.
Bargain Hunters ($+12\%$)Strong positive effect.Action: Keep a moderate investment and try new offers.
Retirees ($0\%$)No impact.Action: Stop spending here and move the budget elsewhere.

Traditional ML predicts outcomes from inputs. Useful, but it does not tell you why things happen. It will not tell you what changes will occur if you take a different action. That is the gap causal inference fills.

Causal methods aim to separate correlation from cause. They work with real‑world logs and customer behavior (observational data). They also work with controlled trials, such as A/B tests (experimental data). You can use them even when a clean experiment is not feasible.

Here is where it helps:

╰┈➤ Dynamic pricing: Adjust prices based on how different segments actually respond, not just who looks similar.

╰┈➤ Churn reduction: Treatment effect estimates show which actions reduce cancellations, and for whom.

╰┈➤ Policy evaluation: Compare new vs old programs and see effects across demographics (heterogeneous effects).

And that is why it matters: with causal inference, you move from “what’s likely” to “what works,” using observational data or experiments when you have them. 

Introducing EconML

EconML is a Microsoft Research library. It estimates causal effects using machine learning. Most ML predicts outcomes. EconML asks why an outcome happened, and what would change if you took a different action.

The core method is called Double Machine Learning (DML). It trains two models, not one:

╰┈➤ Propensity score model: Estimates the probability of receiving a treatment (e.g., whether a customer is likely to receive a discount).

╰┈➤ Outcome model: Predicts the result of that treatment (like whether the discount leads to a purchase).

By combining these, EconML helps separate correlation from causation. That makes the insights more dependable than simple averages.

Diagram of the DML workflow — inputs → propensity score → outcome model → causal effect estimate.

  • Marketing campaign effectiveness: Identify which groups benefit from a promotion. Students may respond well, but retirees show nothing. Spend your money where it works, skip the rest.
  • Dynamic pricing: Set prices based on how different customers respond. Younger customers typically seek deals, while loyal customers are less price-sensitive. Do not just price for the average- match prices to the people. 
  • Medical treatment outcomes: Figure out how the impact of treatment changes with age, gender, or medical background. These details help doctors tailor care to each patient.
  • SaaS churn reduction: Identify which actions actually keep people from canceling- and who benefits from them. Focus on what makes a difference, and drop the stuff that does not.
  • Policy impact in economics: Compare new and old programs for different groups. Focus on policies that have a meaningful positive impact. Microsoft Research shares real examples and case studies using the EconML toolkit.

Diagram of segment‑level effects for each use case.

╰┈➤ Marketing campaign effectiveness.

╰┈➤ Dynamic pricing.

╰┈➤ Medical treatment outcomes.

╰┈➤ SaaS churn reduction.

╰┈➤ Policy impact in economics.

🌐 External link: Microsoft Research EconML – Case Studies – This page shows how EconML is used in real cases. It’s the official source from Microsoft Research.

Why EconML + Clojure via libpython‑clj

You do not have to leave the JVM just to use Python’s machine learning tools. With libpython-clj, you can use Python libraries like EconML right from your Clojure code. Reuse your old machine learning scripts, call familiar Python functions, and stay in your Clojure environment. It is all connected. There is no need to jump between languages or platforms.

╰┈➤ Faster experimentation: Test new ideas quickly without jumping between different tech stacks.

╰┈➤ Expressive functional code: you get the simplicity of Clojure and the power of Python’s machine learning tools.

╰┈➤ JVM ecosystem integration: your results move straight into enterprise systems without any awkward workaround code.

╰┈➤ Lower barriers: You can just build on what you have- no need to start over from scratch.

Let’s say you want to figure out what happens when you send out newsletters.

Dataset structure:

╰┈➤ XXX = It stands for things you know about your customers

╰┈➤ TTT = It tells you if they got a newsletter

╰┈➤ YYY = It shows whether they bought something or how much revenue you made.

EconML calculates τ(X), which indicates how much additional revenue you gain from sending a newsletter, broken down by customer type. Instead of just giving you one big average, you actually see how different groups react. For example,

╰┈➤ Dormant customers suddenly spend 30% more.

╰┈➤ VIP buyers spend 2% less when you send them a newsletter. (Negative effect)

╰┈➤ Bargain hunters go up by 12%, but only in certain situations. (Conditional effect)

So, now you know exactly who likes your emails- and who does not.

Table Comparing A/B Test vs EconML Uplift

SegmentA/B Test Result (Average)EconML Uplift (Segment‑level)Key Takeaway
Aggregate+5% overall liftAverage hides subgroup variation
StudentsNot visible on average+20%Strong positive effect → invest more
Bargain HuntersNot visible on average+12%Moderate effect → keep testing offers
RetireesNot visible on average0%No effect → stop spending here

Applying EconML in Practice

After EconML gives you the uplift scores for each customer, you have what you need to make wise choices. Here is how it usually goes:

1️⃣ Rank everyone on your mailing list by their predicted lift.

2️⃣ Choose the top half of those with an uplift above zero- and send your newsletters to them. 

3️⃣ Skip the bottom half to save yourself time and avoid any negative impact.

4️⃣ Customize the newsletter for each group. Give each segment the version that best fits them, so you are not just blasting the same thing to everyone.

Flowchart of the decision layer.

╰┈➤ Control A/B test: +$50k revenue

╰┈➤ EconML targeting: +$65k revenue

The uplift comes from sending fewer emails that do not matter: less spam, better delivery, and more revenue.

Bar chart comparing outcomes.

Technical Deep Dive: EconML Under the Hood

Residualization is EconML’s method for improving the reliability of causal estimates.

╰┈➤ It works by predicting what would’ve happened if the treatment had never occurred—the counterfactual outcome. 

╰┈➤ To avoid overfitting, EconML splits the data into training and validation sets. 

╰┈➤ There is also cross-fitting: models train on one subset of the data and are evaluated on a different subset. 

All of this helps cut through the noise and get to the real causal signals.  

EconML also supports causal forests, which are decision trees designed to capture heterogeneous treatment effects.

╰┈➤ They split the data into subgroups and estimate effects for each branch.

╰┈➤ This helps discover new customer segments that respond differently to interventions.

Example: “Customers under 35 who browse frequently and have not bought in 60 days increase spend by 22% when shown Instagram ads.”

Tree diagram of a causal forest. Branches by age, browsing behavior, and purchase history, with treatment effects at each leaf.

Flexiana’s Role

Flexiana has been building Clojure solutions for 9 years. Our team brings deep expertise in functional programming, machine learning integration, and global software delivery. Our projects span healthcare, fintech, SaaS, and enterprise systems.

╰┈➤ To empower teams with causal inference tools.

╰┈➤ To make advanced ML accessible to Clojure developers.

Flexiana’s focus is clear: Help organizations use causal ML without leaving their Clojure stack.

🔗 Internal link: Flexiana’s About page 

🌐 External link: Flexiana GitHub

FAQs (People Also Ask)

EconML is a Python library that helps you figure out the cause-and-effect of your actions. It uses observational or experimental data and applies machine learning to econometric models. The goal is to determine why an intervention (or “treatment”) led to a specific outcome. It is about moving past simple prediction to understand individualized treatment effects (ITE).

They have different goals. A/B testing tells you the average effect- it answers whether a change works for everyone overall. EconML focuses on the heterogeneous effects- it tells you who is most (or least) impacted by that change.

Plus, EconML can use data you already have (observational data) to target people better, saving you the time and cost of running a separate experiment for every targeting idea.

Yes, that’s what it is built for. Observational data is messy. EconML uses innovative techniques, such as Double Machine Learning (DML), to manage the many variables that can skew your results. This helps it address common issues such as selection bias, yielding honest, reliable causal estimates from non-experimental data.

It is about using the best tool for each job. Python has the best ML libraries (scikit-learn, EconML, TensorFlow) for building the models. Clojure, running on the Java Virtual Machine (JVM), provides a robust, concurrent, and highly stable production environment for running models at scale. You get Python’s excellent science ecosystem with the JVM’s rock-solid backend.

Think of a causal forest as a special kind of random forest. In regular random forests, the tree splits are based on predicting an outcome. In a causal forest (such as CausalForestDML), tree splits are based on maximizing the difference in the treatment effect between groups. This enables the algorithm to quickly identify and highlight the specific customer traits (features) that drive the uplift variation.

Conclusion: The Future of ML in Clojure

EconML changes how we use machine learning. Predictive models tell us what might happen. EconML helps explain why it happens and what changes if we act differently. That is useful when you need decisions based on cause and effect rather than averages.

With Clojure and libpython‑clj, you get a clean, functional way to build models while reusing Python’s ML libraries. It is simple to keep your JVM stack while still leveraging proven tools.

╰┈➤ Expressive code: Your code stays straightforward and easy to follow.

╰┈➤ Python interop: You can use existing ML libraries without leaving the JVM.

╰┈➤ Enterprise fit: You can send those causal insights straight into production systems- no extra steps.

Together, Clojure and EconML make machine learning more than just predictions. You can test faster, ship better, and actually trust what your models tell you.

Explore EconML with Flexiana. Let’s build causal ML solutions together.

🔗 Internal link: Contact Flexiana page 

🌐 External link: libpython‑clj GitHub repo