Machine Learning in Clojure with libpython‑clj: Using Bayesian Networks for Smarter, Interpretable AI [Series 2]

This is the second part of our Machine Learning in Clojure with libpython‑clj series. If you missed Series 1, we covered how Clojure can use Python’s ML libraries through libpython‑clj. In this article, we focus on Bayesian Networks. We will show how to train them, run queries against them, and use them in Clojure.

Why This Series Matters

Machine learning is not only for Python. With libpython‑clj, Clojure teams can use PyMC, scikit‑learn, and pgmpy. They can keep the JVM and the functional style they prefer. The aim is simple: make ML in Clojure clear, practical, and ready for production.

Why Bayesian Networks?

Bayesian Networks are good when you need clarity. They model uncertainty. They use domain knowledge. And they answer “what if?” questions without guesswork.

➤ Small to medium datasets.

➤ Compliance‑heavy work in healthcare, finance, and logistics.

➤ When explainability is required.

This walkthrough demonstrates how to build a simple BN in Python. Then run it from Clojure using libpython‑clj. The process is straightforward.

Flexiana: Clojure + ML in practice

Flexiana is a Clojure consultancy. We help teams connect Clojure and ML in real projects. We share code, write about patterns, and ship systems that are easy to reason about. If you need support with interop, pipelines, or interpretable models, we are a solid partner.

“Bayesian reasoning helps teams make better calls in logistics, healthcare, and fintech. With Clojure’s REPL and Python’s ML tools, you move faster and stay confident.” – Flexiana ML Team

Why Bayesian Networks Deserve Your Focus

What Are Bayesian Networks?

A Bayesian Network is a directed acyclic graph. Each node is a random variable. Each edge shows how one variable depends on another. The graph encodes conditional probabilities describing how events influence one another.

Bayesian Networks are not just about guessing what will happen next. They show why something is likely or not, based on how the graph’s parts connect. When you are dealing with uncertainty, these networks give you both a prediction and a peek behind the curtain.

How They Differ From Neural Networks

Aspect	Bayesian Networks (BNs)	Neural Networks (NNs)
Structure	Graph with variables	Layers with neurons/weights
Nodes	Variables with probabilities	Units with weights + activations
Edges	Show dependencies	Pass signals between neurons
Flow of Information	Probabilities through a graph	Numbers through weighted sums
Interpretability	Clear, easy to trace	Often, a black box
Data Needs	Small datasets + domain knowledge	Large datasets needed
Strengths	Model uncertainty, “what if?”	Finds patterns in complex data
Limitations	Weak with high‑dimensional data	Hard to explain the reasoning

Why This Distinction Matters

Neural Networks (NNs) really stand out for image recognition, speech, and other messy, unstructured data. They work best with huge datasets and find hidden patterns.

Bayesian Networks fit different needs:

➤ When datasets are smaller, but domain knowledge is strong

➤ When decisions must be explained to stakeholders

➤ When uncertainty needs to be modelled clearly

With all these considered, Neural Networks find hidden patterns. Bayesian Networks help with clarity, reasoning, and trust.

Why and when Bayesian Networks Beat Neural Networks

❶ Interpretability (Explainable AI)

You can actually see how Bayesian Networks think. Every variable and every link is there in the graph. If something happens, you can trace the entire path back and explain it. That is a big deal in sectors like healthcare and finance, where you must explain to regulators or your team exactly why a decision was made. With Neural Networks, they usually keep that logic hidden, which just isn’t good enough when you need transparency.

❷ Explicit Handling of Uncertainty

Bayesian Networks handle uncertainty directly. They do not just return one answer but give you probabilities, so you know what might happen and what’s not, and how sure the model is. That helps when your data is messy or incomplete. Neural Networks, on the other hand, typically select a single outcome, which can be misleading when the situation is unclear.

❸ Encoding Prior Knowledge

Bayesian Networks can include domain knowledge. You can turn relationships and rules from the real world into edges and probability tables. This keeps the model grounded, especially when you do not have a ton of data. Neural Networks, on the other hand, require large datasets to learn patterns and are not well-suited to encoding expert rules.

❹ Adoption In Practice

PyMC Labs recently pointed out that “Bayesian modelling is really shaping business decisions these days. Probabilistic forecasting models are proliferating across retail, finance, energy, and beyond. Businesses are increasingly adopting these approaches.

Flexiana Tie‑In

Flexiana focuses on interpretable ML for compliance‑heavy work. Our projects lean on clear reasoning and stable interoperability. Pairing Clojure’s functional style with BN reasoning helps teams build systems that are practical and explainable.

Real‑World Applications of Bayesian Networks

Where BNs Excel

➤ “What if?” analysis: If a shipment is delayed, what is the churn risk? BNs model scenarios and return clear probabilities.

➤ Small or medium datasets: Expert knowledge– Encode known relationships directly. Useful when data is limited.

➤ Compliance‑heavy industries: Interpretable reasoning– Show why an outcome is likely. Fits healthcare, finance, and logistics.

➤ Flexiana case mentioned: Decision support– Flexiana has used BNs for logistics and healthcare clients. The focus is on clear reasoning and compliance‑friendly workflows.

Where BNs fall short

➤ High‑dimensional inputs: Images and audio involve thousands of features. BNs struggle at this scale.

➤ Unstructured data: Text, images, and raw audio need feature extraction. Deep learning handles this better.

➤ Arbitrary function approximation: Neural Networks capture complex, nonlinear patterns. BNs are built for probabilistic reasoning, not every function shape.

Implementing Bayesian Networks in Practice

Example in Python

Here’s a basic example with pgmpy, a Python library. It builds a small Bayesian Network with two nodes and one dependency.

from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD

# Define the network structure
model = BayesianNetwork([('Rain', 'Traffic')])

# Define the CPDs (Conditional Probability Distributions)
cpd_rain = TabularCPD(
    variable='Rain',
    variable_card=2,
    values=[[0.7], [0.3]]  # 70% no rain, 30% rain
)

cpd_traffic = TabularCPD(
    variable='Traffic',
    variable_card=2,
    values=[
        [0.9, 0.4],  # Probability of no traffic
        [0.1, 0.6]   # Probability of traffic
    ],
    evidence=['Rain'],
    evidence_card=[2]
)

# Add CPDs to the model
model.add_cpds(cpd_rain, cpd_traffic)

# Check if the model is valid
print(model.check_model())

What this does:

➤ Builds a BN with two variables: Rain and Traffic.

➤ Shows that rain raises the chance of traffic

➤ Checks that the model and probabilities are valid

External Resources

For more details, check the official documentation:

➤ pgmpy Documentation

➤ PyMC Documentation

Translating BN to Clojure with libpython‑clj

Step-by-step interop workflow

➤ Train in Python: Build and check the BN with pgmpy or PyMC.

➤ Load via libpython‑clj: Import the Python model into Clojure.

➤ Wrap inference in Clojure: Write small functions for queries and “what if?” checks.

Clojure code snippet

(ns bn.interop
  (:require [libpython-clj2.python :as py]
            [libpython-clj2.require :refer [require-python]]))

(require-python '[pgmpy.models :as models])
(require-python '[pgmpy.factors.discrete :as factors])

(defonce ^:private py-sys (py/initialize!))

;; Load a trained BN (assume serialized or constructed via Python)
(def model
  (models/BayesianNetwork [["Rain" "Traffic"]]))

;; Example: set CPDs (normally loaded from Python artifacts)
(def cpd-rain
  (factors/TabularCPD
   "Rain" 2
   (py/list [[0.7] [0.3]])))

(def cpd-traffic
  (factors/TabularCPD
   "Traffic" 2
   (py/list [[0.9 0.4]
             [0.1 0.6]])
   :evidence (py/list ["Rain"])
   :evidence_card (py/list [2])))

(.add_cpds model cpd-rain cpd-traffic)

(defn prob-traffic
  "Return P(Traffic | Rain = state). state: 0=no rain, 1=rain."
  [state]
  ;; Placeholder for actual inference call
  (case state
    0 {:no-traffic 0.9 :traffic 0.1}
    1 {:no-traffic 0.4 :traffic 0.6}))

See Series 1 for the basics of ML in Clojure with libpython‑clj.

Flexiana Insight

➤ Enterprise JVM focus: Flexiana’s tutorials show how to bridge Python ML with Clojure in enterprise JVM stacks. The focus is clear interop, stable deployment, and explainable models.

Performance & Optimization Tips

➤ Utility functions for boxed math: Avoid unnecessary boxing. Use primitives where possible and keep Python calls lean.

➤ Batch calls for efficiency: Run queries in groups to cut overhead.

➤ Caching strategies: Cache fixed CPDs and reuse common results. Memoize repeated “what if?” checks.

Bayesian Networks in Action

Case Study ❶: Supply Chain Risk Analysis

Bayesian networks are great for running “what if?” modeling in logistics. If a shipment is delayed, these models help you assess the risk of supplier bottlenecks, missed deliveries, or customer loss. Managers do not have to guess; they receive precise numbers and can plan accordingly.

➤ Use case: You can map out how a delay might lead to customer churn, or spot risks up and down the supply chain.

➤ Output: You can hand these probabilities to your ops or finance teams and actually explain what’s behind them.

➤ Value: You get more innovative backup plans and fewer surprises.

Case Study ❷: Healthcare Diagnostics

BNs connect symptoms, test results, and conditions, showing how each piece of information changes the probabilities of a diagnosis. The reasoning is not hidden- clinicians can see not just what the model predicts, but how sure it is.

➤ Use case: If you plug in a set of symptoms, you get a clear picture of which conditions are most likely.

➤ Output: The logic stays out in the open, so anyone reviewing the case can follow every step.

➤ Value: This kind of transparency is good when you need to explain decisions or meet strict compliance rules.

Employing Bayesian Networks for the Diagnosis and Prognosis of Diseases: A Comprehensive Review (arXiv, 2023) – This paper gives an overview of how Bayesian Networks support medical diagnosis and prognosis. It shows how they use medical knowledge and manage uncertainty in decisions.

Case Study ❸: Financial Fraud Detection

BNs do not just spot fraud- they break down what made a transaction suspicious in the first place. You get to see exactly which factors raised the red flag, not just a vague alert. That kind of clarity makes audits and regulator checks way smoother.

➤ Use case: They scan transaction patterns and highlight the risky ones.

➤ Output: Real reasons for every alert, not just a score.

➤ Value: You end up with detection you can actually trust and clearer investigations.

According to a 2025 industry survey, more than 70% of financial firms have adopted probabilistic or machine learning models to detect fraud, reflecting a clear shift away from rule‑based systems.

Why This Matters for Your Team

Business Case for Clojure + ML

➤ Faster orchestration and deployment: For starters, you can move fast. The REPL lets you test ideas and make changes quickly, so updates are faster without waiting around for long builds.

➤ Seamless JVM integration: Clojure integrates nicely with the JVM. You can plug ML models right into your current systems. No need for extra layers or awkward workarounds.

➤ Lower barrier for Clojure teams: if your team already knows Clojure, you do not have to rebuild your stack or retrain everyone to bring machine learning into the picture. You can use the tools you know while still taking advantage of ML.

Explore Flexiana’s consulting to fit ML into your Clojure stack. Interop, deployment, and compliance‑friendly workflows: that’s the focus.

Python and Clojure Examples for a Simple Bayesian Network

Python: Define a small BN, add CPDs, and run inference

# Example using pgmpy
# pip install pgmpy

from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

# 1) Structure: Smoking -> Cancer, and Pollution -> Cancer
model = BayesianNetwork([
    ('Smoking', 'Cancer'),
    ('Pollution', 'Cancer')
])

# 2) CPDs
cpd_smoking = TabularCPD(
    variable='Smoking',
    variable_card=2,
    values=[[0.6], [0.4]],  # P(Smoking=no)=0.6, P(Smoking=yes)=0.4
    state_names={'Smoking': ['no', 'yes']}
)

cpd_pollution = TabularCPD(
    variable='Pollution',
    variable_card=2,
    values=[[0.7], [0.3]],  # P(Pollution=low)=0.7, P(Pollution=high)=0.3
    state_names={'Pollution': ['low', 'high']}
)

# Cancer depends on Smoking and Pollution
cpd_cancer = TabularCPD(
    variable='Cancer',
    variable_card=2,
    values=[
        # Cancer = no
        [0.99, 0.97, 0.95, 0.90],
        # Cancer = yes
        [0.01, 0.03, 0.05, 0.10]
    ],
    evidence=['Smoking', 'Pollution'],
    evidence_card=[2, 2],
    state_names={
        'Cancer': ['no', 'yes'],
        'Smoking': ['no', 'yes'],
        'Pollution': ['low', 'high']
    }
)

# 3) Attach CPDs and check model
model.add_cpds(cpd_smoking, cpd_pollution, cpd_cancer)
assert model.check_model()

# 4) Inference
infer = VariableElimination(model)

# P(Cancer | Smoking=yes, Pollution=high)
q = infer.query(
    variables=['Cancer'],
    evidence={'Smoking': 'yes', 'Pollution': 'high'}
)
print(q)  # Prints distribution for Cancer=no/yes

# P(Smoking | Cancer=yes)
q2 = infer.query(
    variables=['Smoking'],
    evidence={'Cancer': 'yes'}
)
print(q2)

➤ Structure: A clear DAG with three variables; Cancer has two parents.

➤ Nodes: Random variables with named states.

➤ Edges: Smoking and Pollution feed into Cancer.

➤ CPDs: Tables capture prior knowledge and uncertainty.

➤ Inference: Use variable elimination for “what if?” queries.

Clojure: Build and query the same BN via libpython-clj

This example uses libpython‑clj to import pgmpy, set up the network, and run queries in Clojure.

;; deps.edn
;; {:deps {clj-python/libpython-clj {:mvn/version "2.024"}}
;;  :paths ["src"]}

(ns bn.example
  (:require
   [libpython-clj2.python :as py]
   [libpython-clj2.require :refer [require-python]]))

;; Initialize Python
(py/initialize!)

;; Require Python packages
(require-python '[pgmpy.models :as models])
(require-python '[pgmpy.factors.discrete :as factors])
(require-python '[pgmpy.inference :as inf])

(defn build-model []
  ;; 1) Structure
  (let [model (models/BayesianNetwork
               [(py/tuple "Smoking" "Cancer")
                (py/tuple "Pollution" "Cancer")])]

    ;; 2) CPDs
    (def cpd-smoking
      (factors/TabularCPD
       :variable "Smoking"
       :variable_card 2
       :values (py/list [(py/list [0.6])
                          (py/list [0.4])])
       :state_names (py/dict {"Smoking" (py/list ["no" "yes"])})))

    (def cpd-pollution
      (factors/TabularCPD
       :variable "Pollution"
       :variable_card 2
       :values (py/list [(py/list [0.7])
                          (py/list [0.3])])
       :state_names (py/dict {"Pollution" (py/list ["low" "high"])})))

    (def cpd-cancer
      (factors/TabularCPD
       :variable "Cancer"
       :variable_card 2
       :values (py/list [(py/list [0.99 0.97 0.95 0.90])   ;; Cancer = no
                          (py/list [0.01 0.03 0.05 0.10])]) ;; Cancer = yes
       :evidence (py/list ["Smoking" "Pollution"])
       :evidence_card (py/list [2 2])
       :state_names (py/dict {"Cancer" (py/list ["no" "yes"])
                              "Smoking" (py/list ["no" "yes"])
                              "Pollution" (py/list ["low" "high"])})))

    ;; 3) Attach CPDs
    (.add_cpds model cpd-smoking cpd-pollution cpd-cancer)

    ;; 4) Sanity check
    (assert (.check_model model))
    model))

(def model (build-model))
(def infer (inf/VariableElimination model))

;; Query: P(Cancer | Smoking=yes, Pollution=high)
(def cancer-q
  (.query infer
          (py/list ["Cancer"])
          :evidence (py/dict {"Smoking" "yes"
                              "Pollution" "high"})))

(println cancer-q)

;; Query: P(Smoking | Cancer=yes)
(def smoking-q
  (.query infer
          (py/list ["Smoking"])
          :evidence (py/dict {"Cancer" "yes"})))

(println smoking-q)

Interop flow:

➤ Build: Mirror the Python BN and CPDs with libpython‑clj.

➤ Infer: Call pgmpy’s VariableElimination from Clojure.

➤ Return: Get Python objects, then print or convert to Clojure data.

Wrap Python objects into idiomatic Clojure maps

(defn dist->map [python-factor]
  ;; Converts pgmpy DiscreteFactor result into a Clojure map of state->prob
  (let [vars   (vec (py/get-attr python-factor "variables"))
        states (vec (py/call-attr python-factor "state_names" (first vars)))
        values (vec (py/call-attr python-factor "values"))]
    (zipmap states values)))

(println (dist->map cancer-q))
(println (dist->map smoking-q))

Label conversion: Map states to values, e.g., {“no” 0.90, “yes” 0.10}.

Engagement tip: Log queries and results to show stakeholders how each node shapes outcomes.

FAQs (People Also Ask)

❶ Are Bayesian Networks the same as Neural Networks?

Not at all. Neural Networks learn patterns from tons of data, while Bayesian Networks focus on cause and effect, mapping out probabilities and showing you how they conclude.

❷ Can Bayesian Networks handle big data?

Honestly, they are not built for that. Bayesian Networks are helpful for small or medium datasets, especially when you need expert input. If you have large, unstructured datasets, deep learning usually performs better.

❸ Why use Clojure for ML instead of Python?

Clojure runs on the JVM, so it integrates well with enterprise environments. Plus, if you need something from Python, you can call those libraries- no need to pick one or the other.

❹ How does libpython-clj simplify interop?

It makes things easy. Clojure can import Python ML libraries directly- train your model in Python, then query it from Clojure. No need to build complicated bridges between the two.

❺ What industries benefit most from Bayesian reasoning?

Healthcare, finance, and logistics rely on Bayesian models. They want explainable results, clear probabilities, and the ability to test out “what if?” scenarios quickly.

Conclusion

Machine learning in Clojure is practical for real teams. With libpython‑clj, you can use Python’s ML libraries while staying in the JVM stack you already trust. That means faster iteration, smoother deployment, and less friction for Clojure developers.

Bayesian Networks add clear value. They do not just predict; they show the reasoning. This matters in healthcare, finance, and logistics, where decisions carry weight. BNs handle uncertainty and map cause‑and‑effect, so managers and auditors can see why a result makes sense.

If your team is exploring ML in Clojure, now is a good time to try it. Share your thoughts, compare notes, and check the sample code in our GitHub repo.

If you want help, Flexiana’s consulting team can guide design, deployment, and integration so it fits your stack.

That wraps up Series 2 and our exploration of Bayesian Networks in Clojure with libpython-clj.

In Series 3, we unlock causal insights using Microsoft’s EconML to go beyond prediction and start understanding why things happen.
👉 Continue with Series 3: https://flexiana.com/news/2025/12/machine-learning-in-clojure-with-libpython-clj-unlocking-causal-insights-using-microsofts-econml-series-3