Abstract
New methods are needed to monitor environmental treaties, like the Montreal Protocol, by reviewing large, complex customs datasets. This paper introduces a framework using unsupervised machine learning to systematically detect suspicious trade patterns and highlight activities for review. Our methodology, applied to 100,000 trade records, combines several ML techniques. Unsupervised Clustering (K-Means) discovers natural trade archetypes based on shipment value and weight. Anomaly Detection (Isolation Forest and IQR) identifies rare "mega-trades" and shipments with commercially unusual price-per-kilogram values. This is supplemented by Heuristic Flagging to find tactics like vague shipment descriptions. These layers are combined into a priority score, which successfully identified 1,351 price outliers and 1,288 high-priority shipments for customs review. A key finding is that high-priority commodities show a different and more valuable value-to-weight ratio than general goods. This was validated using Explainable AI (SHAP), which confirmed vague descriptions and high value as the most significant risk predictors. The model's sensitivity was validated by its detection of a massive spike in "mega-trades" in early 2021, correlating directly with the real-world regulatory impact of the US AIM Act. This work presents a repeatable unsupervised learning pipeline to turn raw trade data into prioritized, usable intelligence for regulatory groups.
Supplementary materials
Title
Figure 2: Anomaly Detection Quadrant Analysis of related Ozone Depleting Substance.
Description
The scatter plot on Figure 2 visualizes trade data to identify patterns and high-risk items by plotting the "Log of Primary
Value" (Y-axis) against the "Log of Net Weight" (X-axis) for various trade shipments. This analysis helps identify the value to-weight ratio of goods. The data points are grouped into four distinct clusters, each represented by a different color: Cluster 0 (purple), Cluster 1 (blue-grey), Cluster 2 (green), and Cluster 3 (yellow). Two trendlines are overlaid on the plot: a dashed green line representing the "Total Data Trendline" (the average for all data) and a pink dash-dot line representing the "High Risk Trendline," which shows a different value-to-weight relationship.
Actions



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)