Abstract
Comprehensive two-dimensional gas chromatography (GCxGC) is often used for analysing complex chemical samples. DSTL want to be able to use data from GCxGC to attribute samples to a particular region or cultivar. However, the nature of the data means that several difficulties must be overcome before being able to do this. In this report, we investigate several methods to overcome such difficulties, and then classify the data. We are very successful in telling apart blanks from vseeds but obtain limited success when trying to classify between seeds. The method that shows the most promise is k-Nearest Neighbours classification by Wasserstein distance. However, this is still quite sensitive to the noise created by the solvent in the sample. Thus, we suggest that more blank runs be obtained, so that the ‘ground truth’ behaviour of the solvent is better understood, allowing us to remove the effect of the solvent from seed data.