Trustworthy Machine Learning with Logging, Anomaly Detection, and Blockchain

This research aims at improving the accountability and trustworthiness of the machine learning models, which are increasingly prevalent in the operation and security of modern software platforms and technology.

We define a machine learning model $\mathcal{M}$ as accountable if:

It is free from tampering, such as poisoning attack.
It has immutable provenance records $\mathcal{H_M}$ , detailing its history, such as how it was trained.

To address problem 1, we need a mechanism $\mathcal{F}$ to detect whether a given machine learning model has been tampered (e.g., poisoned), based on its provenance records. In other words, $\mathcal{F}: \mathbb{H} \mapsto \mathbb{B}$ , where $\mathcal{H_M} \in \mathbb{H}$ and $\mathbb{B}$ is the set of Boolean value. Different ways to build $\mathcal{F}$ exists. However, in this project, we propose to apply machine learning techniques to build $\mathcal{F}$ . To train this model, we need to have a labeled dataset $X={(\mathcal{H_M}, label) | \mathcal{H_M} \in \mathbb{H}}$ , where label denotes the whether the model $\mathcal{M}$ has been tampered (i.e., poisoned). We also need to find the ML algorithm that can be used to generate $\mathcal{F}$ from dataset $X$ . The dataset can be created by instrumentalizing a machine learning training pipeline, and repeatly train the model under different form of poisoning attacks. Anomaly detection can be used to build the the model.

In the context of the problem 1, we also need to determine what kind of historical data (i.e., logging data) that we should extract from the training process of machine learning model, so that we can train the tampering detection mechanism effectively.

To address problem 2, we will rely on the blockchain technology to store the history data of machine learning model. Blockchain is used for its immutability and decentralized operation. The research in this part would focus on finding an optimal decision in terms of which part of the data should be stored on blockchain and which part off-chain.

(Need thinking) State of the whole blockchain ecosystem

(Need thinking) Comparative Study of Blockchain Platforms