A couple of days ago, my colleague Will and I finished a medical imaging competition: Hubmap - Hacking the Kidney, and we learnt a lot about image segmentation problems.

What was this competition about? This research competition aims to implement a successful and robust glomeruli functional tissue units detector based on very large images. This is a binary image segmentation competition, in which single pixels are either part of a specific tissue or not. 

How do glomeruli look like in pictures? Here are some examples from the competition, in which you see on the left side the masks depicting the area occupied by the glomeruli, and on the right side, you see the actual image:

These images above depict just an extracted magnified tile out of the huge competition images (tiff images) with sizes like, e.g. 31278 x 25794 and 3 channels (red, green blue): 

Each training image would come with a corresponding image mask (encoded in a CSV file) which could also be displayed visually. The corresponding mask for the image displayed above is this one:

Training Strategy #1 - Cut Tiles and Cross-Validation

Obviously, you cannot train with the raw images provided in the Hubmap competition, so you need to "cut" them into tiles. The same applies to the masks. This was the first problem you had to solve in the competition.

In other words, you would have to create a grid and then generate a dataset out of it.

My initial approach was to create a dataset based on a grid and then use Scikit's GroupKFold cross-validation with 4 folds. In GroupKFold, you use groups that influence the distribution of the training and validation samples (tiled images in this case). These groups were the images themselves, out of which the tiled images were extracted.

Then I have used Pytorch and models from the excellent Open Source Pytorch Segmentation Models library available here to train the 4 folds:

https://github.com/qubvel/segmentation_models.pytorch

After this training, I would have four models trained initially with the Unet architecture with one of the EfficientNet encoders provided by the Open Source Pytorch Segmentation Models library.

Unet architecture

This was a decent training approach that provided decent results, but then I came across another training strategy that seemed to provide slightly better results.

We have used plain Pytorch code to implement this training strategy.

Training Strategy #2 - Random Sampling with a Probability Density Function

The main idea in this approach is to select tiles randomly from the huge medical images using two strategies:

  1. Sampling tiles via centre points in the proximity of every glomerulus - this ensures that each glomerulus is seen during one epoch of training at least once.
  2. Sampling random tiles based on region probabilities (e.g., medulla, cortex, other). This is where the probability density function is used.

This brilliant approach was developed by Matthias Griebel and is thoroughly explained in this Kaggle notebook:

https://www.kaggle.com/matjes/hubmap-deepflash2-judge-price

This strategy brought slightly better evaluation results on our Kaggle submissions. 

I would also like to mention a small library, DeepFlash2, created by Matthias Griebel, which provides datasets to extract random tiles using a probability density function.

We have used the fast.ai version 2 library for this training strategy, which is very convenient to use and generally provides good training results.

Beyond Training - Inference Strategy #1 - Averaging multiple models

Generally, you will not get very far by just using one single model on Kaggle competitions. Even if your public score is quite good with a single model, it might well be overfitting. When the final score is unveiled, you experience a big disappointment: your model does not generalize very well on a larger dataset.

So you should always train multiple models and combine them. We even combined models with multiple tile sizes, like e.g: 512 x 512, 640 x 640, 768 x 768. We even chose to use different architectures like, e.g., Unet, FPN, Linknet, PAN and MAnet. The simplest possible combination strategy on a segmentation competition like this one would be for every predicted point to get all predictions for all models as a probability between 0 and 1 and then create an average. If that average is >= 0.5, then the point is part of a glomerulus; else not.

For example, you have these predictions for a single point:

model 1: 0.1

model 2: 0.7

model 3: 0.9

model 4: 0.4

If you then calculate the average, you get 0.525 -> this means this pixel is part of the glomeruli after all.

Inference Strategy #2 - Using a voting system

The second inference strategy we used just used a voting system. We would take multiple models and sum the predictions for each pixel. If the sum of the predictions went above a certain threshold, the pixel would be part of a glomerulus, else not.

For example, you have these predictions for a single point:

model 1: 1

model 2: 1

model 3: 0

model 4: 1

The sum is 3, and if the threshold used to consider a pixel is 2 votes, then this point would be part of glomeruli.

Where have we trained - JarvisCloud

The GPU and TPU free time on Kaggle is a bit limited, especially if you want to train multiple models at a high resolution. So we trained some of our models on a small cloud provider which offers you access to some good Nvidia cards: RTX6000 and A100: JarvisCloud. This cloud provider makes it quite easy to access modern GPU cards at a reasonable price, and the UI is also quite simple to use. Also, my experience with their support was excellent.

Takeaways from this Competition

These days there are great models already available for segmentation tasks, like in the previously mentioned Segmentation Models library. But betting on a single model at a single resolution will typically not generalize so well; you should use some form of an ensemble to improve your results.

Also, you need some reasonable access to either GPU's or TPU's to train your computer vision models, so it is really good to have access to a cloud provider. Some models tend to be hard to train on a single GPU, especially if you try higher resolutions.

Where did we land in this competition? Not quite at the top, but also not too far from it:

Thanks to ...

Kaggle, which is really such a great resource for learning about machine and deep learning. It is also a great platform to exchange ideas.

JarvisCloud, such a handy platform to access GPUs at a really fair price.