This files holds functions to create a plant pathology dataset.

Building DataBlock

get_datablock[source]

get_datablock(path:Path, df:DataFrame, presize:int, resize:int, val_fold:int=4)

Build DataBlock from df where images are in path/images.

Args: path: Path to data dir containing 'images' subdir df: DataFrame created from train CSV with folds presize: Size to presize images too before augmentation resize: Size of images after augmentation val_fold: Fold to use for validation set

path, df = load_data(TEST_DATA_PATH, with_folds=True); df.head()
image_id healthy multiple_diseases rust scab fold
0 Train_0 0 0 1 0 0
1 Train_1 1 0 0 0 1
2 Train_2 1 0 0 0 2
3 Train_3 0 0 1 0 3
4 Train_4 1 0 0 0 4
data = get_datablock(path, df, 128, 64)

Building DataLoaders

get_dls[source]

get_dls(path:Path, df:DataFrame, presize:Union[tuple, int]=(682, 1024), resize:int=256, val_fold:int=4, bs:int=256)

Get DataLoaders from df and path.

Args: path: Path to data dir containing 'images' subdir df: DataFrame created from train CSV with folds presize: Size to presize images too before augmentation resize: Size of images after augmentation val_fold: Fold to use for validation set bs: Batch size

dls = get_dls(path, df, bs=3)
dls.show_batch()

All-in-One Function to Load DataLoaders

get_dls_all_in_1[source]

get_dls_all_in_1(data_path:Path, pseudo_labels_path:str=None, presize:Union[tuple, int]=(682, 1024), resize:int=256, val_fold:int=4, bs:int=256)

Get DataLoaders built from train CSV at data_path, optionally with pseudo-labels added.

Args: data_path: Path to data directory pseudo_labels_path: Path to CSV containing pseudo-labels presize: Size to presize images too before augmentation resize: Size of images after augmentation val_fold: Fold to use for validation set bs: Batch size

Setting val_fold to something invalid (i.e. something not in 0 -> 4) puts all the data into the train set.

dls = get_dls_all_in_1(TEST_DATA_PATH, val_fold=-1, bs=3)
len(dls.train_ds), len(dls.valid_ds)
(5, 0)