For some of our tests, we need access to the competition's data, which is stored in our config module.

Load Data

This reads the training CSV into a pandas DataFrame. You can choose to load the CSV that includes the cross-validation (CV) folds already added to it (if you've created it). You can also choose to load the training data with pseudo-labeled examples added as well (if you've created a CSV of pseudo-labels).

path, df = load_data(TEST_DATA_PATH, with_folds=False)
df.head()

Average Predictions

Let's test this by confirming that if the predictions for everything are 0.0, the average of all the predictions should also be 0.0.

len(all_zeros_prediction_dfs)

5

all_zeros_prediction_dfs[0]

averaged_preds_df = average_preds(all_zeros_prediction_dfs); averaged_preds_df

assert np.all(averaged_preds_df == 0.)    # Average of a bunch of 0's is 0
test_eq(averaged_preds_df.shape, (5, 4))  # 5 examples, 4 classes

Save Averaged Preds

Utility function to load and average all test set prediction CSVs matching naming pattern "predictions_fold_[0-4].csv", which is the default naming scheme when running the training script using 5-fold cross-validation.

	image_id	healthy	multiple_diseases	rust	scab
0	Train_0	0	0	0	1
1	Train_1	0	1	0	0
2	Train_2	1	0	0	0
3	Train_3	0	0	1	0
4	Train_4	1	0	0	0

Utils

Load Data

`load_data`[source]

Average Predictions

`average_preds`[source]

Save Averaged Preds

`get_averaged_preds`[source]

	image_id	healthy	multiple_diseases	rust	scab
0	Test_0	0.0	0.0	0.0	0.0
1	Test_1	0.0	0.0	0.0	0.0
2	Test_2	0.0	0.0	0.0	0.0
3	Test_3	0.0	0.0	0.0	0.0
4	Test_4	0.0	0.0	0.0	0.0

Utils

Load Data

load_data[source]

Average Predictions

average_preds[source]

Save Averaged Preds

get_averaged_preds[source]

`load_data`[source]

`average_preds`[source]

`get_averaged_preds`[source]