Viewing a single comment thread. View all comments

Fender6969 OP t1_jcrnppi wrote

Thanks for the response. I think hardcoding things might make the most sense. Ignoring testing the actual data for a minute, let us say I have an ML pipeline with the following units:

  1. Data Engineering: method that queries data, performs further aggregation in Pandas/PySpark
    1. Unit test: hardcode an input to pass into this function and leverage Pytest/unittest to check for the exact output'
  2. Model Training: method that engineers features and passes data into Sklearn pipeline, which scales/encodes data and trains ML model
    1. Unit test: check for successful predictions on training data to a degree of accuracy based on your evaluation metric
  3. Model Serving: first method that performs ETL for prediction data and second method that loads Sklearn pipeline object to serve prediction
    1. Unit test:
      1. Module 1: same as Data Engineering
      2. Module 2: check for successful predictions

Does the above unit tests make sense to add in a CI pipeline?

1

TheGuywithTehHat t1_jcrsjlo wrote

Most of that makes sense. The only thing I would be concerned about is the model training test. Firstly, a unit test should test the smallest possible unit. You should have many unit tests to test your model, and you should focus on those tests being as simple as possible. Nearly every function you write should have its own unit test, and no unit test should test more than one function. Secondly, there is an important difference between verification and validation testing. Verification testing shouldn't test for any particular accuracy threshold or anything like that, it should at most verify things like "model.fit() causes the model to change" or "a linear regression model that is all zeroes produces an output of zero." Verification testing is what you put on your CI pipeline to sanity check your code before it gets merged to master. Validation testing, however, should test model accuracy. It should go on your CD pipeline, and should validate that the model you're trying to push to production isn't low quality.

2

Fender6969 OP t1_jcrw759 wrote

This makes perfect sense thank you. I’m going to think through this further. If you have any suggestions for verification/sanity testing for any of the components listed above please let me know.

1