UP my solution

clmrie · clmrie · commit 2d3f617ca9a8 · 2025-01-02T15:33:40.000+01:00
diff --git a/numpy_questions.py b/numpy_questions.py
@@ -1,3 +1,52 @@
+"""Assignment - making a sklearn estimator and cv splitter.
+
+The goal of this assignment is to implement by yourself:
+
+- a scikit-learn estimator for the KNearestNeighbors for classification
+  tasks and check that it is working properly.
+- a scikit-learn CV splitter where the splits are based on a Pandas
+  DateTimeIndex.
+
+Detailed instructions for question 1:
+The nearest neighbor classifier predicts for a point X_i the target y_k of
+the training sample X_k which is the closest to X_i. We measure proximity with
+the Euclidean distance. The model will be evaluated with the accuracy (average
+number of samples corectly classified). You need to implement the `fit`,
+`predict` and `score` methods for this class. The code you write should pass
+the test we implemented. You can run the tests by calling at the root of the
+repo `pytest test_sklearn_questions.py`. Note that to be fully valid, a
+scikit-learn estimator needs to check that the input given to `fit` and
+`predict` are correct using the `check_*` functions imported in the file.
+You can find more information on how they should be used in the following doc:
+https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator.
+Make sure to use them to pass `test_nearest_neighbor_check_estimator`.
+
+
+Detailed instructions for question 2:
+The data to split should contain the index or one column in
+datatime format. Then the aim is to split the data between train and test
+sets when for each pair of successive months, we learn on the first and
+predict of the following. For example if you have data distributed from
+november 2020 to march 2021, you have have 4 splits. The first split
+will allow to learn on november data and predict on december data, the
+second split to learn december and predict on january etc.
+
+We also ask you to respect the pep8 convention: https://pep8.org. This will be
+enforced with `flake8`. You can check that there is no flake8 errors by
+calling `flake8` at the root of the repo.
+
+Finally, you need to write docstrings for the methods you code and for the
+class. The docstring will be checked using `pydocstyle` that you can also
+call at the root of the repo.
+
+Hints
+-----
+- You can use the function:
+
+from sklearn.metrics.pairwise import pairwise_distances
+
+to compute distances between 2 sets of samples.
+"""
 import numpy as np
 
 
@@ -24,7 +73,6 @@ def max_index(X):
         raise ValueError("Input must be a numpy array.")
     if X.ndim != 2:
         raise ValueError("Input must be a 2D numpy array.")
-    
     # Find the index of the maximum element
     max_pos = np.unravel_index(np.argmax(X), X.shape)
     return max_pos
@@ -42,6 +90,7 @@ def wallis_product(n_terms):
         Number of steps in the Wallis product. Note that `n_terms=0` will
         consider the product to be `1`.
 
+
     Returns
     -------
     pi : float
@@ -54,5 +103,4 @@ def wallis_product(n_terms):
     for n in range(1, n_terms + 1):
         term = (4 * n**2) / (4 * n**2 - 1)
         product *= term
-    
     return 2 * product