1
+ """Assignment - making a sklearn estimator and cv splitter.
2
+
3
+ The goal of this assignment is to implement by yourself:
4
+
5
+ - a scikit-learn estimator for the KNearestNeighbors for classification
6
+ tasks and check that it is working properly.
7
+ - a scikit-learn CV splitter where the splits are based on a Pandas
8
+ DateTimeIndex.
9
+
10
+ Detailed instructions for question 1:
11
+ The nearest neighbor classifier predicts for a point X_i the target y_k of
12
+ the training sample X_k which is the closest to X_i. We measure proximity with
13
+ the Euclidean distance. The model will be evaluated with the accuracy (average
14
+ number of samples corectly classified). You need to implement the `fit`,
15
+ `predict` and `score` methods for this class. The code you write should pass
16
+ the test we implemented. You can run the tests by calling at the root of the
17
+ repo `pytest test_sklearn_questions.py`. Note that to be fully valid, a
18
+ scikit-learn estimator needs to check that the input given to `fit` and
19
+ `predict` are correct using the `check_*` functions imported in the file.
20
+ You can find more information on how they should be used in the following doc:
21
+ https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator.
22
+ Make sure to use them to pass `test_nearest_neighbor_check_estimator`.
23
+
24
+
25
+ Detailed instructions for question 2:
26
+ The data to split should contain the index or one column in
27
+ datatime format. Then the aim is to split the data between train and test
28
+ sets when for each pair of successive months, we learn on the first and
29
+ predict of the following. For example if you have data distributed from
30
+ november 2020 to march 2021, you have have 4 splits. The first split
31
+ will allow to learn on november data and predict on december data, the
32
+ second split to learn december and predict on january etc.
33
+
34
+ We also ask you to respect the pep8 convention: https://pep8.org. This will be
35
+ enforced with `flake8`. You can check that there is no flake8 errors by
36
+ calling `flake8` at the root of the repo.
37
+
38
+ Finally, you need to write docstrings for the methods you code and for the
39
+ class. The docstring will be checked using `pydocstyle` that you can also
40
+ call at the root of the repo.
41
+
42
+ Hints
43
+ -----
44
+ - You can use the function:
45
+
46
+ from sklearn.metrics.pairwise import pairwise_distances
47
+
48
+ to compute distances between 2 sets of samples.
49
+ """
1
50
import numpy as np
2
51
3
52
@@ -24,7 +73,6 @@ def max_index(X):
24
73
raise ValueError ("Input must be a numpy array." )
25
74
if X .ndim != 2 :
26
75
raise ValueError ("Input must be a 2D numpy array." )
27
-
28
76
# Find the index of the maximum element
29
77
max_pos = np .unravel_index (np .argmax (X ), X .shape )
30
78
return max_pos
@@ -42,6 +90,7 @@ def wallis_product(n_terms):
42
90
Number of steps in the Wallis product. Note that `n_terms=0` will
43
91
consider the product to be `1`.
44
92
93
+
45
94
Returns
46
95
-------
47
96
pi : float
@@ -54,5 +103,4 @@ def wallis_product(n_terms):
54
103
for n in range (1 , n_terms + 1 ):
55
104
term = (4 * n ** 2 ) / (4 * n ** 2 - 1 )
56
105
product *= term
57
-
58
106
return 2 * product
0 commit comments