Skip to content

Commit 2d3f617

Browse files
committed
UP my solution
1 parent 5ba6964 commit 2d3f617

File tree

1 file changed

+50
-2
lines changed

1 file changed

+50
-2
lines changed

numpy_questions.py

+50-2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,52 @@
1+
"""Assignment - making a sklearn estimator and cv splitter.
2+
3+
The goal of this assignment is to implement by yourself:
4+
5+
- a scikit-learn estimator for the KNearestNeighbors for classification
6+
tasks and check that it is working properly.
7+
- a scikit-learn CV splitter where the splits are based on a Pandas
8+
DateTimeIndex.
9+
10+
Detailed instructions for question 1:
11+
The nearest neighbor classifier predicts for a point X_i the target y_k of
12+
the training sample X_k which is the closest to X_i. We measure proximity with
13+
the Euclidean distance. The model will be evaluated with the accuracy (average
14+
number of samples corectly classified). You need to implement the `fit`,
15+
`predict` and `score` methods for this class. The code you write should pass
16+
the test we implemented. You can run the tests by calling at the root of the
17+
repo `pytest test_sklearn_questions.py`. Note that to be fully valid, a
18+
scikit-learn estimator needs to check that the input given to `fit` and
19+
`predict` are correct using the `check_*` functions imported in the file.
20+
You can find more information on how they should be used in the following doc:
21+
https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator.
22+
Make sure to use them to pass `test_nearest_neighbor_check_estimator`.
23+
24+
25+
Detailed instructions for question 2:
26+
The data to split should contain the index or one column in
27+
datatime format. Then the aim is to split the data between train and test
28+
sets when for each pair of successive months, we learn on the first and
29+
predict of the following. For example if you have data distributed from
30+
november 2020 to march 2021, you have have 4 splits. The first split
31+
will allow to learn on november data and predict on december data, the
32+
second split to learn december and predict on january etc.
33+
34+
We also ask you to respect the pep8 convention: https://pep8.org. This will be
35+
enforced with `flake8`. You can check that there is no flake8 errors by
36+
calling `flake8` at the root of the repo.
37+
38+
Finally, you need to write docstrings for the methods you code and for the
39+
class. The docstring will be checked using `pydocstyle` that you can also
40+
call at the root of the repo.
41+
42+
Hints
43+
-----
44+
- You can use the function:
45+
46+
from sklearn.metrics.pairwise import pairwise_distances
47+
48+
to compute distances between 2 sets of samples.
49+
"""
150
import numpy as np
251

352

@@ -24,7 +73,6 @@ def max_index(X):
2473
raise ValueError("Input must be a numpy array.")
2574
if X.ndim != 2:
2675
raise ValueError("Input must be a 2D numpy array.")
27-
2876
# Find the index of the maximum element
2977
max_pos = np.unravel_index(np.argmax(X), X.shape)
3078
return max_pos
@@ -42,6 +90,7 @@ def wallis_product(n_terms):
4290
Number of steps in the Wallis product. Note that `n_terms=0` will
4391
consider the product to be `1`.
4492
93+
4594
Returns
4695
-------
4796
pi : float
@@ -54,5 +103,4 @@ def wallis_product(n_terms):
54103
for n in range(1, n_terms + 1):
55104
term = (4 * n**2) / (4 * n**2 - 1)
56105
product *= term
57-
58106
return 2 * product

0 commit comments

Comments
 (0)