Skip to content

Commit ab6f8f4

Browse files
authored
Update README.md
1 parent 8e610d1 commit ab6f8f4

File tree

1 file changed

+54
-0
lines changed

1 file changed

+54
-0
lines changed

README.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,56 @@
11
# Data-Sampling-using-Python
22

3+
Certainly! In Python, you can perform data sampling using various libraries such as NumPy, pandas, or scikit-learn. Below, I'll provide a brief explanation of how you might perform random sampling and stratified sampling using these libraries:
4+
5+
### Random Sampling:
6+
7+
**Using NumPy:**
8+
```python
9+
import numpy as np
10+
11+
# Assuming you have a dataset 'data'
12+
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
13+
14+
# Perform random sampling
15+
sample_size = 5
16+
random_sample = np.random.choice(data, size=sample_size, replace=False)
17+
18+
print("Random Sample:", random_sample)
19+
```
20+
21+
**Using pandas:**
22+
```python
23+
import pandas as pd
24+
25+
# Assuming you have a DataFrame 'df' with a column 'column_name'
26+
df = pd.DataFrame({'column_name': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})
27+
28+
# Perform random sampling
29+
sample_size = 5
30+
random_sample = df['column_name'].sample(n=sample_size, replace=False)
31+
32+
print("Random Sample:", random_sample.tolist())
33+
```
34+
35+
### Stratified Sampling:
36+
37+
**Using scikit-learn:**
38+
```python
39+
from sklearn.model_selection import train_test_split
40+
41+
# Assuming you have features 'X' and labels 'y'
42+
X, y = np.array([[1, 2], [2, 3], [3, 4], [4, 5]]), np.array([0, 0, 1, 1])
43+
44+
# Perform stratified sampling
45+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, stratify=y)
46+
47+
print("Stratified X_train:", X_train)
48+
print("Stratified y_train:", y_train)
49+
print("Stratified X_test:", X_test)
50+
print("Stratified y_test:", y_test)
51+
```
52+
53+
In the stratified sampling example above, `stratify=y` ensures that the distribution of the target variable 'y' is maintained in both the training and testing sets.
54+
55+
These are just basic examples, and you may need to adapt the code to your specific dataset and requirements. Data sampling methods and parameters can vary based on the characteristics of your data and the objectives of your analysis or machine learning task.
56+

0 commit comments

Comments
 (0)