{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d7d8810c",
   "metadata": {},
   "source": [
    "# Python and NumPy Basics for Machine Learning\n",
    "\n",
    "This notebook covers the essential Python and NumPy concepts needed for machine learning, extracted from our course slides with additional practical examples."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9554e58f",
   "metadata": {},
   "source": [
    "## 1. Python Basics Review\n",
    "\n",
    "Let's start with fundamental Python concepts that we'll use throughout the course."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75fb2de5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Variables and Data Types\n",
    "# Numbers\n",
    "x = 42          # integer\n",
    "y = 3.14        # float\n",
    "z = 2 + 3j      # complex number\n",
    "\n",
    "print(f\"Integer: {x}, type: {type(x)}\")\n",
    "print(f\"Float: {y}, type: {type(y)}\")\n",
    "print(f\"Complex: {z}, type: {type(z)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "46544990",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Collections - fundamental data structures\n",
    "my_list = [1, 2, 3, 4]      # mutable (can be changed)\n",
    "my_tuple = (1, 2, 3, 4)     # immutable (cannot be changed)\n",
    "my_dict = {'a': 1, 'b': 2}  # key-value pairs\n",
    "\n",
    "print(f\"List: {my_list}\")\n",
    "print(f\"Tuple: {my_tuple}\")\n",
    "print(f\"Dictionary: {my_dict}\")\n",
    "\n",
    "# Demonstrate mutability\n",
    "my_list[0] = 10  # This works\n",
    "print(f\"Modified list: {my_list}\")\n",
    "\n",
    "# my_tuple[0] = 10  # This would cause an error!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "823bb277",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Control Flow - loops and conditionals\n",
    "print(\"Even numbers from 0 to 8:\")\n",
    "for i in range(5):\n",
    "    if i % 2 == 0:\n",
    "        print(f\"{i} is even\")\n",
    "    else:\n",
    "        print(f\"{i} is odd\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0e936848",
   "metadata": {},
   "outputs": [],
   "source": [
    "# List comprehensions - a Pythonic way to create lists\n",
    "numbers = [1, 2, 3, 4, 5]\n",
    "squared = [x**2 for x in numbers]\n",
    "\n",
    "print(f\"Original: {numbers}\")\n",
    "print(f\"Squared: {squared}\")\n",
    "\n",
    "# More complex example: filter even numbers and square them\n",
    "even_squared = [x**2 for x in numbers if x % 2 == 0]\n",
    "print(f\"Even numbers squared: {even_squared}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd85176f",
   "metadata": {},
   "source": [
    "## 2. Introduction to NumPy\n",
    "\n",
    "NumPy is the foundation of scientific computing in Python. It provides efficient operations on arrays of numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d69b80f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import NumPy (standard convention)\n",
    "import numpy as np\n",
    "\n",
    "# Check NumPy version\n",
    "print(f\"NumPy version: {np.__version__}\")\n",
    "\n",
    "# Why NumPy? Performance comparison\n",
    "python_list = [1, 2, 3, 4]\n",
    "numpy_array = np.array([1, 2, 3, 4])\n",
    "\n",
    "print(f\"Python list: {python_list}\")\n",
    "print(f\"NumPy array: {numpy_array}\")\n",
    "print(f\"Array type: {type(numpy_array)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e456ac0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Vectorized operations - the power of NumPy\n",
    "python_list = [1, 2, 3, 4]\n",
    "numpy_array = np.array([1, 2, 3, 4])\n",
    "\n",
    "# With Python lists, you need a loop\n",
    "python_result = []\n",
    "for x in python_list:\n",
    "    python_result.append(x * 2)\n",
    "print(f\"Python way: {python_result}\")\n",
    "\n",
    "# With NumPy, apply operation to entire array at once\n",
    "numpy_result = numpy_array * 2\n",
    "print(f\"NumPy way: {numpy_result}\")\n",
    "\n",
    "# This is much faster for large arrays!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e61deb9",
   "metadata": {},
   "source": [
    "## 3. Creating NumPy Arrays\n",
    "\n",
    "There are many ways to create NumPy arrays depending on your needs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e60f6310",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Different ways to create arrays\n",
    "a = np.array([1, 2, 3, 4])          # From a list\n",
    "b = np.zeros(5)                      # Array of zeros\n",
    "c = np.ones((2, 3))                  # 2x3 array of ones\n",
    "d = np.arange(0, 10, 2)              # [0, 2, 4, 6, 8] - start, stop, step\n",
    "e = np.linspace(0, 1, 5)             # 5 evenly spaced points from 0 to 1\n",
    "f = np.random.random((3, 3))         # Random 3x3 matrix\n",
    "\n",
    "print(\"From list:\", a)\n",
    "print(\"Zeros:\", b)\n",
    "print(\"Ones (2x3):\")\n",
    "print(c)\n",
    "print(\"Range with step:\", d)\n",
    "print(\"Linspace:\", e)\n",
    "print(\"Random 3x3:\")\n",
    "print(f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24229810",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Array properties - important to understand your data\n",
    "array_2d = np.array([[1, 2, 3], [4, 5, 6]])\n",
    "\n",
    "print(f\"Array:\")\n",
    "print(array_2d)\n",
    "print(f\"Shape: {array_2d.shape}\")           # Dimensions: (rows, columns)\n",
    "print(f\"Data type: {array_2d.dtype}\")       # Type of elements\n",
    "print(f\"Number of dimensions: {array_2d.ndim}\")  # 1D, 2D, 3D, etc.\n",
    "print(f\"Total elements: {array_2d.size}\")   # Total number of elements\n",
    "print(f\"Memory usage: {array_2d.nbytes} bytes\")  # Memory consumption"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40ae27de",
   "metadata": {},
   "source": [
    "## 4. Basic Array Operations\n",
    "\n",
    "NumPy allows element-wise operations and mathematical functions on entire arrays."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c60ff920",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Basic arithmetic operations\n",
    "a = np.array([1, 2, 3, 4])\n",
    "b = np.array([5, 6, 7, 8])\n",
    "\n",
    "print(f\"a = {a}\")\n",
    "print(f\"b = {b}\")\n",
    "print(f\"a + b = {a + b}\")     # Element-wise addition\n",
    "print(f\"a - b = {a - b}\")     # Element-wise subtraction\n",
    "print(f\"a * b = {a * b}\")     # Element-wise multiplication (NOT matrix multiplication)\n",
    "print(f\"a / b = {a / b}\")     # Element-wise division\n",
    "print(f\"a ** 2 = {a ** 2}\")   # Element-wise power"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1e1e55c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Mathematical functions\n",
    "a = np.array([1, 4, 9, 16])\n",
    "\n",
    "print(f\"Original array: {a}\")\n",
    "print(f\"Square root: {np.sqrt(a)}\")\n",
    "print(f\"Exponential: {np.exp(a)}\")\n",
    "print(f\"Natural log: {np.log(a)}\")\n",
    "print(f\"Sine: {np.sin(a)}\")\n",
    "print(f\"Cosine: {np.cos(a)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d5233378",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Statistical operations\n",
    "data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])\n",
    "\n",
    "print(f\"Data: {data}\")\n",
    "print(f\"Sum: {np.sum(data)}\")\n",
    "print(f\"Mean: {np.mean(data)}\")\n",
    "print(f\"Standard deviation: {np.std(data)}\")\n",
    "print(f\"Minimum: {np.min(data)}\")\n",
    "print(f\"Maximum: {np.max(data)}\")\n",
    "print(f\"Median: {np.median(data)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9fc0af85",
   "metadata": {},
   "source": [
    "## 5. Array Indexing and Slicing\n",
    "\n",
    "Accessing and modifying array elements is crucial for data manipulation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6dea63f6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 1D array indexing\n",
    "arr = np.array([10, 20, 30, 40, 50])\n",
    "\n",
    "print(f\"Array: {arr}\")\n",
    "print(f\"First element (index 0): {arr[0]}\")\n",
    "print(f\"Last element (index -1): {arr[-1]}\")\n",
    "print(f\"Second to fourth (index 1:4): {arr[1:4]}\")\n",
    "print(f\"Every other element: {arr[::2]}\")\n",
    "print(f\"Reverse array: {arr[::-1]}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7f17b163",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 2D array indexing\n",
    "matrix = np.array([[1, 2, 3], \n",
    "                   [4, 5, 6], \n",
    "                   [7, 8, 9]])\n",
    "\n",
    "print(\"Matrix:\")\n",
    "print(matrix)\n",
    "print(f\"Element at row 0, column 1: {matrix[0, 1]}\")\n",
    "print(f\"First row: {matrix[0, :]}\")\n",
    "print(f\"First column: {matrix[:, 0]}\")\n",
    "print(f\"2x2 submatrix (top-left):\")\n",
    "print(matrix[:2, :2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6c032d42",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Boolean indexing - very powerful for data filtering\n",
    "data = np.array([1, 5, 3, 8, 2, 9, 4])\n",
    "\n",
    "print(f\"Original data: {data}\")\n",
    "\n",
    "# Create boolean mask\n",
    "mask = data > 5\n",
    "print(f\"Mask (elements > 5): {mask}\")\n",
    "\n",
    "# Apply mask to get elements\n",
    "large_values = data[mask]\n",
    "print(f\"Values > 5: {large_values}\")\n",
    "\n",
    "# Can do it in one line\n",
    "small_values = data[data <= 3]\n",
    "print(f\"Values <= 3: {small_values}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7409d5ed",
   "metadata": {},
   "source": [
    "## 6. Array Reshaping and Broadcasting\n",
    "\n",
    "Understanding shapes and how arrays interact is crucial for machine learning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0dfb45d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Reshaping arrays\n",
    "original = np.arange(12)  # [0, 1, 2, ..., 11]\n",
    "print(f\"Original array: {original}\")\n",
    "print(f\"Shape: {original.shape}\")\n",
    "\n",
    "# Reshape to 2D\n",
    "matrix_3x4 = original.reshape(3, 4)\n",
    "print(f\"\\nReshaped to 3x4:\")\n",
    "print(matrix_3x4)\n",
    "\n",
    "# Reshape to different dimensions\n",
    "matrix_2x6 = original.reshape(2, 6)\n",
    "print(f\"\\nReshaped to 2x6:\")\n",
    "print(matrix_2x6)\n",
    "\n",
    "# Flatten back to 1D\n",
    "flattened = matrix_3x4.flatten()\n",
    "print(f\"\\nFlattened: {flattened}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c59423f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Broadcasting - performing operations on arrays of different shapes\n",
    "\n",
    "# Scalar with array\n",
    "arr = np.array([1, 2, 3, 4])\n",
    "result = arr + 10  # Adds 10 to each element\n",
    "print(f\"Array: {arr}\")\n",
    "print(f\"Array + 10: {result}\")\n",
    "\n",
    "# Array with smaller array\n",
    "matrix = np.array([[1, 2, 3], \n",
    "                   [4, 5, 6]])\n",
    "vector = np.array([10, 20, 30])\n",
    "\n",
    "print(f\"\\nMatrix shape: {matrix.shape}\")\n",
    "print(matrix)\n",
    "print(f\"\\nVector shape: {vector.shape}\")\n",
    "print(vector)\n",
    "\n",
    "# Broadcasting: vector is added to each row of matrix\n",
    "broadcast_result = matrix + vector\n",
    "print(f\"\\nResult of matrix + vector:\")\n",
    "print(broadcast_result)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2209244b",
   "metadata": {},
   "source": [
    "## 7. Working with Multi-dimensional Arrays\n",
    "\n",
    "Real-world data often comes in higher dimensions (images, time series, etc.)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c408064",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating and working with 3D arrays\n",
    "# Think of this as a stack of 2D matrices\n",
    "array_3d = np.random.randint(0, 10, size=(2, 3, 4))  # 2 matrices of 3x4\n",
    "\n",
    "print(f\"3D array shape: {array_3d.shape}\")\n",
    "print(f\"3D array:\")\n",
    "print(array_3d)\n",
    "\n",
    "# Access different parts\n",
    "print(f\"\\nFirst matrix (index 0):\")\n",
    "print(array_3d[0])\n",
    "\n",
    "print(f\"\\nElement at position [1, 2, 3]: {array_3d[1, 2, 3]}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23bcc3b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Operations along specific axes\n",
    "matrix = np.array([[1, 2, 3], \n",
    "                   [4, 5, 6], \n",
    "                   [7, 8, 9]])\n",
    "\n",
    "print(\"Matrix:\")\n",
    "print(matrix)\n",
    "\n",
    "# Sum along different axes\n",
    "print(f\"\\nSum of all elements: {np.sum(matrix)}\")\n",
    "print(f\"Sum along axis 0 (columns): {np.sum(matrix, axis=0)}\")\n",
    "print(f\"Sum along axis 1 (rows): {np.sum(matrix, axis=1)}\")\n",
    "\n",
    "# Mean along axes\n",
    "print(f\"\\nMean along axis 0 (columns): {np.mean(matrix, axis=0)}\")\n",
    "print(f\"Mean along axis 1 (rows): {np.mean(matrix, axis=1)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2f2dd5dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Array concatenation and splitting\n",
    "arr1 = np.array([1, 2, 3])\n",
    "arr2 = np.array([4, 5, 6])\n",
    "\n",
    "# Concatenate along different axes\n",
    "concat_horizontal = np.concatenate([arr1, arr2])\n",
    "print(f\"Horizontal concatenation: {concat_horizontal}\")\n",
    "\n",
    "# For 2D arrays\n",
    "mat1 = np.array([[1, 2], [3, 4]])\n",
    "mat2 = np.array([[5, 6], [7, 8]])\n",
    "\n",
    "# Stack vertically (along rows)\n",
    "vertical_stack = np.vstack([mat1, mat2])\n",
    "print(f\"\\nVertical stack:\")\n",
    "print(vertical_stack)\n",
    "\n",
    "# Stack horizontally (along columns)\n",
    "horizontal_stack = np.hstack([mat1, mat2])\n",
    "print(f\"\\nHorizontal stack:\")\n",
    "print(horizontal_stack)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d30feefb",
   "metadata": {},
   "source": [
    "## 8. Linear Algebra with NumPy\n",
    "\n",
    "Essential operations for machine learning algorithms."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a1db220",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Vector operations\n",
    "a = np.array([2, 4, 6])\n",
    "b = np.array([1, 3, 5])\n",
    "\n",
    "print(f\"Vector a: {a}\")\n",
    "print(f\"Vector b: {b}\")\n",
    "\n",
    "# Dot product (very important in ML)\n",
    "dot_product = np.dot(a, b)\n",
    "print(f\"Dot product: {dot_product}\")\n",
    "\n",
    "# Vector magnitude (length)\n",
    "magnitude_a = np.linalg.norm(a)\n",
    "magnitude_b = np.linalg.norm(b)\n",
    "print(f\"Magnitude of a: {magnitude_a:.2f}\")\n",
    "print(f\"Magnitude of b: {magnitude_b:.2f}\")\n",
    "\n",
    "# Unit vector (normalized)\n",
    "unit_vector_a = a / magnitude_a\n",
    "print(f\"Unit vector a: {unit_vector_a}\")\n",
    "print(f\"Magnitude of unit vector: {np.linalg.norm(unit_vector_a):.2f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ae094ffb",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Matrix operations\n",
    "A = np.array([[1, 2], \n",
    "              [3, 4], \n",
    "              [5, 6]])\n",
    "B = np.array([[7, 8], \n",
    "              [9, 10]])\n",
    "\n",
    "print(\"Matrix A (3x2):\")\n",
    "print(A)\n",
    "print(\"\\nMatrix B (2x2):\")\n",
    "print(B)\n",
    "\n",
    "# Matrix multiplication (different from element-wise multiplication)\n",
    "matrix_mult = np.dot(A, B)  # or A @ B\n",
    "print(f\"\\nMatrix multiplication A @ B:\")\n",
    "print(matrix_mult)\n",
    "\n",
    "# Transpose\n",
    "A_transpose = A.T\n",
    "print(f\"\\nTranspose of A:\")\n",
    "print(A_transpose)\n",
    "print(f\"Shape changed from {A.shape} to {A_transpose.shape}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37aefbf1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Example: Simple linear regression setup\n",
    "# This demonstrates how linear algebra is used in ML\n",
    "\n",
    "# Generate sample data\n",
    "np.random.seed(42)  # For reproducible results\n",
    "n_samples, n_features = 100, 3\n",
    "\n",
    "# Feature matrix X (each row is a data point)\n",
    "X = np.random.randn(n_samples, n_features)\n",
    "\n",
    "# True weights (what we want to learn)\n",
    "true_weights = np.array([1.5, -2.0, 0.5])\n",
    "\n",
    "# Generate target values with some noise\n",
    "noise = np.random.randn(n_samples) * 0.1\n",
    "y = X @ true_weights + noise  # @ is matrix multiplication\n",
    "\n",
    "print(f\"Data shape: {X.shape}\")\n",
    "print(f\"Target shape: {y.shape}\")\n",
    "print(f\"True weights: {true_weights}\")\n",
    "\n",
    "# Add bias term (intercept)\n",
    "X_with_bias = np.column_stack([np.ones(n_samples), X])\n",
    "print(f\"X with bias shape: {X_with_bias.shape}\")\n",
    "\n",
    "# Analytical solution: w = (X^T X)^(-1) X^T y\n",
    "XTX_inv = np.linalg.inv(X_with_bias.T @ X_with_bias)\n",
    "estimated_weights = XTX_inv @ X_with_bias.T @ y\n",
    "\n",
    "print(f\"\\nEstimated weights (with bias): {estimated_weights}\")\n",
    "print(f\"True weights (with bias=0): [0, {true_weights[0]}, {true_weights[1]}, {true_weights[2]}]\")\n",
    "print(f\"Error: {np.abs(estimated_weights[1:] - true_weights)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "801f8b05",
   "metadata": {},
   "source": [
    "## 9. Practical Examples for Machine Learning\n",
    "\n",
    "Common data preprocessing tasks using NumPy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7b6483f6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Data normalization - important preprocessing step\n",
    "# Generate sample dataset\n",
    "np.random.seed(42)\n",
    "data = np.random.randn(50, 3) * [10, 100, 0.1] + [50, 500, 5]\n",
    "\n",
    "print(\"Original data statistics:\")\n",
    "print(f\"Mean: {data.mean(axis=0)}\")\n",
    "print(f\"Std: {data.std(axis=0)}\")\n",
    "print(f\"Min: {data.min(axis=0)}\")\n",
    "print(f\"Max: {data.max(axis=0)}\")\n",
    "\n",
    "# Z-score normalization (zero mean, unit variance)\n",
    "data_zscore = (data - data.mean(axis=0)) / data.std(axis=0)\n",
    "\n",
    "print(\"\\nAfter Z-score normalization:\")\n",
    "print(f\"Mean: {data_zscore.mean(axis=0)}\")\n",
    "print(f\"Std: {data_zscore.std(axis=0)}\")\n",
    "\n",
    "# Min-Max normalization (scale to [0, 1])\n",
    "data_minmax = (data - data.min(axis=0)) / (data.max(axis=0) - data.min(axis=0))\n",
    "\n",
    "print(\"\\nAfter Min-Max normalization:\")\n",
    "print(f\"Min: {data_minmax.min(axis=0)}\")\n",
    "print(f\"Max: {data_minmax.max(axis=0)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a5c481e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Train-test split implementation\n",
    "def train_test_split_numpy(X, y, test_size=0.2, random_state=None):\n",
    "    \"\"\"Simple train-test split using NumPy\"\"\"\n",
    "    if random_state:\n",
    "        np.random.seed(random_state)\n",
    "    \n",
    "    n_samples = len(X)\n",
    "    n_test = int(n_samples * test_size)\n",
    "    \n",
    "    # Random permutation of indices\n",
    "    indices = np.random.permutation(n_samples)\n",
    "    \n",
    "    # Split indices\n",
    "    test_idx = indices[:n_test]\n",
    "    train_idx = indices[n_test:]\n",
    "    \n",
    "    return X[train_idx], X[test_idx], y[train_idx], y[test_idx]\n",
    "\n",
    "# Example usage\n",
    "X = np.random.randn(100, 3)\n",
    "y = np.random.randint(0, 2, 100)\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split_numpy(X, y, test_size=0.2, random_state=42)\n",
    "\n",
    "print(f\"Original data: {len(X)} samples\")\n",
    "print(f\"Training set: {len(X_train)} samples\")\n",
    "print(f\"Test set: {len(X_test)} samples\")\n",
    "print(f\"Test ratio: {len(X_test) / len(X):.1%}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "327aee79",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Computing basic statistics for analysis\n",
    "# Generate sample dataset\n",
    "np.random.seed(42)\n",
    "dataset = np.random.randn(200, 4)\n",
    "\n",
    "print(\"Dataset shape:\", dataset.shape)\n",
    "print(\"\\nBasic statistics:\")\n",
    "print(f\"Mean of each feature: {dataset.mean(axis=0)}\")\n",
    "print(f\"Standard deviation: {dataset.std(axis=0)}\")\n",
    "print(f\"Variance: {dataset.var(axis=0)}\")\n",
    "print(f\"Minimum values: {dataset.min(axis=0)}\")\n",
    "print(f\"Maximum values: {dataset.max(axis=0)}\")\n",
    "\n",
    "# Correlation matrix between features\n",
    "correlation_matrix = np.corrcoef(dataset.T)  # Transpose for feature correlations\n",
    "print(f\"\\nCorrelation matrix:\")\n",
    "print(correlation_matrix)\n",
    "\n",
    "# Find highly correlated features (correlation > 0.5)\n",
    "high_corr_mask = np.abs(correlation_matrix) > 0.5\n",
    "# Remove diagonal (feature with itself)\n",
    "np.fill_diagonal(high_corr_mask, False)\n",
    "\n",
    "high_corr_pairs = np.where(high_corr_mask)\n",
    "if len(high_corr_pairs[0]) > 0:\n",
    "    print(f\"\\nHighly correlated feature pairs:\")\n",
    "    for i, j in zip(high_corr_pairs[0], high_corr_pairs[1]):\n",
    "        print(f\"Features {i} and {j}: correlation = {correlation_matrix[i, j]:.3f}\")\n",
    "else:\n",
    "    print(\"\\nNo highly correlated features found.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b77f26e1",
   "metadata": {},
   "source": [
    "## 10. Summary and Next Steps\n",
    "\n",
    "You now have the essential Python and NumPy skills needed for machine learning!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c432506",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Quick review: What we've covered\n",
    "print(\"Python and NumPy Basics - Summary:\")\n",
    "print(\"\\n1. Python fundamentals:\")\n",
    "print(\"   - Variables and data types\")\n",
    "print(\"   - Lists, tuples, dictionaries\")\n",
    "print(\"   - Control flow and list comprehensions\")\n",
    "\n",
    "print(\"\\n2. NumPy essentials:\")\n",
    "print(\"   - Array creation and properties\")\n",
    "print(\"   - Vectorized operations\")\n",
    "print(\"   - Indexing and slicing\")\n",
    "print(\"   - Broadcasting\")\n",
    "print(\"   - Linear algebra operations\")\n",
    "\n",
    "print(\"\\n3. ML preprocessing:\")\n",
    "print(\"   - Data normalization\")\n",
    "print(\"   - Train-test splitting\")\n",
    "print(\"   - Statistical analysis\")\n",
    "\n",
    "print(\"\\nYou're ready for machine learning algorithms!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e80be967",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Test your understanding - try these exercises:\n",
    "\n",
    "# Exercise 1: Create a 5x5 matrix of random numbers and find:\n",
    "# - The sum of each row\n",
    "# - The maximum value in each column\n",
    "# - All values greater than 0.5\n",
    "\n",
    "print(\"Exercise 1:\")\n",
    "matrix = np.random.random((5, 5))\n",
    "print(\"Random 5x5 matrix:\")\n",
    "print(matrix)\n",
    "print(f\"Sum of each row: {matrix.sum(axis=1)}\")\n",
    "print(f\"Max of each column: {matrix.max(axis=0)}\")\n",
    "print(f\"Number of values > 0.5: {np.sum(matrix > 0.5)}\")\n",
    "\n",
    "# Exercise 2: Normalize a dataset and verify the result\n",
    "print(\"\\nExercise 2:\")\n",
    "data = np.random.randn(100, 3) * [5, 10, 2] + [10, 50, 5]\n",
    "normalized = (data - data.mean(axis=0)) / data.std(axis=0)\n",
    "print(f\"Original mean: {data.mean(axis=0)}\")\n",
    "print(f\"Normalized mean: {normalized.mean(axis=0)}\")\n",
    "print(f\"Normalized std: {normalized.std(axis=0)}\")"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}