## Data Foundations for AI

### NumPy: Numerical Computing

```python import numpy as np

# Embeddings are NumPy arrays embedding = np.array([0.1, 0.5, -0.3, 0.8])

# Cosine similarity def cosine_similarity(a, b): return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

sim = cosine_similarity(embedding1, embedding2)

# Matrix operations for attention attention_scores = np.matmul(queries, keys.T) / np.sqrt(d_k) attention_weights = softmax(attention_scores) ```

### Pandas: Data Preparation

```python import pandas as pd

# Load and explore training data df = pd.read_csv("training_data.csv") print(df.describe()) print(df.isnull().sum())

# Clean and prepare df = df.dropna(subset=["text", "label"]) df["text"] = df["text"].str.lower().str.strip() df["text_length"] = df["text"].apply(len)

# Split for training from sklearn.model_selection import train_test_split train, test = train_test_split(df, test_size=0.2, random_state=42) ```

### Common AI Data Tasks

Tokenization stats: Analyze token distributions with Pandas
Embedding analysis: Compute similarities with NumPy
Dataset balancing: Sample or augment underrepresented classes
Feature engineering: Create numeric features from text
Evaluation: Calculate metrics across test sets

## Data Foundations for AI

### NumPy: Numerical Computing

```python import numpy as np

# Embeddings are NumPy arrays embedding = np.array([0.1, 0.5, -0.3, 0.8])

# Cosine similarity def cosine_similarity(a, b): return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

sim = cosine_similarity(embedding1, embedding2)

# Matrix operations for attention attention_scores = np.matmul(queries, keys.T) / np.sqrt(d_k) attention_weights = softmax(attention_scores) ```

### Pandas: Data Preparation

```python import pandas as pd

# Load and explore training data df = pd.read_csv("training_data.csv") print(df.describe()) print(df.isnull().sum())

# Clean and prepare df = df.dropna(subset=["text", "label"]) df["text"] = df["text"].str.lower().str.strip() df["text_length"] = df["text"].apply(len)

# Split for training from sklearn.model_selection import train_test_split train, test = train_test_split(df, test_size=0.2, random_state=42) ```

### Common AI Data Tasks

Tokenization stats: Analyze token distributions with Pandas
Embedding analysis: Compute similarities with NumPy
Dataset balancing: Sample or augment underrepresented classes
Feature engineering: Create numeric features from text
Evaluation: Calculate metrics across test sets

NumPy & Pandas for AI Data

Key Takeaways

Frequently Asked Questions

NumPy & Pandas for AI Data

Key Takeaways

Frequently Asked Questions

NumPy & Pandas for AI Data

Key Takeaways

Frequently Asked Questions

Is the "Python AI & ML Libraries" course free?

How long does the "Python AI & ML Libraries" course take?

What will I learn in this course?

Do I need prior experience for this course?

Do I get a certificate after completing this course?

NumPy & Pandas for AI Data

Key Takeaways

Frequently Asked Questions

Is the "Python AI & ML Libraries" course free?

How long does the "Python AI & ML Libraries" course take?

What will I learn in this course?

Do I need prior experience for this course?

Do I get a certificate after completing this course?