The essence of a dataframe.

A minimalist Python dataframe library — lightweight, fast, backed by numpy arrays.

from tafra import Tafra
import numpy as np

t = Tafra({
    'group': np.array(['a', 'a', 'b', 'b']),
    'value': np.array([10, 20, 30, 40])
})

result = t.group_by(
    ['group'], {'value': np.sum}
) # 2 rows: a→30, b→70

{ }

Just Arrays

t['col'] returns the numpy array. No wrappers, no copies, no surprises, no new API to learn.

⊝

SQL on NumPy

GroupBy, Transform, Inner/Left/Cross Join — SQL semantics backed by vectorized numpy.

⚡

Built for Numerics

Designed for scientific Python. Native numba JIT, multiprocessing-ready partitioning, and zero-adapter interop with any library that speaks numpy.

Speed Where It Matters

Lower is better. Full benchmarks →

Construction: 100k rows, 5 cols (ms)

tafra

0.01

polars

0.03

pandas

2.83

Column Access: per call (µs)

tafra

0.09

polars

0.56

pandas

10.7

Numba JIT: 1M rows (ms)

tafra

6.88

polars

6.90

pandas

6.95

Transform: 1M rows, 1k groups (ms)

tafra+C

8.24

polars

9.76

pandas

25.44

Zero Overhead

from tafra import Tafra
import numpy as np

t = Tafra({
    'x': np.arange(1_000_000)
})

# Direct numpy array
arr = t['x']  # the actual ndarray
np.sum(arr)   # numpy works directly

# Pass straight to numba
from numba import jit
@jit
def square_plus_one(x):
    return x ** 2 + 1
result = square_plus_one(t['x'])

From Anywhere

# From a dict of arrays
t = Tafra({
    'a': np.array([1, 2, 3]),
    'b': np.array([4, 5, 6]),
})

# From a pandas DataFrame
t = Tafra.from_dataframe(df)

# From a SQL cursor
t = Tafra.read_sql(query, conn)

# From CSV
t = read_csv('data.csv')

# Back to pandas
df = t.to_pandas()