The essence of a dataframe.

A minimalist Python dataframe library — lightweight, fast, backed by numpy arrays.

Get Started →
from tafra import Tafra
import numpy as np

t = Tafra({
    'group': np.array(['a', 'a', 'b', 'b']),
    'value': np.array([10, 20, 30, 40])
})

result = t.group_by(
    ['group'], {'value': np.sum}
) # 2 rows: a→30, b→70
{ }

Just Arrays

t['col'] returns the numpy array. No wrappers, no copies, no surprises, no new API to learn.

SQL on NumPy

GroupBy, Transform, Inner/Left/Cross Join — SQL semantics backed by vectorized numpy.

Built for Numerics

Designed for scientific Python. Native numba JIT, multiprocessing-ready partitioning, and zero-adapter interop with any library that speaks numpy.

Speed Where It Matters

Lower is better. Full benchmarks →

Construction: 100k rows, 5 cols (ms)
tafra
0.01
polars
0.03
pandas
2.83
Column Access: per call (µs)
tafra
0.09
polars
0.56
pandas
10.7
Numba JIT: 1M rows (ms)
tafra
6.88
polars
6.90
pandas
6.95
Transform: 1M rows, 1k groups (ms)
tafra+C
8.24
polars
9.76
pandas
25.44
Zero Overhead
from tafra import Tafra
import numpy as np

t = Tafra({
    'x': np.arange(1_000_000)
})

# Direct numpy array
arr = t['x']  # the actual ndarray
np.sum(arr)   # numpy works directly

# Pass straight to numba
from numba import jit
@jit
def square_plus_one(x):
    return x ** 2 + 1
result = square_plus_one(t['x'])
From Anywhere
# From a dict of arrays
t = Tafra({
    'a': np.array([1, 2, 3]),
    'b': np.array([4, 5, 6]),
})

# From a pandas DataFrame
t = Tafra.from_dataframe(df)

# From a SQL cursor
t = Tafra.read_sql(query, conn)

# From CSV
t = read_csv('data.csv')

# Back to pandas
df = t.to_pandas()