The essence of a dataframe.
A minimalist Python dataframe library — lightweight, fast, backed by numpy arrays.
from tafra import Tafra
import numpy as np
t = Tafra({
'group': np.array(['a', 'a', 'b', 'b']),
'value': np.array([10, 20, 30, 40])
})
result = t.group_by(
['group'], {'value': np.sum}
) # 2 rows: a→30, b→70
import numpy as np
t = Tafra({
'group': np.array(['a', 'a', 'b', 'b']),
'value': np.array([10, 20, 30, 40])
})
result = t.group_by(
['group'], {'value': np.sum}
) # 2 rows: a→30, b→70
Just Arrays
t['col'] returns the numpy array. No wrappers, no copies, no surprises, no new API to learn.
SQL on NumPy
GroupBy, Transform, Inner/Left/Cross Join — SQL semantics backed by vectorized numpy.
Built for Numerics
Designed for scientific Python. Native numba JIT, multiprocessing-ready partitioning, and zero-adapter interop with any library that speaks numpy.
Speed Where It Matters
Lower is better. Full benchmarks →
Construction: 100k rows, 5 cols (ms)
tafra
polars
pandas
Column Access: per call (µs)
tafra
polars
pandas
Numba JIT: 1M rows (ms)
tafra
polars
pandas
Transform: 1M rows, 1k groups (ms)
tafra+C
polars
pandas
Zero Overhead
from tafra import Tafra
import numpy as np
t = Tafra({
'x': np.arange(1_000_000)
})
# Direct numpy array
arr = t['x'] # the actual ndarray
np.sum(arr) # numpy works directly
# Pass straight to numba
from numba import jit
@jit
def square_plus_one(x):
return x ** 2 + 1
result = square_plus_one(t['x'])
From Anywhere
# From a dict of arrays
t = Tafra({
'a': np.array([1, 2, 3]),
'b': np.array([4, 5, 6]),
})
# From a pandas DataFrame
t = Tafra.from_dataframe(df)
# From a SQL cursor
t = Tafra.read_sql(query, conn)
# From CSV
t = read_csv('data.csv')
# Back to pandas
df = t.to_pandas()