Skip to content

Column Operations

Tafra uses dict-like access for columns and provides methods for selecting, renaming, updating, and deleting columns.

Accessing Columns

By name (__getitem__ with str)

Returns the raw numpy.ndarray for the column:

from tafra import Tafra
import numpy as np

t = Tafra({
    'x': np.array([1, 2, 3]),
    'y': np.array([4.0, 5.0, 6.0]),
    'name': np.array(['a', 'b', 'c']),
})

arr = t['x']
print(type(arr))
Output
<class 'numpy.ndarray'>

By integer index

Returns a single-row Tafra:

row = t[0]
print(row.rows)
Output
1

By slice

Returns a sliced Tafra:

first_two = t[0:2]
print(first_two.rows)
Output
2

By boolean array

Filters rows where the condition is True:

mask = t['x'] > 1
filtered = t[mask]
Output
Tafra with x=[2, 3], y=[5.0, 6.0], name=['b', 'c']

By list of column names

Returns a Tafra with only the listed columns (like select):

subset = t[['x', 'name']]
print(subset.columns)
Output
('x', 'name')

Setting Columns

Assign an array, list, or scalar to a column name. The value is validated for length and converted to an ndarray:

t['z'] = np.array([7, 8, 9])        # new column
t['x'] = np.array([10, 20, 30])     # overwrite existing column
t['flag'] = True                     # scalar broadcast to all rows

Dict-like Interface

keys(), values(), items()

print(t.keys())
print(t.values())
print(t.items())
Output
dict_keys(['x', 'y', 'name', 'z', 'flag'])
dict_values([array([10, 20, 30]), array([4., 5., 6.]), ...])
dict_items([('x', array([10, 20, 30])), ...])

get()

Returns the column array, or a default if the column does not exist:

arr = t.get('x')             # np.ndarray([10, 20, 30])
arr = t.get('missing', None) # None

Selecting Columns

select() returns a new Tafra with only the specified columns. This does not copy the underlying data -- call .copy() if you need independent arrays.

sub = t.select(['x', 'y'])
print(sub.columns)
Output
('x', 'y')
# With copy
sub_copy = t.select(['x', 'y']).copy()

Updating from Another Tafra

update() merges columns from another Tafra into this one. Both must have the same row count. Returns a new Tafra.

other = Tafra({'w': np.array([100, 200, 300])})
t2 = t.update(other)
print('w' in t2.keys())
Output
True

Use update_inplace() for the in-place version:

t.update_inplace(other)

Updating Dtypes

update_dtypes() casts columns to new dtypes. Returns a new Tafra.

t2 = t.update_dtypes({'x': 'float64'})
print(t2['x'].dtype)
Output
float64

Use update_dtypes_inplace() for the in-place version:

t.update_dtypes_inplace({'x': 'float64'})

The 'str' label converts to StringDType(na_object=None), which supports None values:

t.update_dtypes_inplace({'x': 'str'})
t['x'][0] = None  # works — nullable string column

Dtype metadata

_dtypes tracks the user's declared intent for each column's type. It is the source of truth for dtype validation in joins and unions. Use update_dtypes_inplace() to change a column's type — it updates both the metadata and the underlying array. If you assign directly to _data, you must call _coalesce_dtypes() to resync.

Renaming Columns

rename() takes a dict mapping old names to new names. Returns a new Tafra.

t2 = t.rename({'x': 'x_val', 'y': 'y_val'})
print(t2.columns)
Output
('x_val', 'y_val', 'name', ...)

Use rename_inplace() for the in-place version.

Deleting Columns

delete() removes columns by name. Returns a new Tafra.

t2 = t.delete(['z', 'flag'])
print('z' in t2.keys())
Output
False

Use delete_inplace() for the in-place version:

t.delete_inplace(['z', 'flag'])

Both accept a single string or a list of strings.

Row Iteration

iterrows()

Yields each row as a single-row Tafra. Convenient but slow for large data.

for row in t.iterrows():
    print(row['x'], row['y'])

itertuples()

Yields rows as NamedTuple instances. Faster than iterrows().

for row in t.itertuples():
    print(row.x, row.y)

# As plain tuples (no named fields)
for row in t.itertuples(name=None):
    print(row)
Output
(10, 4.0, 'a', ...)

itercols()

Yields (column_name, ndarray) tuples:

for name, arr in t.itercols():
    print(f'{name}: {arr.dtype}, len={len(arr)}')

Mapping Functions

row_map(fn) -- map over rows

results = list(t.row_map(lambda row: row['x'] * 2))

tuple_map(fn) -- map over named tuples (faster)

results = list(t.tuple_map(lambda row: row.x * 2))

col_map(fn) -- map over columns

means = list(t.select(['x', 'y']).col_map(np.mean))

key_map(fn) -- map over columns with names

named_means = dict(t.select(['x', 'y']).key_map(np.mean))
print(named_means)
Output
{'x': 20.0, 'y': 5.0}

Properties

Property Type Description
columns Tuple[str, ...] Column names
rows int Number of rows
data Dict[str, ndarray] Underlying data dict (read-only)
dtypes Dict[str, str] Column dtype strings (read-only)
shape Tuple[int, int] (rows, n_columns)
size int rows * n_columns
ndim int Always 2

Other Operations

Method Description
head(n=5) First n rows
tail(n=5) Last n rows
sort(columns, reverse=False) Sort by one or more columns
sample(n, seed=None) Random sample of n rows
copy(order='C') Deep copy
drop_duplicates(columns=None) Remove duplicate rows
value_counts(column) Count unique values
describe() Summary statistics for numeric columns
shift(n=1) Shift rows (lag/lead)
coalesce(column, fills) Fill None/NaN from fallback values
pipe(fn) Apply a function, return result (also t >> fn)
union(other) Append rows (like SQL UNION)
to_csv(path) Write to CSV
to_pandas() Convert to pandas.DataFrame
to_records() Iterator of row tuples
to_html() HTML table string