Column Operations
Tafra uses dict-like access for columns and provides methods for selecting, renaming, updating, and deleting columns.
Accessing Columns
By name (__getitem__ with str)
Returns the raw numpy.ndarray for the column:
from tafra import Tafra
import numpy as np
t = Tafra({
'x': np.array([1, 2, 3]),
'y': np.array([4.0, 5.0, 6.0]),
'name': np.array(['a', 'b', 'c']),
})
arr = t['x']
print(type(arr))
By integer index
Returns a single-row Tafra:
By slice
Returns a sliced Tafra:
By boolean array
Filters rows where the condition is True:
By list of column names
Returns a Tafra with only the listed columns (like select):
Setting Columns
Assign an array, list, or scalar to a column name. The value is validated for
length and converted to an ndarray:
t['z'] = np.array([7, 8, 9]) # new column
t['x'] = np.array([10, 20, 30]) # overwrite existing column
t['flag'] = True # scalar broadcast to all rows
Dict-like Interface
keys(), values(), items()
Output
get()
Returns the column array, or a default if the column does not exist:
Selecting Columns
select() returns a new Tafra with only the specified columns. This does
not copy the underlying data -- call .copy() if you need independent
arrays.
Updating from Another Tafra
update() merges columns from another Tafra into this one. Both must have
the same row count. Returns a new Tafra.
Use update_inplace() for the in-place version:
Updating Dtypes
update_dtypes() casts columns to new dtypes. Returns a new Tafra.
Use update_dtypes_inplace() for the in-place version:
The 'str' label converts to StringDType(na_object=None), which supports
None values:
Dtype metadata
_dtypes tracks the user's declared intent for each column's type. It is
the source of truth for dtype validation in joins and unions. Use
update_dtypes_inplace() to change a column's type — it updates both
the metadata and the underlying array. If you assign directly to _data,
you must call _coalesce_dtypes() to resync.
Renaming Columns
rename() takes a dict mapping old names to new names. Returns a new Tafra.
Use rename_inplace() for the in-place version.
Deleting Columns
delete() removes columns by name. Returns a new Tafra.
Use delete_inplace() for the in-place version:
Both accept a single string or a list of strings.
Row Iteration
iterrows()
Yields each row as a single-row Tafra. Convenient but slow for large data.
itertuples()
Yields rows as NamedTuple instances. Faster than iterrows().
for row in t.itertuples():
print(row.x, row.y)
# As plain tuples (no named fields)
for row in t.itertuples(name=None):
print(row)
itercols()
Yields (column_name, ndarray) tuples:
Mapping Functions
row_map(fn) -- map over rows
tuple_map(fn) -- map over named tuples (faster)
col_map(fn) -- map over columns
key_map(fn) -- map over columns with names
Properties
| Property | Type | Description |
|---|---|---|
columns |
Tuple[str, ...] |
Column names |
rows |
int |
Number of rows |
data |
Dict[str, ndarray] |
Underlying data dict (read-only) |
dtypes |
Dict[str, str] |
Column dtype strings (read-only) |
shape |
Tuple[int, int] |
(rows, n_columns) |
size |
int |
rows * n_columns |
ndim |
int |
Always 2 |
Other Operations
| Method | Description |
|---|---|
head(n=5) |
First n rows |
tail(n=5) |
Last n rows |
sort(columns, reverse=False) |
Sort by one or more columns |
sample(n, seed=None) |
Random sample of n rows |
copy(order='C') |
Deep copy |
drop_duplicates(columns=None) |
Remove duplicate rows |
value_counts(column) |
Count unique values |
describe() |
Summary statistics for numeric columns |
shift(n=1) |
Shift rows (lag/lead) |
coalesce(column, fills) |
Fill None/NaN from fallback values |
pipe(fn) |
Apply a function, return result (also t >> fn) |
union(other) |
Append rows (like SQL UNION) |
to_csv(path) |
Write to CSV |
to_pandas() |
Convert to pandas.DataFrame |
to_records() |
Iterator of row tuples |
to_html() |
HTML table string |