Column Operations

Tafra uses dict-like access for columns and provides methods for selecting, renaming, updating, and deleting columns.

Accessing Columns

By name (`getitem` with `str`)

Returns the raw numpy.ndarray for the column:

from tafra import Tafra
import numpy as np

t = Tafra({
    'x': np.array([1, 2, 3]),
    'y': np.array([4.0, 5.0, 6.0]),
    'name': np.array(['a', 'b', 'c']),
})

arr = t['x']
print(type(arr))

Output

<class 'numpy.ndarray'>

By integer index

Returns a single-row Tafra:

row = t[0]
print(row.rows)

Output

By slice

Returns a sliced Tafra:

first_two = t[0:2]
print(first_two.rows)

Output

By boolean array

Filters rows where the condition is True:

mask = t['x'] > 1
filtered = t[mask]

Output

Tafra with x=[2, 3], y=[5.0, 6.0], name=['b', 'c']

By list of column names

Returns a Tafra with only the listed columns (like select):

subset = t[['x', 'name']]
print(subset.columns)

Output

('x', 'name')

Setting Columns

Assign an array, list, or scalar to a column name. The value is validated for length and converted to an ndarray:

t['z'] = np.array([7, 8, 9])        # new column
t['x'] = np.array([10, 20, 30])     # overwrite existing column
t['flag'] = True                     # scalar broadcast to all rows

Dict-like Interface

`keys()`, `values()`, `items()`

print(t.keys())
print(t.values())
print(t.items())

Output

dict_keys(['x', 'y', 'name', 'z', 'flag'])
dict_values([array([10, 20, 30]), array([4., 5., 6.]), ...])
dict_items([('x', array([10, 20, 30])), ...])

`get()`

Returns the column array, or a default if the column does not exist:

arr = t.get('x')             # np.ndarray([10, 20, 30])
arr = t.get('missing', None) # None

Selecting Columns

select() returns a new Tafra with only the specified columns. This does not copy the underlying data -- call .copy() if you need independent arrays.

sub = t.select(['x', 'y'])
print(sub.columns)

Output

('x', 'y')

# With copy
sub_copy = t.select(['x', 'y']).copy()

Updating from Another Tafra

update() merges columns from another Tafra into this one. Both must have the same row count. Returns a new Tafra.

other = Tafra({'w': np.array([100, 200, 300])})
t2 = t.update(other)
print('w' in t2.keys())

Output

True

Use update_inplace() for the in-place version:

t.update_inplace(other)

Updating Dtypes

update_dtypes() casts columns to new dtypes. Returns a new Tafra.

t2 = t.update_dtypes({'x': 'float64'})
print(t2['x'].dtype)

Output

float64

Use update_dtypes_inplace() for the in-place version:

t.update_dtypes_inplace({'x': 'float64'})

The 'str' label converts to StringDType(na_object=None), which supports None values:

t.update_dtypes_inplace({'x': 'str'})
t['x'][0] = None  # works — nullable string column

Dtype metadata

_dtypes tracks the user's declared intent for each column's type. It is the source of truth for dtype validation in joins and unions. Use update_dtypes_inplace() to change a column's type — it updates both the metadata and the underlying array. If you assign directly to _data, you must call _coalesce_dtypes() to resync.

Renaming Columns

rename() takes a dict mapping old names to new names. Returns a new Tafra.

t2 = t.rename({'x': 'x_val', 'y': 'y_val'})
print(t2.columns)

Output

('x_val', 'y_val', 'name', ...)

Use rename_inplace() for the in-place version.

Deleting Columns

delete() removes columns by name. Returns a new Tafra.

t2 = t.delete(['z', 'flag'])
print('z' in t2.keys())

Output

False

Use delete_inplace() for the in-place version:

t.delete_inplace(['z', 'flag'])

Both accept a single string or a list of strings.

Row Iteration

`iterrows()`

Yields each row as a single-row Tafra. Convenient but slow for large data.

for row in t.iterrows():
    print(row['x'], row['y'])

`itertuples()`

Yields rows as NamedTuple instances. Faster than iterrows().

for row in t.itertuples():
    print(row.x, row.y)

# As plain tuples (no named fields)
for row in t.itertuples(name=None):
    print(row)

Output

(10, 4.0, 'a', ...)

`itercols()`

Yields (column_name, ndarray) tuples:

for name, arr in t.itercols():
    print(f'{name}: {arr.dtype}, len={len(arr)}')

Mapping Functions

`row_map(fn)` -- map over rows

results = list(t.row_map(lambda row: row['x'] * 2))

`tuple_map(fn)` -- map over named tuples (faster)

results = list(t.tuple_map(lambda row: row.x * 2))

`col_map(fn)` -- map over columns

means = list(t.select(['x', 'y']).col_map(np.mean))

`key_map(fn)` -- map over columns with names

named_means = dict(t.select(['x', 'y']).key_map(np.mean))
print(named_means)

Output

{'x': 20.0, 'y': 5.0}

Properties

Property	Type	Description
`columns`	`Tuple[str, ...]`	Column names
`rows`	`int`	Number of rows
`data`	`Dict[str, ndarray]`	Underlying data dict (read-only)
`dtypes`	`Dict[str, str]`	Column dtype strings (read-only)
`shape`	`Tuple[int, int]`	`(rows, n_columns)`
`size`	`int`	`rows * n_columns`
`ndim`	`int`	Always `2`

Other Operations

Method	Description
`head(n=5)`	First `n` rows
`tail(n=5)`	Last `n` rows
`sort(columns, reverse=False)`	Sort by one or more columns
`sample(n, seed=None)`	Random sample of `n` rows
`copy(order='C')`	Deep copy
`drop_duplicates(columns=None)`	Remove duplicate rows
`value_counts(column)`	Count unique values
`describe()`	Summary statistics for numeric columns
`shift(n=1)`	Shift rows (lag/lead)
`coalesce(column, fills)`	Fill None/NaN from fallback values
`pipe(fn)`	Apply a function, return result (also `t >> fn`)
`union(other)`	Append rows (like SQL UNION)
`to_csv(path)`	Write to CSV
`to_pandas()`	Convert to `pandas.DataFrame`
`to_records()`	Iterator of row tuples
`to_html()`	HTML table string

Column Operations

Accessing Columns

By name (__getitem__ with str)

By integer index

By slice

By boolean array

By list of column names

Setting Columns

Dict-like Interface

keys(), values(), items()

get()

Selecting Columns

Updating from Another Tafra

Updating Dtypes

Renaming Columns

Deleting Columns

Row Iteration

iterrows()

itertuples()

itercols()

Mapping Functions

row_map(fn) -- map over rows

tuple_map(fn) -- map over named tuples (faster)

col_map(fn) -- map over columns

key_map(fn) -- map over columns with names

Properties

Other Operations

By name (`getitem` with `str`)

`keys()`, `values()`, `items()`

`get()`

`iterrows()`

`itertuples()`

`itercols()`

`row_map(fn)` -- map over rows

`tuple_map(fn)` -- map over named tuples (faster)

`col_map(fn)` -- map over columns

`key_map(fn)` -- map over columns with names