pyspark.pandas.DataFrame.itertuples#

DataFrame.itertuples(index=True, name='PandasOnSpark')[source]#

Iterate over DataFrame rows as namedtuples.

Parameters

indexbool, default True: If True, return the index as the first element of the tuple.
namestr or None, default “PandasOnSpark”: The name of the returned namedtuples or None to return regular tuples.

Returns

iterator: An object to iterate over namedtuples for each row in the DataFrame with the first field possibly being the index and following fields being the column values.

See also

DataFrame.iterrows: Iterate over DataFrame rows as (index, Series) pairs.
DataFrame.items: Iterate over (column name, Series) pairs.

Notes

The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore.

Examples

>>> df = ps.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},
...                   index=['dog', 'hawk'])
>>> df
      num_legs  num_wings
dog          4          0
hawk         2          2

>>> for row in df.itertuples():
...     print(row)
...
PandasOnSpark(Index='dog', num_legs=4, num_wings=0)
PandasOnSpark(Index='hawk', num_legs=2, num_wings=2)

By setting the index parameter to False we can remove the index as the first element of the tuple:

>>> for row in df.itertuples(index=False):
...     print(row)
...
PandasOnSpark(num_legs=4, num_wings=0)
PandasOnSpark(num_legs=2, num_wings=2)

With the name parameter set we set a custom name for the yielded namedtuples:

>>> for row in df.itertuples(name='Animal'):
...     print(row)
...
Animal(Index='dog', num_legs=4, num_wings=0)
Animal(Index='hawk', num_legs=2, num_wings=2)