pyspark.sql.functions.hash#
- pyspark.sql.functions.hash(*cols)[source]#
Calculates the hash code of given columns, and returns the result as an int column.
New in version 2.0.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- cols
Column
or str one or more columns to compute on.
- cols
- Returns
Column
hash value as int column.
Examples
>>> df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2'])
Hash for one column
>>> df.select(hash('c1').alias('hash')).show() +----------+ | hash| +----------+ |-757602832| +----------+
Two or more columns
>>> df.select(hash('c1', 'c2').alias('hash')).show() +---------+ | hash| +---------+ |599895104| +---------+