You can use a case-insensitive regular expression:
val df = sc.parallelize(Seq( (1L, "Fortinet"), (2L, "foRtinet"), (3L, "foo") )).toDF("k", "v") df.where($"v".rlike("(?i)^fortinet$")).show // +---+--------+ // | k| v| // +---+--------+ // | 1|Fortinet| // | 2|foRtinet| // +---+--------+
or simple equality with lower / upper :
import org.apache.spark.sql.functions.{lower, upper} df.where(lower($"v") === "fortinet") // +---+--------+ // | k| v| // +---+--------+ // | 1|Fortinet| // | 2|foRtinet| // +---+--------+ df.where(upper($"v") === "FORTINET") // +---+--------+ // | k| v| // +---+--------+ // | 1|Fortinet| // | 2|foRtinet| // +---+--------+
For simple filters, I would prefer rlike , although performance should be the same, for join conditions equality is a much better choice. See How can we join two Spark SQL frames using the SQL-esque "LIKE" criterion? for details.
source share