-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Description
When trying to run a config check on a parquet file, the following error can be seen:
root@lubuntu:/home/jyoti/Spark# /opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit --num-executors 10 --executor-cores 2 data-validator-assembly-20220111T034941.jar --config config.yaml
22/01/11 11:50:53 WARN Utils: Your hostname, lubuntu resolves to a loopback address: 127.0.1.1; using 192.168.195.131 instead (on interface ens33)
22/01/11 11:50:53 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
22/01/11 11:50:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/01/11 11:50:59 INFO Main$: Logging configured!
22/01/11 11:51:00 INFO Main$: Data Validator
22/01/11 11:51:01 INFO ConfigParser$: Parsing `config.yaml`
22/01/11 11:51:01 INFO ConfigParser$: Attempting to load `config.yaml` from file system
Exception in thread "main" java.lang.ExceptionInInitializerError
at com.target.data_validator.validator.RowBased.<init>(RowBased.scala:11)
at com.target.data_validator.validator.NullCheck.<init>(NullCheck.scala:12)
at com.target.data_validator.validator.NullCheck$.fromJson(NullCheck.scala:37)
at com.target.data_validator.validator.JsonDecoders$$anon$7$$anonfun$decoders$2.apply(JsonDecoders.scala:16)
at com.target.data_validator.validator.JsonDecoders$$anon$7$$anonfun$decoders$2.apply(JsonDecoders.scala:16)
at com.target.data_validator.validator.JsonDecoders$$anon$7$$anonfun$2.apply(JsonDecoders.scala:32)
at com.target.data_validator.validator.JsonDecoders$$anon$7$$anonfun$2.apply(JsonDecoders.scala:32)
at scala.Option.map(Option.scala:230)
at com.target.data_validator.validator.JsonDecoders$$anon$7.com$target$data_validator$validator$JsonDecoders$$anon$$getDecoder(JsonDecoders.scala:32)
at com.target.data_validator.validator.JsonDecoders$$anon$7$$anonfun$apply$3.apply(JsonDecoders.scala:27)
at com.target.data_validator.validator.JsonDecoders$$anon$7$$anonfun$apply$3.apply(JsonDecoders.scala:27)
at cats.syntax.EitherOps$.flatMap$extension(either.scala:149)
at com.target.data_validator.validator.JsonDecoders$$anon$7.apply(JsonDecoders.scala:27)
at io.circe.SeqDecoder.apply(SeqDecoder.scala:17)
at io.circe.Decoder$class.tryDecode(Decoder.scala:36)
at io.circe.SeqDecoder.tryDecode(SeqDecoder.scala:6)
at com.target.data_validator.ConfigParser$anon$importedDecoder$macro$15$1$$anon$6.apply(ConfigParser.scala:21)
at io.circe.generic.decoding.DerivedDecoder$$anon$1.apply(DerivedDecoder.scala:13)
at io.circe.Decoder$$anon$28.apply(Decoder.scala:178)
at io.circe.Decoder$$anon$28.apply(Decoder.scala:178)
at io.circe.SeqDecoder.apply(SeqDecoder.scala:17)
at io.circe.Decoder$class.tryDecode(Decoder.scala:36)
at io.circe.SeqDecoder.tryDecode(SeqDecoder.scala:6)
at com.target.data_validator.ConfigParser$anon$importedDecoder$macro$81$1$$anon$10.apply(ConfigParser.scala:28)
at io.circe.generic.decoding.DerivedDecoder$$anon$1.apply(DerivedDecoder.scala:13)
at io.circe.Json.as(Json.scala:106)
at com.target.data_validator.ConfigParser$.configFromJson(ConfigParser.scala:28)
at com.target.data_validator.ConfigParser$$anonfun$parse$1.apply(ConfigParser.scala:65)
at com.target.data_validator.ConfigParser$$anonfun$parse$1.apply(ConfigParser.scala:65)
at cats.syntax.EitherOps$.flatMap$extension(either.scala:149)
at com.target.data_validator.ConfigParser$.parse(ConfigParser.scala:65)
at com.target.data_validator.ConfigParser$.parseFile(ConfigParser.scala:60)
at com.target.data_validator.Main$.loadConfigRun(Main.scala:23)
at com.target.data_validator.Main$.main(Main.scala:171)
at com.target.data_validator.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: requirement failed: Literal must have a corresponding value to bigint, but class Integer found.
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.sql.catalyst.expressions.Literal$.validateLiteralValue(literals.scala:219)
at org.apache.spark.sql.catalyst.expressions.Literal.<init>(literals.scala:296)
at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:144)
at com.target.data_validator.validator.ValidatorBase$.<init>(ValidatorBase.scala:139)
at com.target.data_validator.validator.ValidatorBase$.<clinit>(ValidatorBase.scala)
... 47 more
Ran a spark-submit job as follows:
spark-submit --num-executors 10 --executor-cores 2 data-validator-assembly-20220111T034941.jar --config config.yaml
The config.yaml file has the following content:
numKeyCols: 2
numErrorsToReport: 742
tables:
- parquetFile: /home/jyoti/Spark/userdata1.parquet
checks:
- type: nullCheck
column: salary
I got the userdata1.parquet from the following github link:
https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet
Environment Details:
latest source code: data-validator-0.13.0
Lubuntu 18.04 LTS x64 version on VMWare Player
4 CPU cores and 2GB ram
Java version
yoti@lubuntu:~$ java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)
lsb_release output:
jyoti@lubuntu:~$ lsb_release -a 2>/dev/null
Distributor ID: Ubuntu
Description: Ubuntu 18.04 LTS
Release: 18.04
Codename: bionic
uname -s:
jyoti@lubuntu:~$ uname -s
Linux
sbt -version:
root@lubuntu:/home/jyoti/Spark# sbt -version
downloading sbt launcher 1.6.1
[info] [launcher] getting org.scala-sbt sbt 1.6.1 (this may take some time)...
[info] [launcher] getting Scala 2.12.15 (for sbt)...
sbt version in this project: 1.6.1
sbt script version: 1.6.1
Please let me know if you need anything else.
Metadata
Metadata
Assignees
Labels
No labels