-
Notifications
You must be signed in to change notification settings - Fork 6
Description
When using char-dist-features + header features for the domain "dbpedia", we get many features (400+). The training of RandomForestClassifier with Spark fails with the error:
Cause: org.codehaus.janino.JaninoRuntimeException: Code of method "compare(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB
Apparently, there's a bug in Spark, but it's not clear if there is an easy fix for this problem:
https://issues.apache.org/jira/browse/SPARK-16845
http://stackoverflow.com/questions/40044779/find-mean-and-corr-of-10-000-columns-in-pyspark-dataframe
https://issues.apache.org/jira/browse/SPARK-17092
SparkTestSpec reproduces this error currently.