Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,8 @@ object JavaTypeInference {
c == classOf[java.lang.Double] ||
c == classOf[java.lang.Float] ||
c == classOf[java.lang.Byte] ||
c == classOf[java.lang.Boolean] =>
c == classOf[java.lang.Boolean] ||
c == classOf[java.lang.String] =>
StaticInvoke(
c,
ObjectType(c),
Expand All @@ -235,9 +236,6 @@ object JavaTypeInference {
path :: Nil,
returnNullable = false)

case c if c == classOf[java.lang.String] =>
Invoke(path, "toString", ObjectType(classOf[String]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ScalaReflection does the same thing, do we have a problem there too?

AFAIK the path should be a string type column, and it's always safe to call UTF8String.toString. My gut feeling is, we miss to add Upcast somewhere in JavaTypeInference.

Copy link
Contributor Author

@HeartSaVioR HeartSaVioR Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the path should be a string type column

The sample code in JIRA issue tried to bind IntegerType column to String field in Java bean, which looks to break your expectation. (I guess ScalaReflection would not encounter this case.)

Spark doesn't throw error for this case though - actually Spark would show undefined behaviors, compilation failures on generated code, even might be possible to throw runtime exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample code tried to bind IntegerType column to String field in Java bean

In scala, we can also do this and Spark will add Upcast. e.g. spark.range(1).as[String].collect works fine.

I did a quick search and JavaTypeInference has no Upcast. We should fix it and follow ScalaReflection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. I'll check and address it. Maybe it would be a separate PR if it doesn't fix the new test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah your suggestion seems to work nicely! I left comment to ask which approach to choose: please compare both approach and comment. Thanks!


case c if c == classOf[java.math.BigDecimal] =>
Invoke(path, "toJavaBigDecimal", ObjectType(classOf[java.math.BigDecimal]))

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,37 @@ public void testBeanWithMapFieldsDeserialization() {
Assert.assertEquals(records, MAP_RECORDS);
}

private static final List<RecordSpark22000> RECORDS_SPARK_22000 = new ArrayList<>();

static {
RECORDS_SPARK_22000.add(new RecordSpark22000("1", "[email protected]", 2, 11));
RECORDS_SPARK_22000.add(new RecordSpark22000("2", "[email protected]", 3, 12));
RECORDS_SPARK_22000.add(new RecordSpark22000("3", "[email protected]", 4, 13));
RECORDS_SPARK_22000.add(new RecordSpark22000("4", "[email protected]", 5, 14));
}

@Test
public void testSpark22000() {
// Here we try to convert the type of 'ref' field, from integer to string.
// Before applying SPARK-22000, Spark called toString() against variable which type might be primitive.
// SPARK-22000 it calls String.valueOf() which finally calls toString() but handles boxing
// if the type is primitive.
Encoder<RecordSpark22000> encoder = Encoders.bean(RecordSpark22000.class);

Dataset<RecordSpark22000> dataset = spark
.read()
.format("csv")
.option("header", "true")
.option("mode", "DROPMALFORMED")
.schema("ref int, userId string, x int, y int")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for more types other than int?

.load("src/test/resources/test-data/spark-22000.csv")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to read test data from a file instead of spark.createDataFrame(...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not. Let me try to change not to use file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

.as(encoder);

List<RecordSpark22000> records = dataset.collectAsList();

Assert.assertEquals(records, RECORDS_SPARK_22000);
}

public static class ArrayRecord {

private int id;
Expand Down Expand Up @@ -252,4 +283,73 @@ public String toString() {
return String.format("[%d,%d]", startTime, endTime);
}
}

public static class RecordSpark22000 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final just to be tidy?

private String ref;
private String userId;
private int x;
private int y;

public RecordSpark22000() { }

RecordSpark22000(String ref, String userId, int x, int y) {
this.ref = ref;
this.userId = userId;
this.x = x;
this.y = y;
}

public String getRef() {
return ref;
}

public void setRef(String ref) {
this.ref = ref;
}

public String getUserId() {
return userId;
}

public void setUserId(String userId) {
this.userId = userId;
}

public int getX() {
return x;
}

public void setX(int x) {
this.x = x;
}

public int getY() {
return y;
}

public void setY(int y) {
this.y = y;
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
RecordSpark22000 that = (RecordSpark22000) o;
return x == that.x &&
y == that.y &&
Objects.equals(ref, that.ref) &&
Objects.equals(userId, that.userId);
}

@Override
public int hashCode() {
return Objects.hash(ref, userId, x, y);
}

@Override
public String toString() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need toString? I understand hashCode and equals

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will help to compare expected and actual when test fails. Otherwise they would've seen as Object.toString() does and it doesn't provide any information why they are not equal.

return String.format("ref='%s', userId='%s', x=%d, y=%d", ref, userId, x, y);
}
}
}
5 changes: 5 additions & 0 deletions sql/core/src/test/resources/test-data/spark-22000.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
ref,userId,x,y
1,[email protected],2,11
2,[email protected],3,12
3,[email protected],4,13
4,[email protected],5,14