Hi guys just wanted to give some more feedback about my favourite spark package!
I encounter an error reading a fits file with an "exotic header". I assume the issue is due to the columns which contain data arrays. I would expect spark-fits to load multivalued columns as vectors but I think it might be causing bigger problems as I cannot view any columns.
For example when I read the data:
path = 'photoObj-001000-1-0027.fits'
df = sqlc.read.format("fits").option("hdu", 1).load(path)
The following error is thrown when calling:
df.select('OBJID').show()
The header is shown here example.txt and the file itself is zipped here photoObj-001000-1-0027.fits.zip
Before the error the schema is inferred:

Despite this the multivalued columns (e.g. code 5E with shape 5, such as 'MODELMAG') are teated as floats. I would expect them to be them to be treated as vectors. Is this possible?


The error itself occurs after selecting any column (even if it is a regular non-multivalue column) and then applying the .take(n) or .show(n) method:
com.github.astrolabsoftware#spark-fits_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-3714d087-ba08-4b46-bb49-9693f86131bb;1.0
confs: [default]
found com.github.astrolabsoftware#spark-fits_2.11;0.7.3 in central
:: resolution report :: resolve 183ms :: artifacts dl 3ms
:: modules in use:
com.github.astrolabsoftware#spark-fits_2.11;0.7.3 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 0 | 0 | 0 || 1 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-3714d087-ba08-4b46-bb49-9693f86131bb
confs: [default]
0 artifacts copied, 1 already retrieved (0kB/5ms)
2019-05-06 19:07:42 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2019-05-06 19:08:28 WARN Utils:66 - Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
[Stage 0:> (0 + 1) / 1]2019-05-06 19:08:30 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ArithmeticException: / by zero
at com.astrolabsoftware.sparkfits.FitsRecordReader.nextKeyValue(FitsRecordReader.scala:318)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:230)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-05-06 19:08:30 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.ArithmeticException: / by zero
at com.astrolabsoftware.sparkfits.FitsRecordReader.nextKeyValue(FitsRecordReader.scala:318)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:230)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Please let me know if you require any additional information or have any questions,
Cheers,
Jacob
Hi guys just wanted to give some more feedback about my favourite spark package!
I encounter an error reading a fits file with an "exotic header". I assume the issue is due to the columns which contain data arrays. I would expect spark-fits to load multivalued columns as vectors but I think it might be causing bigger problems as I cannot view any columns.
For example when I read the data:
The following error is thrown when calling:
The header is shown here example.txt and the file itself is zipped here photoObj-001000-1-0027.fits.zip
Before the error the schema is inferred:
Despite this the multivalued columns (e.g. code 5E with shape 5, such as 'MODELMAG') are teated as floats. I would expect them to be them to be treated as vectors. Is this possible?
The error itself occurs after selecting any column (even if it is a regular non-multivalue column) and then applying the .take(n) or .show(n) method:
Please let me know if you require any additional information or have any questions,
Cheers,
Jacob