Have been trying to set it up for hours now. Nothing works.
- Latest version does not seem to have winutils support, and using it causes errors when using some important methods. (EDIT: this is likely wrong, and the winutils stuff that I have should probably be fine.)
- Older versions require to be built with Maven. However, that just gives me a
PluginExecutionException
.
I need to do this ASAP, preferably within the next 3 hours.
I have nowhere else to ask for help, it seems, especially considering that suspended an account I set up specifically for asking questions after I edited a relevant post.
Highly doubt that anybody will be able to help me.
EDIT2: the issue has, thankfully, been resolved. I was using Python 3.12, and switched to 3.11.8. That made the problem go away.
I have never used PySpark, but I do know some about python.
How are you installing pyspark? Are there any errors?
pip install pyspark
and installing the latest version of Apache Spark leads to errors when callingpyspark.sql.DataFrame.show()
methods ofDataFrame
objects.pip install pyspark
and installing an older version of Apache Spark, i.e. having a version mismatch between PySpark and Apache Spark, leads to errors even when instantiating aSparkSession
.pip install pyspark==3.3.4
previously led to an error - the system was unable to build wheels for the package. Now, it seems to install that way, but behaves the same as in the previous case../build/mvn
using Bash from the appropriate directory led toCaused by: org.apache.maven.plugin.PluginExecutionException: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plu gin:4.4.0:compile failed.
Running this code after having installed this stuff as in case 3:
leads to this:
Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Traceback (most recent call last): File "[python file path]", line 6, in <module> spark = SparkSession.builder.getOrCreate() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[python file path]", line 269, in getOrCreate sc = SparkContext.getOrCreate(sparkConf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[python file path]", line 483, in getOrCreate SparkContext(conf=conf or SparkConf()) File "[python file path]", line 197, in __init__ self._do_init( File "[python file path]", line 282, in _do_init self._jsc = jsc or self._initialize_context(self._conf._jconf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[python file path]", line 402, in _initialize_context return self._jvm.JavaSparkContext(jconf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[python file path]", line 1585, in __call__ return_value = get_return_value( ^^^^^^^^^^^^^^^^^ File "[python file path]", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.ExceptionInInitializerError at org.apache.spark.unsafe.array.ByteArrayMethods.<clinit>(ByteArrayMethods.java:56) at org.apache.spark.memory.MemoryManager.defaultPageSizeBytes$lzycompute(MemoryManager.scala:264) at org.apache.spark.memory.MemoryManager.defaultPageSizeBytes(MemoryManager.scala:254) at org.apache.spark.memory.MemoryManager.$anonfun$pageSizeBytes$1(MemoryManager.scala:273) at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.memory.MemoryManager.<init>(MemoryManager.scala:273) at org.apache.spark.memory.UnifiedMemoryManager.<init>(UnifiedMemoryManager.scala:58) at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:207) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279) at org.apache.spark.SparkContext.<init>(SparkContext.scala:464) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.base/java.lang.Thread.run(Thread.java:1570) Caused by: java.lang.IllegalStateException: java.lang.NoSuchMethodException: java.nio.DirectByteBuffer.<init>(long,int) at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:113) ... 25 more Caused by: java.lang.NoSuchMethodException: java.nio.DirectByteBuffer.<init>(long,int) at java.base/java.lang.Class.getConstructor0(Class.java:3784) at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2955) at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:71) ... 25 more SUCCESS: The process with PID 21224 (child process of PID 9020) has been terminated. SUCCESS: The process with PID 9020 (child process of PID 15684) has been terminated. SUCCESS: The process with PID 15684 (child process of PID 4980) has been terminated. Process finished with exit code 1
System environmental variables
JAVA_HOME
,HADOOP_HOME
,SPARK_HOME
are configured. The relevant binary directories are included in thePath
system environmental variable.PYTHON_SPARK
is set topython
.EDIT: Great, and now Maven can't even attempt to build the package and throws the error
Error occurred during initialization of VM Could not reserve enough space for 2097152KB object heap
Just great.
Just in case, if I install the library the first way, for the same piece of code the logs start with this:
One stackoverflow thread mentions running python 3.12. Are you running 3.12? Does it help if you use python 3.11?
Else, an interesting thing might be the "Caused by: java.io.EOFException", an End Of File Exception.
That actually worked. Thank you.
I have got to say that I have a special hatred for arcane programming errors like in this case.
I am running 3.12. I have not tried running 3.11. Highly doubt that that will change anything, but I guess I'll try it when I'm able to.
I'm not sure what I have to glean from the EOF exception.