Summary
Logging is turned off by default. In Java, simply call GatewayServer.turnLoggingOn() or GatewayServer.turnLoggingOff(). Py4J-java uses the java.util.logging framework. To get fined-grained control over the logging behavior, just obtain a Logger instance by calling Logger.getLogger("py4j"). You can also look at the Java Logging Overview for more information on this framework.
For example, in Java, you can do:
GatewayServer.turnLoggingOn();
logger = Logger.getLogger("py4j");
logger.setLevel(Level.ALL);
In Python, logging can be enabled this way:
logger = logging.getLogger("py4j")
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())
Hint: you can enable/disable Java logging from Python too:
gateway.jvm.py4j.GatewayServer.turnLoggingOn()
Use the jvm member of a gateway followed by the class’s fully qualified name. See JVM Views to learn how to import packages and avoid typing the fully qualified name of classes:
>>> gateway = JavaGateway()
>>> java_list = gateway.jvm.java.util.ArrayList()
Use the jvm member of a gateway followed by the fully qualified name of the method’s class. Do the same for static fields.
>>> gateway = JavaGateway()
>>> timestamp = gateway.jvm.System.currentTimeMillis() # equivalent to jvm.java.lang.System...
Use the get_field function:
>>> field_value = py4j.java_gateway.get_field(object,'public_field')
Or you can also set the auto_field parameter to True when you create the gateway:
>>> gateway = JavaGateway(auto_field=True)
>>> object = gateway.entry_point.getObject()
>>> field_value = object.public_field
As in Java, you can always access any class using its fully qualified name, but you can also import the fully qualified name to only refer to the simple name later on:
>>> from py4j.java_gateway import JavaGateway
>>> from py4j.java_gateway import java_import
>>> gateway = JavaGateway()
>>> jList1 = gateway.jvm.java.util.ArrayList()
>>> java_import(gateway.jvm,'java.util.*')
>>> jList2 = gateway.jvm.ArrayList()
>>> jMap = gateway.jvm.HashMap()
>>> gateway.jvm.java.lang.String("a")
u'a'
>>> gateway.jvm.String("a")
u'a'
Read how to use jvm views to make sure that an import statement only affects the current Python module.
Use the new_array function:
>>> gateway = JavaGateway()
>>> string_class = gateway.jvm.String
>>> string_array = gateway.new_array(string_class, 3, 5)
>>> string_array[2][4] = 'Hello World'
>>> string_array[2][4]
u'Hello World'
>>> string_array[2][3] is None
True
>>> string_array[3][1]
Traceback (most recent call last):
...
IndexError: list index out of range
Py4J by default uses the TCP port 25333 to communicate from Python to Java and TCP port 25334 to communicate from Java to Python. It also uses TCP port 25332 for a test echo server (only used by unit tests).
These ports can be customized when creating a JavaGateway on the Python side and a GatewayServer on the Java side.
The Java component of Py4J is thread-safe, but multiple threads could access the same entry point. Each gateway connection is executed in is own thread (e.g., each time a Python thread calls a Java method) so if multiple Python threads/processes/programs are connected to the same gateway, i.e., the same address and the same port, multiple threads may call the entry point’s methods concurrently.
In the following example, two threads are accessing the same entry point. If gateway1 and gateway2 were created in separate processes, method1 would be accessed concurrently.
# ... in Thread One
gateway1 = JavaGateway() # Thread One is accessing the JVM.
gateway1.entry_point.method1() # Thread One is calling method1
# ... in Thread Two
gateway2 = JavaGateway() # Thread Two is accessing the JVM.
gateway2.entry_point.method1() # Thread Two is calling method1
The Python component of Py4J is also thread-safe, except the close function of a CommChannelFactory, which must not be accessed concurrently with other methods to ensure that all communication channels are closed. This is a trade-off to avoid accessing a lock every time a Java method is called on the Python side. This will only be a problem if attempting to shut down or close a JavaGateway while calling Java methods on the Python side.
See Py4J Threading Model for more details.
Because each Eclipse plug-in has its own class loader, a GatewayServer instance started in one plug-in won’t have access to the other plug-ins by default. You can work around this limitation by adding this line to the manifest of the plug-in where the GatewayServer resides:
Eclipse-BuddyPolicy:global
You can also use the Py4J Eclipse features that starts a default GatewayServer and that allows Python clients to refer to any classes declared in any plug-in.
See Py4J and Eclipse for more details.
Yes, thanks to a generous contributor, Py4J now works with Python 3.
Running a Py4J gateway on a JVM exposes the JVM over the network, which is a major security concern.
By default, Py4J only listens to the IPv4 localhost (127.0.0.1), so if you trust all users having access to the localhost, the security risks are minimal because external programs and users do not have access to the localhost by default on most systems.
If you use Py4J to make a JVM available over the network, you are responsible for ensuring that only trusted connections can communicate with the JVM. This is usually achieved with a proper firewall configuration.
You can view Py4J as a dangerous equivalent of redis or memcached server: no protection by default with access to system commands and the filesystem. Still, redis and memcached are used by lots of organizations, it is just that they are usually not open outside of a private and trusted network.
Please report bugs on our issue tracker.
There are many ways to contribute to Py4J:
In case of doubt, do not hesitate to contact the founder of the project, Barthelemy.