Warning

Before calling any PyLucene API that requires the Java VM, start it bycallinginitVM(classpath, ...). More about this functioninhere.

访问PyLuncene API之前,需要Java虚拟机,通过调用initVM(classpath,...)开始,更多的信息参考这里

Installing JCC

JCC is a Python extension written in Python and C++. It requires aJava Runtime Environment (JRE) to operate as it uses Java'sreflection APIs to do its work. It is built and installedviadistutils orsetuptools.

See installation for moreinformation and operating system specific notes.

Invoking JCC

JCC is installed as a package and how to invoke it depends on thePython version used:

  • python 2.7: python -m jcc
  • python 2.6: python -m jcc.__main__
  • python 2.5: python -m jcc
  • python 2.4:
    • no setuptools: python site-packages/jcc/__init__.py
    • with setuptools: python site-packages/jcc egg directory/jcc/__init__.py
  • python 2.3: python site-packages/jcc egg directory/jcc/__init__.py

Generating C++ and Python wrappers with JCC

使用JCC生成 C++和Python的包装器

JCC started as a C++ code generator for hiding the gory details ofaccessing methods and fields on Java classes viaJava'sNative Invocation Interface.These C++ wrappers make it possible to access a Java object as if itwas a regular C++ object very much like GCJ'sCNIinterface

JCC作为C++代码生成器,隐藏了通过JNI访问java方法和属性的大量细节,生成的包装类实现了像访问正常C++类一样地访问java对象,很像CNI。

It then became apparent that JCC could also generate the C++wrappers for making these classes available to Python. Every classthat gets thus wrapped becomes aCPythontype.

很明显,JCC生成的C++包装类也可以被Python访问。每个包装的类成为了一个CPython类型。

JCC generates wrappers for all public classes that are requested byname on the command line or via the--jar command lineargument. It generates wrapper methods for all public methods andfields on these classes whose return type and parameter types arefound in one of the following ways:

JCC生成所有公共类的包装,这些类在命令行下通过--jar参数指定。类中的公共方法和属性的类型和参数类型通过下面方法确定:

  • the type is one of the requested classes

  • 类型是需要包装的类本身
  • the type is one of the requested classes' superclass or implementedinterfaces

  • 类型是包装类的父类或者是实现的接口
  • the type is available from one of the packages listed via the--package command line argument

  • 这些类型在--package 命令行指定包内存在。

Overloaded methods are supported and are selected at runtime on thebasis of the type and number of arguments passed in.

重载方法在运行时根据传入参数的类型和个数决定

JCC does not generate wrappers for methods or fields which don'tsatisfy these requirements. Thus, JCC can avoid generating code forrunaway transitive closures of type dependencies.

JCC不会生成不满足以上条件的方法或属性的包装类。因此JCC可以避免生成错误的类型依赖闭包的代码。

JCC generates property accessors for a propertycalled field when it finds Java methodsnamedsetField(value),getField() orisField().

JCC如果发现java方法名是 setField(value), getField() 或者isField()的方法,会生成该成员的访问器,

The C++ wrappers are declared in a C++ namespace structure thatmirrors the Java classes' Java packages. The Python types aredeclared in a flat namespace at the top level of the resultingPython extension module.

C++包装类在c++的命名空间内声明,与Java类的java包对应。Python类型在顶层的python扩展模块中声明。

JCC's command-line arguments are best illustrated via the PyLuceneexample:

通过PyLucene的例子可以很好的演示jcc的命令行参数


$ python -m jcc           # run JCC to wrap
    --jar lucene.jar      # all public classes in the lucene jar file
    --jar analyzers.jar   # and the lucene analyzers contrib package
    --jar snowball.jar    # and the snowball contrib package
    --jar highlighter.jar # and the highlighter contrib package
    --jar regex.jar       # and the regex search contrib package
    --jar queries.jar     # and the queries contrib package
    --jar extensions.jar  # and the Python extensions package
    --package java.lang   # including all dependencies found in the 
                          # java.lang package
    --package java.util   # and the java.util package
    --package java.io     # and the java.io package
      java.lang.System    # and to explicitely wrap java.lang.System
      java.lang.Runtime   # as well as java.lang.Runtime
      java.lang.Boolean   # and java.lang.Boolean
      java.lang.Byte      # and java.lang.Byte
      java.lang.Character # and java.lang.Character
      java.lang.Integer   # and java.lang.Integer
      java.lang.Short     # and java.lang.Short
      java.lang.Long      # and java.lang.Long
      java.lang.Double    # and java.lang.Double
      java.lang.Float     # and java.lang.Float
      java.text.SimpleDateFormat
                          # and java.text.SimpleDateFormat
      java.io.StringReader
                          # and java.io.StringReader
      java.io.InputStreamReader
                          # and java.io.InputStreamReader
      java.io.FileInputStream
                          # and java.io.FileInputStream
      java.util.Arrays    # and java.util.Arrays
    --exclude org.apache.lucene.queryParser.Token
                          # while explicitely not wrapping
                          # org.apache.lucene.queryParser.Token
    --exclude org.apache.lucene.queryParser.TokenMgrError
                          # nor org.apache.lucene.queryParser.TokenMgrError
    --exclude org.apache.lucene.queryParser.ParseException
                          # nor.apache.lucene.queryParser.ParseException
    --python lucene       # generating Python wrappers into a module
                          # called lucene
    --version 2.4.0       # giving the Python extension egg version 2.4.0
    --mapping org.apache.lucene.document.Document 
              'get:(Ljava/lang/String;)Ljava/lang/String;' 
                          # asking for a Python mapping protocol wrapper
                          # for get access on the Document class by
                          # calling its get method
    --mapping java.util.Properties 
              'getProperty:(Ljava/lang/String;)Ljava/lang/String;'
                          # asking for a Python mapping protocol wrapper
                          # for get access on the Properties class by
                          # calling its getProperty method
    --sequence org.apache.lucene.search.Hits
               'length:()I' 
               'doc:(I)Lorg/apache/lucene/document/Document;'
                          # asking for a Python sequence protocol wrapper
                          # for length and get access on the Hits class by
                          # calling its length and doc methods
    --files 2             # generating all C++ classes into about 2 .cpp
                          # files
    --build               # and finally compiling the generated C++ code
                          # into a Python egg via setuptools - when
                          # installed - or a regular Python extension via
                          # distutils or setuptools otherwise 
    --module collections.py
                          # copying the collections.py module into the egg
    --install             # installing it into Python's site-packages
                          # directory.

There are limits to both how many files can fit on the command lineand how large a C++ file the C++ compiler can handle. By default,JCC generates one large C++ file containing the source code for allwrapper classes.

jcc的限制取决于命令行可以输入多少个文件,或者C++的编译器可以处理多大的文件,默认情况下JCC生成一个大的C++文件,包括所有包装类的代码。

Using the --files command line argument, this behaviourcan be tuned to workaround various limits:

通过--files参数可以解决这些限制,

for example:

  • to break up the large wrapper class file into about 2 files:
    --files 2

  • to break up the large wrapper class file into about 10 files:
    --files 10

  • to generate one C++ file per Java class wrapped:
    --files separate

The --prefix and --root arguments arepassed through to distutils'setup().

--prefix 和 --root参数是通过 distulis‘setup()传递的

Classpath considerations

When generating wrappers for Python, the JAR files passed to JCCvia --jar are copied into the resulting Python extensionegg as resources and added to the extensionmodule'sCLASSPATH variable. Classes or JAR files thatare required by the classes contained in the argument JAR files needto be made findable via JCC's--classpath command lineargument. At runtime, these need to be appended to theextension'sCLASSPATH variable before starting the VMwithinitVM(CLASSPATH).

当生成Python包装类时,通过 --jar参数传递给JCC的JAR文件被以资源的形式拷贝到生成的Python扩展中,同时追加到扩展模块的CLASSPATH变量中。--jar命令指定的类或者jar文件以来的类或者jar文件通过 JCC的 --classpath命令行指定。运行的时候,在调用initVM(CLASSPATH)之前,这些类和jar包也需要追加到扩展的CLASSPATH变量中。

To have such required jar files also automatically copied intoresulting Python extension egg and added to the classpath at buildand runtime, use the--include option. This optionworks like the--jar option except that no wrappers aregenerated for the classes contained in them unless they'reexplicitely named on the command line.

使用 --include选型,可以在编译和运行时将这些必须的文件自动的拷贝到Python扩展 egg中,同时加入到classpath中。这个选项的工作方式很像--jar,但是他不会生成包含在其中的所有类,除非在命令行中显示地指定他们的名称

When more than one JCC-built extension module is going to be used inthe same Python VM and these extension modules share Java classes,only one extension module should be generated with wrappers for theseshared classes. The other extension modules must be built by importingthe one with the shared classes by using the --importcommand line parameter. This ensures that only one copy of thewrappers for the shared classes are generated and that they arecompatible among all extension modules sharing them.

当多个JCC生成的扩展模块在同一个Python VM中使用时,这些扩展模块共享java类,仅仅需要对一个扩展模块生成这些共享类的包装代码,其他扩展模块必须通过 --import命令行,指定导入具有共享类的模块来编译。这确保了仅生成共享类的一个包装,同时他们与所有共享他们的扩展模块都兼容。

Using distutils vs setuptools

By default, when building a Python extension,if setuptools is found to be installed, it is usedoverdistutils. If you want to force the useofdistutils oversetuptools, usethe--use-distutils command line argument.

默认使用setuptools编译Python扩展

Distributing an egg

The --bdist option can be used to ask JCC toinvoke distutils withbdistorsetuptoolswithbdist_egg. Ifsetuptools is used,the resulting egg has to be installed with theeasy_installinstaller which is normally part of a Python installation thatincludessetuptools.

JCC's runtime API functions

JCC includes a small runtime component that is compiled into anyPython extension it produces.

JCC包含一个小的运行时模块,该模块被编译到他生成的Python扩展中

This runtime component makes it possible to manage the Java VM fromPython. Because a Java VM can be configured with a myriad ofoptions, it is not automatically started when the resulting Pythonextension module is loaded into the Python interpreter.

云心事模块允许从Python管理Java VM。因为java VM可以通过无数的选项配置,因此他不会在加载python扩展时自动运行。

Instead, the initVM() function must be called from themain thread before using any of the wrapped classes. It takes thefollowing keyword arguments:

在使用任何包装类时,必须首在主线程中先调用initVM函数。该函数的关键参数如下:

  • classpath
    A string containing one or more directories or jar files for theJava VM to search for classes. Every Python extension produced byJCC exports aCLASSPATH variable that is hardcoded tothe jar files that it was produced from. A copy of each jar fileis installed as a resource file with the extension when JCC isinvoked with the--install command line argument. This parameter is optional and defaults to theCLASSPATH string exported by the moduleinitVM is imported from.

  • 包含一个或多个路径或者jar文件,用于JavaVM寻找类。JCC生成的每个Python扩展都导出一个CLASSPATH变量,包括了jcc处理过的jar文件的硬编码路径。jcc通过 --install命令安装时,每个jar文件的拷贝都作为资源 与扩展一起安装。该参数是可选的,默认是initVM导入的模块中导出的CLASSPATH字符串。

  • 
        >>> import lucene
        >>> lucene.initVM(classpath=lucene.CLASSPATH)
    
  • initialheap
    The initial amount of Java heap to start the Java VM with. Thisargument is a string that follows the same syntax as thesimilar-Xms java command line argument.

  • 初始化javaVM堆的大小。该参数与java的 -Xms命令参数有相似的语法。

    
        >>> import lucene
        >>> lucene.initVM(initialheap='32m')
        >>> lucene.Runtime.getRuntime().totalMemory()
        33357824L
    
  • maxheap
    The maximum amount of Java heap that could become available to theJava VM. This argument is a string that follows the same syntax asthe similar-Xmx java command line argument.

  • Java VM可以使用的堆的最大值。该参数与java的 -Xms命令参数有相似的语法。
  • maxstack
    The maximum amount of stack space that available to the JavaVM. This argument is a string that follows the same syntax as thesimilar-Xss java command line argument.

  • JavaVM可以使用的最大的栈空间。与java的 -Xss命令参数有相似的语法。
  • vmargs
    A string of comma separated additional options to pass to the VMstartup rountine. These are passed through as-is. For example:

  • 一个逗号分隔的字符串,作为VM启动的附加参数

    
        >>> import lucene
        >>> lucene.initVM(vmargs='-Xcheck:jni,-verbose:jni,-verbose:gc')
    

The initVM() and getVMEnv() functionsreturn a JCCEnv object that has a few utility methods on it:

initVM()和getVMEnv()函数返回一个JCCEnv对象,包含以下几个有用的方法。

  • attachCurrentThread(name, asDaemon)
    Before a thread created in Python or elsewhere but not in the JavaVM can be used with the Java VM, this method needs to beinvoked. The two arguments it takes are optional andself-explanatory.

  • 一个线程在python或者其他非javaVM环境中创建,在该线程使用javaVM之前,该方法需要被调用,两个可选参数是自解释的。
  • detachCurrentThread()The opposite of attachCurrentThread(). This methodshould be used with extreme caution as Python's and java VM'sgarbage collectors may use a thread detached too early causing asystem crash. The utility of this method seems dubious at themoment.

  • 该函数是上一个函数的反向操作。由于python和java的垃圾回收器可能使一个线程太早的退出,导致系统崩溃,因此使用该函数必须极度小心,这个功能目前看起来有些可疑。

There are several differences between JNI's findClass()and Java's Class.forName():

一下是JNI的findClass和java的Class.forName()的区别

  • className is a '/' separated string of names

  • className是一个’/‘分隔名称的字符串
  • the class loaders are different, findClass() may findclasses that Class.forName() won't.

  • 类加载器不同,findClass可能找到 Class.forName找不到的类。

For example:


    >>> from lucene import *
    >>> initVM(CLASSPATH)
    >>> findClass('org/apache/lucene/document/Document')
    <Class: class org.apache.lucene.document.Document>
    >>> Class.forName('org.apache.lucene.document.Document')
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    lucene.JavaError: java.lang.ClassNotFoundException: org/apache/lucene/document/Document
    >>> Class.forName('java.lang.Object')
    <Class: class java.lang.Object>

Type casting and instance checks

Many Java APIs are declared to return types that are less specific than the types actually returned. In Java 1.5, this is worked around with type parameters. JCC generates code to heed type parameters unless the--no-generics is used. See next section fordetails on Java generics support.

很多Java API返回的类型与实际返回的类型不是很符合。在Java1.5中,这点是通过类型参数解决,JCC生成的代码严格依据参数类型,除非 使用 --no-generics参数。下一节将介绍Java泛型的支持

In C++, casting the object into its actual type is supported via theregular C casting operator.

C++中,对象类型的转换时通过标准的C转换操作符

In Python each wrapped class has a class methodcalled cast_ that implements the same functionality.

Python中,每个包装类都包含一个方法——cast_,执行相同的功能。

Similarly, each wrapped class has a class methodcalled instance_ that tests whether the wrapped javainstance is of the given type. For example:

每个包装类包含一个类方法——instance_,该方法检测包装的java类是不是指定的类型。


    if BooleanQuery.instance_(query):
        booleanQuery = BooleanQuery.cast_(query)

    print booleanQuery.getClauses()

Handling generic classes

Java 1.5 added support for parameterized types. JCC generates codeto heed type parameters unless the--no-genericscommand line parameter is used. Java type parameterization is aruntime feature. The same class is used for all itsparameterizations. Similarly, JCC wrapper objects all use the sameclass but store type parameterizations on instances and make themaccessible as a tuple via theparameters_ property.

java1.5增加了对泛型的支持。jcc生成的代码严格按照那个参数类型,除非指定了 --no-generics参数。Java的泛型是一个运行时的特性,同样的类应用于所有他参数化以后的具体类。同样jcc包装对象使用相同的类,但是在instances中保存它们,并通过元组参数_property访问。

For example, an ArrayList<Document> instance,has (<type 'Document'>,)forparameters_ and itsget() method uses that type parameter to wrap its return values.

例如: 数组列表ArrayList<Document>,包含一个元组(<type 'Document'>,)作为参数parameters_,同时它的 get()方法使用该类型参数来包装它的返回值。

To allocate an instance of a generic Java class with specific typeparameters use theof_() method. This method accepts one or more Python wrapper classes to use as type parameters. Forexample,java.util.ArrayList<E> is declared to accept one type parameter. Its wrapper'sof_() methodhence accepts one parameter, a Python class, to use as type parameter for the return type of its get() method, among others:

通过使用 of_()方法来分配一个制定参数的泛型java类。该方法接收一个或多个作为类型参数的python包装类,例如: java.util.ArrayList<E>,声明使用一个类型参数。它的包装类的of_()方法接收一个参数(即一个python类)使用类型参数最为 get()方法除了其他返回值外的返回值。


    >>> a = ArrayList().of_(Document)
    >>> a
    <ArrayList: []>
    >>> a.parameters_
    (<type 'Document'>,)
    >>> a.add(Document())
    True
    >>> a.get(0)
    <Document: Document<>>

The use of type parameters is, of course, optional. A generic Javaclass can still be used as before, without type parameters.Downcasting from Object is then necessary:

使用类型参数是可选的,一个泛型java类仍然可以像之前一样不带类型参数而使用。那么向Object的类型转换时必须的:


    >>> a = ArrayList()
    >>> a
    <ArrayList: []>
    >>> a.parameters_
    (None,)
    >>> a.add(Document())
    True
    >>> a.get(0)
    <Object: Document<>>
    >>> Document.cast_(a.get(0))
    <Document: Document<>>

Handling arrays

Java arrays are wrapped with a C++ JArraytemplate. The [] is available for readaccess. This template,JArray<T>, accomodates all java primitive types,jstring,jobject andwrapper class arrays.

java数组采用c++的JArraytemplate包装。[]作为访问器,模板JArray<T>,容纳所有的java原生类型、 jstringjobject以及包装类的数组。

Java arrays are returned to Python in a JArray wrapper instance that implements the Python sequence protocol. It is possible to change an array's elements but not to change an array'ssize.

java数组以JArray包装类实例返回到python,该实例实现了Python的序列号协议。可以修改数组的元素,但是不能修改数组的大小。

To convert a char array to a Python string usea ''.join(array) construct.

将一个char数组转换为Python字符串使用 ''.join(array)构造方法。

Any Java method expecting an array can be called with the corresponding sequence object from python.

任意一个使用数组的java方法都可以通过使用Python中相应的序列对象调用。

To instantiate a Java array from Python, use one of the following forms:



    >>> array = JArray('int')(size)
    # the resulting Java int array is initialized with zeroes

    >>> array = JArray('int')(sequence)
    # the sequence must only contain ints
    # the resulting Java int array contains the ints in the sequence

Instead of 'int', you may also use oneof 'object', 'string','bool','byte','char','double','float','long' and'short'to create an array of the corresponding type.

除了'int' ,你也可以使用'object', 'string','bool','byte','char','double','float','long' and'short'来窗机一个响应类型的数组

Because there is only one wrapper class for object arrays,the JArray('object') type's constructor takes a second argument denoting the class of the object elements. This argument isoptional and defaults toObject.

由于只有一个对象数组的包装类,JArray('object')的构造函数有第二个参数,指明对象元素的类型。该参数是可选的,默认是 Object。

As with the Object types, the JArray types also include a cast_ method. This method becomes useful when the array returned to Python is wrapped as a plain Object. This is the case, for example, with nested arrays since there is no distinct Python type for every different java object array class - all java object arrays are wrapped byJArray('object'). For example:

由于有Object类型,JArray类型同样包含一个cast_方法。该方法在以下情况非常有用,即数组被包装为一个Object对象,并返回到Python。例如:对于嵌套数组,由于没有一个单独的Python类型对应不同的java对象数组类,所有的java队形数组都包装为JArray('object')。例如:


# cast obj to an array of ints
>>> JArray('int').cast_(obj)
# cast obj to an array of Document
>>> JArray('object').cast_(obj, Document)

In both cases, the java type of obj must be compatible with the array type it is being cast to.

以上两种情况,obj的java类型必须与它将要转换的类型兼容。


    # using nested array:

    >>> d = JArray('object')(1, Document)
    >>> d[0] = Document()
    >>> d
    JArray<object>[<Document: Document<>>]
    >>> d[0]
    <Document: Document<>>
    >>> a = JArray('object')(2)
    >>> a[0] = d
    >>> a[1] = JArray('int')([0, 1, 2])
    >>> a
    JArray<object>[<Object: [Lorg.apache.lucene.document.Document;@694f12>, <Object: [I@234265>]
    >>> a[0]
    <Object: [Lorg.apache.lucene.document.Document;@694f12>
    >>> a[1]
    <Object: [I@234265>
    >>> JArray('object').cast_(a[0])[0]
    <Object: Document<>>
    >>> JArray('object').cast_(a[0], Document)[0]
    <Document: Document<>>
    >>> JArray('int').cast_(a[1])
    JArray<int>[0, 1, 2]
    >>> JArray('int').cast_(a[1])[0]
    0

To verify that a Java object is of a given array type, use the instance_() method available on the arraytype. This is not the same as verifying that it is assignable with elements of a given type. For example, using the arrays created above:

验证java对象是否给定的数组类型,在数组类型上调用instance_()方法。这个与验证它是否可以用指定的类型赋值不同。例如,使用上文创建的数组:


    # is d array of Object ? are d's elements of type Object ?
    >>> JArray('object').instance_(d)
    True

    # can it receive Object instances ?
    >>> JArray('object').assignable_(d)
    False

    # is it array of Document ? are d's elements of type Document ?
    >>> JArray('object').instance_(d, Document)
    True

    # is it array of Class ? are d's elements of type Class ?
    >>> JArray('object').instance_(d, Class)
    False

    # can it receive Document instances ?
    >>> JArray('object').assignable_(d, Document)
    True

Exception reporting

Exceptions that occur in the Java VM and that escape to C++ are reported as a javaError C++ exception. When usingPython wrappers, the C++ exceptions are handled and reported withPython exceptions. When using C++ only, failure to handle the exception in your C++ code will cause the process to crash.

Java VM中产生的异常将在C++中报告为javaError异常。当使用Python包装类是,C++的异常将作为Python异常捕获和处理。在仅仅使用C++时,如果没有处理异常,进程将崩溃。

Exceptions that occur in the Java VM and that escape to the PythonVM are reported with aJavaError python exception object. ThegetJavaException() method can be called on JavaError objects to obtain the original java exception object wrapped as any other Java object. This Java object can be used to obtain a Java stack trace for the error, for example.

在Java VM中发生的异常,在Python VM中将作为JavaError 报告为Python异常对象。在JavaError对象调用getJavaException()方法可以获取到被包装为其他java对象的原始的java异常对象。该java对象可以用来获取java堆栈信息,例如:

Exceptions that occur in the Python VM and that escape to the JavaVM, as for example can happen in Python extensions (see topic below)are reported to the Java VM as aRuntimeException or as a PythonException when using sharedmode. Seeinstallationinstructions for more information about shared mode.

发生在PythonVM并抛出到JavaVM中的异常,例如:发生在Python扩展中(参考下一个话题)的异常,在JavaVM中作为RuntimeException PythonException 处理。

Writing Java class extensions in Python

JCC makes it relatively easy to extend a Java class fromPython. This is done via an intermediary class written in Java thatimplements a special method calledpythonExtension()and that declares a number of native methods that are to beimplemented by the actual Python extension.

When JCC sees these special extension java classes it generates theC++ code implementing the native methods they declare. These nativemethods call the corresponding Python method implementations passingin parameters and returning the result to the Java VM caller.

For example, to implement a Lucene analyzer in Python, one wouldimplement first such an extension class in Java:


package org.apache.pylucene.analysis;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import java.io.Reader;

public class PythonAnalyzer extends Analyzer {
    private long pythonObject;

    public PythonAnalyzer()
    {
    }

    public void pythonExtension(long pythonObject)
    {
        this.pythonObject = pythonObject;
    }
    public long pythonExtension()
    {
        return this.pythonObject;
    }

    public void finalize()
        throws Throwable
    {
        pythonDecRef();
    }

    public native void pythonDecRef();
    public native TokenStream tokenStream(String fieldName, Reader reader);
}

The pythonExtension() methods is what makes this classrecognized as an extension class by JCC. They should be includedverbatim as above along with the declaration ofthepythonObject instance variable.

The implementation of the native pythonDecRef() methodis generated by JCC and is necessary because it seemsthatfinalize() cannot itself be native. Since anextension class wraps the Python instance object it's going to becalling methods on, its ref count needs to be decremented when thisJava wrapper class disappears. A declarationforpythonDecRef() and afinalize()implementation should always be included verbatim as above.

Really, the only non boilerplate user input is the constructor of theclass and the other native methods,tokenStream() inthe example above.

The corresponding Python class(es) are implemented as follows:


class _analyzer(PythonAnalyzer):
  def tokenStream(_self, fieldName, reader):
      class _tokenStream(PythonTokenStream):
          def __init__(self_):
              super(_tokenStream, self_).__init__()
              self_.TOKENS = ["1", "2", "3", "4", "5"]
              self_.INCREMENTS = [1, 2, 1, 0, 1]
              self_.i = 0
              self_.posIncrAtt = self_.addAttribute(PositionIncrementAttribute.class_)
              self_.termAtt = self_.addAttribute(TermAttribute.class_)
              self_.offsetAtt = self_.addAttribute(OffsetAttribute.class_)
          def incrementToken(self_):
              if self_.i == len(self_.TOKENS):
                  return False
              self_.termAtt.setTermBuffer(self_.TOKENS[self_.i])
              self_.offsetAtt.setOffset(self_.i, self_.i)
              self_.posIncrAtt.setPositionIncrement(self_.INCREMENTS[self_.i])
              self_.i += 1
              return True
          def end(self_):
              pass
          def reset(self_):
              pass
          def close(self_):
              pass
      return _tokenStream()

When an __init__() is declared, super()must be called or else the Java wrapper class will not know aboutthe Python instance it needs to invoke.

When a java extension class declares native methods for which thereare public or protected equivalents available on the parent class,JCC generates code that makes it possible tocallsuper() on these methods from Python as well.

There are a number of extension examples available in PyLucene's testsuiteandsamples.

Embedding a Python VM in a Java VM

Using the same techniques used when writing a Python extension of aJava class, JCC may also be used to embed a Python VM in a Java VM.Following are the steps and constraints to follow to achieve this:

  • JCC must be built in shared mode. Seeinstallationinstructions for more information about shared mode.Note that for this use on Mac OS X, JCC must also be builtwith the link flags"-framework", "Python" inthe LFLAGS value.

  • As described in the previous section, define one or more Javaclasses to be "extended" from Python to provide theimplementations of the native methods declared on them. Instancesof these classes implement the bridges into the Python VM fromJava.

  • The org.apache.jcc.PythonVM Java class is going beused from the Java VM's main thread to initialize the embeddedPython VM. This class is installed inside the JCC egg under thejcc/classes directory and the full path to thisdirectory must be on the Java CLASSPATH.

  • The JCC egg directory contains the JCC shared runtime library - notthe JCC Python extension shared library - but a librarycalledlibjcc.dylib on Mac OS X,libjcc.so on Linux orjcc.dll on Windows. This directory must be added to the Java VM's shared library pathvia the-Djava.library.path command line parameter.

  • In the Java VM's main thread, initialize the Python VM bycalling its static start() method passing it aPython program name string and optional start-up argumentsin a string array that will be made accessible in Python viasys.argv. Note that the program name string ispurely informational, and is not used by thestart() code other than to initialize thatPython variable. This method returns the singleton PythonVMinstance to be used in this Java VM.start()may be called multiple times; it will always return the samesingleton instance. This instance may also be retrieved at anylater time via the staticget() method definedon the org.apache.jcc.PythonVM class.

  • Any Java VM thread that is going to be calling into the Python VMshould start with acquiring a reference to the Python thread stateobject by callingacquireThreadState() method on thePython VM instance. It should then release the Python thread statebefore terminating by callingreleaseThreadState(). Calling these methods is optional but strongly recommended as itensures that Python is not creating and throwing away a threadstate everytime the Python VM is entered and exited from a givenJava VM thread.

  • Any Java VM thread may instantiate a Python object for which anextension class was defined in Java as described in the previoussection by calling theinstantiate() method on the PythonVM instance. This method takes two string parameters, thename of the Python module and the name of the Python class toimport and instantiate from it. The__init__()constructor on this class must be callable without any parametersand, if defined, must callsuper() in order toinitialize the Java side. Theinstantiate() method isdeclared to returnjava.lang.Object but the returnvalue is actually an instance of the Java extension class used andmust be downcast to it.

Pythonic protocols

When generating wrappers for Python, JCC attempts to detect which classes can be made iterable:

当生成Python包装类时,Jcc尝试检测那些类可以迭代

  • When a class declares to implement java.lang.Iterable, JCC makes it iterable from Python.

  • 当一个类声明实现了java.lang.Iterable,JCC将使其在Python中可以迭代
  • When a Java class declares a method called next()with no arguments returning an object type, this class is made iterable. Itsnext() method is assumed to terminate iteration by returning null.

  • 当一个java类声明了无参数并返回对象类型的next()函数时,该类将可以迭代。它的next()方法在返回null时认为迭代结束。

JCC generates a Python mapping get method for a class when requested to do so via the--mapping command line option which takes two arguments, the class to generate the mapping get for and the Java method to use. The method is specified with its name followed by ':' and its Javasignature.

想让JCC给一个java类生成一个Python 映射的get方法时,需要通过 --mapping 命令行,并制定两个参数,一是需要生成get方法的映射的java类,一是使用的java方法,该方法需要制定它的名字,紧跟一个":",后面是他的java签名。

For example, System.getProperties()['java.class.path'] is made possible by:


--mapping java.util.Properties 
        'getProperty:(Ljava/lang/String;)Ljava/lang/String;'
                    # asking for a Python mapping protocol wrapper
                    # for get access on the Properties class by
                    # calling its getProperty method

JCC generates Python sequence length and get methods for a class when requested to do so via the--sequence command line option which takes three arguments, the class to generate the sequence length and get for and the two java methods to use. The methods are specified with their name followed by ':' and their Javasignature. For example:

像让JCC为java 类生成一个Python序列长度和get方法时,需要制定命令行--sequence选项,并附三个参数。一是生成序列长度和get方法的java类,另两个是使用的java方法。方法格式是名称加":"加java签名,例如:


for i in xrange(len(hits)): 
doc = hits[i]
...

is made possible by:


--sequence org.apache.lucene.search.Hits
         'length:()I' 
         'doc:(I)Lorg/apache/lucene/document/Document;'

The Apache Software Foundation

The Apache Software Foundation provides support for the Apache community of open-source software projects. The Apache projects are defined by collaborative consensus based processes, an open, pragmatic software license and a desire to create high quality software that leads the way in its field. Apache Lucene, Apache Solr, Apache PyLucene, Apache Open Relevance Project and their respective logos are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐