jeudi 23 juin 2016

How to translate an R wrapper around a C++ function to Python/Numpy


The R package Ckmeans.1d.dp relies on C++ code to do 99% of its work.

I want to use this functionality in Python without having to rely on RPy2. Therefore I want to "translate" the R wrapper to an analogous Python wrapper that operates on Numpy arrays the way the R code operates on R vectors. Is this possible? It seems like it should be, since the C++ code itself looks (to my untrained eye) like it stands up on its own.

However, the documentation for Cython doesn't really cover this use case, of wrapping a existing C++ with Python. It's briefly mentioned here and here, but I'm in way over my head since I've never worked with C++ before.

Here's my attempt, which fails with a slew of "Cannot assign type 'double' to 'double *' errors:

Directory structure

.
├── Ckmeans.1d.dp  # clone of https://github.com/cran/Ckmeans.1d.dp
├── ckmeans
│   ├── __init__.py
│   └── _ckmeans.pyx
├── setup.py
└── src
    └── Ckmeans.1d.dp_pymain.cpp

src/Ckmeans.1d.dp_pymain.cpp

#include "../Ckmeans.1d.dp/src/Ckmeans.1d.dp.h"
static void Ckmeans_1d_dp(double *x, int* length, double *y, int * ylength,
                          int* minK, int *maxK, int* cluster,
                          double* centers, double* withinss, int* size)
{
    // Call C++ version one-dimensional clustering algorithm*/
    if(*ylength != *length) { y = 0; }

    kmeans_1d_dp(x, (size_t)*length, y, (size_t)(*minK), (size_t)(*maxK),
                    cluster, centers, withinss, size);

    // Change the cluster numbering from 0-based to 1-based
    for(size_t i=0; i< *length; ++i) {
        cluster[i] ++;
    }
}

ckmeans/init.py

from ._ckmeans import ckmeans

ckmeans/_ckmeans.pyx

cimport numpy as np
import numpy as np
from .ckmeans import ClusterResult

cdef extern from "../src/Ckmeans.1d.dp_pymain.cpp":
    void Ckmeans_1d_dp(double *x, int* length,
                       double *y, int * ylength,
                       int* minK, int *maxK,
                       int* cluster, double* centers, double* withinss, int* size)

def ckmeans(np.ndarray[np.double_t, ndim=1] x, int* min_k, int* max_k):
    cdef int n_x = len(x)
    cdef double y = np.repeat(1, N)
    cdef int n_y = len(y)
    cdef double cluster
    cdef double centers
    cdef double within_ss
    cdef int sizes
    Ckmeans_1d_dp(x, n_x, y, n_y, min_k, max_k, cluster, centers, within_ss, sizes)
    return (np.array(cluster), np.array(centers), np.array(within_ss), np.array(sizes))

Aucun commentaire:

Enregistrer un commentaire