The R package Ckmeans.1d.dp relies on C++ code to do 99% of its work.
I want to use this functionality in Python without having to rely on RPy2. Therefore I want to "translate" the R wrapper to an analogous Python wrapper that operates on Numpy arrays the way the R code operates on R vectors. Is this possible? It seems like it should be, since the C++ code itself looks (to my untrained eye) like it stands up on its own.
However, the documentation for Cython doesn't really cover this use case, of wrapping a existing C++ with Python. It's briefly mentioned here and here, but I'm in way over my head since I've never worked with C++ before.
Here's my attempt, which fails with a slew of "Cannot assign type 'double' to 'double *'
errors:
Directory structure
.
├── Ckmeans.1d.dp # clone of https://github.com/cran/Ckmeans.1d.dp
├── ckmeans
│ ├── __init__.py
│ └── _ckmeans.pyx
├── setup.py
└── src
└── Ckmeans.1d.dp_pymain.cpp
src/Ckmeans.1d.dp_pymain.cpp
#include "../Ckmeans.1d.dp/src/Ckmeans.1d.dp.h"
static void Ckmeans_1d_dp(double *x, int* length, double *y, int * ylength,
int* minK, int *maxK, int* cluster,
double* centers, double* withinss, int* size)
{
// Call C++ version one-dimensional clustering algorithm*/
if(*ylength != *length) { y = 0; }
kmeans_1d_dp(x, (size_t)*length, y, (size_t)(*minK), (size_t)(*maxK),
cluster, centers, withinss, size);
// Change the cluster numbering from 0-based to 1-based
for(size_t i=0; i< *length; ++i) {
cluster[i] ++;
}
}
ckmeans/init.py
from ._ckmeans import ckmeans
ckmeans/_ckmeans.pyx
cimport numpy as np
import numpy as np
from .ckmeans import ClusterResult
cdef extern from "../src/Ckmeans.1d.dp_pymain.cpp":
void Ckmeans_1d_dp(double *x, int* length,
double *y, int * ylength,
int* minK, int *maxK,
int* cluster, double* centers, double* withinss, int* size)
def ckmeans(np.ndarray[np.double_t, ndim=1] x, int* min_k, int* max_k):
cdef int n_x = len(x)
cdef double y = np.repeat(1, N)
cdef int n_y = len(y)
cdef double cluster
cdef double centers
cdef double within_ss
cdef int sizes
Ckmeans_1d_dp(x, n_x, y, n_y, min_k, max_k, cluster, centers, within_ss, sizes)
return (np.array(cluster), np.array(centers), np.array(within_ss), np.array(sizes))
Aucun commentaire:
Enregistrer un commentaire