Discussion:
[C++-sig] Boost.Python C++ object reference in Python: unexpected behaviour.
Christoff Kok
2015-05-28 07:29:39 UTC
Permalink
Hi,

I am having an issue with Boost.Python with a very simple use case.

I am returning a reference to an object, and it seems that my python object
looses its C++ object's reference at a stage for some reason.

Please see my *example* below reproducing this issue.

*C++ Code:*

#include <iostream>
#include <vector>
#include <string>
#include <cmath>
#include <boost/python.hpp>
#include <boost/python/suite/indexing/vector_indexing_suite.hpp>

class Car {
public:
Car(std::string name) : m_name(name) {}

bool operator==(const Car &other) const {
return m_name == other.m_name;
}

std::string GetName() { return m_name; }
private:
std::string m_name;
};

class Factory {
public:
Factory(std::string name) : m_name(name) {}

bool operator==(const Factory &other) const {
return m_name == other.m_name
&& m_car_list == other.m_car_list;
}

Car& create_car(std::string name)
{
m_car_list.emplace_back(Car(name));
return m_car_list.back();
}

std::string GetName() { return m_name; }
std::vector<Car>& GetCarList() { return m_car_list;}
private:
std::string m_name;
std::vector<Car> m_car_list;
};

class Manufacturer {
public:
Manufacturer(std::string name) : m_name(name) {}

bool operator==(const Manufacturer &other) const {
return m_name == other.m_name
&& m_factory_list == other.m_factory_list;
}

Factory& create_factory(std::string name)
{
m_factory_list.emplace_back(Factory(name));
return m_factory_list.back();
}

std::string GetName() { return m_name; }
std::vector<Factory>& GetFactoryList() { return m_factory_list;}
private:
std::string m_name;
std::vector<Factory> m_factory_list;
};

BOOST_PYTHON_MODULE(carManufacturer)
{
using namespace boost::python;
class_<Manufacturer>("Manufacturer", init<std::string>())
.add_property("factory_list",
make_function(&Manufacturer::GetFactoryList,
return_internal_reference<1>()))
.add_property("name", &Manufacturer::GetName)
.def("create_factory", &Manufacturer::create_factory,
return_internal_reference<>());
class_<Factory>("Factory", init<std::string>())
.add_property("car_list", make_function(&Factory::GetCarList,
return_internal_reference<1>()))
.add_property("name", &Factory::GetName)
.def("create_car", &Factory::create_car,
return_internal_reference<>());
class_<Car>("Car", init<std::string>())
.add_property("name", &Car::GetName);

class_<std::vector<Factory> >("FactoryList")
.def(vector_indexing_suite<std::vector<Factory> >());
class_<std::vector<Car> >("Car")
.def(vector_indexing_suite<std::vector<Car> >());
}


*Python Code:*

import sys
sys.path[:0] = [r"bin\Release"]

from carManufacturer import *

vw = Manufacturer("VW")
vw_bra_factory = vw.create_factory("Brazil Factory")
beetle = vw_bra_factory.create_car("Beetle69")

if vw_bra_factory is vw.factory_list[0]:
print("equal.")
else:
print("NOT EQUAL")
print("## I expected them to be the same reference..?")


print("vw_bra_factory Car List size : " + str(len(vw_bra_factory.car_list)))
print("Actual Car List size : " +
str(len(vw.factory_list[0].car_list)))
print("## This still works. Maybe the python objects differ, but refer to
the same C++ object. I can live with that.")

vw_sa_factory = vw.create_factory("South Africa Factory")
print("vw_bra_factory Car List size : " + str(len(vw_bra_factory.car_list)))
print("Actual Car List size : " +
str(len(vw.factory_list[0].car_list)))
print("## .. what? why? brazil py object has no cars now? I don't get it. I
can't have any of that.")

print("## What will happen if I create another car in the brazil factory?")
combi = vw_bra_factory.create_car("Hippie van")
print("vw_bra_factory Car List size : " + str(len(vw_bra_factory.car_list)))
print("Actual Car List size : " +
str(len(vw.factory_list[0].car_list)))

print("## And another.")
citi_golf = vw_bra_factory.create_car("Citi golf")
print("vw_bra_factory Car List size : " + str(len(vw_bra_factory.car_list)))
print("Actual Car List size : " +
str(len(vw.factory_list[0].car_list)))
print("## 'vw_bra_factory' must have lost its C++ reference it had to
'vw.factory_list[0]' when I created a new factory. Why?")


*Python Output:*

NOT EQUAL
*## I expected them to be the same reference..?*
vw_bra_factory Car List size : 1
Actual Car List size : 1
*## This still works. Maybe the python objects differ, but refer to the
same C++ object. I can live with that.*
vw_bra_factory Car List size : 0
Actual Car List size : 1
*## .. what? why? brazil py object has no cars now? I don't get it. I can't
have any of that.*
*## What will happen if I create another car in the brazil factory?*
vw_bra_factory Car List size : 1
Actual Car List size : 1
*## And another.*
vw_bra_factory Car List size : 2
Actual Car List size : 1
*## 'vw_bra_factory' must have lost its C++ reference it had to
'vw.factory_list[0]' when I created a new factory. Why?*

This is just an example made to reproduce my real work's problem in a
presentable way. In my real work, python crashes
after I create a second "factory" and try to add a "car" to the first
"factory".
The crash occurs in C++ "create_car" method whan trying to access the
"factory"'s "car" list.

Does anyone have insight as to what the problem is? Any useful input will
be greatly appreciated.

Greetings,
Christoff
Christoff Kok
2015-06-02 05:34:09 UTC
Permalink
Hi,

This looks like a bug in Boost.Python to me.

Could anyone confirm this? I provided a minimal, full working example.

I would like to make sure it is a bug before reporting it as one.

Christoff
Post by Christoff Kok
Hi,
I am having an issue with Boost.Python with a very simple use case.
I am returning a reference to an object, and it seems that my python
object looses its C++ object's reference at a stage for some reason.
Please see my *example* below reproducing this issue.
*C++ Code:*
#include <iostream>
#include <vector>
#include <string>
#include <cmath>
#include <boost/python.hpp>
#include <boost/python/suite/indexing/vector_indexing_suite.hpp>
class Car {
Car(std::string name) : m_name(name) {}
bool operator==(const Car &other) const {
return m_name == other.m_name;
}
std::string GetName() { return m_name; }
std::string m_name;
};
class Factory {
Factory(std::string name) : m_name(name) {}
bool operator==(const Factory &other) const {
return m_name == other.m_name
&& m_car_list == other.m_car_list;
}
Car& create_car(std::string name)
{
m_car_list.emplace_back(Car(name));
return m_car_list.back();
}
std::string GetName() { return m_name; }
std::vector<Car>& GetCarList() { return m_car_list;}
std::string m_name;
std::vector<Car> m_car_list;
};
class Manufacturer {
Manufacturer(std::string name) : m_name(name) {}
bool operator==(const Manufacturer &other) const {
return m_name == other.m_name
&& m_factory_list == other.m_factory_list;
}
Factory& create_factory(std::string name)
{
m_factory_list.emplace_back(Factory(name));
return m_factory_list.back();
}
std::string GetName() { return m_name; }
std::vector<Factory>& GetFactoryList() { return m_factory_list;}
std::string m_name;
std::vector<Factory> m_factory_list;
};
BOOST_PYTHON_MODULE(carManufacturer)
{
using namespace boost::python;
class_<Manufacturer>("Manufacturer", init<std::string>())
.add_property("factory_list",
make_function(&Manufacturer::GetFactoryList,
return_internal_reference<1>()))
.add_property("name", &Manufacturer::GetName)
.def("create_factory", &Manufacturer::create_factory,
return_internal_reference<>());
class_<Factory>("Factory", init<std::string>())
.add_property("car_list", make_function(&Factory::GetCarList,
return_internal_reference<1>()))
.add_property("name", &Factory::GetName)
.def("create_car", &Factory::create_car,
return_internal_reference<>());
class_<Car>("Car", init<std::string>())
.add_property("name", &Car::GetName);
class_<std::vector<Factory> >("FactoryList")
.def(vector_indexing_suite<std::vector<Factory> >());
class_<std::vector<Car> >("Car")
.def(vector_indexing_suite<std::vector<Car> >());
}
*Python Code:*
import sys
sys.path[:0] = [r"bin\Release"]
from carManufacturer import *
vw = Manufacturer("VW")
vw_bra_factory = vw.create_factory("Brazil Factory")
beetle = vw_bra_factory.create_car("Beetle69")
print("equal.")
print("NOT EQUAL")
print("## I expected them to be the same reference..?")
print("vw_bra_factory Car List size : " +
str(len(vw_bra_factory.car_list)))
print("Actual Car List size : " +
str(len(vw.factory_list[0].car_list)))
print("## This still works. Maybe the python objects differ, but refer to
the same C++ object. I can live with that.")
vw_sa_factory = vw.create_factory("South Africa Factory")
print("vw_bra_factory Car List size : " +
str(len(vw_bra_factory.car_list)))
print("Actual Car List size : " +
str(len(vw.factory_list[0].car_list)))
print("## .. what? why? brazil py object has no cars now? I don't get it.
I can't have any of that.")
print("## What will happen if I create another car in the brazil factory?")
combi = vw_bra_factory.create_car("Hippie van")
print("vw_bra_factory Car List size : " +
str(len(vw_bra_factory.car_list)))
print("Actual Car List size : " +
str(len(vw.factory_list[0].car_list)))
print("## And another.")
citi_golf = vw_bra_factory.create_car("Citi golf")
print("vw_bra_factory Car List size : " +
str(len(vw_bra_factory.car_list)))
print("Actual Car List size : " +
str(len(vw.factory_list[0].car_list)))
print("## 'vw_bra_factory' must have lost its C++ reference it had to
'vw.factory_list[0]' when I created a new factory. Why?")
*Python Output:*
NOT EQUAL
*## I expected them to be the same reference..?*
vw_bra_factory Car List size : 1
Actual Car List size : 1
*## This still works. Maybe the python objects differ, but refer to the
same C++ object. I can live with that.*
vw_bra_factory Car List size : 0
Actual Car List size : 1
*## .. what? why? brazil py object has no cars now? I don't get it. I
can't have any of that.*
*## What will happen if I create another car in the brazil factory?*
vw_bra_factory Car List size : 1
Actual Car List size : 1
*## And another.*
vw_bra_factory Car List size : 2
Actual Car List size : 1
*## 'vw_bra_factory' must have lost its C++ reference it had to
'vw.factory_list[0]' when I created a new factory. Why?*
This is just an example made to reproduce my real work's problem in a
presentable way. In my real work, python crashes
after I create a second "factory" and try to add a "car" to the first
"factory".
The crash occurs in C++ "create_car" method whan trying to access the
"factory"'s "car" list.
Does anyone have insight as to what the problem is? Any useful input will
be greatly appreciated.
Greetings,
Christoff
--
Christoff Kok
Software Engineer
Ex Mente

http://www.ex-mente.co.za
***@ex-mente.co.za
PO Box 10214
Centurion
0046
South Africa
tel: +27 12 743 6993
tel: +27 12 654 8198
fax: +27 85 150 1341
Stefan Seefeld
2015-06-02 12:35:10 UTC
Permalink
Post by Christoff Kok
Hi,
This looks like a bug in Boost.Python to me.
Could anyone confirm this? I provided a minimal, full working example.
I would like to make sure it is a bug before reporting it as one.
The 'is' operator compares the identities of the two Python objects,
which differ. However, both are referencing the same C++ object.
As a test, add

bool identical(Factory &f1, Factory &f2) { return &f1 == &f2;}

to your C++ and expose that, then use that function to compare the
factory references, instead of 'is'.
Yes, it would be nice if the same Python (wrapper) object would be
returned. I'm not sure how to do that, though. I'll think about it some
more...

HTH,
Stefan
--
...ich hab' noch einen Koffer in Berlin...
Stefan Seefeld
2015-06-02 13:11:23 UTC
Permalink
Christoff,

I just noticed I wasn't really answering the real problem you report,
which is the crash.

I believe the problem is in your code: You create two vectors of
value-types (cars and factories). Then you take references to the stored
objects, while there is no guarantee that the objects' addresses won't
change over time. In particular, there is a good chance of these objects
to be copied as the vector gets resized as new objects are added beyond
their current capacity.

As a test, I called

vector<...>::reserve(32)

on each of the vectors right in the Factory and Manufacturer
constructors, with the effect of allocating enough storage upfront so
that in your sample code no re-allocation is required, and thus objects
won't be copied around. This prevents the crash from happening for me.

Obviously this is just to illustrate the problem; it's definitely not a
solution to your problem, which still is that you reference objects
beyond their lifetime.

HTH,
Stefan
--
...ich hab' noch einen Koffer in Berlin...
Christoff Kok
2015-06-02 14:36:31 UTC
Permalink
Hi Stefan,

Thank you very much. That makes sense and my tests prove it. The code runs
as expected when I reserve enough space for the vector.

I do not quite get it why it works in C++ and not python. I know too little
about the C++ and python run-time.
I guess that the C++ run-time automatically updates objects holding
references to its new address, whereas this is not the case in the Python
run-time.
Fixing that issue for me is way over my head at the moment.

Thank you very much for spending the time to think about the problem and
kudos to you for discovering the reason.


I'm not sure how to solve the problem either, my objects are uniquely
identifiable.
I might be able to override the python objects' __getattribute__ methods
and set the python object to that of its parent container's instance (only
if the vector has not been resized).
I don't know if this will work, This might slow the code down a bit as
well. I'll try it none the less.

Thanks again Stefan.

Regards,
Christoff
Post by Stefan Seefeld
Christoff,
I just noticed I wasn't really answering the real problem you report,
which is the crash.
I believe the problem is in your code: You create two vectors of
value-types (cars and factories). Then you take references to the stored
objects, while there is no guarantee that the objects' addresses won't
change over time. In particular, there is a good chance of these objects
to be copied as the vector gets resized as new objects are added beyond
their current capacity.
As a test, I called
vector<...>::reserve(32)
on each of the vectors right in the Factory and Manufacturer
constructors, with the effect of allocating enough storage upfront so
that in your sample code no re-allocation is required, and thus objects
won't be copied around. This prevents the crash from happening for me.
Obviously this is just to illustrate the problem; it's definitely not a
solution to your problem, which still is that you reference objects
beyond their lifetime.
HTH,
Stefan
--
...ich hab' noch einen Koffer in Berlin...
_______________________________________________
Cplusplus-sig mailing list
https://mail.python.org/mailman/listinfo/cplusplus-sig
--
Christoff Kok
Software Engineer
Ex Mente

http://www.ex-mente.co.za
***@ex-mente.co.za
PO Box 10214
Centurion
0046
South Africa
tel: +27 12 743 6993
tel: +27 12 654 8198
fax: +27 85 150 1341
Stefan Seefeld
2015-06-02 14:55:21 UTC
Permalink
Post by Christoff Kok
Hi Stefan,
Thank you very much. That makes sense and my tests prove it. The code
runs as expected when I reserve enough space for the vector.
I do not quite get it why it works in C++ and not python. I know too
little about the C++ and python run-time.
What works in C++ and not in Python ? With the "reserve" calls the
Python script you provided runs fine (for me).
Post by Christoff Kok
I guess that the C++ run-time automatically updates objects holding
references to its new address, whereas this is not the case in the
Python run-time.
Fixing that issue for me is way over my head at the moment.
But the problem really is in your C++ code, and has nothing to do with
the Python bindings. So you should start by clarifying your desired API,
then adjust the implementation. (Do you really want references to be
usable ? In that case, your vectors shouldn't store cars by-value. My
guess is you shouldn't be using references, unless the real types are
heavy objects, in which case you also want to prevent copy-construction,
and store heap-allocated objects.

Once the C++ API is settled, you can reconsider the appropriate Python
API for it.
Post by Christoff Kok
Thank you very much for spending the time to think about the problem
and kudos to you for discovering the reason.
You are welcome.

Stefan
--
...ich hab' noch einen Koffer in Berlin...
Christoff Kok
2015-06-03 07:43:44 UTC
Permalink
Thank you again Stefan,
Post by Stefan Seefeld
What works in C++ and not in Python ? With the "reserve" calls the
Python script you provided runs fine (for me).
I tested the code in a C++ console application, using the same example as
in the Python example I posted.
I meant, that, without the "reserve" calls, the C++ console application
worked as I expected, and Python didn't. Python works as I expected with
the 'reserve' calls.
Post by Stefan Seefeld
Do you really want references to be
usable ? In that case, your vectors shouldn't store cars by-value. My
guess is you shouldn't be using references, unless the real types are
heavy objects, in which case you also want to prevent copy-construction,
and store heap-allocated objects.
The types I am using are big objects and performance is a concern.
C++11's 'move' semantics make storing large objects by value in a vector
much more viable and performant. (Very little overhead.)
I liked this approach of using containers of objects by value everywhere
(well, except where polymorphism is needed).
I am sure that using heap-allocated objects will still be faster however.
Even though moving has little overhead, it's still overhead heap-allocated
objects doesn't have to deal with.
Post by Stefan Seefeld
Once the C++ API is settled, you can reconsider the appropriate Python
API for it.
I tested the use of heap-allocated objects and it works.
I am going to change my code to rather store heap allocated objects for all
my complex types (to keep it consistent for maintenance and simplicity's
sake).

Thank you for all your assistance, I greatly appreciate it. You saved me a
lot of time.

Regards,
Christoff
Post by Stefan Seefeld
Post by Christoff Kok
Hi Stefan,
Thank you very much. That makes sense and my tests prove it. The code
runs as expected when I reserve enough space for the vector.
I do not quite get it why it works in C++ and not python. I know too
little about the C++ and python run-time.
What works in C++ and not in Python ? With the "reserve" calls the
Python script you provided runs fine (for me).
Post by Christoff Kok
I guess that the C++ run-time automatically updates objects holding
references to its new address, whereas this is not the case in the
Python run-time.
Fixing that issue for me is way over my head at the moment.
But the problem really is in your C++ code, and has nothing to do with
the Python bindings. So you should start by clarifying your desired API,
then adjust the implementation. (Do you really want references to be
usable ? In that case, your vectors shouldn't store cars by-value. My
guess is you shouldn't be using references, unless the real types are
heavy objects, in which case you also want to prevent copy-construction,
and store heap-allocated objects.
Once the C++ API is settled, you can reconsider the appropriate Python
API for it.
Post by Christoff Kok
Thank you very much for spending the time to think about the problem
and kudos to you for discovering the reason.
You are welcome.
Stefan
--
...ich hab' noch einen Koffer in Berlin...
_______________________________________________
Cplusplus-sig mailing list
https://mail.python.org/mailman/listinfo/cplusplus-sig
--
Christoff Kok
Software Engineer
Ex Mente

http://www.ex-mente.co.za
***@ex-mente.co.za
PO Box 10214
Centurion
0046
South Africa
tel: +27 12 743 6993
tel: +27 12 654 8198
fax: +27 85 150 1341
Loading...