C++ is a hack

Posted in code with tags , , , , , on September 12, 2010 by maxpower3141

If you are reading this, you probably knew it already.

Basically C++ is just a set of semi-clever preprocessor hacks on top of C – I seem to remember that actually the first C++ compilers were just an extra preprocessing pass on top of a C-compiler!

In themselves, hacks are not evil, but they do normally always cause some undesired side-effects that limits usability. In this case as C++ is basically just a hack on top of C, it means that it actually works by doing a lot of odd name-mangling in the preprocessing steps in order to get the names correct and here in lies the problem.

I recently began using thrust and with it some template programming on C++ and while thrust is great, C++ itself is not so much.

Firstly the template syntax is kind of awkward, but that’s not so much of an issue – the issue is the following:

Using function-objects (types that just overload the “function-call”-operator ()), I can abstract my CUDA-algorithms, by making them template-based on the different function-object types.

For example the SAXPY-call using thrust (this computes result = a X + Y, for vectors X,Y and scalar a) is as I had earlier written:

struct saxpy_functor { const float a;

 saxpy_functor(float _a) : a(_a) {} __host__ __device__ float operator()(const float& x, const float& y) const { return a * x + y; } }; 

void saxpy_fast(float A, thrust::device_vector& X, thrust::device_vector& Y) { // Y <- A * X + Y thrust::transform(X.begin(), X.end(), Y.begin(), Y.begin(), saxpy_functor(A)); } 

This is ok, but I have started to like the idea of keeping the code close to where it is executed (guess I’m getting old or something) so I though that I would like to create a small macro that creates me a transformation-functor type on the fly so I could actually semi-transparently embed CUDA-code with my host code! That Would just rock! Something like:

 void saxpy_fast(float A, thrust::device_vector& X, thrust::device_vector& Y) { struct saxpy_functor { const float a; saxpy_functor(float _a) : a(_a) {} __host__ __device__ float operator()(const float& x, const float& y) const { return a * x + y; } }; // Y <- A * X + Y thrust::transform(X.begin(), X.end(), Y.begin(), Y.begin(), saxpy_functor(A)); } 

and hopefully wrap the silly stuff inside some macro to get:

 { DEFINE_XFORM_CALL(saxpy, data, input1, input2) { return data.a * x + y; }; CALL_XFORM(saxpy, data, in1, in2); } 

Sweet no? Local types are supported by C++, except not in template parameters! And this causes:

“error: a template argument may not reference a local type” – compilation error!

Nice – what’s the freaking problem now? Can’t deduce the name of the type? Oh dear… SW sucks.

There has been a proposition to fix this in the standard itself, but of course this will never be fixed there – I guess MS supports this as compiler extension, but NVCC does not – I’ll request this feature and hope that NVIDIA listens – they do need all the user-friendliness they can get and this would be really friendly for me!

I’ll keep you posted if I ever hear from my request again 🙂 (not filed yet, sorry – TBD tomorrow)

Also I really think it is starting to be high time to switch to D Programming Language – as soon as NVCC supports it, I’m there!

Here’s a sneak-peek for a reduction:
 import std.algorithm, std.range, std.stdio;

 int main() { int[] a1 = [0,1,2,3,4,5,6,7,8,9]; int[] a2 = [6,7,8,9]; int sentinel = 5; int mysum(int a, int b) { if (b <= sentinel) // access outside variable return a + b; else return a; } auto result = reduce!(mysum)( chain(a1, a2) ); // pass in a delegate (closure) writeln("Result: ", result); 

 return 0; } 

Kristofer Columbus in Barcelona pointing

Linux Sucks!

Posted in code with tags , , , on September 12, 2010 by maxpower3141

Just trying to get a snappy title.

A short post, no rant, about some Linux silliness I run into constantly. Recently I ranted about Eclipse.

Now a couple of observations with KDE4 – the Vista of Linux. 🙂

a) When I have another user logged on, I’m logged on to my account and I want to let the other user use his session for some reason clicking “switch user” on the launcher, it doesn’t display at all the existing sessions – how convenient is that? I have to lock screen and then select switch user to do it – end result: 2-4 sessions open by each user – go KDE4!

b) When another user-session is going on, it is actively hogging computer resources! Like 20% of one CPU – hello? The worst offender is the plasma-desktop so my question is, why on earth aren’t the processes sleeping?? I could forgive something like firefox offending this, but plasma is a system service! Surely the system knows nobody’s using it!

These two are quite minor rants, but running into this kind of silly problems all the time seems depressing – I mean, it’s 2010? Shouldn’t the sw start working some day soon? 🙂

Zoom to Alligator

Eclipse: Too many files open problem – possible solution!

Posted in code on August 19, 2010 by maxpower3141

In my previous post I did a quick rant on how sucky open-source code is – well – I implied it in the context.

Just today I ran again into the problem (I now have ulimit -n and ulimit -Hn to 19000) so I started going through web trying to find real solution. The loveliest thing about this problem is that it doesn’t allow you to save your changes! So what do you do? Yes – copy the changes to clipboard, close the program open the file(s) in text-editor and apply the recent changes manually (and pray that there isn’t so many of them) – BRRRRRRR!!!! WRONG! In Linux, not even copy-paste works... Of course this most probably is not the fault of the lovely kernel but the problem lies most probably somewhere in the wonderful world of gnus, gnomes and so-forth, but damn.. 🙂 I miss KDE 3.5.9.. 🙂 Or maybe I should just get a mac – Ok, I’m not that desperate yet..

This Eclipse.org forum-post shows the most promise – yet again you get stability by switching to code produced by people who are paid to do it. 🙂 oh well.. 🙂 Let’s see if it works – I plan to code now for a week with Eclipse open and lets see. Oh yes – the proposed solution is to switch from OpenJDK to Sun’s Java implementation – it figures… 🙂

Btw. Have I ever mentioned to you that gnome sucks? It’s unfortunate that so does KDE4 now, but at least they are trying.

I will put a picture of Big Ben I took, in a vague effort to make my blog more graphical – enjoy!

About Coulombic Friction models

Posted in code, physics with tags , , on August 16, 2010 by maxpower3141

So I have been lately somewhat involved in modeling Coulombic friction in my (perhaps game-) physics simulations project, as you can see from my post on writing test code and the first O’Caml rant.

Just today I spent some time in augmenting the test-case for the static friction case, and it occurred to me, that you, my loyal reader and supporter, might be interested in what is involved with the math and code when implementing these kind of models, so here we go again:

First of all, let us set the background: We are interested in this case friction between two bodies connected with some kind of constraint – let this constraint be realized by the equation: $\phi(x_i) = 0.$ The function $\phi(x_i)$ could be almost anything involving the coordinates $x_i, i = 1, 2, 3, \ldots$ – for example in my test-case has $\phi(x, y) = \sin x + y = 0$, which just tells that whatever the rest of the dynamics are, the point-body described by the coordinates $(x, y)$ should always lie on the negative sine-curve.

Then basic Lagrangian mechanics tells us that there exists a quantity $\lambda$, which ensures that when one applies the force(s) $F_i = \lambda \frac{\partial \phi}{\partial x_i}$ to the body (bodies), then the constraint can be satisfied. This force is the contact or normal force of the constraint and it has the special property that it doesn’t perform any physical work, because it is always orthogonal to the movement-vector of the body.

Then the typical, and quite accurate model, for friction force caused by the constraint $\phi$ is that the movement of the body (/ the relative movement of the bodies) is slowed down by a force that is linearly proportional in magnitude to the normal force of the constraint $F_i = \lambda \frac{\partial \phi}{\partial x_i}$. Then of course the magnitude of the constraint force is $\| F_i \| = | \lambda | \sqrt{\sum_i \left(\frac{\partial \phi}{\partial x_i}\right)^2}$ and the proportionality constant of the friction force is slightly higher in case the body is not (bodies are not) moving.

Until now, I have mostly concentrating on the dynamical friction, where the friction force is completely opposite to the velocity vector of the bodies $v_i$:
$F^{fric}_i = - \xi \frac{v_i}{\|v\|} \| F_i \| = - \xi \frac{v_i}{\|v\|} | \lambda | \sqrt{\sum_i \left(\frac{\partial \phi}{\partial x_i}\right)^2},$ where $\xi$ is the coefficient of the friction.

Now we get to the actual predicament: How to do this in practice? The force is proportional to another force, but alas, it is the absolute value that it is proportional to! We would get perfect model for linear dependence – meaning something like $F^{fric}_i = \lambda \ldots$, but we do have linear dependency on the velocity – almost: it is linearly dependent on the direction of the velocity.

In order to study this, I devised three models how to numerically solve them:

1) Make the linear dependency on the velocity linear and use other values from previous known state – this results effectively in:
$F^{fric}_i = - \xi \frac{v_i}{\|v_0\|} | \lambda_0 | \sqrt{\sum_i \left(\frac{\partial \phi}{\partial x_i^0}\right)^2}$, where $v_0$ is the previous velocity, $\lambda_0$ is the previous Lagrange multiplier and $x_i^0$ are the coordinates from previous state. The nice thing about this model is that it is really simple and computationally inexpensive to implement, but it has the weird property that the velocity used comes from two different points in time, but according to my tests the model is quite ok (except for really smally velocities).

2) Make the force linearly dependent on the Lagrange multiplier $\lambda$, $\rightarrow$
$F^{fric}_i = - \xi \frac{v_i^0}{\|v_0\|} \lambda \ sign (\lambda_0)\ \sqrt{\sum_i \left(\frac{\partial \phi}{\partial x_i^0}\right)^2}$
Here of course $|\lambda| = \lambda \ sign (\lambda_0)$ is always true, except when the constraint force changes directions between two time-steps, but in these situations normally anyways the bodies can drift apart (in case we have a normal contact-constraint $\phi(x_i) \ge 0$) and the model works nicely as it is completely up-to-date with the constraint force, but alas it is somewhat more complex to compute.

The third option, that I haven’t tried out yet, is to just throw away all quantities of current step and just throw the entire force on the right-hand-side of the dynamics equation, making it basically a constant force over each time-step and this just might be completely sufficient for my purposes but all this still requires some work and testing, as the dynamics model I’m using is somewhat novel. Hopefully I can share more about that later on. For now, I bid you good night.

Btw, I’m planning on modeling the rest-friction case as just conditional constraint in the tangent-space of the original constraint. More on that to come. 🙂

Writing test-code

Posted in code, physics, visualization with tags , on August 11, 2010 by maxpower3141

One of the tasks, that in my knowledge is too often neglected (well you know I do this!), is writing test-code for new functionality.

Recently in my post on an O’Caml rant, I was complaining how O’Caml made it impossible for me to do the “simplest of all computer operations” – sleep. I simply wanted to make my CPU(s) idle for a while after drawing some visualization graphics, in order to produce a fluid animation, and it seemed hopeless. I just went on and implemented (the dreaded) busy-loop. 🙂

This test-case, even though it provided visual confirmation that my dynamic friction model (well the first option anyways – I have 2 others to check still) was ok, but it was lacking something important: Error analysis – the Final Frontier for programmers, where even the uber-nerds wont go without some serious recommending.

There are a few easy things we can check out in these sort of cases:

1) Distance from constraint manifolds (for both position and velocity variables)
2) Constraint force (I actually projected the total force vector to constraint-manifold normal. subtracted gravity before projection in order to measure directly the force felt by particle due to constraint)
3) Friction force (should depend linearly on the constraint force)

Computing these values turned out to be already a task, but visualizing them over time turned out even larger task – here’s the gif-animation:

Animating particle sliding on the sine curve

Of course the classical “O’Caml list-reverse” happened, so that the plot is like a polygraph, lie-detector type of curve (For me this happens all the time with O’Caml lists – and of course any iteration through the list inverts it! :))

But furthermore the results seem good, except one tiny thing: The friction force seems to diverge when going through zero-velocity of the particle – how very odd, no? I spent already some time verifying the math – it all matches, so what’s wrong? I know my friction model in this case is not perfect, but the discrepancy should not be so noticeable.

Well, the (yet again too obvious) solution was that this model had only implemented dynamical friction!

Rest friction not yet implemented!!

. D’Oh! 🙂 The classical – without rest friction the plot is supposed to look something like that!!!

So well, moving on to implement rest friction and then to try the two other friction models! I’ll be back with more cool things! … Like more code… And physics.. and math.. and.. Uh.. Yeah – I know – super cool!Well, I am having a glass of wine while coding now, so, I guess I’m not completely hopeless, yet? Ok? And if I play a tad on my guitar?

🙂

Make your CUDA-development fly

Posted in code on August 5, 2010 by maxpower3141

The other day I came by Thrust and I immediately became a fan – without even trying it out first!

The idea of this library is to provide CUDA-developers the most important set of tools to handle those simple, but tricky parallel operations. The idea of CUDA is that the modern GPU is actually more capable in handling computations involving large data-sets than the modern CPU in practically all cases that matter and it is even quite easy to use – until you have to run some operator on the dataset that has global dependencies between the data.

For example finding the maximum of a large dataset is surprisingly difficult to do efficiently in CUDA – Mark Harris of nvidia has a made a nice paper explaining how to get good performance on the same algorithm with the maximum-operator replaced by the sum operator (the algorithm itself is exactly the same, and therefore lends itself to be abstracted by the binary-operator using templates).

Even more difficult is to sort a dataset efficiently on the GPU – here is a paper about it by Nadathur Satish, Michael Garland and Mark Harris. (Btw. in this paper you can see that the GPU is not super great at this kind of tasks and here for some quite short lists (less than one million entries) the CPU can be little faster than the GPU – but keep in mind also that normally sorting is just a small piece of a bigger problem and that this benchmark was done on quite old hardware (an Intel 4-core CPU against GTX 280).)

The point I’m trying to make, is that implementing these auxiliary functions to “real work” can sometimes be notoriously difficult and doing it wrong can bottleneck your computation due to silliness – so what to do? Code them anyways always from scratch, benchmark, test and improve? This can take days, if not weeks even for good programmers!

And TADAA – in comes to rescue the Thrust library! The idea of this library is to give you versatile versions of all these algorithms in the same way that STL does with good performance. And, as far as I can tell, it just works!

Here is a blog post about a benchmark which shows thrust:sort() beating stl:sort() with ten times better perf! And this even includes copying the data from the main system memory to the GPU and back, which is just silliness: when you do GPU compute, you keep the data on the GPU!

The great thing about this thrust is that the algorithms are completely general! You can replace the data-types with any types and the operators are of course overloadable by standard template-techniques.

This example is from the thrust quickstart guide page, which computes z = ax + y for vectors, or the SAXPY-operator from BLAS:

struct saxpy_functor
{
const float a;

saxpy_functor(float _a) : a(_a) {}

__host__ __device__
float operator()(const float& x, const float& y) const {
return a * x + y;
}
};

void saxpy_fast(float A, thrust::device_vector<float>& X, thrust::device_vector<float>& Y)
{
// Y <- A * X + Y
thrust::transform(X.begin(), X.end(), Y.begin(), Y.begin(), saxpy_functor(A));
}


All I need to do now, is to put it to good use!

Nice particle-based physics engine

Posted in code, physics on August 4, 2010 by maxpower3141

Geeks3d has an article on Lagoa physics engine, which seems quite impressive.

Direct quote from site:

Lagoa Multiphysics is a new physics engine created by Thiago Costa. This physics engine is focused on particles.

I have been thinking about this kind of idea to approximate continuous matter as a set of constrained point-bodies (particles) and here mr. Costa brilliantly shows how impressive things you can create with this kind of system. Hopefully later on he will add more rigid bodies as well and destruction thingies.

Also this kind of solver should be just great for GPU-acceleration for physics!