Introduction

In this short tutorial I’m presenting something that’s made me loose weeks of work. How to implement picking with a perspective camera in the Android platform using OPENGL 1.0.

The process of picking basically involves the user clicking a point in their device screen, we take that point and apply the inverse transforms that opengl applies to it’s 3D scene, and so get a point in the world coordinate system (wcs) that is where the player wanted to click. For the sake of simplicity, we will work on a simple 2D map, instead of having to cast a ray to intersect multiple objects.

Usually, in opengl we would use the function glUnProject to un-project the point and so get the wcs equivalent point, but that function is plagued by errors on the Android platform and it’ very difficult to get the gl transformations for the projection and model matrixes.

Algorithm

So here is my solution. It might not be perfect, but it actually works.

Code Snippet
 /**
    * Calculates the transform from screen coordinate 
    * system to world coordinate system coordinates 
    * for a specific point, given a camera position.
    * 
    * @param touch Vec2 point of screen touch, the 
      actual position on physical screen (ej: 160, 240)
    * @param cam camera object with x,y,z of the 
      camera and screenWidth and screenHeight of 
      the device.
    * @return position in WCS.
    */
   public Vec2 GetWorldCoords( Vec2 touch, Camera cam)
   {  
       // Initialize auxiliary variables.
       Vec2 worldPos = new Vec2();
       
       // SCREEN height & width (ej: 320 x 480)
       float screenW = cam.GetScreenWidth();
       float screenH = cam.GetScreenHeight();
              
       // Auxiliary matrix and vectors 
       // to deal with ogl.
       float[] invertedMatrix, transformMatrix, 
           normalizedInPoint, outPoint;
       invertedMatrix = new float[16];
       transformMatrix = new float[16];
       normalizedInPoint = new float[4];
       outPoint = new float[4];
 
       // Invert y coordinate, as android uses 
       // top-left, and ogl bottom-left.
       int oglTouchY = (int) (screenH - touch.Y());
       
       /* Transform the screen point to clip 
       space in ogl (-1,1) */       
       normalizedInPoint[0] = 
        (float) ((touch.X()) * 2.0f / screenW - 1.0);
       normalizedInPoint[1] = 
        (float) ((oglTouchY) * 2.0f / screenH - 1.0);
       normalizedInPoint[2] = - 1.0f;
       normalizedInPoint[3] = 1.0f;
 
       /* Obtain the transform matrix and 
       then the inverse. */
       Print("Proj", getCurrentProjection(gl));
       Print("Model", getCurrentModelView(gl));
       Matrix.multiplyMM(
           transformMatrix, 0, 
           getCurrentProjection(gl), 0, 
           getCurrentModelView(gl), 0);
       Matrix.invertM(invertedMatrix, 0, 
           transformMatrix, 0);       
 
       /* Apply the inverse to the point 
       in clip space */
       Matrix.multiplyMV(
           outPoint, 0, 
           invertedMatrix, 0, 
           normalizedInPoint, 0);
       
       if (outPoint[3] == 0.0)
       {
           // Avoid /0 error.
           Log.e("World coords", "ERROR!");
           return worldPos;
       }
       
       // Divide by the 3rd component to find 
       // out the real position.
       worldPos.Set(
           outPoint[0] / outPoint[3], 
           outPoint[1] / outPoint[3]);
         
       return worldPos;       
   }
 

In my case, I’ve got a render, a logic and an application thread, this function is a service provided by the render thread, because it needs the gl Projection and ModelView matrixes.

What happens is the logic thread sends a touch (x,y) position, and the current camera (x,y,z, screenH, screenW), to the GetWorldCoords function, and expects the world position of that point taking into accound the camera position (x,y,z), and the view fustrum (represented by the projection and modelview matrixes).

The first lines get the data ready, create auxiliary matrixes and access camera data.

One important point is the line

int oglTouchY = (int) (screenH - touch.Y());

This inversion is needed because android screen coordinates assume a top-left coordinate system, and opengl needs a bottom left. So we change it. And with that we can start doing the picking algorithm.

Transform the point from screen coordinates (ej: 120, 330) to clip space (for a 320 x 480 android, this would be –0.25, 0.375)
Get the transformation matrix (projection * modelView), and invert it.
Multiply the clip-space point times the inverse transformation.
Divide the coordinates x,y,z (positions 0,1,2) times the w (position 3)
You’ve got the world coordinates.

Notes:

The z doesn’t appear because I don’t have need for it, but you can get it easily (outPoint[2] / outPoint[3]).

The situation I’m working on is the following. The red and blue are the frustum limits, the green is the world map, at an arbitrary point in space, and the camera is at the tip of the view frustum.

There is one very special complication when doing this picking algorithm in the android platform and that is accessing the projection and model view matrixes opengl uses. We manage with the following code.

Code Snippet
 /**
    * Record the current modelView matrix 
    * state. Has the side effect of
    * setting the current matrix state 
    * to GL_MODELVIEW
    * @param gl context
    */
   public float[] getCurrentModelView(GL10 gl) 
   {
        float[] mModelView = new float[16];
        getMatrix(gl, GL10.GL_MODELVIEW, mModelView);
        return mModelView;
   }
 
   /**
    * Record the current projection matrix 
    * state. Has the side effect of
    * setting the current matrix state 
    * to GL_PROJECTION
    * @param gl context
    */
   public float[] getCurrentProjection(GL10 gl) 
   {
       float[] mProjection = new float[16];
       getMatrix(gl, GL10.GL_PROJECTION, mProjection);
       return mProjection;
   }
 
   /**
    * Fetches a specific matrix from opengl
    * @param gl context
    * @param mode of the matrix
    * @param mat initialized float[16] array 
    * to fill with the matrix
    */
   private void getMatrix(GL10 gl, int mode, float[] mat) 
   {
       MatrixTrackingGL gl2 = (MatrixTrackingGL) gl;
       gl2.glMatrixMode(mode);
       gl2.getMatrix(mat, 0);
   }
 

The gl parameter passed to the getCurrent*(GL10 gl) functions is stored as a member variable in the class.

The MatrixTrackingGL class is part of the android samples, and can be found here. Some other classes must be included for it to work (mainly MatrixStack). The MatrixTrackingGL class acts as a wrapper for the gl context, but providing the data we need. For it to work, our custom GLSurfaceView class must have the GLWrapper call, something like this.

Code Snippet
 public DagGLSurfaceView(Context context) 
{
    super(context);       
        
    setFocusable(true);
        
    // Wrapper set so the renderer can 
    //access the gl transformation matrixes.
    setGLWrapper(
    new GLSurfaceView.GLWrapper() 
    {
        @Override
        public GL wrap(GL gl) 
        { 
            return new MatrixTrackingGL(gl); 
        }
    });  
        
    mRenderer = new DagRenderer();
    setRenderer(mRenderer);
}
 

(Where DagRender is my GLSurfaceView.Renderer, and DagGLSurfaceView is my GLSurfaceView)

24 comments:

AnonymousOctober 15, 2010 at 5:01 PM
Hi Jaime,
I'm stuck with the same problem for over a week now. What you write here is exactly what I'm trying to achieve. It would be a blast, if you could post some compilable code.
cheers
Judith
ReplyDelete
Replies
Jaime Barrachina VerdiaOctober 15, 2010 at 5:23 PM
Well Judith, the source code for the game I'm making that uses that is here.
You want to look at the DagRenderer.java file and the Camera.java file.

Hope it can be of use!
ReplyDelete
Replies
AlNovember 21, 2010 at 12:23 PM
Hello Jaime!
thank you for the article!!
Your blog is the only one I found so far in terms of good explanation of theory and practical (Android) example most others just repeat the same theory and nothing more.
Thank you again!!!
Alex
p.s.
It would be nice if you continue your blog with practical explanations of 3D picks algoritms: color pick, name pick, ray pick
ReplyDelete
Replies
Jaime Barrachina VerdiaNovember 21, 2010 at 12:25 PM
Thanks for the comment Alex. I try to share any algorithm or task I have special problems with, so other people don't have to waste their time like I did.
Cheers!
ReplyDelete
Replies
AnonymousDecember 17, 2010 at 9:20 AM
Hi! It really helps a lot!
But may I know how you utilize the current camera position (x,y,z) to get the result? when I went through your code in GetWorldCoords, I couldnt get how the cam (x,y,z) was involved there?
ReplyDelete
Replies
Jaime Barrachina VerdiaDecember 17, 2010 at 10:27 PM
Well, you don't get to use the x,y,z of the cam, because you've already supplied those to OGL in your matrix transformations to move the camera each frame.
We access that in getCurrentProjection(gl), so we only use the camera directly for the size of the viewport in this code.

Hope it helps :3
If you need more help, feel free to email me directly.
ReplyDelete
Replies
AleksandarDecember 20, 2010 at 4:15 PM
Hi, what If I have set the Frustum like this:

float size = .01f * (float) Math.tan(Math.toRadians(-45.0) / 2);
float ratio = (float) w / h;
gl.glFrustumf(-size, size, -size / ratio, size / ratio, 0.01f, 10.0f);

? It gives wrong values..
ReplyDelete
Replies
Jaime Barrachina VerdiaDecember 20, 2010 at 4:26 PM
Aleksandar, take a look at gluPerspective for that, much easier to use.
http://hi-android.info/src/android/opengl/GLU.java.html
ReplyDelete
Replies
AleksandarDecember 20, 2010 at 5:46 PM
Is it the same? I use it with gluPerspective now, and then define a, say, rect with vertices:

float[] vertices = new float[] {
1, -1, 0
1, 1, 0
-1, 1, 0
-1, -1, 0
};

and it draws perfectly correct. But then when I touch the points where the rectangle is drawn with the texture, having in mind that the texture is 256x256 (pow of two), the getWorldCoords returns values which are close to but not correct to the defined vertices. And the range of x goes from -2.2 to 2.2 and y from -1.5 to 1.5.
ReplyDelete
Replies
AleksandarDecember 21, 2010 at 9:50 AM
Sorry, one more question... Currently the scene goes from x -2.0 to x 2.0 and from y -1.5 to y 1.5 .. will this be the same for all screen devices? How do you handle different screen sizes?
ReplyDelete
Replies
AnonymousDecember 27, 2010 at 4:17 AM
Thanks~I get it ~ really helps a lot:)
ReplyDelete
Replies
AnonymousDecember 27, 2010 at 7:22 AM
One more question:
if we want to get the 3D coordinates, then there should be 3 parameters contained in the screen coordinates, right? ScreenX, ScreenY, and 0 or 1 for the two ending points of the ray.
How will these be involved?
ReplyDelete
Replies
UnknownJanuary 13, 2011 at 8:03 PM
Great code, but I have some questions:

1. Why normalizedInPoint[2] = - 1.0f?
2. Where is "an arbitrary point in space" of world map?

I am trying to understand these transformation but not everything is clear.
ReplyDelete
Replies
Florin BJune 11, 2011 at 12:26 PM
+1 for snakeye comment. It is curious it works with the values:
normalizedInPoint[2] = - 1.0f;
normalizedInPoint[3] = 1.0f;

I tried to use the same in my Renderer but can't get results. Guess it only works in your particular OpenGL setting...
ReplyDelete
Replies
TalhaJuly 16, 2011 at 3:51 AM
Hi,

I can get this code running, I put the Touch-Screen Co-ordinates and it gives a normalized value b/w -1 and 1 and I can convert it back to Screen Coordinates using the Denormalization.

My question is if I want to check if I have clicked inside a marker/object area or not, how should I proceed.

I would really appreciate if you can reply.
ReplyDelete
Replies
bloggerJanuary 2, 2012 at 2:30 PM
Hello!

This is nice, but how to build a moell selector tool with this?

I have several modellt, but in android glPicking/colorPicking not working.

How can I change this code to make selections?
ReplyDelete
Replies
UnknownNovember 20, 2012 at 1:36 PM
/* Transform the screen point to clip
space in ogl (-1,1) */
normalizedInPoint[0] =
(float) ((touch.X()) * 2.0f / screenW - 1.0);
normalizedInPoint[1] =
(float) ((oglTouchY) * 2.0f / screenH - 1.0);
normalizedInPoint[2] = - 1.0f;
normalizedInPoint[3] = 1.0f;

Can You please explain this lines.
ReplyDelete
Replies
UnknownNovember 20, 2012 at 7:18 PM
http://nehe.gamedev.net/article/using_gluunproject/16013/
In the above link they are not using "ray", but using
glReadPixels( x, int(winY), 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &winZ );
Will it not work with Android? I am trying to use it.
Please help.
ReplyDelete
Replies
UnknownMarch 8, 2013 at 7:36 AM
Hi... Your solution seems to be great.. But I am working on android so i want to do the same in android.. Could you provide the same code for android, please....?
ReplyDelete
Replies
UnknownOctober 8, 2013 at 5:32 AM
Hi there,
im still new in this AR. What software are you using? and if i have my own 3D model, how can i import my own model?
ReplyDelete
Replies
Daniele SegatoOctober 25, 2013 at 12:02 PM
Your link for MatrixTrackingGL library is gone.
This is valid:
https://github.com/mitchellhislop/apps-for-android/blob/master/Samples/OpenGLES/SpriteText/src/com/google/android/opengles/spritetext/MatrixTrackingGL.java
ReplyDelete
Replies

Add comment