We present a unifi ed and compact scene representation for robotics, where each object in the scene is depictedby a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, recovering depth, point clouds, or voxel maps), collision checking, and stable grasp prediction. We build our representation from a single RGB input image at test time by leveraging recent advances in Neural Radiance Fields (NeRF) that learn category-level priors on large multiview datasets, then fi ne-tune on novel objects from one or few views. We expand the NeRF model for additional grasp outputs and explore ways to leverage this representation forrobotics. At test-time, we build the representation from a single RGB input image observing the scene from only oneviewpoint. We fi nd that the recovered representation allows rendering from novel views, including of occluded object parts,and also for predicting successful stable grasps. Grasp poses can be directly decoded from our latent representation with an implicit grasp decoder. We experimented in both simulation and real world and demonstrated the capability for robust robotic graspingusing such compact representation.