Patriot, my good friend (seriously)...it is a shame that it is so time consuming and rather tedious to sort through what I know people want as a simple answer. I greatly respect you for being one of the few willing to spend some time seeing it is not as straight forward as everyone normally thinks.
On the one hand, you can get "seat of the pants" relative comparisons if you use something like a ceiling bounce test...but it assumes that your light is positioned at the same starting spot, aimed at the same spot as one you are comparing with, has a relatively similar beam pattern (& hotspot, corona, spill, etc.) as well as same position of light meter taking reading, no difference in ambient lighting, etc.
The important thing to understand with lumens is the specifics on how precisely it is defined. When you realize that, you begin to see that a HID bulb put into an adjustable reflector (of variable reflecting quality) like the MaxaBeam uses doesn't make sense to describe in lumens, because you are dealing with a directional light output.
That is also why Osram listed output of new Ministar bulb (I linked thread above) in Candela + beam angle (which together can give a ballpark lumen conversion).
The integrating sphere appears to be the correct/proper device for measuring lumens based in lux....(I think) A light box is a rough replication of the integrating sphere and the ceiling bounce or room illumination test is a still rougher version of the light box. Since a light meter is capable of measuring the overall amount of photons bouncing around within a space, can't it show the relative differences between different lights? If it can reveal the relative differences can't those differences be assigned values and if so what should those values be measured in?
The Integrating Sphere is the "Gold Standard" but the size required to measure an entire light + reflector without it interfering with the necessary even dispersion is nearly impossible, and certainly impractical...so manufacturers give ballpark claims of questionable results. You could just put in the HID bulb itself, but you still would need to make estimates of the secondary effect of the reflector.
The light box and ceiling bounce inventions here at CPF were ways for people to try and approach the I.S. concept without spending the huge amount for those (IS) devices. However, the light does not reflect and disperse completely and randomly in a box, ceiling, or any non-spherical setup. Of course they can be used for relative ballpark comparisons, but people toss out results in lumens like they really are that absolute value.
Ideally, if you start with an LED or incan bulb that has been rigorously measured at a certain conservative voltage and bulb age
(output drops significantly with many bulbs I tested proportional to degree of voltage overdrive and time run), you could use that as a reference point in comparisons.
I don't know what the actual measurements (or how they were done) are of various HID bulbs that you referenced (Maybe XeRay knows), but if it was expressed in (I.S. verified) raw HID bulb lumens, it becomes a whole other ballpark once you interface it with a specific regulated ballast voltage/current output and reflector/lens setup.
As you know, the idea of pointing a spotlight/flashlight at a wall, and taking beam readings with a light meter at any distance is extremely problematic, based upon placement in the hotspot, corona, spill...and it still not taking all the dispersed output into account.
I think the whole point of the integrating sphere and other indirect measuring methods seeks to remove the anomalous effects of reflectors. Other than different loss rates due to differences in reflector coatings isn't the indirect method of measurement fairly reliable?
I would be very surprised to find that an entire spotlight has been inserted into a very large ($$$$$$) I.S. for an accurate total package measurement. There are way too many variables to say that any other "indirect method" is reliable. Even with a ceiling bounce, the issue of light meter placement and gradation of bulb's light concentration coming out of a reflector will not be reliable.
Shine any of your lights in a ceiling bounce manner, and note the variations in hotspot, corona, spill, etc. At what placement is the light meter going to be the same reliable reading from light to light, since they all have different beam patterns?