DX10
Between a driver and an hard place
Posted April 20th, 2008 by DavideFriday I got to test a new model that I'll have to use for my current project.
The unoptimized model had over 1M polys (it's a small one 8). I went on to display it with the DX10-based engine and it crashed the (NVidia) driver.
The model is actually composed of many smaller models, so it's not like there is a huge vertex buffer. It also had no textures and I was using a simple common shader.
The only issue I can think of is the range of coordinates which it's relatively very large. It's a laughable supposition, but everything points to that. I'll have to debug to find a workaround.. but it's not easy considering that those driver crashes force me to reboot almost every time ! (Vista stays up, but I can't do 3D anymore).
This is a really bad setback. For the test I eventually displayed the model with my ultra-basic software renderer... no crashes there ;) ..and if there were, I could have fixed them myself.
Another thing that bothers me about DX10 and drivers in general, is how one has to guess performance, because the internals are obscure.
For example, using an NVidia 8800, I noticed that performance is a lot worse when using buffers flagged as "dynamic".
This whole "static" vs "dynamic" thing is apparently part of DX10 and Vista's driver model. Somebody, somewhere, probably decides to put the buffer in system memory (as opposed to GPU memory) under the assumption that the buffer needs to be touched frequently. Only, I may want to change it rarely, and also I was sort of expecting for the buffer to be allocated directly in the GPU memory and only be mirrored in system memory if I ever tried to read from the buffer (which I wouldn't dare to).
So, I have to be really careful and only use the "dynamic" flag for things that change frequently.. and possibly forget about building a flexible system that uploads textures and geometry on-demand.. which is otherwise theoretically very possible with no (not much ?) performance degradation.
As it stands, it seems almost that dynamic buffers are being uploaded per-frame, regardless of the fact that they aren't being modified per-frame.
..this is all speculation of course.. but that's what I really don't like about this: having to spend time trying to guess what those drivers do behind my back.. and hope that different drivers on different card will behave similarly (crashes aside ;).
For this reason I hope that the time will come when game companies can write complete graphics pipelines again. Either in-house or licensing code, but staying away from closed-source drivers, so that one won't have to debug and profile in the dark.
Some are worried that they couldn't possibly do much better than what card manufacturers already do with those drivers.. I think that there is plenty to improve by just getting rid of those fat drivers that have been plaguing PCs ever since 3D cards came out.
ole'
More on software rendering and Direct3D 10
Posted March 21st, 2008 by DavideWriting a software renderer is quickly associated with revolutionizing things, but on the more conservative side, one advantage would really be that one has the freedom to optimize things as needed rather than having to try and guess what a driver is doing behind the scenes (maybe write your own driver ?!).
Graphics drivers do a great deal of work, they can make a big differences, they can be quite smart at guessing resources usage, what to prioritize on which basis, but they can't be smarter than a whole application.
APIs such as Direct3D and OpenGL don't have a concept of object (ok, OpenGL supports "lists" at least). So, they miss potentially useful hints. For example, if you know your object bounding box, you can tell right away whether or not the object requires clipping and you can tell in advance which textures you need, and what's the maximum level of detail that is needed for textures (if the bounding box is 512x512, you can't possibly need a 1024x1024 texture 8).
Based on that knowledge, one could completely avoid having to perform per-polygon clipping test and could also avoid loading full size textures in video RAM.. because a smaller mip-level would be sufficient.
Another big problem in dealing with a separate hardware graphics system is communication. Every state change can be a big deal. Drivers will probably cache things, but not necessarily.
Recently, I wrote an immediate rendering library to draw debug primitives. I had a small pool of vertex buffers that I would rotate as I called Draw() several times per frame. It turned out to be a big slowdown, so big that I had to switch to use one vertex buffer per frame (actually 2/3 to rotate at each frame, not at each Draw()).
To do that of course I had to keep track of logical draw calls issued by the application program, so that I could finally unmap/unlock the vertex buffer at the end and call all the Draw() at once.. remembering of which primitive type it was, how many vertices and which draw state was associated with that draw call.
This all comes down to having to deal with separate architectures. I have my vertex buffer, the card has its vertex buffer.. collect here, copy there, avoid touching this buffer or that buffer.
Small things that show the cat and mouse kind of job of having to optimize rendering using an API such as Direct3D 10.
Lastly, recently I managed to crash nVidia's driver on Vista 8)
I think it has to do with.. big polygons, or polygons going way off.. possibly a clipping bug ? All I know is that my screen goes blank and boom !
The window comes back and it's all black, while I get a balloon message from the sys-tray that says that the driver's process has crashed. From then on I can't do 3D unless I reboot 8)
ehhhhhhhhhhh
Busy Man: Google Sites and points in DX 10
Posted March 10th, 2008 by DavideI've been busy between work and not work !
Recently I tried Google Apps for my father's web site. I like the Google Sites thing which is an evolution of JotSpot, a company that Google bought over a year ago.
Very easy to set things up, yet I still have to change th HTML to do certain things. All those WYSIWYG editors seem broken to me. It's usually very hard to place a cursor when one really wants, or to get rid of some formatting. Very hard to get rid of a link being assigned as text is typed and incredibly complicated to extend an hyperlink effect to text added before an existing hyperlink.
I also never quite understood the logic by which things get selected. For example even in MS Word, when I go select a line of text, the selection will snap to include the newline character. Or sometimes I select some text from bottom to top and a whole chunk of text gets selected !
In Google Sites I can't find a way to select a whole table while changing hyperlinks for a picture it doesn't really work with what I have.
Still it's nice to setup sites quickly, though I wish Google Apps would allow Google Sites to take over the domain, rather than redirecting to some complicated URLs.
Speaking of Direct3D 10, I've been using it for a while now. Nothing too special, but it's interesting the tendency to shift rendering properties, such as texture sampling and blending from the engine to the HLSL code. On one side it's a pain to try take over those stats over the shader (see Shader Reflection). Some complex passages need to be done to for example replace texture addressing.. but then again, maybe those things should be translated into shader permutations: even when starting from one generic shader, perhaps different shaders should be created depending on the material properties.
Recently I also tried to render a scene using points rather than triangles. Much to my surprise, points rendering (on NVIDIA GT8800 Ultra) was about 3 times slower.
I'm told that moder graphics cards are optimized for triangles but still.. rendering points should be so much faster !!
My goal was to speedup rendering of objects that are rather complex but that are being projected distant enough so that triangles are no bigger than one pixel.
Ideally then one would also take into account camera focus to use larger points in place of triangles when they are going to be smoothed by out of focus post processing.
mumble mumble
Posting from Portland
Posted January 26th, 2008 by Davide![]() |
| Portland_2 |
I arrived in Portland (OR) a couple of days ago and I'm going back to Tokyo tomorrow !
It was my first time here, though I've been in Oregon before.
Compared to San Francisco, Portland looks a lot more safe. But also deadly cold. It seems that most people wear hats, and I can see why. One day I had my ears freezing from the cold wind.. so hats after all aren't just a fashion thing !!
I'm here for business, but I can't really talk about it. Exciting things, but I still think I'd be more excited if I got a substantial pay raise ;) ..cough cough !
I'll be back on Sunday and Monday already to work. I want to get busy with code, but instead there will be some management/business/financial things to talk about.
Talking about code. Just before leaving, I made some sensible changes to the engine I'm currently working on.
There really isn't a proper way to do anything. One can throw random polygons at a rasterizer and get terrible performance or group geometry (and textures) in a certain hardware-friendly fashion.
Depending on the file format and on the average organization of mesh data, one has to come up with the best compromise. So, for example, in most cases having every polygon listing a different material is a bad idea.
Then one can group polygon lists using the same material, but do those polygon lists all share a common vertex pool or should every polygon list have a separate vertex pool ?
If lists are few and made of many polygons, then it may be wiser to have separate vertex pools, this is also because if every list has a different material, different materials could imply different vertex attributes (if there is no texture, then there is no need for texture coordinates, etc).
This also reminds me of the pain of using uber-shaders (long shaders) with DX10. Sometimes long shaders will ignore some input vertex data, but feeding null buffers gets DX10 debug mode to spit out thousands of warning messages in the output console.
I dunno.. DX10 doesn't really seem like the ultimate solution. It shifts the weight of state changes towards the application program, which is nice and not so nice at the same time.
And finally, some great news ! I'm leaving in 5 hours and I still haven't lost my digital camera 8)
woooooo read more »
Damn Cg.. and shaders in general ?
Posted November 13th, 2007 by DavideToday I hit a wall.. a performance issue that I was expecting sometime, but not quite as bad.
Currently I'm working with nVidia's Cg for shaders on OpenGL. One ugly thing about shaders is that one often ends up with a lot of permutations, depending on the number of inputs a shader deals with.
For example a shader may get vertex color in input while another may get texture coordinates in input.. it grows exponentially !
Each combination has traditionally been converted into a separate shader-program.. which is not nice, especially if there is no simple way to have all these programs instead share values that should be common to all of them.
For example any vertex shader will normally get one (or two) transformation matrix per-object, while an object is often built of different materials, thus different shaders.
In Direct3D 9 with HLSL one can set some virtual global registers, which is a bit ugly (there is a more high level system on PC but not on XBox 360) but does a nice job of sharing those values across programs. In Cg however, one has to explicitly connect shaders parameters.. which, as it turns out, can be pretty bad for performance.
My current shaders base is composed of 16 total shaders which all share 2 matrix transformations (to screen space and to world space). So, every time I set those two matrices for a 3D object, it really sets 32 matrices and it possibly even stalls somewhere.. because my frame rate will drop drastically for 1000 objects !!!
Basically I'm there not rendering anything and seeing FPS rate going from 90 to 15 just by setting up the transformation matrices for 1000 objects.
Supposedly Cg 2.0 (if combined with the latest nVidia cards ?) allows to have a common buffer that all shaders can share.
I'm curious to see how Direct3D 10 deals with those shaders params and state changes in general... however D3D 10 requires Vista, an OS that isn't quite as approved yet in my company.
Given the situation, I will be getting a new computer to be used to run Vista. As I picked the hardware I felt a bit guilty asking for relatively high end stuff.. but it's for work.. really !
wooooooo
P.S. Let's see if I can go to sleep before 4 AM tonight !



Recent comments
3 years 12 weeks ago
3 years 16 weeks ago
3 years 17 weeks ago
3 years 19 weeks ago
3 years 20 weeks ago
3 years 21 weeks ago
3 years 21 weeks ago
3 years 21 weeks ago
3 years 22 weeks ago
3 years 22 weeks ago