• This is an archive of ASSEMblerGames.com from June 07th, 2019, all content should be here (minus threads with none english titles, we'll add later.) you cannot login here as this is a clone and your personal info doesn't exist in this database.

    Brought to you by ObscureGamers.

    0th bit/Everdrive forums aren't fully working due to wrong id's from importer. (fixing just takes time) 0th bit posts will be added. Only major issue is the importer copied html/classes which show so need to find a way to strip/convert. this was caused from converting html rips back to SQL. This archive doesn't contain emails/private messages/ip's as we could only convert what google could see/crawl.

(Homebrew) Sonic Z-Treme

jollyroger

Gutsy Member


Well done!
As far as I remember from my experiments from ages ago, this is precisely what Burning Rangers does, including a lower resolution draw for the transparent layer and the reuse of the same framebuffer for both the transparent layer and the background layer in a single frame, before swapping. The only difference may be the exact timing of the copies between the framebuffer and the VDP2 ram, and whether BR goes through work ram or elsewhere, which I don't actually recall right now.
 

XL2

Rising Member

<div class="bbCodeBlock bbCodeQuote" data-author="jollyroger">

<div class="attribution type">jollyroger said:

<a class="AttributionLink" href="goto/post?id=981530#post-981530">↑
</div>
<blockquote class="quoteContainer"><div class="quote">Well done!
As far as I remember from my experiments from ages ago, this is precisely what Burning Rangers does, including a lower resolution draw for the transparent layer and the reuse of the same framebuffer for both the transparent layer and the background layer in a single frame, before swapping. The only difference may be the exact timing of the copies between the framebuffer and the VDP2 ram, and whether BR goes through work ram or elsewhere, which I don't actually recall right now.[/quote]

</div> Since you can't do scu dma from vdp1 to vdp2, I guess it goes to work ram.
BR uses a 176x112 part of the screen and scales it to 2x for both x and y.
But it also covers the upper left part of the screen, so as I understand it it sends a end command, wait until it finishes drawing and retrieve it, clear the FB and starts drawing the main sprites after transfering to a vdp2 layer.
I could be wrong, but I guess it has an extra 16 ms delay because of that, while the technique I do is - imo - easier and maybe faster.
I draw the transparent sprites offscreen first, so when I retrieve the FB the drawing of that part of the screen will be completed for sure, so I don't need to wait.
I will still have to clip/mask the transparent sprites to allow something similar to BR.
 

jollyroger

Gutsy Member


It does send a end command between the two display lists, therefore it waits for the first one to end before moving to the second.
It's not too big of a deal in terms of wait, as it can overlap with the wait several other calculations, it is only a problem if it has to clear the entire screen twice, but I would wager that they clear only the small portion when they need to draw the transparent buffer, and then the whole screen when they need to draw the background.
If you clear a larger portion of the screen to account for both background and transparent portions (and are more or less the same size as the ones used by BR), then the fill time is approximately the same.
Also, the DMA transfer has to happen at some point too, so the bus cycles that are used for that have to be used, whether it is between the two display lists renders or at vblank time...
If well synchronized the two methods are pretty much equal (in my opinion), but yours is indeed easier to implement, which is always better <img alt=":)" class="mceSmilieSprite mceSmilie1" src="styles/default/xenforo/clear.png" title="Smile :)"/>
Nicely done.

P.S. the DMA transfer between vdp1 and vdp2 could be done through the SCU DSP, so that it would only use the B bus, and the CPU could still have full speed access to work ram while that happens...
 

XL2

Rising Member

<div class="bbCodeBlock bbCodeQuote" data-author="jollyroger">

<div class="attribution type">jollyroger said:

<a class="AttributionLink" href="goto/post?id=981537#post-981537">↑
</div>
<blockquote class="quoteContainer"><div class="quote">It does send a end command between the two display lists, therefore it waits for the first one to end before moving to the second.
It's not too big of a deal in terms of wait, as it can overlap with the wait several other calculations, it is only a problem if it has to clear the entire screen twice, but I would wager that they clear only the small portion when they need to draw the transparent buffer, and then the whole screen when they need to draw the background.
If you clear a larger portion of the screen to account for both background and transparent portions (and are more or less the same size as the ones used by BR), then the fill time is approximately the same.
Also, the DMA transfer has to happen at some point too, so the bus cycles that are used for that have to be used, whether it is between the two display lists renders or at vblank time...
If well synchronized the two methods are pretty much equal (in my opinion), but yours is indeed easier to implement, which is always better <img alt=":)" class="mceSmilieSprite mceSmilie1" src="styles/default/xenforo/clear.png" title="Smile :)"/>
Nicely done.

P.S. the DMA transfer between vdp1 and vdp2 could be done through the SCU DSP, so that it would only use the B bus, and the CPU could still have full speed access to work ram while that happens...[/quote]

</div> How hard would it be to do it over the SCU DSP?
I've noticed that the DMA slows down the CPU quite a bit since it needs to wait for the DMA transfer to be completed.
And since I can't do DMA from B bus to A bus or B to B, it leaves me with work ram as the only option.
Or did you mean simply using indirect scu dma?
 

jollyroger

Gutsy Member


You can do it two ways:
1) You can use indirect SCU DMA to alternate transfers B Bus->DSP area and DSP area->B Bus
2) You can use multiple direct SCU transfers from VDP1 to the DSP area, and then use a DSP program to initiate the transfers from the DSP area to VDP2

Either way, check Precaution 27 on the final SCU specifications to perform correctly the DMA setup to DSP area.
 

XL2

Rising Member


I added gouraud shading on everything after using a bsp compiler on pc to do light raytracing.
The results look pretty nice, but I had to stop using color bank palette sprites to allow colored lighting.
On real hardware the framerate goes from 20 to 30 fps, but nothing is optimized yet.
And that's including the vdp2 BR-style transparency.
 

AUSTIN PEYTON

Gutsy Member

<div class="bbCodeBlock bbCodeQuote" data-author="XL2">

<div class="attribution type">XL2 said:

<a class="AttributionLink" href="goto/post?id=982156#post-982156">↑
</div>
<blockquote class="quoteContainer"><div class="quote">I added gouraud shading on everything after using a bsp compiler on pc to do light raytracing.
The results look pretty nice, but I had to stop using color bank palette sprites to allow colored lighting.
On real hardware the framerate goes from 20 to 30 fps, but nothing is optimized yet.
And that's including the vdp2 BR-style transparency.[/quote]

</div> Lol nice touch with Segata Sanshiro
You are doing coding wizardry in my eyes. Excellent work dude.
 

XL2

Rising Member

<div class="bbCodeBlock bbCodeQuote" data-author="AUSTIN PEYTON">

<div class="attribution type">AUSTIN PEYTON said:

<a class="AttributionLink" href="goto/post?id=982161#post-982161">↑
</div>
<blockquote class="quoteContainer"><div class="quote">Lol nice touch with Segata Sanshiro<br/>
You are doing coding wizardry in my eyes. Excellent work dude.[/quote]

</div> He was already in the last demo, you just have a explore a bit <img alt=";)" class="mceSmilieSprite mceSmilie2" src="styles/default/xenforo/clear.png" title="Wink ;)"/>
Hopefully my engine will allow me again to reach a stable 30 fps with much increased visuals for my future project.
20-30 isn't too bad, but a stable 30 will be way better.
 

XL2

Rising Member


Still no name nor any gameplay to show, but here is some footage from last month's version.
The textures aren't very good and they are too dark and I didn't put enough lights so it's hard to see.
Anyway, so I am using a bsp tree with a compressed pvs for visibility.
While I do have portals, I haven't implemented the actual portal culling since I think it will be too slow, so I might add some "important" portals and only process them.
Anyway the pvs is quite good overall.

I didn't have much time recently so I haven't made much progress, but I am currently working on a Quake .map converter, which should allow me to make maps faster and help with moving entities/switches/doors.
Anyway, the "game" runs at 30 fps, but I still have visual glitches that I will try to fix once I am done with the converter.
<iframe allowfullscreen="" frameborder="0" height="300" src="https://www.youtube.com/embed/fsaj5g7p3fQ?wmode=opaque" width="500"></iframe>
 

XL2

Rising Member


Here is a small tech demo I made to test if lightmaps would be a viable option for lighting. The way it works is with the same framebuffer trick I mentionned previously, with the VDP2 doing the final blending. The edges are rough since the offscreen display area is smaller and it gets enlarged to cover the whole screen, so sometimes it doesn't cover the whole quad it is supposed to light. I guess it would be ok for a screen full of polygons. The lightmap is 64x64, I tried with 16x16 and it looked a bit rough. A better choice of colors would help quite a bit to reduce the noise. And of course, using this technique means that you need to render twice as many quads, but without gouraud shading and on a small area (160x112), so it might still be fast enough to hit 30 fps. Gouraud shading could be added when there are dynamic lights, so it might still work since additive gouraud will oversaturate the sprite.

Edit : Part of the hard edges is caused by the delay between the rendering of the image and that image being passed to vdp2. I would need to do it in one pass only instead of waiting for vblank.
 

XL2

Rising Member


New test on real hardware (using a usb capture card). I am not using SGL anymore for rendering, only my own code (no assembly, all in c), running only on the slave CPU at the moment. The main CPU will take care of AI, game logic, collision detection, PVS decompressing, etc. It might help the slave too, so I'll see once I get the other things done.

<iframe allowfullscreen="" frameborder="0" height="300" src="https://www.youtube.com/embed/YT72GFbHkX0?wmode=opaque" width="500"></iframe>
 
Top