1. 25 Oct, 2017 1 commit
  2. 05 Jul, 2017 1 commit
    • Screwtape's avatar
      Update to v103r07 release. · d4876a83
      Screwtape authored
      byuu says:
      
      Changelog:
      
        - gba/cpu: massive code cleanup effort
        - gba/cpu: DMA can run in between active instructions¹
        - gba/cpu: added two-cycle startup delay between DMA activation and
          DMA transfers²
        - processor/spc700: BBC, BBC, CBNE cycle 4 is an idle cycle
        - processor/spc700: ADDW, SUBW, MOVW (read) cycle 4 is an idle cycle
      
      ¹: unfortunately, this causes yet another performance penalty for the
      poor GBA core =( Also, I think I may have missed disabling DMAs while
      the CPU is stopped. I'll fix that in the next WIP.
      
      ²: I put the waiting counter decrement at the wrong place, so this
      doesn't actually work. Needs to be more like
      this:
      
          auto CPU::step(uint clocks) -> void {
            for(auto _ : range(clocks)) {
              for(auto& timer : this->timer) timer.run();
              for(auto& dma : this->dma) if(dma.active && dma.waiting) dma.waiting--;
              context.clock++;
            }
            ...
      
          auto CPU::DMA::run() -> bool {
            if(cpu.stopped() || !active || waiting) return false;
      
            transfer();
            if(irq) cpu.irq.flag |= CPU::Interrupt::DMA0 << id;
            if(drq && id == 3) cpu.irq.flag |= CPU::Interrupt::Cartridge;
            return true;
          }
      
      Of course, the real fix will be restructuring how DMA works, so that
      it's always running in parallel with the CPU instead of this weird
      design where it tries to run all channels in some kind of loop until no
      channels are active anymore whenever one channel is activated.
      
      Not really sure how to design that yet, however.
      d4876a83
  3. 03 Jul, 2017 1 commit
    • Screwtape's avatar
      Update to v103r06 release. · 16f73630
      Screwtape authored
      byuu says:
      
      Changelog:
      
        - processor/spc700: restored fetch/load/store/pull/push shorthand
          functions
        - processor/spc700: split functions that tested the algorithm used (`op
          != &SPC700:...`) to separate instructions
            - mostly for code clarity over code size: it was awkward having
              cycle counts change based on a function parameter
        - processor/spc700: implemented Overload's new findings on which
          cycles are truly internal (no bus reads)
        - sfc/smp: TEST register emulation has been vastly improved¹
      
      ¹: it turns out that TEST.d4,d5 is the external clock divider (used
      when accessing RAM through the DSP), and TEST.d6,d7 is the internal
      clock divider (used when accessing IPLROM, IO registers, or during idle
      cycles.)
      
      The DSP (24576khz) feeds its clock / 12 through to the SMP (2048khz).
      The clock divider setting further divides the clock by 2, 4, 8, or 16.
      Since 8 and 16 are not cleanly divislbe by 12, the SMP cycle count
      glitches out and seems to take 10 and 2 clocks instead of 8 or 16. This
      can on real hardware either cause the SMP to run very slowly, or more
      likely, crash the SMP completely until reset.
      
      What's even stranger is the timers aren't affected by this. They still
      clock by 2, 4, 8, or 16.
      
      Note that technically I could divide my own clock counters by 24 and
      reduce these to {1,2,5,10} and {1,2,4,8}, I instead chose to divide by
      12 to better illustrate this hardware issue and better model that the
      SMP clock runs at 2048khz and not 1024khz.
      
      Further, note that things aren't 100% perfect yet. This seems to throw
      off some tests, such as blargg's `test_timer_speed`. I can't tell how
      far off I am because blargg's test tragically doesn't print out fail
      values. But you can see the improvements in that higan is now passing
      all of Revenant's tests that were obviously completely wrong before.
      16f73630
  4. 01 Jul, 2017 1 commit
    • Screwtape's avatar
      Update to v103r05 release. · 40802b0b
      Screwtape authored
      byuu says:
      
      Changelog:
      
        - fc/controller: added ControllerPort class; removed Peripherals class
        - md/controller/gamepad: removed X,Y,Z buttons since this isn't a
          6-button controller
        - ms/controller: added ControllerPort class (not used in Game Gear
          mode); removed Peripherals class
        - pce/controller: added ControllerPort class; removed Peripherals
          class
        - processor/spc700: idle(address) is part of SMP class again, contains
          flag to detect mov (x)+ edge case
        - sfc/controller/super-scope,justifier: use CPU frequency instead of
          hard-coding NTSC frequency
        - sfc/cpu: move 4x8-bit SMP ports to SMP class
        - sfc/smp: move APU RAM to DSP class
        - sfc/smp: improved emulation of TEST registers bits 4-7 [information
          from nocash]
            - d4,d5 is RAM wait states (1,2,5,10)
            - d6,d7 is ROM/IO wait states (1,2,5,10)
        - sfc/smp: code cleanup to new style (order from lowest to highest
          bits; use .bit(s) functions)
        - sfc/smp: $00f8,$00f9 are P4/P5 auxiliary ports; named the registers
          better
      40802b0b
  5. 30 Jun, 2017 1 commit
    • Screwtape's avatar
      Update to v103r04 release. · ff3750de
      Screwtape authored
      byuu says:
      
      Changelog:
      
        - fc/apu: $4003,$4007 writes initialize duty counter to 0 instead of 7
        - fc/apu: corrected duty table entries for use with decrementing duty
          counter
        - processor/spc700: emulated the behavior of cycle 3 of (x)+
          instructions to not read I/O registers
            - specifically, this prevents reads from $fd-ff from resetting the
              timers, as observed on real hardware
        - sfc/controller: added ControllerPort class to match Mega Drive
          design
        - sfc/expansion: added ExpansionPort class to match Mega Drive design
        - sfc/system: removed Peripherals class
        - sfc/system: changed `colorburst()` to `cpuFrequency()`; added
          `apuFrequency()`
        - sfc: replaced calls to `system.region == System::Region::*` with
          `Region::*()`
        - sfc/expansion: remove thread from scheduler when device is destroyed
        - sfc/smp: `{read,write}Port` now use a separate 4x8-bit buffer instead
          of underlying APU RAM [hex\_usr]
      ff3750de
  6. 28 Jun, 2017 1 commit
    • Screwtape's avatar
      Update to v103r03 release. · 78f34148
      Screwtape authored
      byuu says:
      
      Changelog:
      
        - md/psg: fixed output frequency rate regression from v103r02
        - processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
          SuperMikeMan]
        - processor/spc700: renamed abbreviated instructions to functional
          descriptions (eg `XCN` → `ExchangeNibble`)
        - processor/spc700: removed memory.cpp shorthand functions (fetch,
          load, store, pull, push)
        - processor/spc700: updated all instructions to follow cycle behavior
          as documented by Overload with a logic analyzer
      
      Once again, the changes to the SPC700 core are really quite massive. And
      this time it's not just cosmetic: the idle cycles have been updated to
      pull from various memory addresses. This is why I removed the shorthand
      functions -- so that I could handle the at-times very bizarre addresses
      the SPC700 has on its address bus during its idle cycles.
      
      There is one behavior Overload mentioned that I don't emulate ... one of
      the cycles of the (X) transfer functions seems to not actually access
      the $f0-ff internal SMP registers? I don't fully understand what
      Overload is getting at, so I haven't tried to support it just yet.
      
      Also, there are limits to logic analyzers. In many cases the same
      address is read from twice consecutively. It is unclear which of the two
      reads the SPC700 actually utilizes. I tried to choose the most logical
      values (usually the first one), but ... I don't know that we'll be able
      to figure this one out. It's going to be virtually impossible to test
      this through software, because the PC can't really execute out of
      registers that have side effects on reads.
      78f34148
  7. 27 Jun, 2017 1 commit
    • Screwtape's avatar
      Update to v103r02 release. · 3517d5c4
      Screwtape authored
      byuu says:
      
      Changelog:
      
        - fc/apu: improved phase duty cycle emulation (mode 3 is 25% phase
          inverted; counter decrements)
        - md/apu: power/reset do not cancel 68K bus requests
        - md/apu: 68K is not granted bus access on Z80 power/reset
        - md/controller: replaced System::Peripherals with ControllerPort
          concept
        - md/controller: CTRL port is now read-write, maintains value across
          controller changes (and soon, soft resets)
        - md/psg: PSG sampling rate unintentionally modified¹
        - processor/spc700: improve cycle timing of (indirect),y instructions
          [Overload]
        - processor/spc700: idle() cycles actually read from the program
          counter; much like the 6502 [Overload]
            - some of the idle() cycles should read from other addresses; this
              still needs to be supported
        - processor/spc700: various cleanups to instruction function naming
        - processor/z80: prefix state (HL→IX,IY override) can now be
          serialized
        - icarus: fix install rule for certain platforms (it wasn't buggy on
          FreeBSD, but was on Linux?)
      
      ¹: the clock speed of the PSG is oscillator/15. But I was setting the
      sampling rate to oscillator/15/16, which was around 223KHz. I am not
      sure whether the PSG should be outputting at 3MHz or 223KHz. Amazingly
      ... I don't really hear a difference either way `o_O` I didn't actually
      mean to make this change; I just noticed it after comparing the diff
      between r01 and r02. If this turns out to be wrong, set
      
          stream = Emulator::audio.createStream(1, frequency() / 16.0);
      
      in md/psg.cpp to revert this change.
      3517d5c4
  8. 19 Jun, 2017 1 commit
    • Screwtape's avatar
      Update to v102r27 release. · e7806dd6
      Screwtape authored
      byuu says:
      
      Changelog:
      
        - processor/gsu: minor code cleanup
        - processor/hg51b: renamed reg(Read,Write) to register(Read,Write)
        - processor/lr35902: minor code cleanup
        - processor/spc700: completed code cleanup (sans disassembler)
            - no longer uses internal global state inside instructions
        - processor/spc700: will no longer hang the emulator if stuck in a WAI
          (SLEEP) or STP (STOP) instruction
        - processor/spc700: fixed bug in handling of OR1 and AND1 instructions
        - processor/z80: minor code cleanup
        - sfc/dsp: revert to initializing registers to 0x00; save for
          ENDX=random(), FLG=0xe0 [Jonas Quinn]
      
      Major testing of the SNES game library would be appreciated, now that
      its CPU cores have all been revised.
      
      We know the DSP registers read back as randomized data ... mostly, but
      there are apparently internal latches, which we can't emulate with the
      current DSP design. So until we know which registers have separate
      internal state that actually *is* initialized, I'm going to play it safe
      and not break more games.
      
      Thanks again to Jonas Quinn for the continued research into this issue.
      
      EDIT: that said ... `MD works if((ENDX&0x30) > 0)` is only a 3:4 chance
      that the game will work. That seems pretty unlikely that the odds of it
      working are that low, given hardware testing by others in the past :/ I
      thought if worked if `PITCH != 0` before, which would have been way more
      likely.
      
      The two remaining CPU cores that need major cleanup efforts are the
      LR35902 and ARM cores. Both are very large, complicated, annoying cores
      that will probably be better off as full rewrites from scratch. I don't
      think I want to delay v103 in trying to accomplish that, however.
      
      So I think it'll be best to focus on allowing the Mega Drive core to not
      lock when processors are frozen waiting on a response from other
      processors during a save state operation. Then we should be good for a
      new release.
      e7806dd6
  9. 16 Jun, 2017 1 commit
    • Screwtape's avatar
      Update to v102r26 release. · 50411a17
      Screwtape authored
      byuu says:
      
      Changelog:
      
        - md/ym2612: initialize DAC sample to center volume [Cydrak]
        - processor/arm: add accumulate mode extra cycle to mlal [Jonas
          Quinn]
        - processor/huc6280: split off algorithms, improve naming of functions
        - processor/mos6502: split off algorithms
        - processor/spc700: major revamp of entire core (~50% completed)
        - processor/wdc65816: fixed several bugs introduced by rewrite
      
      For the SPC700, this turns out to be very old code as well, with global
      object state variables, those annoying `{Boolean,Natural}BitField` types,
      `under_case` naming conventions, heavily abbreviated function names, etc.
      I'm working to get the code to be in the same design as the MOS6502,
      HuC6280, WDC65816 cores, since they're all extremely similar in terms of
      architectural design (the SPC700 is more of an off-label
      reimplementation of a 6502 core, but still.)
      
      The main thing left is that about 90% of the actual instructions still
      need to be adapted to not use the internal state (`aa`, `rd`, `dp`,
      `sp`, `bit` variables.) I wanted to finish this today, but ran out of
      time before work.
      
      I wouldn't suggest too much testing just yet. We should wait until the
      SPC700 core is finished for that. However, if some does want to and
      spots regressions, please let me know.
      50411a17
  10. 31 Jul, 2016 1 commit
    • Screwtape's avatar
      Update to v100r15 release. · c50723ef
      Screwtape authored
      byuu wrote:
      
      Aforementioned scheduler changes added. Longer explanation of why here:
      http://hastebin.com/raw/toxedenece
      
      Again, we really need to test this as thoroughly as possible for
      regressions :/
      This is a really major change that affects absolutely everything: all
      emulation cores, all coprocessors, etc.
      
      Also added ADDX and SUB to the 68K core, which brings us just barely
      above 50% of the instruction encoding space completed.
      
      [Editor's note: The "aformentioned scheduler changes" were described in
      a previous forum post:
      
          Unfortunately, 64-bits just wasn't enough precision (we were
          getting misalignments ~230 times a second on 21/24MHz clocks), so
          I had to move to 128-bit counters. This of course doesn't exist on
          32-bit architectures (and probably not on all 64-bit ones either),
          so for now ... higan's only going to compile on 64-bit machines
          until we figure something out. Maybe we offer a "lower precision"
          fallback for machines that lack uint128_t or something. Using the
          booth algorithm would be way too slow.
      
          Anyway, the precision is now 2^-96, which is roughly 10^-29. That
          puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
          referring to it as the byuusecond. The other 32-bits of precision
          allows a 1Hz clock to run up to one full second before all clocks
          need to be normalized to prevent overflow.
      
          I fixed a serious wobbling issue where I was using clock > other.clock
          for synchronization instead of clock >= other.clock; and also another
          aliasing issue when two threads share a common frequency, but don't
          run in lock-step. The latter I don't even fully understand, but I
          did observe it in testing.
      
          nall/serialization.hpp has been extended to support 128-bit integers,
          but without explicitly naming them (yay generic code), so nall will
          still compile on 32-bit platforms for all other applications.
      
          Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
      
      The "longer explanation" in the linked hastebin is:
      
          Okay, so the idea is that we can have an arbitrary number of
          oscillators. Take the SNES:
      
          - CPU/PPU clock = 21477272.727272hz
          - SMP/DSP clock = 24576000hz
          - Cartridge DSP1 clock = 8000000hz
          - Cartridge MSU1 clock = 44100hz
          - Controller Port 1 modem controller clock = 57600hz
          - Controller Port 2 barcode battler clock = 115200hz
          - Expansion Port exercise bike clock = 192000hz
      
          Is this a pathological case? Of course it is, but it's possible. The
          first four do exist in the wild already: see Rockman X2 MSU1
          patch. Manifest files with higan let you specify any frequency you
          want for any component.
      
          The old trick higan used was to hold an int64 counter for each
          thread:thread synchronization, and adjust it like so:
      
          - if thread A steps X clocks; then clock += X * threadB.frequency
            - if clock >= 0; switch to threadB
          - if thread B steps X clocks; then clock -= X * threadA.frequency
            - if clock <  0; switch to threadA
      
          But there are also system configurations where one processor has to
          synchronize with more than one other processor. Take the Genesis:
      
          - the 68K has to sync with the Z80 and PSG and YM2612 and VDP
          - the Z80 has to sync with the 68K and PSG and YM2612
          - the PSG has to sync with the 68K and Z80 and YM2612
      
          Now I could do this by having an int64 clock value for every
          association. But these clock values would have to be outside the
          individual Thread class objects, and we would have to update every
          relationship's clock value. So the 68K would have to update the Z80,
          PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
          per clock step event instead of one.
      
          As such, we have to account for both possibilities. The only way to
          do this is with a single time base. We do this like so:
      
          - setup: scalar = timeBase / frequency
          - step: clock += scalar * clocks
      
          Once per second, we look at every thread, find the smallest clock
          value. Then subtract that value from all threads. This prevents the
          clock counters from overflowing.
      
          Unfortunately, these oscillator values are psychotic, unpredictable,
          and often times repeating fractions. Even with a timeBase of
          1,000,000,000,000,000,000 (one attosecond); we get rounding errors
          every ~16,300 synchronizations. Specifically, this happens with a CPU
          running at 21477273hz (rounded) and SMP running at 24576000hz. That
          may be good enough for most emulators, but ... you know how I am.
      
          Plus, even at the attosecond level, we're really pushing against the
          limits of 64-bit integers. Given the reciprocal inverse, a frequency
          of 1Hz (which does exist in higan!) would have a scalar that consumes
          1/18th of the entire range of a uint64 on every single step. Yes, I
          could raise the frequency, and then step by that amount, I know. But
          I don't want to have weird gotchas like that in the scheduler core.
      
          Until I increase the accuracy to about 100 times greater than a
          yoctosecond, the rounding errors are too great. And since the only
          choice above 64-bit values is 128-bit values; we might as well use
          all the extra headroom. 2^-96 as a timebase gives me the ability to
          have both a 1Hz and 4GHz clock; and run them both for a full second;
          before an overflow event would occur.
      
      Another hastebin includes demonstration code:
      
          #include <libco/libco.h>
      
          #include <nall/nall.hpp>
          using namespace nall;
      
          //
      
          cothread_t mainThread = nullptr;
          const uint iterations = 100'000'000;
          const uint cpuFreq = 21477272.727272 + 0.5;
          const uint smpFreq = 24576000.000000 + 0.5;
          const uint cpuStep = 4;
          const uint smpStep = 5;
      
          //
      
          struct ThreadA {
            cothread_t handle = nullptr;
            uint64 frequency = 0;
            int64 clock = 0;
      
            auto create(auto (*entrypoint)() -> void, uint frequency) {
              this->handle = co_create(65536, entrypoint);
              this->frequency = frequency;
              this->clock = 0;
            }
          };
      
          struct CPUA : ThreadA {
            static auto Enter() -> void;
            auto main() -> void;
            CPUA() { create(&CPUA::Enter, cpuFreq); }
          } cpuA;
      
          struct SMPA : ThreadA {
            static auto Enter() -> void;
            auto main() -> void;
            SMPA() { create(&SMPA::Enter, smpFreq); }
          } smpA;
      
          uint8 queueA[iterations];
          uint offsetA;
          cothread_t resumeA = cpuA.handle;
      
          auto EnterA() -> void {
            offsetA = 0;
            co_switch(resumeA);
          }
      
          auto QueueA(uint value) -> void {
            queueA[offsetA++] = value;
            if(offsetA >= iterations) {
              resumeA = co_active();
              co_switch(mainThread);
            }
          }
      
          auto CPUA::Enter() -> void { while(true) cpuA.main(); }
      
          auto CPUA::main() -> void {
            QueueA(1);
            smpA.clock -= cpuStep * smpA.frequency;
            if(smpA.clock < 0) co_switch(smpA.handle);
          }
      
          auto SMPA::Enter() -> void { while(true) smpA.main(); }
      
          auto SMPA::main() -> void {
            QueueA(2);
            smpA.clock += smpStep * cpuA.frequency;
            if(smpA.clock >= 0) co_switch(cpuA.handle);
          }
      
          //
      
          struct ThreadB {
            cothread_t handle = nullptr;
            uint128_t scalar = 0;
            uint128_t clock = 0;
      
            auto print128(uint128_t value) {
              string s;
              while(value) {
                s.append((char)('0' + value % 10));
                value /= 10;
              }
              s.reverse();
              print(s, "\n");
            }
      
            //femtosecond (10^15) =    16306
            //attosecond  (10^18) =   688838
            //zeptosecond (10^21) = 13712691
            //yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
            //byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
      
            auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
              this->handle = co_create(65536, entrypoint);
      
              uint128_t unitOfTime = 1;
            //for(uint n : range(29)) unitOfTime *= 10;
              unitOfTime <<= 96;  //2^96 time units ...
      
              this->scalar = unitOfTime / frequency;
              print128(this->scalar);
              this->clock = 0;
            }
      
            auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
            auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
          };
      
          struct CPUB : ThreadB {
            static auto Enter() -> void;
            auto main() -> void;
            CPUB() { create(&CPUB::Enter, cpuFreq); }
          } cpuB;
      
          struct SMPB : ThreadB {
            static auto Enter() -> void;
            auto main() -> void;
            SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
          } smpB;
      
          auto correct() -> void {
            auto minimum = min(cpuB.clock, smpB.clock);
            cpuB.clock -= minimum;
            smpB.clock -= minimum;
          }
      
          uint8 queueB[iterations];
          uint offsetB;
          cothread_t resumeB = cpuB.handle;
      
          auto EnterB() -> void {
            correct();
            offsetB = 0;
            co_switch(resumeB);
          }
      
          auto QueueB(uint value) -> void {
            queueB[offsetB++] = value;
            if(offsetB >= iterations) {
              resumeB = co_active();
              co_switch(mainThread);
            }
          }
      
          auto CPUB::Enter() -> void { while(true) cpuB.main(); }
      
          auto CPUB::main() -> void {
            QueueB(1);
            step(cpuStep);
            synchronize(smpB);
          }
      
          auto SMPB::Enter() -> void { while(true) smpB.main(); }
      
          auto SMPB::main() -> void {
            QueueB(2);
            step(smpStep);
            synchronize(cpuB);
          }
      
          //
      
          #include <nall/main.hpp>
          auto nall::main(string_vector) -> void {
            mainThread = co_active();
      
            uint masterCounter = 0;
            while(true) {
              print(masterCounter++, " ...\n");
      
              auto A = clock();
              EnterA();
              auto B = clock();
              print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
      
              auto C = clock();
              EnterB();
              auto D = clock();
              print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
      
              for(uint n : range(iterations)) {
                if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
              }
            }
          }
      
      ...and that's everything.]
      c50723ef
  11. 01 Jul, 2016 1 commit
    • Screwtape's avatar
      Update to v099r14 release. · 82293c95
      Screwtape authored
      byuu says:
      
      Changelog:
      - (u)int(max,ptr) abbreviations removed; use _t suffix now [didn't feel
        like they were contributing enough to be worth it]
      - cleaned up nall::integer,natural,real functionality
        - toInteger, toNatural, toReal for parsing strings to numbers
        - fromInteger, fromNatural, fromReal for creating strings from numbers
        - (string,Markup::Node,SQL-based-classes)::(integer,natural,real)
          left unchanged
        - template<typename T> numeral(T value, long padding, char padchar)
          -> string for print() formatting
          - deduces integer,natural,real based on T ... cast the value if you
            want to override
          - there still exists binary,octal,hex,pointer for explicit print()
            formatting
      - lstring -> string_vector [but using lstring = string_vector; is
        declared]
        - would be nice to remove the using lstring eventually ... but that'd
          probably require 10,000 lines of changes >_>
      - format -> string_format [no using here; format was too ambiguous]
      - using integer = Integer<sizeof(int)*8>; and using natural =
        Natural<sizeof(uint)*8>; declared
        - for consistency with boolean. These three are meant for creating
          zero-initialized values implicitly (various uses)
      - R65816::io() -> idle() and SPC700::io() -> idle() [more clear; frees
        up struct IO {} io; naming]
      - SFC CPU, PPU, SMP use struct IO {} io; over struct (Status,Registers) {}
        (status,registers); now
        - still some CPU::Status status values ... they didn't really fit into
          IO functionality ... will have to think about this more
      - SFC CPU, PPU, SMP now use step() exclusively instead of addClocks()
        calling into step()
      - SFC CPU joypad1_bits, joypad2_bits were unused; killed them
      - SFC PPU CGRAM moved into PPU::Screen; since nothing else uses it
      - SFC PPU OAM moved into PPU::Object; since nothing else uses it
        - the raw uint8[544] array is gone. OAM::read() constructs values from
          the OAM::Object[512] table now
        - this avoids having to determine how we want to sub-divide the two
          OAM memory sections
        - this also eliminates the OAM::synchronize() functionality
      - probably more I'm forgetting
      
      The FPS fluctuations are driving me insane. This WIP went from 128fps to
      137fps. Settled on 133.5fps for the final build. But nothing I changed
      should have affected performance at all. This level of fluctuation makes
      it damn near impossible to know whether I'm speeding things up or slowing
      things down with changes.
      82293c95
  12. 28 Jun, 2016 1 commit
    • Screwtape's avatar
      Update to v099r12 release. · 7a68059f
      Screwtape authored
      byuu says:
      
      Changelog:
      - fixed FC AxROM / VRC7 regression
      - BitField split to BooleanBitField/NaturalBitField (in preparation
        for IntegerBitField)
      - BitFieldReference removed
      - GB CPU cleaned up
      - GB Cartridge + Mappers cleaned up
      - SFC CGRAM is now emulated as uint15[256] instead of uint[512]
      - sfc/ppu/memory.cpp no longer needed; removed
      - purged SFC Debugger hooks for now (some of the operator[] calls were
        bypassing them anyway)
      
      Unfortunately, for reasons that defy all semblance of logic, the CGRAM
      change caused a slight speed hit. As have the last few changes. We're
      now down to around 129.5fps compared to 123.fps for v099 and 134.5fps
      at our peak (v099r01-r02).
      
      I really like the style I came up with for the Game Boy mappers to settle
      the purpose(ROM,RAM) vs (rom,ram)Purpose naming convention. If I ever get
      around to redoing the NES mappers, that's likely the approach I'll take.
      7a68059f
  13. 08 Jun, 2016 1 commit
    • Screwtape's avatar
      Update to v098r19 release. · 50420e3d
      Screwtape authored
      byuu says:
      
      Changelog:
      - added nall/bit-field.hpp
      - updated all CPU cores (sans LR35902 due to some complexities) to use
        BitFields instead of bools
      - updated as many CPU cores as I could to use BitFields instead of union {
        struct { uint8_t ... }; }; pairs
      
      The speed changes are mostly a wash for this. In some instances,
      I noticed a ~2-3% speedup (eg SNES emulation), and in others a 2-3%
      slowdown (eg Famicom emulation.) It's within the margin of error, so
      it's safe to say it has no impact.
      
      This does give us a lot of new useful things, however:
      
      - no more manual reconstruction of flag values from lots of left shifts
        and ORs
      - no more manual deconstruction of flag values from lots of ANDs
      - ability to get completely free aliases to flag groups (eg GSU can
        provide alt2, alt1 and also alt (which is alt2,alt1 combined)
      - removes the need for the nasty order_lsbN macro hack (eventually will
        make higan 100% endian independent)
      - saves us from insane compilers that try and do nasty things with
        alignment on union-structs
      - saves us from insane compilers that try to store bit-field bits in
        reverse order
      - will allow some really novel new use cases (I'm planning an
        instant-decode ARM opcode function, for instance.)
      - reduces code size (we can serialize flag registers in one line instead
        of one for each flag)
      
      However, I probably won't use it for super critical code that's constantly
      reading out register values (eg PPU MMIO registers.) I think there we
      would end up with a performance penalty.
      50420e3d
  14. 05 Jun, 2016 3 commits
    • Screwtape's avatar
      Update to v098r17 release. · 9b452c9f
      Screwtape authored
      byuu says:
      
      Changelog:
      - fixed Super Game Boy regression from v096r04 with bottom tile row
        flickering
      - fixed GB STAT IRQ regression from previous WIP
        - Altered Space is now playable
        - GBVideoPlayer isn't; but nobody seems to know exactly what weird
          hardware quirk that one relies on to work
      - ~3-4% speed improvement in SuperFX games by eliminating function<>
        callback on register assignments
        - most noticeable in Doom in-game; least noticeable on Yoshi's Island
          title screen (darn)
      - finished GSU core and SuperFX coprocessor code cleanups
      - did some more work cleaning up the LR35902 core and GB CPU code
      
      Just a fair warning: don't get your hopes up on these GB
      fixes. Cliffhanger now hangs completely (har har), and none of the
      other bugs are fixed. We pretty much did all this work just for Altered
      Space. So, I hope you like playing Altered Space.
      9b452c9f
    • Screwtape's avatar
      Update to v098r16 release. · 3681961c
      Screwtape authored
      byuu says:
      
      Changelog:
      - GNUmakefile: reverted $(call unique,) to $(strip)
      - processor/r6502: removed templates; reduces object size from 146.5kb
        to 107.6kb
      - processor/lr35902: removed templates; reduces object size from 386.2kb
        to 197.4kb
      - processor/spc700: merged op macros for switch table declarations
      - sfc/coprocessor/sa1: partial cleanups; flattened directory structure
      - sfc/coprocessor/superfx: partial cleanups; flattened directory structure
      - sfc/coprocessor/icd2: flattened directory structure
      - gb/ppu: changed behavior of STAT IRQs
      
      Major caveat! The GB/GBC STAT IRQ changes has a major bug in it somewhere
      that's seriously breaking most games. I'm pushing the WIP anyway, because
      I believe the changes to be mostly correct. I'd like to get more people
      looking at these changes, and also try more heavy-handed hacking and
      diff comparison logging between the previous WIP and this one.
      3681961c
    • Screwtape's avatar
      Update to v098r15 release. · 20ac95ee
      Screwtape authored
      byuu says:
      
      Changelog:
      - removed template usage from processor/spc700; cleaned up many function
        names and the switch table
        - object size: 176.8kb => 127.3kb
        - source code size: 43.5kb => 37.0kb
      - fixed processor/r65816 BRK/COP vector regression [hex_usr]
      - corrected HuC3 unmapped RAM read value; fixes Robopon [endrift]
      - cosmetic: simplified the butterworth constant calculation
        [Wolfram|Alpha]
      
      The SPC700 core changes took forever, about three hours of work.
      
      Only the LR35902 and R6502 still need their template functions
      removed. The point of this is that it doesn't cause any speed penalty
      to do so, and it results in smaller binary sizes and faster compilation
      times.
      20ac95ee
  15. 16 Feb, 2016 1 commit
    • Screwtape's avatar
      Update to 20160215 release. · ef65bb86
      Screwtape authored
      byuu says:
      
      Got it. Wow, that didn't hurt nearly as much as I thought it was going
      to.
      
      Dropped from 127.5fps to 123.5fps to use Natural/Integer for
      (u)int(8,16,32,64).
      
      That's totally worth the cost.
      ef65bb86
  16. 02 Feb, 2016 1 commit
    • Screwtape's avatar
      Update to v097r07 release. · ad51f147
      Screwtape authored
      byuu says:
      
      26 hours in, 173 instructions implemented. Although the four segment
      prefix opcodes don't actually do anything yet. There's less than 256
      actual instructions on the 80186, not sure of the exact count.
      
      Gunpey gets around ~8,200 instructions in before hitting an unsupported
      opcode (loop). Riviera goes off the rails on a retf and ends up
      executing an endless stream of bad opcodes in RAM =( Both games hammer
      the living shit out of the in/out ports pretty much immediately.
      ad51f147
  17. 30 Dec, 2015 1 commit
    • Screwtape's avatar
      Update to v096r01 release. · 47d4bd4d
      Screwtape authored
      byuu says:
      
      Changelog:
      
      - restructured the project and removed a whole bunch of old/dead
        directives from higan/GNUmakefile
      - huge amounts of work on hiro/cocoa (compiles but ~70% of the
        functionality is commented out)
      - fixed a masking error in my ARM CPU disassembler [Lioncash]
      - SFC: decided to change board cic=(411,413) back to board
        region=(ntsc,pal) ... the former was too obtuse
      
      If you rename Boolean (it's a problem with an include from ruby, not
      from hiro) and disable all the ruby drivers, you can compile an
      OS X binary, but obviously it's not going to do anything.
      
      It's a boring WIP, I just wanted to push out the project structure
      change now at the start of this WIP cycle.
      47d4bd4d
  18. 18 Aug, 2013 1 commit
    • Screwtape's avatar
      Update to v093 release. · 4e2eb238
      Screwtape authored
      byuu says:
      
      Changelog:
      - added Cocoa target: higan can now be compiled for OS X Lion
        [Cydrak, byuu]
      - SNES/accuracy profile hires color blending improvements - fixes
        Marvelous text [AWJ]
      - fixed a slight bug in SNES/SA-1 VBR support caused by a typo
      - added support for multi-pass shaders that can load external textures
        (requires OpenGL 3.2+)
      - added game library path (used by ananke->Import Game) to
        Settings->Advanced
      - system profiles, shaders and cheats database can be stored in "all
        users" shared folders now (eg /usr/share on Linux)
      - all configuration files are in BML format now, instead of XML (much
        easier to read and edit this way)
      - main window supports drag-and-drop of game folders (but not game files
        / ZIP archives)
      - audio buffer clears when entering a modal loop on Windows (prevents
        audio repetition with DirectSound driver)
      - a substantial amount of code clean-up (probably the biggest
        refactoring to date)
      
      One highly desired target for this release was to default to the optimal
      drivers instead of the safest drivers, but because AMD drivers don't
      seem to like my OpenGL 3.2 driver, I've decided to postpone that. AMD
      has too big a market share. Hopefully with v093 officially released, we
      can get some public input on what AMD doesn't like.
      4e2eb238
  19. 05 May, 2013 1 commit
    • Screwtape's avatar
      Update to v092r09 release. · 29ea5bd5
      Screwtape authored
      byuu says:
      
      This will be another massive diff from the previous version.
      
      All of higan was updated to use the new foo& bar syntax, and I also
      updated switch statements to be consistent as well (but not in the
      disassemblers, was starting to get an RSI just from what I already did.)
      
      phoenix/{windows, cocoa, qt} need to be updated to use "string foo"
      instead of "const string& foo", and after that, the major diffs should
      be finished.
      
      This archive is the first time I'm posting my copy-on-write,
      size+capacity nall::string class, so any feedback on that is welcome as
      well.
      29ea5bd5
  20. 26 Dec, 2012 1 commit
    • Screwtape's avatar
      Update to higan v091 release. · 94b2538a
      Screwtape authored
      byuu says:
      
      Basically just a project rename, with s/bsnes/higan and the new icon
      from lowkee added in.
      
      It won't compile on Windows because I forgot to update the resource.rc
      file, and a path transform command isn't working on Windows.
      It was really just meant as a starting point, so that v091 WIPs can flow
      starting from .00 with the new name (it overshadows bsnes v091, so
      publicly speaking this "shouldn't exist" and will probably be deleted
      from Google Code when v092 is ready.)
      94b2538a