Category: Arduino Signal Processing

(8)Arduino IDE+Due/Uno->Audio: What next?

Given standard AM transmission at 10kHz and FM at 15kHz, the 7.5kHz result produced is obviously not enough.  Further optimization would require using specialized data coding schemes that I have not ever touched before.

One possible and probably the one which requires the least tweaking of the current setup would be something along the lines of DPSK (Differential Phase Shift Keyring).  This way of encoding information (for analog signals) would be transmitting the change in voltage using a phase shift of the carrier to determine the change in output voltage from the last sample.

However, because the data I am attempting to transmit is digital, I will directly transmit the change in voltage using an integer.  Due to the small range of deviation in the following scheme a high or even higher baud rate will be required.  This is because we are only able to transmit 3 bit (up to 8) changes.  A probable methodology will be as follows.

  1. Transmit the starting data (full 10 bits) using the scheme provided before
  2. Transmit the change in integer in the next serial transmission.  One possible way to code the change will be in 1 byte, 4 bits for channel A and 4 bits for channel B.  In order to discern which 4 bits are for A/B, the MSB and the 5th bit will determine the channel.  1 being A and 0 being B
  3. Every X transmissions reset and transmit a full 10 bit set of data again

If the receiver were also allowed to transmit data back to the transmitter, it would be possible to only reset the transmission only when required.  This means a new set of data would only need to be sent after data has been disrupted.

Implementing the idea above will mostly likely allow the max discernible frequency of data to be around 10kHz and maybe if optimized well reach a desirable 15kHz.

(8)Arduino IDE+Due/Uno->Audio: Bit Manipulation

Following the revelation discussed in the previous post, this meant I had to figure out the maximum baud rate which can be used with the code required to process, send, and receive data.  After discussing with a few individuals I decided that direct bit manipulation of the memory allocated to variables would be the most efficient.

This is because when the compiler converts high level code into in assembly code bit operations such as AND and NOT can be done in a single line.  Contrary to direct bit manipulation things such as if, while, and for require an extensive amount of code to execute.

For example.  Turning R1=128 into R1=512 would only require one line of code.

ADD R1, R1, R2 where R2 is a bitmask.

However a simple for loop to add 128 to itself until 512 is reached could look something like the following.

for(i=0; i<n ; i++){
x=x+x;
}
ADD R1, R1, R1 ;add R1 to itself
ADD R2, R2, -1 ;decrement counter
BRz R2 #R2 contains a counter

But how do I encode data I read into such a form in which it can be easily transmitted and understood?  Recalling that the data read from an Analog In port on an Arduino meausres a value from 0~1023 (10 bit) and that serial transmission transmits 1 byte words, I can turn a 10 bit word into two 5 bit words with a 3 bit header which marks where the data came from.

bool interruptCtr = true;

void setup(){
Serial.begin(5200000);
Serial2.begin(5200000);
}unsigned int in_A=0;
unsigned int in_B=0;
uint8_t A_1=0;
uint8_t A_2=0;
uint8_t B_1=0;
uint8_t B_2=0;

void loop(){
if( interruptCtr++ >= 1 ){//check whether to interrupt
interruptCtr = 1;//update interrupt value

//parse 0-1023 (10 bit) into 2 five bit words
//serial transmit transmits with 8 bit words
//split each 10 bit word into 2 bytes with 3 bit headers
//if from chA mark in[0-2] with 1
//if from chB mark in[0-2] with 0
//use integer comparison to determine the ch.
//if in>=2^7+2^6+2^5 (224) then it must be from chA
//chA has 111xxxxx or 110xxxxx in[2]=1 first 5 bit in[2]=0 second 5 bit
//chB has 001xxxxx or 000xxxxx

in_A=analogRead(7);
//save 1st five bits in A1/B1
A_1=(in_A>>5)|224;//xxxxxyyyyy->00000xxxxx
//00000xxxxx|0011100000->00111xxxxx
Serial2.write(A_1);

A_2=(in_A&31)|192;//xxxxxyyyyy&0000011111->00000yyyyy
//00000yyyyy|0011000000->00110yyyyy
Serial2.write(A_2);

in_B=analogRead(8);

B_1=(in_B>>5)|32;//xxxxxyyyyy->00000xxxxx
//00000xxxxx|0011100000->00111xxxxx
Serial2.write(B_1);

B_2=in_B&31;//xxxxxyyyyy&0000011111->00000yyyyy
//00000yyyyy|0011000000->00110yyyyy
Serial2.write(B_2);
}
}

 

The code above for the transmitter will read from two ports, split each port’s data into two 5 bit words and mark the header with 111 for first part of port A 110 for second part of port A and 001 for first part of port B and 000 for second part of port B.

One thing to note is the optimization done by condensing all bit operations into one line of code.  The compiler will try to do all written on one line in as little assembly as possible.  If the same operations were split on different lines (marked with 😉 there will be a much longer list of assembly instructions required.

The second thing to note is the variable types used.  Although for most programs that do not require heavy optimization a simple “int”, allowing for both + – integers (in 32 bit form for the Due), due to the inefficient manner of the Arduino IDE using something such as an uint_8 (unsigned 8 bit integer) will save memory and require less processing.

For the receiver a similar thing was done to optimize the code.

void setup() {
Serial.begin(5200000);
Serial1.begin(5200000);
analogWriteResolution(10);//set resolution to 10 bit (0-1023)
}//used to mark the difference between first and second parts
unsigned int A_1=0;
unsigned int A_2=0;
unsigned int B_1=0;
unsigned int B_2=0;

//flags are used to mark whether a certain part was recieved
bool flagA1=false;
bool flagA2=false;
bool flagB1=false;
bool flagB2=false;

bool two=false;//mark whether both parts are complete
bool two1=false;

bool switch_flag=false;

unsigned int serial_in=0;

void loop() {
//determine whether its chA or chB
//chA has 111xxxxx or 110xxxxx
//chB has 001xxxxx or 000xxxxx
//11100000=224 11000000=192
flagA1=flagA2=flagB1=flagB2=two=two1=false;

//check A/B and 1st 2nd byte conditions
while(two==false && Serial1.available()){
serial_in=Serial1.read();

switch(serial_in & 224){
case 224: //111xxxxx
flagA1=true;
A_1=(serial_in&31)<<5;
break;
case 192: //110xxxxx
flagA2=true;
A_1=A_1|(serial_in&31);
break;
case 32: //001xxxxx
flagB1=true;
B_1=(serial_in&31)<<5;
break;
case 0: //000xxxxx
flagB2=true;
B_1=B_1|(serial_in&31);
break;
}
two=flagA1 && flagA2 && flagB1 && flagB2;

}
analogWrite(DAC1, A_1);
analogWrite(DAC0, B_1);

}

The code here uses flags to mark whether each of the first and second parts were received and once all 4 parts are received both pieces of data will be written to both DAC pins of the Rx Due at the same time.

Originally I had planned to use if and else loops such as the following to determine whether each part had been recieved

while(flagB1==false && flagB2==false &&flagA1==false && flagA2==false && Serial1.read()){
serial_in=Serial1.read();
if(serial_in>=224){ //111xxxxx
flagA1=true;
A_1=serial_in;
A_1=A_1&31;
//analogWrite(DAC1, A_1);
A_1=A_1<<5;

}
else if((serial_in>=192) && (serial_in<224)){ //110xxxxx
flagA2=true;
A_2=serial_in;
A_2=A_2&31;

}
else if((serial_in>=32) && (serial_in<64)){ //001xxxxx
flagB1=true;
B_1=serial_in;
B_1=(B_1&31)<<5;
}
else if((serial_in>=0) && (serial_in<32)){ //000xxxxx
flagB2=true;
B_2=serial_in;
B_2=B_2&31;
}
}

However upon further research on optimization due to the problems with “if and else” conditions the switch method is much more efficient.  This is due to the compiler reading and converting switch statements into JMP.

This means that in less lines the processor can decide to jump to which subroutine (a separate function) stored somewhere else in memory.  Given that the subroutine is the same as whatever was inside the if else braces the switch statement is definitely more efficient.

Even with all the developments and optimizations, the maximum baud rate the processor can handle is 5.25 million bits per second which given the code provided above is a measly 7.5kHz which is lackluster to say the least.

(7)Arduino IDE+Due/Uno->Audio: Porting? Or new idea?

Upon completing my ventures attempting to use Simulink and the Arduino together with the hardware support package, I am now tryign to port this to the Arduino IDE in C code in order to create something with actual application in the real world.  To begin with I aimed to create a transmitter and receiver to send audio data with LEDs.

However, this proved to be a much larger hurdle than I first expected it to.  In order to “emulate” sampling in code, one would need to use timer interrupts.  This meant physically manipulating the registers inside the MPU.  The Uno has a lot of documentation on how to do this with sites such as the following.

Atmega168 Timer interrupts

The Due on the other hand was not as well documented.  Thus I decided to try and implement something with the Uno as a test run before I learned to properly write AVR code.  After reading up on the Arduino forums I discovered the following scheme to properly do timer interrupts on the Uno.

//http://www.protostack.com/blog/2010/09/timer-interrupts-on-an-atmega168/
//boolean trigger
// boolean toggle0 = 0;void setup() {
// put your setup code here, to run once:
//observation pin 8
pinMode(8, OUTPUT);//stop interrupts for safety
cli();

//setup interrupt vectors for 15khz
TCCR0A=0;//set timer registers to all 0
TCCR0B=0;//
TCNT0=0;//set timer count to 0

//set the frequency for ~15khz increments
//compare match register = [ 16,000,000Hz/ (prescaler * desired interrupt frequency) ] – 1
//OCR0A=7;
OCR0A=250;//when the counter will reset

//set prescaler
//CS02 CS01 CS00
// 0 1 1 (64 prescaler)
TCCR0B|=(1<<CS02);
TCCR0B|=(1<<CS00);

//MODE2 for wgm 02, 01, 00 will enable ctc (clear timer on compare)
//WGM02 WGM 01 WGM 00
//0 1 0
TCCR0A|=(1<<WGM01);

//enable timer/counter interrupt mask register
TIMSK0|=(1<<OCIE0A);//need to understadn what this is

//reenable interrupts
sei();
}

ISR(TIMER0_COMPA_vect){//timer0 interrupt 1Hz toggles pin 13 (LED)
//generates pulse wave of frequency 2khz/2=1kHz (takes two cycles for full wave- toggle high then toggle low)
if (toggle0){
digitalWrite(8,HIGH);
toggle0 = 0;
}
else{
digitalWrite(8,LOW);
toggle0 = 1;
}
}

void loop() {
// put your main code here, to run repeatedly:
//if toggle0==1 then take a sample from A0
//serial send and recieve by yourself

}

The code has been commented for ease of understanding.  However I was not able to transmit a square wave of anything higher than around 2kHz on the Uno and thus had to keep looking for working code for the Due.  After doing so, I stumbled on the following webpage which details how timer interrupts can be done on the Due.

Timer Interrupts

Even after further testing using the same sort of square wave testing methodology I was only able to create a square wave/trigger that was around 5 kHz.  After that I concluded that in order to fully maximize the processing power of the Due, I would have to completely forget about timer interrupts.  This is because I need to take advantage of every single instruction that the MPU can process and even doing so I may not be able to transmit and receive data at the desired rate.

With this new scheme, it meant that I would be walking away from the idea of “sampling” data and transmitting data at a certain rate and just work with the max data transfer rate.

(6)Simulink+Arduino->Audio: Serial Receiver

Now that I know I can use serial transmission in Simulink with the Arduino without any hassle, I proceeded to use the same methodology as before to send data.  In retrospect now that I know the serial transmit block actually sends uint8 values there was no real reason to turn the 10bit analogread values to binary.  What I would have needed to do is to normalize it to 8 bit unsigned vales.  Regardless below is the receiver.Capture

The methodology has already been explained previously in part (3) except now the end is replaced with a Serial Transmit block.

Implementation (MATLAB R2015B)

https://mega.nz/#!DM0UEQjA

The receiver is what has changed.  The methodology is as follows

  1. Receive data/status from serial input
  2. Use the change from 0 to 1 of the status to trigger the start of the clock (to match the incoming data)
  3. Use the previous decomposition scheme to decipher data
  4. Scale output such that it runs between 0~4096

Capture.PNG

capture1

Inside the Subsystem: Enable uses the status as a trigger to generate a synchronized clock

The reason why this works even with interruptions is because the receiver generated clock will stop with the data such that it will not drift out of sync by running by itself.  And will only reset when the data stream is resumed.

There is an unaddressed issue though.  This scheme does not specifically help one identify the L/R channels specifically such that the L/R data might be flipped if reset.

Regardless, the highest possible sampling rate was around 1/250 sec @ 250 Hz and was impractical for any sort of use.  In order to eliminate the overhead that Matlab imposes upon the Arduino Due, my next task was to port this to IDE in C code.

(5)Simulink+Arduino->Audio: Asynchronous Transmission

Upon the utter failure of the previous attempt, I decided to try and look into something that was “premade” and optimized more than anything I could have done with my own blocks.  This led me to explore asynchronous transmission.

Source

Asynchronous transmission allows a receiver to decode the transmitter’s data without needing the help of clock recovery or some sort of external clock line like i2c or SPI.  This made it a very very attractive alternative to other transmission schemes as I would only need to use one LED for the transmitter.  The reason this can be done is because the baud rate (bits per second transmitted) has to be agreed on which can be hard coded or chosen by a person.

While this is an attractive solution, by incorporating both start,  stop, and gap bits, the bandwidth required to transmit the same amount of data is much higher.  However, I was willing to make that trade off for the advantage of using only one data line.

In addition to this one of the reasons I chose to use asynchronous transmission is because the Arduino already has a serial transmission library built in.  This also applied for Arduino’s Simulink library.  The blocks included are as follows.

The advantage of these two blocks are that I do not have to design my own scheme for the number of gap bits, nor do I have to define what is a stop/start bit.  Serial transmit receives a 8 bit unsigned integer (0-255) and transmits it.  Serial receive receives the data (0-1023) and status is high when data is received and 0 when the block fails to receive data.  One thing of note (and the thing that took me a while to learn to abuse) is that if your receiver is working correctly, it will only produce a high upon the first received piece of data.

This will serve as the trigger to activate the receiver allowing us to start and stop the receiver whenever we desire without any data loss.  This also accounts for when transmission randomly stops due to interference.  As a proof of concept I returned to my original design for the transmitter and receiver using the serial transmit and receive blocks.

(4)Simulink+Arduino->Audio: Increasing Efficiency and Stability

To improve efficiency and stability there were 2 things I needed to do.

  1. Send data in larger packets to conserve bandwidth
  2. Implement a header periodically to allow the two signals to be periodically synced

To implement part one first we would need to implement a system to create a data stream of … A0A1A2A3B0B1B2B3 and so on.  Furthermore, to implement the second part regarding the header we would need to create a data stream of H0H1H2H3A0A1A2A3B0B1B2B3.

However to do this we need to consider the fact that the data stream incoming will be less than the amount of data to be output.  This requires rate transitions and registers to store the data.  Thus the following methodology will be this.

  1. Read data
  2. Convert A, B channel data into [4:1] vectors
  3. Push both to a separate registers at the same time
  4. Pop A, B registers (such that 4 bits of each are released at a time to create A0A1A2A3B0B1B2B3)
  5. Attach the header (H0H1H2H3A0A1A2A3B0B1B2B3)
  6. Output data to receiver

Steps 1 and 2 are the same as before.  However steps 3 and 4 were a bit of a pain to implement.

Capture

The push is done continuously by a clock and is essentially the “sampling” rate of the signal.  However the pop logic is more interesting.

To properly control when to pop several things are required.  1. A way to select a channel, 2. A way to decide when to pop.  A counter will be used to select a channel and then the logic that is used to pop is shown.

Finally after the data is properly popped a header is attached.

Capture.PNG

Here is the implementation.  Again it is done in MATLAB 2015RB.

https://mega.nz/#!2IsVwICb

Originally, the idea was to implement a pilot signal and have the receiver decode the message and use a header to determine where the beginning of the signal lay.  This was a relatively good idea given that the boards themselves only had to be synced up ever minute or so.

However as I learned after discussing with a professor is that while that should work in theory generating an accurate clock (through clock recovery) just based on a trigger would be nearly impossible to do.

Thus the idea evolved and changed such that training data will be sent first to the receiver to train the clock and then begin transmitting data with a periodic header to help recalculate the deviation in the clock.  This would be implemented with the help of a phase locked loop.

The PLL block in Simulink outputs the frequency of the current input waveform and the deviation in the frequency over time.  Thus one can use these two values to adjust and tune the frequency to their liking.  Given this,  I created a variable frequency oscillator (square wave)  which would respond to the changes in frequency.  This was taken from a Matlab FAQ post and adapted such that it would automatically  ceil/floor given a certain threshold.

capture1

Implementation

https://mega.nz/#!vV0VXaRZ

However this was ineffective and approximating with thresholds was not good thus given a constant frequency of 1khz with deviations from the random number generator of only 0.2Hz….the following result was produced.

Capture.PNG

In addition to this, the Simulink implementation of the PLL is utterly horrendous…  Running the PLL with any sort of input especially with higher frequencies even at a frequency of 1kHz did not run at real time.

This led me to abandon clock recovery try and try to explore various transmission schemes such as synchronous and asynchronous which will be discussed in the next post.

(3)Simulink+Arduino->Audio: Receiver

Capture.PNG

Above is an image of the receiver which was much more complicated to design than I originally thought.  The methodology is as follows

  1. Receive data through a data line, receive clock through another data line.
  2. Use received clock to trigger switches to sample data from data line
  3. Downsample data to remove interlaced zero values
  4. Rebuild [4:1] vector
  5. Input vector into Integer to Bit
  6. Normalize and output through Analog Output block

Since the image above is a bit convoluted the following images will discuss eat component of the system.

Firstly, separating data.

Capture.PNG

As shown above there are 2 input lines, data split between two 1 sample reset switches and the clock between two regular switches.  Again the 1 sample reset switches are reset/triggered on opposite edges to help separate each channel.

The following switch is used to help select between the data line and ground.  This is done so that the data comprised of A0B0A1B1A2B2 can be converted into two seperate data lines of..

  1. A0 0 A1 0 A2 0
  2. 0 B0 0 B1 0 B2

Following this a delay is applied to the second channel in order to “sync” the two channels again.  I say sync, however in reality given a high enough sample rate delaying by 7 samples is essentially negligible.  This is only done so that the downsample by 2 works.

Downsampling will remove the extra zeros thus creating vectors of the form A0A1A2A3…. and so on.

Secondly is the part of the system which converts the data from [1:1] to [4:1].

Capture.PNG

This is where the bulk of the processing was used and the reason why this model was so inefficient.  The 8 rate transitions (4 per bit * 2 channels) and the 8 sample and holds really killed the processor.  The methodology is as follows.

  1. Given 1 cycle of data, over 4 samples the inputs are A0A1A2A3
  2. Using counters that start counting from 0 to 3 and each triggering on a different value respectively we obtain 4 data lines of the following form
    A0    0   0   0
    0       A1 0   0
    0      0    A2 0
    0      0    0    A3
  3. In order to choose between reading a zero and a value from the data line, we use the switch which only triggers when “Hit” is observed from the counter
  4. The sample and holds then hold each respective value such that we can allow all 4 data lines to be sampled simultaneously to obtain a vector of [A0; A1; A2; A3] the 4 data lines will appear as follows from the output of the mux
    A0 A0 A0 A0
    0    A1 A1   A1
    0    0   A2  A2
    0    0   0     A3
  5. Following that using the same setup a counter and a switch is used to read the vector passing it through a ZOH and finally the bit to integer converter which converts this value back into an unsigned 4 bit integer

Lastly we have the analog output.

Capture.PNG

Luckily we do not have to do the block diagram for the DAC.  The only thing done her is to scale the input, 0-15, to values of , 0-4050.  Although this doesn’t increase the resolution of the output by any means, it allows one to fully use the output range of the Due’s DAC which reads uint16 numbers.  Although it reads uint16 numbers only 10 of the 16 bits are used meaning that the max value is 4095.

Thus concludes the first “working” model of a dual channel transmitter and receiver which needs a clock line and digital data line.

When it came to the actual real world implementation of this, there were problems..to say the least.  It was physically impossible to get the clock/data lines to be read properly due to the desynced internal clocks of two separate boards.  Data was only able to be read and decomposed correctly upon multiple hard resets on the receiver board and only remained in sync for several minutes as a time.

Thus from here on out, I began to study clock recovery and various signal transmission methods to resolve this issue.  In addition to that I hoped to improve the sampling rate by using a method that altered instead of each bit maybe 4 or 5 bits at a time.

Below are the receiver and transmitter simulink files.  Keep in mind that I used the R2015B version of MATLAB and the same version is required to open and further edit these files i apologize for this inconvenience.

File for 1 Arduino board

https://mega.nz/#!DB8xiRBB

To split this into two boards and try for yourself split the above file at the the 1 sample switch and use 2 digital outputs and inputs for the data stream and the clock.