Chapter 1: Computer Networks and the Internet
Chapter 2: Application Layer
2.1 Principles of Network Applications
- You do not write applications for the network-core stuff you only worry about the higher level end application stuff
2.1.1 Network Application Architectures
-
Client-Server Architecture
- Server: Always-on host, services requests from many other hosts, waits to be contacted
- Client: End user that sends requests to the Server, the initiator
- Clients do not communicate with each other
- Server has fixed location/IP address
-
P2P Architecture
- Little to no reliance on dedicated servers in data centers
- Direct communication between pairs of intermittently connected hosts, peers
- Self-Scalability: A peer both generates workload (request file) and adds to service capacity by distributing files to other peers
2.1.2 Process Communicating
- Processes not programs communicate with each other processes call system calls and fork and stuff
- Processes and end systems communicate with each other by exchanging messages across the computer network
-
CLIENT SERVER PROCESSES
-
In a server-client infrastructure
- For example a Web browser: client process, Web server: server process
-
In a P2P scheme
- Peer is both client and server
-
-
THE INTERFACE BETWEEN THE PROCESS AND THE COMPUTER NETWORK
-
Socket: Process sends and receives messages on a socket
- Interface between application and transport layer
- Process à House, Socket à Door
-
-
ADDRESSING PROCESSES
-
To identify a receiving process the following are required
- Address of host
- Identifiers specifying receiving process in the destination host
-
Host is identified by its IP Address, process on host is identified by port number
- 32 bit quantity IP
- 16 bit quantity port
-
2.1.3 Transport Services Available to Applications
- Many networks provide more than one transport layer protocol
-
RELIABLE DATA TRANSFER
- Guaranteed data delivery with no errors by protocol
- There are also unreliable data transfer protocols
- The transport layer handles the reliable data transfer and the application layer does not need to worry about it
-
THROUGHPUT
- Throughput: Rate at which the sending process can deliver bits to the receiving process
- Throughput can fluctuate depending on the bandwidth used by other processes or users
- Transport layer could provide guaranteed throughput
- Application can request specific speed especially for bandwidth sensitive applications
- Elastic applications can accommodate inconsistent throughput
-
TIMING
- Transport layer protocol can also provide timing guarantee
- Guaranteed Timing: A bit will arrive at destination in a specified amount of time
-
SECURITY
- Can provide encryption
2.1.4 Transport Services Provided by the Internet
- Two transport protocols, UDP and TCP
-
TCP SERVICES
-
Connection oriented and reliable data transfer service
-
Connection Oriented Service
- Handshaking: Client and server exchange transport layer control information before application level messages begin to flow
- After handshake TCP connection is established
- Full duplex, 2 way
- Connection is closed after transfer finishes
-
Reliable Data Transfer Service
- RCP can deliver all data without error and in proper order
-
- Also has congestion control mechanism which throttles sending process
-
-
UDP SERVICES
- Lightweight transport protocol providing minimal services
- Connectionless, no handshaking before beginning of communication
- No guaranteed delivery
- No congestion control
-
SERVICES NOT PROVIDED BY INTERNET TRANSPORT PROTOCOLS
- TCP and UDP have been designed to deal with throughput and timing guarantees as best as possible
- Clever design provides relatively good time-sensitive performance or throughput
- However we cannot provide any guarantees for these two things
2.1.5 Application-Layer Protocols
-
Application layer protocols define how application processes running on different end systems pass messages to each other
- Protocols define… types of messages, syntax, semantics (meaning of fields), rules determining when and how to respond
-
Network vs Application protocols
- Application protocols are not the same as network protocols and only specifies how a server and client application should communicate with each other
2.2 The Web and HTTP
2.2.1 Overview of HTTP
-
HTTP: Web’s application-layer protocol
- Implemented on 2 programs client and server
- HTTP uses TCP as its underlying transport protocol thus will establish a TCP connection first before transferring data over the socket interface
- Browser/client will be interchanged as often the client is a web browser
-
Server sends requested files to clients without storing any state information about the client
- This makes it a stateless protocol, eg. Repeated requests for the same object will be fulfilled and server will never say “just shut up I just gave it to you” it will just server the same object again
2.2.2 Non-Persistent and Persistent Connections
- Non-Persistent Connection: Each request/response is sent over a separate TCP connection
- Persistent Connection: All requests and corresponding responses should be sent over the same TCP connection
-
HTTP WITH NON-PERSISTENT CONNECTIONS
-
For example a request to a URL http://www.someSchool.edu/someDepartment/home.index
- HTTP client initiates TCP connection on port 80
- HTTP client sends HTTP request for object
- HTTP server process receives request and sends a response message with the file as part of it
- HTTP server closes TCP connection
- HTTP client receives response with the HTML file and finds references to JPEGS on HTML file
- HTTP client repeats process on each image
- If there are 10 images on the HTML file then 11 TCP connections are created
-
RTT: Round Trip Time
- The time for a small packet to travel from client to server and then back to client
- Contains packet-propagation delays, packet queuing delays, and packet processing delays
-
-
HTTP WITH PERSISTENT CONNECTIONS
- Leave TCP connection open after sending response
- In the previous example HTML and 10 images will be sent on the same TCP connection
- This removes the RTTs which are contributed by the multiple TCP connection handshakes
2.2.3 HTTP Message Format
- HTTP REQUEST MESSAGE
-
-
Method: GET, POST, HEAD, PUT, DELETE
- GET: Browser requests object
-
Header can contain something like
-
Host: www.someschool.edu
- Note: This is required for a web proxy cache
-
Connection: Close
- This means use non-persistent connections
-
-
User-agent: Mozilla/5.0
- Type of browser
- Accept-language: Preferred language of the webpage if available
-
Body is filled with different things based on the type of method
- GET: Empty
-
POST: Whatever you want to enter into the form on the page
- Can be emulated with a GET request and having arguments in the URL
- /animalsearch?monkeys&bananas
- HEAD: Similar to GET but tells server to leave out the actual file in response
- PUT: Upload objects to server
- DELETE: Delete from web server
-
-
HTTP RESPONSE MESSAGE
-
Status Line (1 line)
- Protocol version: HTTP/1.1
-
Status code
-
200 OK
- Request succeeded and information is returned as response
-
301 Moved Permanently
- Object been permanently moved and new URL is specified in Location header line
-
304 Not Modified
- Used to tell a cache that the cached object is up to date
-
400 Bad Request
- Generic error saying request was not understood
-
404 Not Found
- The requested document does not exist on this server
- 505 HTTP Version Not Supported
-
-
Status message
- The English part after the number
-
Header lines (6 lines)
- Connection: Close
-
Date: Tue, 09 Aug 2011 15:44:04 GMT
- Time when server retrieves the object from system
-
Server: Apache/2.2.3 (CentOS)
- Info about server
-
Last-Modified: Tue, 09 Aug 2011 15:11:03 GMT
- Last time object was changed
-
Content-Length: 6821
- Number of bytes
-
Content-Type: text/html
- Type of file
- Data Content
- Each object gets its own HTTP response message
- Header lines are generated as a function of browser type and version
2.2.4 User-Server Interaction Cookies
- Recall HTTP is stateless, cookies are used to keep track of users
-
Cookies contain…
- Cookie header line in HTTP response message
- Cookie header line in HTTP request message
- Cookie file kept on the user’s end system and managed by user browser
- Back end database at the website
- This is how websites can provide a “logged in state” or provide you with the one click shopping like Amazon
- Cookies are controversial due to the privacy concerns they bring up
2.2.5 Web Caching/Proxy Server
- Keep Copies of recently requested objects of a website in storage
- Goal of this is to reduce traffic on a link to the internet
- When a browser tries to grab an object from the origin server it will first contact the Proxy Server first, the Web cache checks to see if it has a copy of the file stored locally if it does it will return it to the browser with an HTTP response message
- If the Proxy server does not have the copy of the file, it requests it from the origin server and saves the new file on the proxy server and then sends a new copy to the web browser that requested the file in the first place
- Cache is both server and browser! It’s a server to the web browser client and a client to the origin server
- Internet Delay: Time required to router to send request and get a response
-
Suppose the Hit rate of a cache is 0.4, let the response time of the cache is 0.01 sec and the average internet delay is 2.01 sec
- Average time to respond to request = 0.4 * 0.01 sec + 0.6 * 2.01 sec
- CDN (Content Distribution Network): Company that specializes in cachew
2.2.6 The Conditional GET
- What if the data in the cache is stale?
- Conditional GET lets us make sure objects are up to date
-
It is a GET request with an additional header line
- If-Modified-Since: Wed, 7 Sep 2011 09:23:24
- It is sent by the cache to the origin server
-
If the If-Modified-Since time is the same as the Last-Modified time, the origin server just sends a HTTP response with an empty body
- It also uses a 304 Not Modified response
2.3 File Transfer: FTP
- Allows a user to transfer files to or from a remote host
- User provides hostname of remote host to FTP client and establishes TCP connection, user then authenticates and can copy or transfer files to and from an FTP server
-
- TCP Control Connection uses port 21: Remains open during session
- TCP Data Connection uses port 20: New connection is created every time a file is transfered
- As Control and Data do not use the same TCP connection (parallel connections) FTP is considered out-of-band
- HTTP sends control information in-band
- Must maintain state of user (logged in or current directory) this limits the number of active sessions
2.3.1 FTP Commands and Replies
- Probably doesn’t matter
2.4 Electronic Mail in the Internet
-
3 major components
- User agents: Software that allows a user to send mail to another user
-
Mail servers: Contains various users mailboxes and converses with other mail servers using SMTP to send mail to other users inboxes
- User agent email à Sender’s mail server à Possible queue if mail cannot be delivered à Recipient serve receives mail à Place new mail in inbox of recipient
-
Simple Mail Transfer Protocol (SMTP)
- Application-Layer protocol for internet electronic mail
- Uses TCP
- SMTP has two sides client and server, any mail server will run both to both to properly receive and send mail
2.4.1 SMTP
- Much older than HTTP and one such limitation is 7-bit ASCII data
- Binary multimedia data must be encoded into ASCII before sent over SMTP
- SMTP uses TCP over port 25
-
SMTP has its own handshaking phase for sending mail
- SMTP client indicates the email address of sender and email address of the recipient
- Once instructions are done, client sends message
- Repeat until queue is empty
- One complete close TCP connection
2.4.2 Comparison with HTTP
- HTTP mainly pull protocol (people load information from web server)
- SMTP mainly a push protocol (sends information)
- SMTP requires 7 bit ASCII format, HTTP data does not impose this restriction
- HTTP encapsulates each object into its own HTTP response message, SMTP puts everything into one message
2.4.3 Mail Message Formats
- From: alice@crepes.fr
To: bob@hamburger.edu
Subject: Searching for meaning of life.
<-blank line->
<-Content of the message->
2.4.4 Mail Access Protocols
- Often the mail server itself does not run on a person’s PC that would require having your PC on all the time just to receive messages
- Instead it runs somewhere else. While sending emails you can use SMTP (push protocol) to converse with the mail server how do you pull the mail from the server?
- POP3, IMAP, HTTP
POP3
- Simple and limited functionality
-
3 Phases: Authorization, transaction, update
- Authorization: Send user login info
- Transaction: Pull messages from the mail server
- Update: Issue a quit command and disconnect, mail server now deletes emails
- State is maintained during a session (messages marked deleted)
- Does not maintain state across POP3 sessions
IMAP
- Allows saving of state between sessions and allows one to access your mail consistently across multiple machines
- Also allows for sorting of mail by folder
- Allows user to obtain components of message not all to save bandwidth
Web Based E-Mail
- User agent is now an ordinary Web browser, user communicates with its remote mailbox via HTTP
- Recipient accesses mailbox via HTTP in browser
- While someone sending a message will use HTTP to send it via the interface on the web browser the mail server itself still uses SMTP to send the mail
2.5 DNS-The Internet’s Directory Service
-
Used to translate human readable hostnames into IP addresses which are used by routers and such
- DNS is a distributed database implemented in a hierarchy of DNS servers
- Has an application-layer protocol that allows hosts to query the database
- DNS uses UDP and port 53
- Employed by other application-layer protocols, HTTP, SMTP, FTP
-
The steps done by DNS
- User machine runs DNS
- Browser extracts URL and passes to DNS client side application
- DNS client queries DNS server
- DNS client receives reply, which is the IP address of the host name
- Initiate TCP connection on port 80
-
DNS also provides the following services
-
Host aliasing
- This allows a website to have multiple mnemonics while the true resolved canonical hostname is much more complicated
- google.com or www.google.com resolve to relay1.west-coast.google.com
-
Mail server aliasing
- The same applies to the mail server
- bob@hotmail.com the Hotmail part will resolve to something that is much more complicated DNS will be invoked by the mail server
- MX record will allow a company to have their mail-server and web-server to resolve to the same hostname
-
Load Distribution
- Busy websites are replicated over multiple servers to split the load where each end system has a different IP address
- A set of IP addresses will have 1 canonical hostname
- DNS database will contain a SET of IP addresses
- HTTP requests from a client often go to the first IP address in the set returned by the DNS database the database will rotate the list to reduce the load on a specific IP
- This applies to both web-servers and mail-servers
-
- Often DNS can take a significant amount of time and add much delay however DNS resolved things are often cached at a more nearby DNS server
2.5.2 Overview of How DNS Works
- All DNS queries and reply messages all occur using UDP datagrams on port 53
- DNS host receives a DNS reply with the proper mapping which is passed to the invoking application
- DNS appears to be a black box to the invoking option
-
A simple implementation is a centralized DNS server with all the mappings, this is unwise and not scalable
- Single point of failure
- Traffic volume
- Distant centralized database: You can’t be close to everything
- Maintenance: Needs to be updated frequently and keep EVERY record for the entire internet
-
A DISTRIBUTED, HIERARCHICAL DATABASE
- There are 3 types of DNS servers, root, top level domain (TLD), authoritative
-
When a client queries amazon.com…
- Query root server which returns TLD IP à Query TLD server returns authoritative amazon.com IP à Query authoritative amazon.com server and returns IP for hostname amazon.com
- Local DNS server (domain name service): These are located on institutions or local ISPs and is what your computer will really send the request to
-
For example host cis.poly.edu desires IP address of gaia.cs.umass.edu
- Assume TLD server knows authoritative DNS server for hostname (not always true)
- For example if there is dns.umass.edu and dns.cs.umass.edu, the TLD server probably only knows dns.umass.edu
- If we want to access cs.umass.edu we will receive the IP of dns.cs.umass.edu (the new authoritative server) from dns.umass.edu
-
- All but 1/8 are iterative queries because the response goes directly to the local DNS query
- 1/8 are recursive queries as we tell the DNS server to do something for us instead of receiving the query ourself
-
- Purely recursive version of the same DNS query
-
DNS CACHING
- Used to reduce delay and number of DNS messages traveling around
- When a DNS request is made to a specific server when receiving a reply it will cache it and then when requested the same host name it can now reply with the resolved IP even though it is not an authoritative server
- These mappings are often invalidated every 2 days
2.5.3 DNS Records and Messages
- DNS servers also store RRs (resource records)
-
Resource records (RR) are 4-tuples that contain
- Meaning of Name and Value depend on type
-
Type
-
Type=A, Name=hostname, Value=IP address of hostname
- (relay1.bar.foo.com, 145.37.93.126, A)
- This is the DNS RR on a TLD DNS server
-
Type=NS, Name=domain, Value=hostname of authoritative DNS server that knows how to obtain IP address for hosts in the domain
- (foo.com, dns.foo.com, NS)
-
Type=CNAME, Value=Canonical hostname for alias hostname Name
- (foo.com, relay1.bar.foo.com, CNAME)
-
Type=MX, Value=canonical name of mail server has alias hostname
- (foo.com, mail.bar.foo.com, MX)
- Allows company to have same aliased name for mail server and other servers
- To obtain canonical name for the mail server DNS client would query for MX record
- To obtain canonical name for the web server DNS client would query for the CNAME record
-
- TTL: Time to live, determine how long before being removed from cache
-
DNS MESSAGES
-
DNS has two types of messages, query and reply both share the same format
-
Header (12 bytes)
- 16 bit ID
- 1 bit query (0) reply (1)
- 1 bit authoritative flag set in reply
- 1 bit recursion flag for client
- 1 bit recursion for reply if DNS server supports it
- 4 Number sections indicating the occurrences of the next sections
-
Question Section
- Information about query being made
- Name filed of what’s being queried
- Type field about what type of question is made
-
Answer Section
- Resource records for the name that was originally queried
- Multiple may be in a reply as a hostname may resolve to multiple IP addresses
-
Authority section
- Contains records of other authoritative servers
-
Additional Section
- If a MX request was made this section contains the Type A record of the mail server
- nslookup can be used to manually query a DNS server
-
-
INSERTING RECORDS INTO THE DNS DATABASE
-
A registrar is a company that verifies uniqueness of a domain name and enters it into the DNS database
- Need to supply primary and secondary DNS servers (name and IP)
-
Make sure Type NS and Type A entries are entered into the TLD com/org/etc. servers
- (networkutopia.com, dns1.networkutopia.com, NS)
(dns1.networkutopia.com, 212.212.212.1, A) - You need to make sure your authoritative domain server contains an MX record to mail.networkutopia.com and Type A resource record for Web server http://www.networkutopia.com
- (networkutopia.com, dns1.networkutopia.com, NS)
-
2.6 Peer-to-Peer Applications
- In P2P architecture there is minimal (or no) reliance on always-on infrastructure servers
- Peers are desktop and laptops controlled by users
- Discuss P2P file distribution and database distributed over large community of peers in the case of a Distributed Hash Table (DHT)
2.6.1 P2P File Distribution
-
SCALABILITY OF P2P ARCHITECTURES
- Let us compare the download time it requires to distribute a file with F bits to N people with server upload speed u_s and d_i the download speed of each person
- The below are the minimum distribution times for each architecture
-
In the case of a client server architecture
- The minimum time required to send all the files is the number of people * number of bits / upload speed or limited by the speed of the slowest peer
-
For the case of the P2P architecture we assume that at the beginning only the server has the file and that the total upload capacity is the sum of the upload speed of the server and all of the users
- Where the first term is the minimum distribution time if all users were not limited by download speed
- The second term is where the minimum distribution time is limited by the slowest user
- And the third where we are limited by the total capacity of the P2P distribution network
-
BITTORENT
- Collection of peers participating in a distribution of a file is called a torrent
- Peers download equal sized chunks from each other, 256KBytes
- A peer has a choice to leave the swarm once done downloading or can continue to contribute
-
Each torrent has an infrastructure node called a tracker
- Peers joining the tracker will register itself and periodically informs the tracker it’s still participating
- When Alice joins a tracker randomly selects a subset of peers and sends their IPs to Alice, these are the initial “neighboring peers” of Alice
- Alice will periodically ask for the list of chunks her peers have over TCP and will issue requests for chunks she does not have
-
1. Which chunks should she request?
- Use rarest first: Among the chunks Alice does not have request the one that is rarest among her neighbors
- In this way the chunks will then even out and become more common
-
2. Which neighbors should she send chunks to?
- Give priority to the 4 peers feeding her bits at the highest rate
- Reciprocate by sending chunks to these 4 peers as well
- Recalculate periodically (10 sec) and change the set of these 4 (unchoked peers) if needed
-
Periodically (30 sec) begin sending chunks to a random peer, this peer is said to be optimistically unchoked
- It may be possible that this new peer can become Alice’s top 4 and vice versa
- This system allows new peers to get chunks and begin the circle of life
- This system usually ends up allowing peers of similar internet rates find each other and transfer data between each other
- Known as tit-for-tat
2.6.2 Distributed Hash Tables (DHTs)
- Imagine a database with a (key, value) pair system for example, key=filename, value=IP
-
Naïve implementation
- Randomly scatter key value pairs to all peers, when someone requests a file the requesting peer will request from all peers and the one which has the key value pair will reply with the information
- Infeasible as many requests and requires each peer to keep a copy of all peers
- Any peer will also be allowed to insert new key-value pairs
-
A better implementation
- Let each peer have an identifier in range [0, 2^n-1]
- Let keys require the same range of integers
- Use a hash function to map a string (file name) to an integer
- Thus key=hash of filename, value=IP/whoever
-
Assign each key-value pair to the peer whose identifier is closest to the key
- We can define closest as the closest in a subtraction to identifier
- However this is infeasible as it requires each peer to keep track of a complete list of all peers to determine the definition of closest
-
CIRCULAR DHT
- In a circular DHT arrange peers in a circle with increasing identifier values
- Each peer only keeps track of its immediate successor and immediate predecessor
- This is known as an Overlay Network, where there is a virtual liaison between pairs of peers
- When a peer requests something the request will be passed on until the closest peer has been reached who will then send the request back to the originator
- However this can mean that you need to access O(N) (average N/2) messages before receiving your reply
- One solution is to add shortcuts so a peer not only keeps track of its immediate predecessor and successor but also a small number of shortcut peers
- The process is the same when requesting a key value pair
- How do we determine the number of shortcuts? Don’t know but it’s possible to achieve O(log N) messages
-
PEER CHURN
- What if a peer decides to leave suddenly?
- In order to deal with this each peer will keep track of two immediate predecessors and successors
- Then each peer will intermittently ping each of its immediate successor and immediate predecessor to make sure they are there
- If they disappear they will just update the predecessor and successor and then request for their new second predecessor/successor
-
What if someone wants to join?
- The new joiner will send the request to the peer with the lowest identifier and the request will get forwarded until it reaches its to be predecessor which will then send the required predecessor and successor information to the new peer
- The new peer will now insert itself and notify its 2 new immediate predecessors and successors
2.7 Socket Programming: Creating Network Applications
- There are both programs that use proprietary and open standards for network programming
2.7.1 Socket Programming with UDP
- Recall machines communicate with each other by sending packets
-
Below is code for UDPClient.py
- # UDPClient.py
from socket import *
serverName = ‘hostname’
serverPort = 12000
clientSocket = socket(socket.AF_INET, socket.SOCK_DGRAM)
message = raw_input(‘Input lowercase sentence:’)
clientSocket.sendto(message,(serverName, serverPort))
modifiedMessage, serverAddress = clientSocket.recvfrom(2048)
print modifiedMessage
clientSocket.close()
- hostname is the IP
- socket.AF_INET represents IPv4
- socket.SOCK_DGRAM represents UDP socket
- clientSocket.sendto() sends the message to the socket opened on the client and specifies the server IP and port number
- modifiedMessage contains whatever the reply from the server is and serverAddress contains both the server IP and port number however we already know this
- # UDPClient.py
-
UDPServer.py
- # UDPServer.py
from socket import *
serverPort = 12000
serverSocket = socket(AF_INET, SOCK_DGRAM)
serverSocket.bind((”, serverPort))
print “The server is ready to receive”
while 1:
message, clientAddress = serverSocket.recvfrom(2048)
modifiedmessage=message.upper()
serverSocket.sendto(modifiedMessage,clientAddress)
- The bind function assigns serverSocket the port number of 12000
- When sending values out of the server onto serverSocket, the server address is attached automatically even if we didn’t do so explicitly
- # UDPServer.py
2.7.2 Socket Programming with TCP
- TCP is connection-oriented protocol
- Client and server need to handshake before establishing a TCP connection
- A client socket and server socket are paired (IP address and port number)
- Once established you do not need to specify the destination or source address in the reply from the client or server
- To receive the handshake, the server must have special welcoming socket to wait for the request from the client
- After the handshake completed the application sees the new client socket and server connection socket as directly connected by a pipe
- TCP guarantees server will receive every byte is received and in order
- TCP is two way server and client can have a conversation
-
TCPClient.py
- from socket import *
serverName = ‘servername’
serverPort = 12000
clientSocket = socket(AF_INET, SOCK_STREAM)
clientSocket.connect((serverName, serverPort))
sentence = raw_input(‘Input lowercase sentence:’)
clientSocket.send(sentence)
modifiedSentence = clientSocket.recv(1024)
print ‘From Server:’, modifiedSentence
clientSocket.close()
- from socket import *
-
TCP differs significantly from UDP
-
clientSocket = socket(AF_INET, SOCK_STREAM)
- First off we use SOCK_STREAM, this means it’s a TCP socket instead
- Once again we do not specify the client port and let the OS choose for us
-
clientSocket.connect((serverName, serverPort))
- This is the line that actually connects to the server and does the 3 way handshake
-
-
TCPServer.py
- from socket import *
serverPort = 12000
serverSocket = socket(AF_INET, SOCK_STREAM)
serverSocket.bind((”, serverPort))
serverSocket.listen(1)
print ‘The server is ready to receive’
while 1:
connectionSocket, addr = serverSocket.accept()
sentence = connectionSocket.recv(1024)
capitalizedSentence = sentence.upper()
connectionSocket.send(capitalizedSentence)
connectionSocket.close() -
serverSocket=socket(AF_INET, SOCK_STREAM)
- Once again we use SOCK_STREAM to specify a TCP socket
-
serverSocket.bind((”, serverPort))
- This creates the welcoming socket the one that’s constantly listening
-
serversocket.listen(1)
- This constantly listens for a request from a client
-
connectionSocket, addr = serverSocket.accept()
- This will create a new socket for the client to connect to on the server
- Note we only close the connectionSocket, the one that actually does the sending of data, this means that when we receive another connection request from another client we will still be able to connect and reply with the modifiedData
- from socket import *
Chapter 3: Transport Layer
3.1 Introduction and Transport-Layer Services
-
Transport layer provides logical communication between application processes running on different hosts
- From application perspective, as if hosts were directly connected
-
Sending side: Transport layer takes application layer message à transport layer message
- Break into smaller chunks and add a transport layer header to create transport layer segment
- Receiving Side: Network layer extracts transport layer segment from datagram and passes segment up to transport layer
-
Multiple transport layer protocols are available to network applications
- TCP and UDP
3.1.1 Relationship Between Transport and Network Layers
- Transport layer protocol provides logical communication between processes running on different hosts, a network-layer protocol provides logical communication between hosts
-
Imagine situation where 2 households send mail to each other, where one specific household member collects the mail in each household
- Application messages=letters in envelopes
processes=recipients
hosts (end systems)=houses
transport-layer protocol=The person in charge of collecting mail
network-layer protocol=postal service (including mail carriers) - In this example the processes (recipients) appear to be talking directly to each other
- And the hosts (houses) appear to be directly connected to each other
- Application messages=letters in envelopes
- Transport layer cannot guarantee delay or bandwidth guarantees as that is something that is constrained by the Network layer
- However, what the transport layer can guarantee is reliable data transfer service even if the network protocol is unreliable
- Another thing that can be offered is encryption
3.1.2 Overview of the Transport Layer in the Internet
- Internet (TCP/IP) network makes two transport layer protocols available to the application layer, TCP UDP
- TCP or UDP is specified when a application layer programmer creates a socket
-
Transport-layer packets are known as segments
- Often transport layer TCP packet is known as segment while UDP is known as datagram
- Book uses segment for transport-layer and datagram for network-layer always
-
Network-layer is IP, internet protocol
- Best-effort delivery service/unreliable service like UDP
- Each host has at least one IP (network-layer address)
-
UDP/TCP is responsible for extending IP’s delivery service between two systems to two processes running on those systems
- This extending host-to-host to process-to-process is known as transport-layer multiplexing and demultiplexing
- UDP/TCP offer integrity checking as well by using error-detection fields
- UDP only offers error-detection and transport layer multiplexing
- TCP also offers reliable data transfer and congestion control
3.2 Multiplexing and Demultiplexing
- If we have multiple sockets and applications running on a client that need to send data out we need to multiplex them together
- When we have incoming data for multiple applications running on different sockets we need to demultiplex them
This is done whenever we have a transition between two layers in the model, in the transport layer we transfer from network à application
-
Each socket has a unique identifier and each segment has special fields to indicate which socket to which segment is to be delivered
- These are the source port number and destination port number fields
- Each port is a 16 bit number, however 0-1023 are restricted ports
-
In theory we would only need to do is to assign the socket a port and compare destination port
- Essentially what UDP does, TCP is a bit more subtle?
-
CONNECTIONLESS MULTIPLEXING AND EMULTIPLEXING
- Transport layer attaches the source port number and destination port number onto the UDP segment and then sends it to the network-layer
- The network-layer will then encapsulate it into a IP datagram which also contains the source IP
- When arriving at the destination it will compare the destination port and then deliver it
- Note: UDP socket is fully identified by two tuple, destination IP and destination port
- If two UDP segments have different source IP and source port but same destination IP and destination port they will be delivered to the same socket on the destination
-
The source port number serves as a “return address” if required
- Recall UDPServer.py
-
CONNECTION-ORIENTED MULTIPLEXING AND DEMULTIPLEXING
- TCP uses a 4 tuple to identify a source
- This means that either different source IP or different source port will be directed to two different sockets
- The 4 tuple contains, source port, dest port, source IP, dest IP
-
- Here in this example although host A and host C both come from source port 26145 the server will have opened two different sockets from them because the source IP is different and is able to differentiate
-
WEB SERVERS AND TCP
- A new process is spawned for each connection in an Apache web server
- Nowadays one process multiple threads instead for each connection
- All traffic is directed at destination port 80 on the web server
3.3 Connectionless Transport: UDP
- Does almost nothing in transport protocol, aside from multiplexing and demultiplexing it only attaches source and destination port number fields + 2 other small fields
- No hand shaking and thus connectionless
- DNS typically uses UDP probably because it reduces delay when coming from the handshake, if the data is lost the DNS application can either resend or tell the invoking application that no reply was received
-
Why would one want to use UDP?
-
Finer application-level control over what data is sent and when
- Specifically for real-time applications you do not want to be bogged down by delay or must fufill a minimum sending rate
- TCP’s congestion control and resending when a packet is dropped will break this real-time ability
-
No connection establishment
- No handshake = less delay, this is not an insignificant part of why loading stuff over HTTP is slow initially
-
No connection state
- State in TCP is used for congestion control and reliable data transfer
- Keeping track of state will limit the number of active clients
-
Small packet overhead
- 8 bytes for UDP vs 20 bytes for TCP
-
- UDP can have reliable data transfer if the application-level programmer incorporates it there
3.3.1 UDP Segment Structure
- RFC 768
-
Contains only 8 bytes in header each field is 2 bytes
- Length: # bytes in UDP segment (header + data)
- Checksum used to check for errors in delivery
3.3.2 UDP Checksum
- Checksum is computed by taking the 1’s complement of the sum of all the 16 bit words in the segment with any overflow encountered during sum being wrapped around
- While a checksum is provided UDP does not provide a mechanism to recover
- At the receiver all 3 16 bit words are added and then checksum is added as well
- If there are no errors the sum at the receiver should be all 1s if no error
- UDP provides a checksum even though many link-layer protocols (ethernet) provide error checking because not ALL link-layer protocols provide it
- Even if the network-layer introduces no error when transferring to the router there may be a memory error at the router that induces a bit error
- End-end principle: It is far easier to obtain reliability beyond a certain margin by mechanisms in the end hosts of a network rather than in the intermediary nodes. Therefore it is prudent to employ this before the data is sent out onto the network-layer
3.4 Principles of Reliable Data Transfer
- Reliable data transfer protocol implements service abstraction
- We assume that the layer below the reliable data transfer is unreliable, in this case IP (this assumption can always be made for any layer below a reliable transfer protocol)
- Our reliable data transfer will always send data in order however it maybe lost in the lower level and the underlying channel will not reorder packets
-
We only discuss unidirectional data transfer however TCP is a full bidirectional data transfer
- It is no more difficult conceptually but considerably more tedious to explain
- Even though we say unidirectional data transfer they must also exchange control packets back and forth
3.4.1 Building a Reliable Data Transfer Protocol
-
RELIABLE DATA TRANFSER OVER A PERFECTLY RELIABLE CHANNEL: rdt3.0
-
Features:
- ACK to tell sender which packet was last received, if receiver receives out of order packet/corrupted packet it will send an ACK for the last received packet
- Uses sequence number to determine current packet and previous packet, this is a 1 bit value due to the stop-and-wait protocol
- A timer is used to check for a timeout on whether or not we received an ACK for the last packet we sent
- If sender receives a corrupted ACK or incorrect ACK it will wait until timeout before resending the last packet, this is done to avoid duplicate packets
- This timer used has to be at least RTT + processing time at receiver choosing this value is hard
- ACK to tell sender which packet was last received, if receiver receives out of order packet/corrupted packet it will send an ACK for the last received packet
-
- 2.2 is same as 3.0 but does not incorporate the countdown time on the sender
- The receiver FSM used in 3.0 is the same as in rdt2.2. The sender handles packet loss by retransmitting after a timeout. These retransmissions can introduce duplicates in the channel if the packet wasn’t actually lost, but the rdt2.2 receiver FSM can already deal with such duplicates since they could have been caused by corrupted acks. Note that rdt2.2 and rdt3.0 are also identical in that the receiver will resend the last ack if it receives a corrupted packet: this will give the sender early indication to retransmit the packet: it would be inefficient to remove this provision from rdt3.0, even though the sender would eventually retransmit anyway.
-
3.4.2 Pipelined Reliable Data Transfer Protocols
- rdt3.0 is functionally correct but it has very bad performance due to its stop-and-wait protocol
- Let us take an example of RTT=30ms, R of 1Gbps (10^9 bits per sec), packet size L of 1000 bytes
-
The amount of time to send the data onto the link is
-
The total time for a packet to be acknowledged and have the acknowledgement be sent back to the sender
- assume ACK is small and takes no transmission time
- approximately 267 kbps out of 1Gbps
-
To make use of the capabilities of the 1Gbps link we will pipeline the data being put out
-
In order to do that…
- Sequence number range must be significantly increased
- Sender or receiver must buffer more than one packet, specifically those that have been sent but not yet ACKed in case retransmission needs to occur
- Both of the requirements of sequence numbers and buffering must match the protocol such that lost, corrupted, and overly delayed packets can be detected
3.4.3 Go-Back-N (GBN)
- Transmits multiple packets, when available, without waiting for an ACK as long as it lies within the current window N
- We use a window as it will impose a limit a limit on the sender as a form of congestion control (3.7) and flow control (3.5)
-
Let us define “base” as the sequence number of the oldest unacknowledged packet, “nextseqnum” as the smallest unused sequence number
- There are 4 intervals for the sequences numbers
- [0, base-1]: Already acknowledged packets
- [base, nextseq-1]: Sent but not yet acknowledged
- [nextseqnum, base+N-1]: Available to be sent if application layer provides data
- [base+N, :]: Outside window range, ignore if application layer sends these
-
GBN uses a sliding window protocol, when the base packet has been acknowledged it will slide the window along and update the base value
- GBN uses ACKs only and no NACKs
-
- In a conventional finite state machine, the transition is associated with a set of input Boolean conditions and a set of output Boolean functions. In an extended finite state machine (EFSM) model, the transition can be expressed by an “if statement” consisting of a set of trigger conditions.
-
GBN sender responds to the following events
- Invocation from above: rdt_send() if window is not full, if full then data is returned to application layer
- Receipt of an ACK: Cumulative ACK, indicating all packets with sequence number up to and including n have been correctly received
- Timeout: If a timeout occurs, resend all packets have been previously sent but not yet acknowledged, we only need one timer for oldest transmitted but not yet acknowledged packet (base), if ACK is received but there are still additional transmitted but not yet acknowledged packets the timer is restarted. If there are no outstanding unacknowledged packets, the timer is stopped
-
GBN receiver actions
- If packet with sequence number n is received correctly and is in order (last delivered to application layer is n-1)
- In all other cases just resend ACK for most recently received correct packet
-
Receiver discards out of order packets
- Justified? Yes!
- It is justified because while the receiver could buffer the packets, the way that GBN is set up it if there is a drop in packet n (and we have buffered n+1), both n and n+1 will be re-transmitted anyways
- This way the receiver only needs to maintain the expected sequence number while the receiver needs to keep track of its window
3.4.4 Selective Repeat (SR)
- GBN is still restricted in terms of performance, if a window size and bandwidth delay product are large a single packet error can cause a LARGE number of packet retransmissions
- SR aims to avoid unnecessary transmissions by having sender only retransmit packets that it suspects were received in error
- The receiver then will be required to individually ACK all packets meaning that it is possible that some ACKs do not have to be received in order
- SR receiver will ACK a packet whether or not it is in order, out of order packets are buffered until the void is filled and a batch of packets can be delivered to the upper layer
- SR receiver will also acknowledge received packets with certain sequence numbers below current base (like image below) this is to avoid eventual retransmission as the timer will time out
- In addition to eventual retransmissions, the send_base will never be incremented and we will never receive new data
-
- I’m assuming the send_base and send_base+1 packets have not yet received their ACKs yet or have been lost in transit
- Note the SR receiver will re-acknowledge already received packets below current window base for this reason
-
SR sender and actions
- Data received from above: If in window send otherwise buffer or return
- Timeout: Each packet has its own timer and we will retransmit for each individual packet if they timeout
- ACK received: If packet sequence number is == send_base increment send_base to smallest unacknowledged packet with the smallest sequence number
-
SR receiver and actions
- Packet with sequence number in [rcv_base, rcv_base+N-1] is correctly received: If in window send selective ACK, and buffered. If this packet is rcv_base send all consecutive ones to the application layer and increment rcv_base accordingly
- Packet with sequence number in [rcv_base-N, rcv_base-1] is correctly received: ACK must be generated even though it is a packet that is already acknoweledged
- Otherwise ignore
- There is a lack of synchronization between sender and receiver take the following example
- In the following example both cases appear equal to the receiver and we do not know if a new packet was transmitted or if it was a retransmission
-
This brings up the question how much smaller should the window size be compared to the sequence number?
- Window size should be <= half the size of the sequence number space for SR protocols
- So far we have assumed that packets cannot be reordered in the underlying channel between sender and receiver and only be dropped
-
In a network with multiple nodes it is possible reordering can occur as not all packets will go through the same number of nodes to reach the receiver
- This means that an ACK or sequence number of x not in the current window may appear
- As sequence numbers are reused the above problem may occur if the sequence numbers have looped around
- One way to mitigate this is to set a maximum lifetime of a packet such that if this does happen we can check a lifetime to say the packet is too old and this should not be possible, in TCP a value of 3 minutes
- This can be completely avoided? [Sunshine, 1978]
3.5 Connection-Oriented Transport: TCP
- TCP is defined in RFC 793, 1122, 1323, 2018, 2581
3.5.1 The TCP Connection
- Connection-Oriented: Must use a handshake
- Not TDM or FDM (time division, frequency divison) or virtual cirtuit
- TCP runs on end system s and not intermediate network elements
- Full duplex, A à B and A ß B data transfer can happen at the same time
- Point-to-point connections: Single sender and receiver
-
Three way handshake: Connection establishment procedure
- 1 client à server, 2 server à client, 3 client à server note the 3rd segment may have a payload
-
The buffers in the TCP are used to store the data passed down from the application layer (sender) and also to store data before passing it to the application layer (receiver)
-
Segment size of a TCP segment?
- MSS: Maximum segment size, the maximum size of a data can be used
- MTU: Maximum transmission unit, the maximum size of the link layer frame
-
MTU = data_from_application_layer + TCP header
- MSS == largest amount of data we can incorporate into an MTU so that data + TCP header = MTU
- TCP Segment contains both header and data
3.5.2 TCP Segment Structure
- For large files TCP splits it into chunks of size MSS (or less for last chunk)
-
TCP header is typically 20 bytes (12 more than UDP)
- Still contains source and destination port numbers and checksum calculated the same as UDP (sum and 1’s complement also wrap around carry)
-
Other fields in the TCP segment
- 32 bit sequence number and acknowledgement number
- 16 bit receive window used for flow control (refer to later) used to indicate the number of bytes receiver is willing to accept
- 4 bit header length specifies length of TCP header in units of 32 bit words, exists because options is a variable length
- Options field used when sender and receiver negotiate MSS or window scaling factor (high speed networks) also time-stamping option
-
Flag field
- ACK: Indicate value in ACK field is valid
- RST, SYN, FIN: Connection setup/teardown
- PSH: Tell receiver to immediately push data to upper layer
- URG: Urgent, location of last byte of urgent data is indicated by 16 bit urgent data pointer field
- NOTE: URG and PSH are not used
-
SEQUENCE NUMBERS AND ACKNOWLEDGMENT NUMBERS
- TCP views data as unstructured but ordered stream of bytes
- Sequence number: Number of the first byte in the segment
-
Full duplex (Host A = client, Host B = server)
- The acknowledgment number that Host A puts in its segment is the sequence number of the next byte Host A is expecting from Host B
- TCP uses cumulative ACK
-
Example 1 of Sequence Number
- Host A received 0-535 from B and about to send segment to Host B
- Host A is now waiting for byte 536 and beyond
- Host A puts 536 in the acknowledgment number field of the segment it sends to B
-
Example 2 of Sequence Number
- Host A has received 0-535 and 900-1000 from Host B
- Host A is still waiting from 536-899 from Host B and is about to send a segment to Host B
- Next Host A segment will contain 536 in the acknowledgment number field
-
TCP standard does not impose any rules for out-of-order segments there are 2 main ways to deal with it (GBN SR kinda)
- Receiver discards out-of-order segments
- Receiver buffers out-of-order bytes and waits for missing bytes to fill the gaps
-
TCP chooses random initial sequence numbers
- Minimizes chance some segment still in network from an earlier terminated connection between two hosts is mistaken for a valid segment
-
TELNET A CASE STUDY FOR SEQUENCE AND ACKNOWLEDGEMENT NUMBERS
- Host A initiates Telnet and becomes client, Host B receives Host B’s request and becomes server
- Telnet server echoes whatever the client sent back to the client to ensure characters seen by the Telnet user have already been received and processed
-
Take the following example
- Let the starting sequence numbers be 42 and 79 for client and server which are the sequence numbers for the first byte in data field
- Let the TCP connection be already initiated
- Segment 1: Client sends ‘C’ to the server with the initial sequence numbers
- Segment 2: Server receives ‘C’ and echoes back with the ‘C’ with the ACK 43 meaning received everything up through byte 42 and now waiting for bytes 43 and onward
- Segment 3: Client acknowledges data has been received from the server and has an empty data field. ACK 80 as it is now expecting 80 and beyond
- Note: Seq and ACK values are reversed on the client/server side, seq always represents what that side is sending and ACK always represents what the other side is sending
3.5.3 Round-Trip Time Estimation and Timeout
- Recall timeout is required for recovery from lost packets
- Timeout should be larger than RTT, how much?
-
ESTIMATING THE ROUNT-TRIP TIME
-
Let SampleRTT be a RTT be a measured value from one of the transmitted but currently unacknowledged segments
- SampleRTT can NEVER be a value calculated from a retransmitted segment
-
We will estimate our current RTT from the Sampled RTT
- Alpha is often taken to be 1/8
- This is a weighted average of Sample RTT values where newer SampleRTTs have more weight
- Exponential weighted moving average (EWMA) function
-
We can measure the variability in RTT with DevRTT
- Beta is most often .25
-
-
SETTING AND MANAGRING THE RETRANSMISSION TIMEOUT INTERVAL
- We are given values of EstimatedRTT and DevRTT
-
Use the following equation to determine TimeoutInterval
- TimeoutInterval initial value is 1 second as recommended by RFC 6298
- When timeout occurs for TimeoutInterval is doubled to avoid premature timeouts for following segments
3.5.4 Reliable Data Transfer
- Recall IP is unreliable best effort service therefore it falls on TCP to provide reliable data service
- We do this by doing timers and using timeout events to retransmit, however timer management can require considerable overhead
- Therefore we use a single retransmission timer even if there are multiple unACKed segments
- Below is a SIMPLIFIED VERSION NOT FULL VERSION
-
-
3 cases as follows
- Data received from above, if timer is not running start the timer otherwise pass it to IP and update sequence number
- On timer timeout retransmit oldest unACKed packet and restart timer
-
ACK received, if received ACK number is larger than current SendBase (sequence number of oldest unACKed) update SendBase
- Because TCP uses cumulative ACKs if the current received ACK > SendBase we can assume everything before “received ACK” is correctly received.
- Also restart timer
-
-
A FEW INTERESTING SCENARIOS
- Refer to page 246 of the textbook this should be obvious by now
-
DOUBLING THE TIMEOUT INTERVAL
- If a timeout occurs double the timeout interval otherwise if it is a normal ACK (case 3 in the code above) we adjust using the EstimatedRTT equation
- This can be seen as a limited example of congestion control as we severely cut the transmission rate after we detect a timeout
-
FAST RETRANSMIT
-
How to detect lost packet before timeout? Use duplicate ACKs
- Duplicate ACKs are reACKs for segments
- Duplicate ACKs are issued when receiver receives a segment with sequence number that is larger than the next, expected in order sequence number creating a gap
- It immediately re-acknowledges the last in-order byte of data received
-
Due to the large number of back to back segments we wait for 3 duplicate ACKs to fast retransmit the missing segment before timer expires
- Why 3 Duplicate ACKs?
- Since TCP does not know whether a duplicate ACK is caused by a lost segment or just a reordering of segments, it waits for a small number of duplicate ACKs to be received. It is assumed that if there is just a reordering of the segments, there will be only one or two duplicate ACKs before the reordered segment is processed, which will then generate a new ACK. If three or more duplicate ACKs are received in a row, it is a strong indication that a segment has been lost.
-
The change to the ACK received condition to code above
- Refer to Table 3.2 for TCP ACK Generation Recommendation
-
3.5.5 Flow Control
- In the receive buffer data is stored however the data is not always immediately read from the buffer
- TCP offers flow-control service to make sure the receiver buffer does not overflow speed matching the sender and receiver
- Flow and congestion control are similar but done for different reasons
- For now let us assume TCP receiver discards out of order segments
- A receive window must be maintained at the sender representing the buffer space available at the receiver
-
At the receiver (Host B) the following must be defined
- LastByteRead: Last byte in data stream read from buffer by application
- LastByteRcvd: Last byte in data stream that has arrived from network and placed in buffer at B
-
The following conditions MUST be met
- In order to notify Host A of amount of space left Host B puts rwnd into the segments it sends to A
-
Host A then needs to keep track of 2 variables to make sure it does not overflow the receive buffer
- LastByteSen
- LastByteAcked
-
Host A now keeps track of the following to make sure to not overflow the receive buffer
- This is essentially the amount of unacknowledged data
-
Possible Problem: rwnd=0 at B, B has nothing to send to A, A only knows rwnd=0
- Host B then can empty all the data into the application
- Host A will now never send anything as it was never told that the receive buffer was cleared
- To solve this problem TCP requires Host A to continue to send segments with one data byte even when B’s receive window is zero then Host B will acknowledge these packets and then sooner or later rwnd will be updated to not equal 0
3.5.6 TCP Connection Management
- TCP connection establishment can add significantly long delays
- Often this handshake can actually be exploited like a SYN flood
-
Initiating Connection Steps
- 1. Client sends special TCP segment with SYN bit set to 1, client also chooses an initial sequence number (client_isn) this is known as the SYN segment
- 2. TCP SYN segment arrives at the server allocates the variables and buffers and sends a segment with SYN bit set to 1, sets client_isn+=1 and a new server sequence number (server_isn) this is known as the SYNACK segment sent to client
- 3. Client receives the SYNACK and allocates buffers and variables for itself, it then sets server_isn+=1 and sets SYN=0 and completes the handshake, this segment can contain actual application data all future segments also have SYN=0
- After connection ends, resources at hosts are deallocated
-
Ending Connection Steps
- 1. TCP client sends a segment with FIN=1
- 2. Server sends an ACK to client
- 3. Server sends its own segment with FIN=1
- 4. Client sends an ACK
-
TCP States, the lifespan of a TCP connection
-
- TIME_WAIT is often 30 seconds to 1 minute it exists to allow TCP client to resend final ACK in case the ACK is lost
-
-
Three possible outcomes from sending a SYN packet
- 1. Source host receives TCP SYNACK segment from target host, port is open and connection is established
- 2. Source host receives a TCP RST segment from target host. Port is closed on target host segment received has RST=1 set, means please don’t resend segment
- 3. Source host receives nothing, firewall has blocked port
3.6 Principles of Congestion Control
3.6.1 The Causes and Costs of Congestion
-
SCENARIO 1: TWO SENDERS, A ROUTER WITH INFINITE BUFFERS
- Assume a Host A sends x bytes per second and Host B sends x bytes per second, assume their data is routed through a router with link capacity of R and an infinite buffer size
- Assume data is sent into the socket only once, and A à Host C B à D on the other side of the router
- When Host A and Host B start to saturate the router with each R/2 the average delay grows exponentially even with throughput of R/2 (which might be a good thing as link is fully utilized)
- When sending rate exceeds R/2 the average number of queued packets will become unbounded
- Per-connection throughput: Number of bytes per second at the receiver
- Large queueing delays are experienced as the packet arrival rate nears link capacity
-
- Lambda_in=Host A in to router
- Lambda_out=Receiver output
-
SCENARIO 2: TWO SENDERS AND A ROUTER WITH FINITE BUFFERS
- Similar to scenario 1 but buffer is now finite buffers, also assume connection is reliable transfer (retransmission)
- Lambda_in=Host A application to transport
- Lambda’_in=Host A transport to network, known as offered output
- Lambda_out=Receiver output, per-connection throughput
- a) Host A magically knows when to transmit as to not overflow buffer
-
b) Host A retransmits only when packet is lost (with proper timeout)
- Assume offered load = R/2
- Assume data delivered to receiver is R/3
- Thus 0.333R on average is original data and 0.166R are retransmitted
- Note: Sender must perform retransmission for dropped packets due to buffer overflow as congestion increases
-
c) Host A retransmits prematurely
- Assume on average each packet is forwarded twice by the router thus giving asymptotic value of R/4 as offered load approaches R/2
- Note: Unneeded transmissions by sender in the face of large delays may cause a router to use its link bandwidth to forward unneeded copies of a packet
-
SCENARIO 3: FOUR SENDERS, ROUTERS WITH FINITE BUFFERS, AND MULTIHOP PATHS
-
- The model we are using here
- Assume lambda_in is same for all hosts, all routers have capacity R bytes/sec and all hosts use a proper timeout/retransmission mechanism
- Let us take the following case Host A à Host C through R1 and R2, thus A-C shares R1 with D-B and shares R2 with B-D
- For small values of lambda_in, lambda’_in will also be small and thus retransmissions are rare and low chance of overflowing the buffer and no problem
- For large lambda_in for both A-C and B-D in our specified routes we will have large lambda’_in for all connections
- B-D traffic at R2 will be much larger than A-C at R2
-
As A-C has to travel through R1 first the amount of traffic from A-C through R2 compared to B-D is much less therefore in the infinite case, R2 will be immediately be filled with all B-D packets causing the A-C throughout through R2 go to 0
- And thus A-C end-to-end throughput goes to zero
- And thus A-C end-to-end throughput goes to zero
- In this case we can see that we have wasted the work done by the first-hop router in forwarding a packet to the second-hop router
- Note: When a packet is dropped along a path, the transmission capacity that was used at each of the upstream links to forward that packet to the point at which it is dropped ends up having been wasted
-
3.6.2 Approaches to Congestion Control
- Two main approaches, end-to-end congestion control (no network assistance) and network-assisted congestion control (network-layer assistance)
-
End-to-end congestion control: Used by TCP as no network-layer assistance
- Presence of congestion in network is inferred by the end systems
- TCP segment loss is taken as an indication of network congestion and TCP addresses window size accordingly
-
Network-Assisted Congestion Control: Do I need to take notes on this
- Congestion information is typically fed back from the network to the sender in one of two ways
-
- Direct from network-router to sender this is a choke packet
- Router marks/updates a field in the packet from sender à receiver, receiver then notifies the sender of the congestion indication (requires at least one RTT)
3.6.3 Network-Assisted Congestion-Control Example: ATM ABR Congestion Control
3.7 TCP Congestion Control
- TCP has each sender limit the rate at which it sends traffic as a function of perceived network congestion
-
Questions
- How does a TCP sender limit the rate at which it sends traffic into connection?
- How does TCP sender perceive congestion?
- What algorithm should sender use to change send rate as a function of perceived end-to-end congestion?
-
Recall that the sender keeps track of LastByteSent, LastByteAcked and rwnd we introduce a new variable cwnd (congestion window) and adjust our amount of unACKed data to
-
For now assume we are only solely limited by cwnd
- Thus sender’s send rate is ~
- Adjusting cwnd can change rate at which it sends data into its connection
-
But how does TCP perceive congestion?
- Recall that losses occur on a timeout or triple duplicate ACKs
- If the network is currently congestion free TCP will interpret triple duplicate ACKs as successfully delivered to the destination and increase congestion window size
- There is no explicit signaling of congestion by network
- TCP uses ACKs to trigger/clock its congestion window size and thus is called self-clocking
-
How do TCP senders determine how much you’re congesting the network and aer TCP senders explicitly coordinated or is there a distributed approach in which TCP senders can set their sending rates based only on local information?
- A lost segment implies congestion and hence the TCP sender’s rate should be decreased when a segment is lost
- An ACKed segment indicates that the network is delivering the sender’s segments to the receiver and hence, the sender’s rate can be increased when an ACK arrives for a previously unacknowledged segment
- Bandwidth Probing: Increase its rate in response to arriving ACKs until a loss event occurs and increases its transmission rate to probe for the rate that at which congestion onset begins backs off and then increases again to see if congestion rate has changed
- TCP congestion algorithm has 3 parts, slow start, congestion avoidance, and fast recovery
-
SLOW START
-
cwnd: Initialized to 1 MSS
- If MSS=500 bytes and 200msec, 500*8/200=20kbps initial rate
-
In slow start cwnd=1 MSS, and if we do not experience a timeout we continue doubling cwnd
- If we hit a timeout we set cwnd=1 and set ssthresh=cwnd/2 (slow start threshold) and restart the doubling
- If we hit triple ACKs, we move into fast recovery state
- When we hit cwnd==ssthresh this time we move to congestion avoidance state
-
-
CONGESTION AVOIDANCE
- On hitting cwnd ~ ssthresh we move to this mode
- Increase cwnd by 1 MSS/cwnd (aka MSS bytes) every RTT
- In the case of timeout we reset cwnd=1 MSS and ssthresh=cwnd/2 and then move to the slow start state again
- In the case of a triple ACK, we set ssthresh=cwnd/2 and cwnd=ssthresh+3*MSS and move to the fast recovery state which is less drastic change versus timeout case
-
FAST RECOVERY
- Increase cwnd by 1 MSS for every duplicate ACK received for the missing segment that caused TCP to enter fast recovery state and continue sending new packets
- If ACK arrives for missing segment move to congestion-avoidance state
- Else if we get a timeout move to slow start state
-
TCP CONGESTION CONTROL: RETROSPECTIVE
-
TCP congestion control is additive-increase, multiplicative-decrease (AIMD)
- +1 MSS and halving cwnd on duplicate ACK
-
This gives TCP a sawtooth congestion width
-
-
MACROSCOPIC DESCRIPTION OF TCP THROUGHPUT
- Let us ignore slow-start in our simplified study
- Let ‘w’ be the window size, RTT, and ‘W’ the value of w when loss event occurs
- If there is no losses then the transmission rate is roughly w/RTT
-
If there are losses then transmission rate ranges from W/(2*RTT) to W/RTT
- Assuming RTT and W are approximately constant
-
Assuming that we have 50% in each of the ranges
-
TCP OVER HIGH-BANDWIDTH PATHS
- With this formula for 10Gbps throughput we find that TCP can only tolerate 1 loss for about 5 billion packets in order to maintain it
- People are now trying to improve upon TCP to fix this problem of internet speed
3.7.1 Fairness
- Fair: If for K TCP connections over a link capable of R fair is if each TCP connection gets R/K rate
-
Suppose at some point in time we have two connections on a single link with the same MSS and RTT traveling over a bottlenecked router of capacity R and are at point A in the following graph
-
- Goal is to achieved throughputs fall somewhere near the intersection of equal bandwidth share line and full bandwidth utilization line
-
At A
- Sum of two connections rate is < R thus we will increase cwnd by 1 MSS per RTT
- We will then exceed the maximum possible and hit B and decrease window factor by 2 (timeout) and move to C
- This repeats until we converge on the intersection point even with multiple hosts using the same link thus TCP should (!) be fair
-
-
TCP FAIRNESS AND UDP
- TCP can be unfair in the presence of different RTTs the closer one will grab the available bandwidth more quickly as it becomes free and thus more throughput
-
TCP can be unfair with parallel connections if an application uses multiple parallel connections it gets a larger fraction of the bandwidth in the congestion link
- Let there be 9 current TCP connections and a application A joins R will now be divided into R/10 per connection
- Now application A uses 11 parallel connections and each connection gets R/20 per connection HOWEVER
application A gets 11R/20 as it is using 11 of those 20 connections
- UDP doesn’t care about limiting send rates and thus it is possible that TCP will lower the send rate sensing losses and UDP can crowd out TCP traffic
Chapter 4: The Network Layer
4.1 Introduction
4.1.1 Forwarding and Routing
- Forwarding: Act of moving a packet from router to router
-
Routing: The act of determining the correct router the packet should go to this is done using routing algorithms
- Centralized routing algorithm: Algorithm on central site which downloads routing tables to router
- Decentralized: Routing algorithm is run on each individual router
-
Routers use forwarding tables to determine where to forward a packet
- Each packet has a index to forwarding table in the header which indicates which outgoing link interface to forward the packet
- The header value can either be destination address or indication of the connection the packet belongs
- Packet switch: General term for a device that forwards packets from input link interface to output link interface depending on header value
- Link-layer switches: Base forwarding decision on values in the fields of the link-layer frame (layer 2)
- Router: base forwarding decision on the value in the network-layer (layer 3)
-
CONNECTION SETUP
- Network layer servers 2 purposes forwarding and routing
- Some computer networks also have a third purpose connection setup like TCP 3 way handshake
- ATM frame relay and MPLS require routers to have a handshake
- Network layer servers 2 purposes forwarding and routing
4.1.2 Network Service Models
- Network service model: Defines characteristics of end-to-end transport of packets between sender and receiver end systems
-
Possible services network layer may provide
- Guaranteed delivery: Guarantee packets will eventually arrive at destination
- Guaranteed delivery with bounded delay: Guarantee packets will eventually arrive at destination and delivery within a specified host-to-host delay bound (expected delivery time)
-
Services that can be provided for a flow of packets
- In-order packet delivery
- Guaranteed minimal bandwidth
- Guaranteed maximum jitter: Guarantee amount of time between the transmission of two successive packets at the sender is equal to the amount of time between their receipt at the destination
- Security services
4.2 Virtual Circuit and Datagram Networks