Week 1, Day 1
- Let’s start with an “old school” model of the web, back when everything was static data.
- Demonstrate visiting http://example.com or some other web page. Perhaps https://web.archive.org/web/19970212132224/http://www.gvsu.edu/
- At a high level, “using the web” was fundamentally
- Your computer asking another computer for some information
- The other computer sending that information
- Your computer displaying a “pretty picture” of that information
-
So, let’s start by going through what it takes for your computer to “ask for” and “receive” information from another computer.
- Actually, before we do that, let’s review the separation of implementation and interface.
- Alarm clock example.
- Recall linked lists and dynamic arrays (lists, vectors) from CIS 163.
- Both store items in a sequence (there is a first item, a second item, a third item, etc.)
- Both have the same fundamental operations (
get
,put
,add_at_end
,remove_from_end
) - If you implement both a linked list and a vector using the interface (i.e., the methods have identical signatures — names and parameters), then you can swap one for another without modifying your code (other than to specify which type of list to use). Your code should still run, just with different performance characteristics.
- This use of a common interface to different implementations is common in application design.
- Quite common in web development also (e.g., middleware).
- Another common use is the serial I/O interface.
- The sending of data from one place to another is often conceptualized as a stream.
- Think of it as writing 1s and 0s on ping pong balls and shoving them into a pipe and having them pop out the other end in the same order.
- Think about printing to the terminal. In some sense you are “shoving” letters into a pipe only to have them “pop out” on the screen.
- In python the
print
function is (more or less) a short-cut forsys.stdout.write
sys.stdout.write('Hello, World!\n')
- When you write to a file, you do something like this:
f = open('...')
f.write('Hello, World\n')
- (Notice the use of
write
in both cases.)
- You can write a function that is agnostic to where the data is being sent:
def send_message(output):
output.write('Hello, World!\n')
send_message(sys.stdout)
send_message(open('new_file.txt', 'w'))
- Key idea: The same code can write to either the screen or a file without needing any changes (other than the one line that sets up where the output goes).
- The same is true for reading data. If you use
sys.stdin.readline()
instead ofinput()
, code that reads from the keyboard is nearly identical to code that reads from a file. - When you run the
open()
function in Python, the library that implementsopen
returns to you an object withreadline
andwrite
methods that are somehow “connected” to the desired file. - There are similar built-in functions in Python that return to you an object with
readline
andwrite
methods that is “magically” connected to a program running on another computer.
import socket
# Connect to the other machine and set up the stream
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("www.example.com", 80))
stream = s.makefile('rw') # At this point, 'stream' is similar to what is returned from 'open'
# "Talk" to the other machine.
stream.write("GET / HTTP/1.0\n")
stream.write("host: www.example.com\n")
stream.write("\n")
stream.flush()
# Listen to the reply
headers = {}
for line in stream:
if len(line.strip()) == 0:
break
parts = line.strip().split(':')
if len(parts) != 2:
print(f"WTF: {parts}")
else:
key, value = parts
headers[key] = value
for line in stream:
print(line.strip())
Networking Basics
- To explain this code, we need a little more networking background.
- (Many of you already know this; but, let’s make sure everybody is on the same page.)
- We need to identify the machine we want to talk to.
- Machines are assigned IP addresses (e.g.,
93.184.215.14
)- If you have a machine you want to connect to the Internet, you need to get an IP from your ISP.
- The Internet Assigned Numbers Authority (IANA) manages the distribution of IP addresses.
- Humans prefer names to numbers (e.g.,
example.com
) - A web service called
DNS
(Domain Name System) takes a name and returns the corresponding number. - The networking library does this lookup for us, so we can ask for a connection to
example.com
, and as part of theconnect
call, the name gets converted to the number used by the underlying network.
- Machines are assigned IP addresses (e.g.,
- Also need to identify which program on the machine.
- Many different programs can run on one machine.
- We use a port to specify a program.
- Think of it like an old-school “exchange” on a telephone number.
- When a server program starts up, it needs to register with the operating system saying, effectively: “If somebody knocks on the door and asks for ‘port 80’, send the traffic to me”.
- For convenience and consistency, certain programs/services have a conventional port
- HTTP: 80
- HTTPS: 443
- SMTP (email): 25
- Secure email: 587
- SSH: 22
- https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers
- Using a convention means we can remember the name only, and we don’t have to remember a random number along with each name.
- Programs must have administrator access to request ports below 1024.
- This prevents regular users from “spoofing” services (e.g. launching their own, unofficial web server on port 80).
- So, in the example above,
connect(('example.com', 80))
asks the networking system to open a connection to machine registered asexample.com
– specifically to the program that has registered as “listening” to port 80. - At this point, the two programs “talk” to each other by reading and writing to the established stream.
SSL
- In the “old days”, data was sent over the Internet using clear text.
- Anybody with access to any of the routers along the message’s path could read the message.
- The solution was to encrypt data before putting on the network.
- This encryption is often done using SSL: _S_ecure _S_ockets _L_ayer.
- Using SSL means adding handlers on each side of the stream to encrypt/decrypt data.
- Also need to add extra steps at the beginning to choose and exchange the secret key used for encryption.
- Secure connections typically use a different port.
- For example, 80 for unencrypted web traffic, and 443 for encrypted web traffic.
- 25 for unencrypted email and 587 for encrypted email.
- (Using a different port is how the client knows whether to open an encrypted or unencrypted connection.)
- Switch to Browsers and Servers notes