Douglas Comer defines a protocol as "a formal description of message formats and the rules two or more machines must follow to exchange those messages."
Protocols usually exist in two forms. First, they exist in a textual form for humans to understand. The majority of Internet protocols are distributed as RFCs, which can (and should) be read to understand the protocols' design and operation. Second, they exist as programming code for computers to understand. Both forms should ultimately specify the precise interpretation of every bit of every message exchanged across a network.
Protocols should exist at every point where logical program flow crosses between hosts or programs. In other words, we need protocols every time two different computers or programs need to agree on how they will communicate information between them. Every time we want to print something on a network printer we need protocols, otherwise there will be no agreement on how to pause the sending computer's output if the printer falls behind. Every time we want to download a file we need protocols, otherwise the computers will be unable to agree on which file should be downloaded. Every time we want to save our work on disk, we don't need protocols - unless the disk is on a network file server.
Usually multiple protocols will be in use simultaneously. For one thing, computers usually do several things at once, and often for several people at one. Therefore, most protocols support multitasking. Also, one operation can involve several protocols. For example, consider the NFS (Network File System) protocol. A write to a file is done with an NFS operation, that uses another protocol (RPC) to perform a function call on a remote host, that uses another protocol (UDP) to deliver a datagram to a port on a remote host, that uses another protocol to delivery a datagram on an Ethernet, and so on. Along the way we made need to lookup host names (using the DNS protocol), convert data to a network standard form (using the XDR protocol), find a routing path to the host (using one or many of numerous protocols) - I think you get the idea.
Initially, protocols were specified using an explicit description of how every bit in a binary message should be interpreted. For example, RFC 791 Section 3.1, part of the IP Protocol, specifies the exact interpretation of every bit in the IP packet header. In more recent years, it has become popular to specify protocols using a higher-layer description to avoid such tedious details, while avoiding ambiguity. Two popular means of doing this are ASCII Request/Reply and ASN.1.
In addition to specifying message formats, a protocol may also specify when certain messages are allowed to occur. For example, a file transfer protocol may not allow a READ message until after an OPEN message has been successfully transferred. State diagrams are the most popular way to do this (see RFC 793 Section 3.2 for an example), though ITU-T standards use a formal graphical syntax called SDL.
One of the challenges facing network designers is to construct protocols that are as specific as possible to one function. For example, I consider NFS a good protocol design because one protocol does file transport (NFS), one protocol does procedure calls (RPC), etc. If you need to make a remote procedure call to print a file, you already have the RPC protocol that already does almost everything you need. Add one piece to the puzzle - a printing protocol, defined in terms using the RPC protocol, and your job is done.
On the other hand, I do not consider TCP a very good protocol, because it mixes two functions: reliable data delivery and connection-oriented streams. Consequently, the Internet lacks a good, reliable datagram delivery mechanism, because TCP's reliable delivery techniques, while effective, are specific to stream connections.