You should have user honeypots. Once in a while present a fake certificate. If the user ignore the wrong fingerprint and type in the correct password, reset the account password.
That is an interesting idea. It is easy to MITM our SSH client connections. But, this control comes with a large expense. Because it is easy for our clients to see Security's actions, and it is hard for them to see the actions of attackers, they will conclude that Security is being evil for no good reason. This will greatly reduce our effectiveness by isolating Security from our community. Other controls may mitigate this problem with less expense.
For example, we are currently pushing our people to adopt widespread 2-factor authentication. Our people are ready to accept 2-factor. They understand it's value. They are familiar with it's use. We have multiple cheap 2-factor solutions. 2-factor somewhat mitigates MITM and also helps other issues.
That said, I think we really need a simpler form of SSH for trusted point-to-point communications. It should exclusively use pre-distributed one-time pads for it's authentication and encryption. We can now generate and distribute 100+ Gigabyte files of true-random data. This data can be used to authenticate. It can be used to generate secure symmetric encryption keys. We can handle millions of secure connections before we need to redistribute pads again.
Since I am not a cryptographer, this idea has many problems. But I believe that securely using these huge one-time pads could be as easy as:
- Ask Schneier for a good, symmetric encryption algorithm :)
- Select a key-size that is twice as long as Schneier thinks we need :) So, if Schneier thinks 512bits are fine, we use 1024 bit keys. This is only 128 bytes.
- Generate about 128 Gigabytes of random data from a truly random noise source. Use 64Gigs of it for connection keys. That will allow about 512 million connections. This may be excessive and need to be adjusted.
- Use the rest of the Random data 2 Gigs at a time. This gives you 32 records. The server always gets the first copy/install of the file. The server always uses the first record. Each subsequent client copy/install uses the data in it's record for install identification and session identification. This may not be enough records. It may need to be adjusted. But, it probably should not increase to hundreds. If there are too many copies, it is impossible to protect confidentiality.
- Throw away the first key record. You can spare some. Use that space to write down the GMT time-stamp when this file was created and the number of times the file has been copied.
- Use the next key record as the FileID for this file.
- The server only tries to use uses 1 pad file at a time.
- When the server starts up, it skips down the number of keys indicated by it's current key index or the number of minutes since pad creation, whichever is greater. If the server detects that GMT time is running backwards, it should terminate with a descriptive error message.
- Every minute, it switches to the next key in the list. Don't worry, this will only use up 10 million of your possible keys in 20 years. The server should not attempt to respond to more than one connection attempt per second.
- Whenever the server has authenticated a successful connection, it switches to the next key in the list.
- When something pokes it's port, the server assembles a message that says something like: Number of non-padding bytes in message. Message Type 0. Server Message#1. I have received 0 of your messages. I am copy 1 of the file with the ID of #FileID. My Copy ID is (the first field in my Copy ID Record). The local time is (current time). The number of times I have incremented keys is: (CurrentKeyIndex). The number of successful connections is (ConnectionNumber). The authentication number for this connection is (use ConnectionNumber to index into the Copy ID Record and retrieve the value). Optional padding. End of Server Message #1.
- Then the server encrypts all that info using the current encryption key and sends it out to the client. It should all fit in a standard ethernet/IP/TCP packet. All messages must be padded to the same length. A good starting message length is probably 1400 bytes.
- The client uses the current time as a guess to a starting index into the key data. It should probably start 1 before to allow for sloppy timekeeping. It sequentially tries each key until it manages to decrypt the server's message. It should probably give up and fail with an error if tries more than 20 keys. This number may need adjusting. When it fails, the client drops the connection without saying anything.
- If the client decodes the server message, it then checks it's own expected and calculated information against the info provided by the server. If it doesn't check out, it drops the connection and sends an urgent error message that somebody is attempting to mimic the server using a replay attack. If it checks out, it uses the key to encrypt it's response. It also updates it's CurrentKeyIndex.
- The response of the client looks like: Number of non-padding bytes in message. Message Type 1. The latest message I have decoded from you is (LastServerMessageNumber). This is my Message #1. Nice to meet you. I am copy (whatever) of the file with the ID of #FileID. My Copy Id is (the first field in my Copy ID Record. My local time is (timestamp). I have now updated and crossed off (CurrentKeyIndex) number of keys. My number of successful connections is (ConnectionNumber). The authentication number for this connection is (use ConnectionNumber to index into the Copy ID Record and retrieve the value.) Optional padding. End of Client Message #1.
- Then the server checks the client's supplied info for inconsistencies. If it fails, the server crosses off the key, drops the connection, and sends an urgent error message that somebody is attempting to mimic the client via a replay attack. It if checks out, the server sends an encrypted acknowledgement, and updates it status information on that copy of the file.
- Once the client receives the acknowledgement, it updates it's info on the server. Then both sides continue the encrypted conversation. The conversation looks like a sequence of encrypted messages.
- Most messages have the same format: Number of non-padding bytes in message. Message Type 2. Timestamp. From (Copy #) To (Copy #). Your latest message was (whatever). This is my message (whatever). [MESSAGE CONTENTS] Optional padding. End of message (whatever).
- You will also need some utility messages. A NAK may look like: Number of non-padding bytes in message. Message Type 3. Timestamp. From (Copy #) To (Copy #) Please re-transmit everything after message (whatever). This is my message (whatever). Optional padding. End of message (whatever).
- A FIN may look like: Number of non-padding bytes in message. Message Type 4. Timestamp. From (Copy #) To (Copy #) Your latest message was (whatever). Time to say goodbye. Optional padding. End of message (whatever).
- A Change Key may look like: Number of non-padding bytes in message. Message Type 5. Timestamp. From (Copy #) To (Copy #) Your latest message was (whatever). I'm feeling paranoid. Lets change to the next key. Optional padding. End of message (whatever).
- An Oh Shit may look like: Number of non-padding bytes in message. Message Type 6. Timestamp. From (Copy #) To (Copy #) Somebody just showed up with a NSL. I'm wiping my key-files/one-time pads. You should wipe this key-file/pad. Send lawyers, money, guns. So long and thanks for all the fish. Optional padding. End of message (whatever).
As you can see, this system is very simple,crude and inefficient. We are just re-implementing the old concepts of secure phones using 1-time pads. None of this is new. We can use simple logic because we don't want or need complexity. It allows for 1 server and multiple clients. You have to redo this logic to have more than one server per pad/keyfile. It only solves one problem, but it is so simple that it should eliminate almost all opportunity for logic and programming flaws. Remember, complexity is the enemy. We don't care about efficiency. We want security. The NSA has used feature creep to corrupt many forms of existing crypto.
This proposal is connection oriented, but it can run on TCP or UDP or ICMP. You probably want to use TCP to reduce spoofing, DoS opportunities and sort out some of the low level attacks. If you do, you have to remember that you can't trust TCP to eliminate spoofing or verify message delivery.