diff options
author | Erik Nordmark <Erik.Nordmark@Sun.COM> | 2009-11-11 11:49:49 -0800 |
---|---|---|
committer | Erik Nordmark <Erik.Nordmark@Sun.COM> | 2009-11-11 11:49:49 -0800 |
commit | bd670b35a010421b6e1a5536c34453a827007c81 (patch) | |
tree | 97c2057b6771dd40411a12eb89d2db2e2b2cce31 /usr/src/uts/common/inet/ip.h | |
parent | b3388e4fc5f5c24c8a39fbe132a00b02dae5b717 (diff) | |
download | illumos-joyent-bd670b35a010421b6e1a5536c34453a827007c81.tar.gz |
PSARC/2009/331 IP Datapath Refactoring
PSARC/2008/522 EOF of 2001/070 IPsec HW Acceleration support
PSARC/2009/495 netstat -r flags for blackhole and reject routes
PSARC 2009/496 EOF of XRESOLV
PSARC/2009/494 IP_DONTFRAG socket option
PSARC/2009/515 fragmentation controls for ping and traceroute
6798716 ip_newroute delenda est
6798739 ARP and IP are too separate
6807265 IPv4 ip2mac() support
6756382 Please remove Venus IPsec HWACCEL code
6880632 sendto/sendmsg never returns EHOSTUNREACH in Solaris
6748582 sendmsg() return OK, but doesn't send message using IPv4-mapped x IPv6 addr
1119790 TCP and path mtu discovery
4637227 should support equal-cost multi-path (ECMP)
5078568 getsockopt() for IPV6_PATHMTU on a non-connected socket should not succeed
6419648 "AR* contract private note" should be removed as part of ATM SW EOL
6274715 Arp could keep the old entry in the cache while it waits for an arp response
6605615 Remove duplicated TCP/IP opt_set/opt_get code; use conn_t
6874677 IP_TTL can be used to send with ttl zero
4034090 arp should not let you delete your own entry
6882140 Implement IP_DONTFRAG socket option
6883858 Implement ping -D option; traceroute -F should work for IPv6 and shared-IP zones
1119792 TCP/IP black hole detection is broken on receiver
4078796 Directed broadcast forwarding code has problems
4104337 restrict the IPPROTO_IP and IPPROTO_IPV6 options based on the socket family
4203747 Source address selection for source routed packets
4230259 pmtu is increased every ip_ire_pathmtu_interval timer value.
4300533 When sticky option ipv6_pktinfo set to bogus address subsequent connect time out
4471035 ire_delete_cache_gw is called through ire_walk unnecessarily
4514572 SO_DONTROUTE socket option doesn't work with IPv6
4524980 tcp_lookup_ipv4() should compare the ifindex against tcpb->tcpb_bound_if
4532714 machine fails to switch quickly among failed default routes
4634219 IPv6 path mtu discovery is broken when using routing header
4691581 udp broadcast handling causes too many replicas
4708405 mcast is broken on machines when all interfaces are IFF_POINTOPOINT
4770457 netstat/route: source address of interface routes pretends to be gateway address
4786974 use routing table to determine routes/interface for multicast
4792619 An ip_fanout_udp_ipc_v6() routine might lead to some simpler code
4816115 Nuke ipsec_out_use_global_policy
4862844 ipsec offload corner case
4867533 tcp_rq and tcp_wq are redundant
4868589 NCEs should be shared across an IPMP group
4872093 unplumbing an improper virtual interface panics in ip_newroute_get_dst_ill()
4901671 FireEngine needs some cleanup
4907617 IPsec identity latching should be done before sending SYN-ACK
4941461 scopeid and IPV6_PKTINFO with UDP/ICMP connect() does not work properly
4944981 ip does nothing with IP6I_NEXTHOP
4963353 IPv4 and IPv6 proto fanout codes could be brought closer
4963360 consider passing zoneid using ip6i_t instead of ipsec_out_t in NDP
4963734 new ip6_asp locking is used incorrectly in ip_newroute_v6()
5008315 IPv6 code passes ip6i_t to IPsec code instead of ip6_t
5009636 memory leak in ip_fanout_proto_v6()
5092337 tcp/udp option handling can use some cleanup
5035841 Solaris can fail to create a valid broadcast ire
5043747 ar_query_xmit: Could not find the ace
5051574 tcp_check_policy is missing some checks
6305037 full hardware checksum is discarded when there're more than 2 mblks in the chain
6311149 ip.c needs to be put through a woodchipper
4708860 Unable to reassemble CGTP fragmented multicast packets
6224628 Large IPv6 packets with IPsec protection sometimes have length mismatch.
6213243 Solaris does not currently support Dead Gateway Detection
5029091 duplicate code in IP's input path for TCP/UDP/SCTP
4674643 through IPv6 CGTP routes, the very first packet is sent only after a while
6207318 Multiple default routes do not round robin connections to routers.
4823410 IP has an inconsistent view of link mtu
5105520 adding interface route to down interface causes ifconfig hang
5105707 advanced sockets API introduced some dead code
6318399 IP option handling for icmp and udp is too complicated
6321434 Every dropped packet in IP should use ip_drop_packet()
6341693 ifconfig mtu should operate on the physical interface, not individual ipif's
6352430 The credentials attached to an mblk are not particularly useful
6357894 uninitialised ipp_hoplimit needs to be cleaned up.
6363568 ip_xmit_v6() may be missing IRE releases in error cases
6364828 ip_rput_forward needs a makeover
6384416 System panics when running as multicast forwarder using multicast tunnels
6402382 TX: UDP v6 slowpath is not modified to handle mac_exempt conns
6418413 assertion failed ipha->ipha_ident == 0||ipha->ipha_ident == 0xFFFF
6420916 assertion failures in ipv6 wput path
6430851 use of b_prev to store ifindex is not 100% safe
6446106 IPv6 packets stored in nce->nce_qd_mp will be sent with incorrect tcp/udp checksums
6453711 SCTP OOTB sent as if genetated by global zone
6465212 ARP/IP merge should remove ire_freemblk.esballoc
6490163 ip_input() could misbehave if the first mblk's size is not big enough
6496664 missing ipif_refrele leads to reference leak and deferred crash in ip_wput_ipsec_out_v6
6504856 memory leak in ip_fanout_proto_v6() when using link local outer tunnel addresses
6507765 IRE cache hash function performs badly
6510186 IP_FORWARD_PROG bit is easily overlooked
6514727 cgtp ipv6 failure on snv54
6528286 MULTIRT (CGTP) should offload checksum to hardware
6533904 SCTP: doesn't support traffic class for IPv6
6539415 TX: ipif source selection is flawed for unlabeled gateways
6539851 plumbed unworking nic blocks sending broadcast packets
6564468 non-solaris SCTP stack over rawip socket: netstat command counts rawipInData not rawipOutDatagrams
6568511 ipIfStatsOutDiscards not bumped when discarding an ipsec packet on the wrong NIC
6584162 tcp_g_q_inactive() makes incorrect use of taskq_dispatch()
6603974 round-robin default with many interfaces causes infinite temporary IRE thrashing
6611750 ilm_lookup_ill_index_v4 was born an orphan
6618423 ip_wput_frag_mdt sends out packets that void pfhooks
6620964 IRE max bucket count calculations performed in ip_ire_init() are flawed
6626266 various _broadcasts seem redundant
6638182 IP_PKTINFO + SO_DONTROUTE + CIPSO IP option == panic
6647710 IPv6 possible DoS vulnerability
6657357 nce should be kmem_cache alloc'ed from an nce_cache.
6685131 ilg_add -> conn_ilg_alloc interacting with conn_ilg[] walkers can cause panic.
6730298 adding 0.0.0.0 key with mask != 0 causes 'route delete default' to fail
6730976 vni and ipv6 doesn't quite work.
6740956 assertion failed: mp->b_next == 0L && mp->b_prev == 0L in nce_queue_mp_common()
6748515 BUMP_MIB() is occasionally done on the wrong ill
6753250 ip_output_v6() `notv6' error path has an errant ill_refrele()
6756411 NULL-pointer dereference in ip_wput_local()
6769582 IP must forward packet returned from FW-HOOK
6781525 bogus usesrc usage leads directly to panic
6422839 System paniced in ip_multicast_loopback due to NULL pointer dereference
6785521 initial IPv6 DAD solicitation is dropped in ip_newroute_ipif_v6()
6787370 ipnet devices not seeing forwarded IP packets on outgoing interface
6791187 ip*dbg() calls in ip_output_options() claim to originate from ip_wput()
6794047 nce_fp_mp prevents sharing of NCEs across an IPMP group
6797926 many unnecessary ip0dbg() in ip_rput_data_v6
6846919 Packet queued for ND gets sent in the clear.
6856591 ping doesn't send packets with DF set
6861113 arp module has incorrect dependency path for hook module
6865664 IPV6_NEXTHOP does not work with TCP socket
6874681 No ICMP time exceeded when a router receives packet with ttl = 0
6880977 ip_wput_ire() uses over 1k of stack
6595433 IPsec performance could be significantly better when calling hw crypto provider synchronously
6848397 ifconfig down of an interface can hang.
6849602 IPV6_PATHMTU size issue for UDP
6885359 Add compile-time option for testing pure IPsec overhead
6889268 Odd loopback source address selection with IPMP
6895420 assertion failed: connp->conn_helper_info == NULL
6851189 Routing-related panic occurred during reboot on T2000 system running snv_117
6896174 Post-async-encryption, AH+ESP packets may have misinitialized ipha/ip6
6896687 iptun presents IPv6 with an MTU < 1280
6897006 assertion failed: ipif->ipif_id != 0 in ip_sioctl_slifzone_restart
Diffstat (limited to 'usr/src/uts/common/inet/ip.h')
-rw-r--r-- | usr/src/uts/common/inet/ip.h | 2395 |
1 files changed, 1163 insertions, 1232 deletions
diff --git a/usr/src/uts/common/inet/ip.h b/usr/src/uts/common/inet/ip.h index 5a7e05b210..88a14068bb 100644 --- a/usr/src/uts/common/inet/ip.h +++ b/usr/src/uts/common/inet/ip.h @@ -55,8 +55,6 @@ extern "C" { #include <sys/squeue.h> #include <net/route.h> #include <sys/systm.h> -#include <sys/multidata.h> -#include <sys/list.h> #include <net/radix.h> #include <sys/modhash.h> @@ -94,6 +92,7 @@ typedef uint32_t ipaddr_t; /* Number of bits in an address */ #define IP_ABITS 32 +#define IPV4_ABITS IP_ABITS #define IPV6_ABITS 128 #define IP_HOST_MASK (ipaddr_t)0xffffffffU @@ -101,14 +100,6 @@ typedef uint32_t ipaddr_t; #define IP_CSUM(mp, off, sum) (~ip_cksum(mp, off, sum) & 0xFFFF) #define IP_CSUM_PARTIAL(mp, off, sum) ip_cksum(mp, off, sum) #define IP_BCSUM_PARTIAL(bp, len, sum) bcksum(bp, len, sum) -#define IP_MD_CSUM(pd, off, sum) (~ip_md_cksum(pd, off, sum) & 0xffff) -#define IP_MD_CSUM_PARTIAL(pd, off, sum) ip_md_cksum(pd, off, sum) - -/* - * Flag to IP write side to indicate that the appln has sent in a pre-built - * IP header. Stored in ipha_ident (which is otherwise zero). - */ -#define IP_HDR_INCLUDED 0xFFFF #define ILL_FRAG_HASH_TBL_COUNT ((unsigned int)64) #define ILL_FRAG_HASH_TBL_SIZE (ILL_FRAG_HASH_TBL_COUNT * sizeof (ipfb_t)) @@ -137,17 +128,12 @@ typedef uint32_t ipaddr_t; #define UDPH_SIZE 8 -/* Leave room for ip_newroute to tack on the src and target addresses */ -#define OK_RESOLVER_MP(mp) \ - ((mp) && ((mp)->b_wptr - (mp)->b_rptr) >= (2 * IP_ADDR_LEN)) - /* * Constants and type definitions to support IP IOCTL commands */ #define IP_IOCTL (('i'<<8)|'p') #define IP_IOC_IRE_DELETE 4 #define IP_IOC_IRE_DELETE_NO_REPLY 5 -#define IP_IOC_IRE_ADVISE_NO_REPLY 6 #define IP_IOC_RTS_REQUEST 7 /* Common definitions used by IP IOCTL data structures */ @@ -157,31 +143,6 @@ typedef struct ipllcmd_s { uint_t ipllc_name_length; } ipllc_t; -/* IP IRE Change Command Structure. */ -typedef struct ipic_s { - ipllc_t ipic_ipllc; - uint_t ipic_ire_type; - uint_t ipic_max_frag; - uint_t ipic_addr_offset; - uint_t ipic_addr_length; - uint_t ipic_mask_offset; - uint_t ipic_mask_length; - uint_t ipic_src_addr_offset; - uint_t ipic_src_addr_length; - uint_t ipic_ll_hdr_offset; - uint_t ipic_ll_hdr_length; - uint_t ipic_gateway_addr_offset; - uint_t ipic_gateway_addr_length; - clock_t ipic_rtt; - uint32_t ipic_ssthresh; - clock_t ipic_rtt_sd; - uchar_t ipic_ire_marks; -} ipic_t; - -#define ipic_cmd ipic_ipllc.ipllc_cmd -#define ipic_ll_name_length ipic_ipllc.ipllc_name_length -#define ipic_ll_name_offset ipic_ipllc.ipllc_name_offset - /* IP IRE Delete Command Structure. */ typedef struct ipid_s { ipllc_t ipid_ipllc; @@ -257,16 +218,8 @@ typedef struct ipoptp_s #define Q_TO_ICMP(q) (Q_TO_CONN((q))->conn_icmp) #define Q_TO_RTS(q) (Q_TO_CONN((q))->conn_rts) -/* - * The following two macros are used by IP to get the appropriate - * wq and rq for a conn. If it is a TCP conn, then we need - * tcp_wq/tcp_rq else, conn_wq/conn_rq. IP can use conn_wq and conn_rq - * from a conn directly if it knows that the conn is not TCP. - */ -#define CONNP_TO_WQ(connp) \ - (IPCL_IS_TCP(connp) ? (connp)->conn_tcp->tcp_wq : (connp)->conn_wq) - -#define CONNP_TO_RQ(connp) RD(CONNP_TO_WQ(connp)) +#define CONNP_TO_WQ(connp) ((connp)->conn_wq) +#define CONNP_TO_RQ(connp) ((connp)->conn_rq) #define GRAB_CONN_LOCK(q) { \ if (q != NULL && CONN_Q(q)) \ @@ -278,9 +231,6 @@ typedef struct ipoptp_s mutex_exit(&(Q_TO_CONN(q))->conn_lock); \ } -/* "Congestion controlled" protocol */ -#define IP_FLOW_CONTROLLED_ULP(p) ((p) == IPPROTO_TCP || (p) == IPPROTO_SCTP) - /* * Complete the pending operation. Usually an ioctl. Can also * be a bind or option management request that got enqueued @@ -295,63 +245,13 @@ typedef struct ipoptp_s } /* - * Flags for the various ip_fanout_* routines. - */ -#define IP_FF_SEND_ICMP 0x01 /* Send an ICMP error */ -#define IP_FF_HDR_COMPLETE 0x02 /* Call ip_hdr_complete if error */ -#define IP_FF_CKSUM 0x04 /* Recompute ipha_cksum if error */ -#define IP_FF_RAWIP 0x08 /* Use rawip mib variable */ -#define IP_FF_SRC_QUENCH 0x10 /* OK to send ICMP_SOURCE_QUENCH */ -#define IP_FF_SYN_ADDIRE 0x20 /* Add IRE if TCP syn packet */ -#define IP_FF_IPINFO 0x80 /* Used for both V4 and V6 */ -#define IP_FF_SEND_SLLA 0x100 /* Send source link layer info ? */ -#define IPV6_REACHABILITY_CONFIRMATION 0x200 /* Flags for ip_xmit_v6 */ -#define IP_FF_NO_MCAST_LOOP 0x400 /* No multicasts for sending zone */ - -/* - * Following flags are used by IPQoS to determine if policy processing is - * required. - */ -#define IP6_NO_IPPOLICY 0x800 /* Don't do IPQoS processing */ -#define IP6_IN_LLMCAST 0x1000 /* Multicast */ - -#define IP_FF_LOOPBACK 0x2000 /* Loopback fanout */ -#define IP_FF_SCTP_CSUM_ERR 0x4000 /* sctp pkt has failed chksum */ - -#ifndef IRE_DB_TYPE -#define IRE_DB_TYPE M_SIG -#endif - -#ifndef IRE_DB_REQ_TYPE -#define IRE_DB_REQ_TYPE M_PCSIG -#endif - -#ifndef IRE_ARPRESOLVE_TYPE -#define IRE_ARPRESOLVE_TYPE M_EVENT -#endif - -/* * Values for squeue switch: */ - #define IP_SQUEUE_ENTER_NODRAIN 1 #define IP_SQUEUE_ENTER 2 -/* - * This is part of the interface between Transport provider and - * IP which can be used to set policy information. This is usually - * accompanied with O_T_BIND_REQ/T_BIND_REQ.ip_bind assumes that - * only IPSEC_POLICY_SET is there when it is found in the chain. - * The information contained is an struct ipsec_req_t. On success - * or failure, either the T_BIND_ACK or the T_ERROR_ACK is returned. - * IPSEC_POLICY_SET is never returned. - */ -#define IPSEC_POLICY_SET M_SETOPTS +#define IP_SQUEUE_FILL 3 -#define IRE_IS_LOCAL(ire) ((ire != NULL) && \ - ((ire)->ire_type & (IRE_LOCAL | IRE_LOOPBACK))) - -#define IRE_IS_TARGET(ire) ((ire != NULL) && \ - ((ire)->ire_type != IRE_BROADCAST)) +extern int ip_squeue_flag; /* IP Fragmentation Reassembly Header */ typedef struct ipf_s { @@ -387,71 +287,6 @@ typedef struct ipf_s { #define ipf_src V4_PART_OF_V6(ipf_v6src) #define ipf_dst V4_PART_OF_V6(ipf_v6dst) -typedef enum { - IB_PKT = 0x01, - OB_PKT = 0x02 -} ip_pkt_t; - -#define UPDATE_IB_PKT_COUNT(ire)\ - { \ - (ire)->ire_ib_pkt_count++; \ - if ((ire)->ire_ipif != NULL) { \ - /* \ - * forwarding packet \ - */ \ - if ((ire)->ire_type & (IRE_LOCAL|IRE_BROADCAST)) \ - atomic_add_32(&(ire)->ire_ipif->ipif_ib_pkt_count, 1);\ - else \ - atomic_add_32(&(ire)->ire_ipif->ipif_fo_pkt_count, 1);\ - } \ - } - -#define UPDATE_OB_PKT_COUNT(ire)\ - { \ - (ire)->ire_ob_pkt_count++;\ - if ((ire)->ire_ipif != NULL) { \ - atomic_add_32(&(ire)->ire_ipif->ipif_ob_pkt_count, 1); \ - } \ - } - -#define IP_RPUT_LOCAL(q, mp, ipha, ire, recv_ill) \ -{ \ - switch (ipha->ipha_protocol) { \ - case IPPROTO_UDP: \ - ip_udp_input(q, mp, ipha, ire, recv_ill); \ - break; \ - default: \ - ip_proto_input(q, mp, ipha, ire, recv_ill, 0); \ - break; \ - } \ -} - -/* - * NCE_EXPIRED is TRUE when we have a non-permanent nce that was - * found to be REACHABLE more than ip_ire_arp_interval ms ago. - * This macro is used to age existing nce_t entries. The - * nce's will get cleaned up in the following circumstances: - * - ip_ire_trash_reclaim will free nce's using ndp_cache_reclaim - * when memory is low, - * - ip_arp_news, when updates are received. - * - if the nce is NCE_EXPIRED(), it will deleted, so that a new - * arp request will need to be triggered from an ND_INITIAL nce. - * - * Note that the nce state transition follows the pattern: - * ND_INITIAL -> ND_INCOMPLETE -> ND_REACHABLE - * after which the nce is deleted when it has expired. - * - * nce_last is the timestamp that indicates when the nce_res_mp in the - * nce_t was last updated to a valid link-layer address. nce_last gets - * modified/updated : - * - when the nce is created - * - every time we get a sane arp response for the nce. - */ -#define NCE_EXPIRED(nce, ipst) (nce->nce_last > 0 && \ - ((nce->nce_flags & NCE_F_PERMANENT) == 0) && \ - ((TICK_TO_MSEC(lbolt64) - nce->nce_last) > \ - (ipst)->ips_ip_ire_arp_interval)) - #endif /* _KERNEL */ /* ICMP types */ @@ -560,7 +395,17 @@ typedef struct ipha_s { #define IPH_DF 0x4000 /* Don't fragment */ #define IPH_MF 0x2000 /* More fragments to come */ #define IPH_OFFSET 0x1FFF /* Where the offset lives */ -#define IPH_FRAG_HDR 0x8000 /* IPv6 don't fragment bit */ + +/* Byte-order specific values */ +#ifdef _BIG_ENDIAN +#define IPH_DF_HTONS 0x4000 /* Don't fragment */ +#define IPH_MF_HTONS 0x2000 /* More fragments to come */ +#define IPH_OFFSET_HTONS 0x1FFF /* Where the offset lives */ +#else +#define IPH_DF_HTONS 0x0040 /* Don't fragment */ +#define IPH_MF_HTONS 0x0020 /* More fragments to come */ +#define IPH_OFFSET_HTONS 0xFF1F /* Where the offset lives */ +#endif /* ECN code points for IPv4 TOS byte and IPv6 traffic class octet. */ #define IPH_ECN_NECT 0x0 /* Not ECN-Capable Transport */ @@ -571,10 +416,8 @@ typedef struct ipha_s { struct ill_s; typedef void ip_v6intfid_func_t(struct ill_s *, in6_addr_t *); -typedef boolean_t ip_v6mapinfo_func_t(uint_t, uint8_t *, uint8_t *, uint32_t *, - in6_addr_t *); -typedef boolean_t ip_v4mapinfo_func_t(uint_t, uint8_t *, uint8_t *, uint32_t *, - ipaddr_t *); +typedef void ip_v6mapinfo_func_t(struct ill_s *, uchar_t *, uchar_t *); +typedef void ip_v4mapinfo_func_t(struct ill_s *, uchar_t *, uchar_t *); /* IP Mac info structure */ typedef struct ip_m_s { @@ -582,8 +425,8 @@ typedef struct ip_m_s { int ip_m_type; /* From <net/if_types.h> */ t_uscalar_t ip_m_ipv4sap; t_uscalar_t ip_m_ipv6sap; - ip_v4mapinfo_func_t *ip_m_v4mapinfo; - ip_v6mapinfo_func_t *ip_m_v6mapinfo; + ip_v4mapinfo_func_t *ip_m_v4mapping; + ip_v6mapinfo_func_t *ip_m_v6mapping; ip_v6intfid_func_t *ip_m_v6intfid; ip_v6intfid_func_t *ip_m_v6destintfid; } ip_m_t; @@ -591,20 +434,14 @@ typedef struct ip_m_s { /* * The following functions attempt to reduce the link layer dependency * of the IP stack. The current set of link specific operations are: - * a. map from IPv4 class D (224.0/4) multicast address range to the link - * layer multicast address range. - * b. map from IPv6 multicast address range (ff00::/8) to the link - * layer multicast address range. - * c. derive the default IPv6 interface identifier from the interface. - * d. derive the default IPv6 destination interface identifier from + * a. map from IPv4 class D (224.0/4) multicast address range or the + * IPv6 multicast address range (ff00::/8) to the link layer multicast + * address. + * b. derive the default IPv6 interface identifier from the interface. + * c. derive the default IPv6 destination interface identifier from * the interface (point-to-point only). */ -#define MEDIA_V4MINFO(ip_m, plen, bphys, maddr, hwxp, v4ptr) \ - (((ip_m)->ip_m_v4mapinfo != NULL) && \ - (*(ip_m)->ip_m_v4mapinfo)(plen, bphys, maddr, hwxp, v4ptr)) -#define MEDIA_V6MINFO(ip_m, plen, bphys, maddr, hwxp, v6ptr) \ - (((ip_m)->ip_m_v6mapinfo != NULL) && \ - (*(ip_m)->ip_m_v6mapinfo)(plen, bphys, maddr, hwxp, v6ptr)) +extern void ip_mcast_mapping(struct ill_s *, uchar_t *, uchar_t *); /* ip_m_v6*intfid return void and are never NULL */ #define MEDIA_V6INTFID(ip_m, ill, v6ptr) (ip_m)->ip_m_v6intfid(ill, v6ptr) #define MEDIA_V6DESTINTFID(ip_m, ill, v6ptr) \ @@ -616,107 +453,38 @@ typedef struct ip_m_s { #define IRE_LOCAL 0x0004 /* Route entry for local address */ #define IRE_LOOPBACK 0x0008 /* Route entry for loopback address */ #define IRE_PREFIX 0x0010 /* Route entry for prefix routes */ +#ifndef _KERNEL +/* Keep so user-level still compiles */ #define IRE_CACHE 0x0020 /* Cached Route entry */ +#endif #define IRE_IF_NORESOLVER 0x0040 /* Route entry for local interface */ /* net without any address mapping. */ #define IRE_IF_RESOLVER 0x0080 /* Route entry for local interface */ /* net with resolver. */ #define IRE_HOST 0x0100 /* Host route entry */ +/* Keep so user-level still compiles */ #define IRE_HOST_REDIRECT 0x0200 /* only used for T_SVR4_OPTMGMT_REQ */ +#define IRE_IF_CLONE 0x0400 /* Per host clone of IRE_IF */ +#define IRE_MULTICAST 0x0800 /* Special - not in table */ +#define IRE_NOROUTE 0x1000 /* Special - not in table */ #define IRE_INTERFACE (IRE_IF_NORESOLVER | IRE_IF_RESOLVER) -#define IRE_OFFSUBNET (IRE_DEFAULT | IRE_PREFIX | IRE_HOST) -#define IRE_CACHETABLE (IRE_CACHE | IRE_BROADCAST | IRE_LOCAL | \ - IRE_LOOPBACK) -#define IRE_FORWARDTABLE (IRE_INTERFACE | IRE_OFFSUBNET) - -/* - * If an IRE is marked with IRE_MARK_CONDEMNED, the last walker of - * the bucket should delete this IRE from this bucket. - */ -#define IRE_MARK_CONDEMNED 0x0001 - -/* - * An IRE with IRE_MARK_PMTU has ire_max_frag set from an ICMP error. - */ -#define IRE_MARK_PMTU 0x0002 - -/* - * An IRE with IRE_MARK_TESTHIDDEN is used by in.mpathd for test traffic. It - * can only be looked up by requesting MATCH_IRE_MARK_TESTHIDDEN. - */ -#define IRE_MARK_TESTHIDDEN 0x0004 - -/* - * An IRE with IRE_MARK_NOADD is created in ip_newroute_ipif when the outgoing - * interface is specified by e.g. IP_PKTINFO. The IRE is not added to the IRE - * cache table. - */ -#define IRE_MARK_NOADD 0x0008 /* Mark not to add ire in cache */ - -/* - * IRE marked with IRE_MARK_TEMPORARY means that this IRE has been used - * either for forwarding a packet or has not been used for sending - * traffic on TCP connections terminated on this system. In both - * cases, this IRE is the first to go when IRE is being cleaned up. - */ -#define IRE_MARK_TEMPORARY 0x0010 - -/* - * IRE marked with IRE_MARK_USESRC_CHECK means that while adding an IRE with - * this mark, additional atomic checks need to be performed. For eg: by the - * time an IRE_CACHE is created, sent up to ARP and then comes back to IP; the - * usesrc grouping could have changed in which case we want to fail adding - * the IRE_CACHE entry - */ -#define IRE_MARK_USESRC_CHECK 0x0020 - -/* - * IRE_MARK_PRIVATE_ADDR is used for IP_NEXTHOP. When IP_NEXTHOP is set, the - * routing table lookup for the destination is bypassed and the packet is - * sent directly to the specified nexthop. The associated IRE_CACHE entries - * should be marked with IRE_MARK_PRIVATE_ADDR flag so that they don't show up - * in regular ire cache lookups. - */ -#define IRE_MARK_PRIVATE_ADDR 0x0040 +#define IRE_IF_ALL (IRE_IF_NORESOLVER | IRE_IF_RESOLVER | \ + IRE_IF_CLONE) +#define IRE_OFFSUBNET (IRE_DEFAULT | IRE_PREFIX | IRE_HOST) +#define IRE_OFFLINK IRE_OFFSUBNET /* - * When we send an ARP resolution query for the nexthop gateway's ire, - * we use esballoc to create the ire_t in the AR_ENTRY_QUERY mblk - * chain, and mark its ire_marks with IRE_MARK_UNCACHED. This flag - * indicates that information from ARP has not been transferred to a - * permanent IRE_CACHE entry. The flag is reset only when the - * information is successfully transferred to an ire_cache entry (in - * ire_add()). Attempting to free the AR_ENTRY_QUERY mblk chain prior - * to ire_add (e.g., from arp, or from ip`ip_wput_nondata) will - * require that the resources (incomplete ire_cache and/or nce) must - * be cleaned up. The free callback routine (ire_freemblk()) checks - * for IRE_MARK_UNCACHED to see if any resources that are pinned down - * will need to be cleaned up or not. + * Note that we view IRE_NOROUTE as ONLINK since we can "send" to them without + * going through a router; the result of sending will be an error/icmp error. */ - -#define IRE_MARK_UNCACHED 0x0080 - -/* - * The comment below (and for other netstack_t references) refers - * to the fact that we only do netstack_hold in particular cases, - * such as the references from open streams (ill_t and conn_t's - * pointers). Internally within IP we rely on IP's ability to cleanup e.g. - * ire_t's when an ill goes away. - */ -typedef struct ire_expire_arg_s { - int iea_flush_flag; - ip_stack_t *iea_ipst; /* Does not have a netstack_hold */ -} ire_expire_arg_t; - -/* Flags with ire_expire routine */ -#define FLUSH_ARP_TIME 0x0001 /* ARP info potentially stale timer */ -#define FLUSH_REDIRECT_TIME 0x0002 /* Redirects potentially stale */ -#define FLUSH_MTU_TIME 0x0004 /* Include path MTU per RFC 1191 */ +#define IRE_ONLINK (IRE_IF_ALL|IRE_LOCAL|IRE_LOOPBACK| \ + IRE_BROADCAST|IRE_MULTICAST|IRE_NOROUTE) /* Arguments to ire_flush_cache() */ #define IRE_FLUSH_DELETE 0 #define IRE_FLUSH_ADD 1 +#define IRE_FLUSH_GWCHANGE 2 /* * Open/close synchronization flags. @@ -724,31 +492,21 @@ typedef struct ire_expire_arg_s { * depends on the atomic 32 bit access to that field. */ #define CONN_CLOSING 0x01 /* ip_close waiting for ip_wsrv */ -#define CONN_IPSEC_LOAD_WAIT 0x02 /* waiting for load */ -#define CONN_CONDEMNED 0x04 /* conn is closing, no more refs */ -#define CONN_INCIPIENT 0x08 /* conn not yet visible, no refs */ -#define CONN_QUIESCED 0x10 /* conn is now quiescent */ - -/* Used to check connection state flags before caching the IRE */ -#define CONN_CACHE_IRE(connp) \ - (!((connp)->conn_state_flags & (CONN_CLOSING|CONN_CONDEMNED))) - -/* - * Parameter to ip_output giving the identity of the caller. - * IP_WSRV means the packet was enqueued in the STREAMS queue - * due to flow control and is now being reprocessed in the context of - * the STREAMS service procedure, consequent to flow control relief. - * IRE_SEND means the packet is being reprocessed consequent to an - * ire cache creation and addition and this may or may not be happening - * in the service procedure context. Anything other than the above 2 - * cases is identified as IP_WPUT. Most commonly this is the case of - * packets coming down from the application. +#define CONN_CONDEMNED 0x02 /* conn is closing, no more refs */ +#define CONN_INCIPIENT 0x04 /* conn not yet visible, no refs */ +#define CONN_QUIESCED 0x08 /* conn is now quiescent */ +#define CONN_UPDATE_ILL 0x10 /* conn_update_ill in progress */ + +/* + * Flags for dce_flags field. Specifies which information has been set. + * dce_ident is always present, but the other ones are identified by the flags. */ -#ifdef _KERNEL -#define IP_WSRV 1 /* Called from ip_wsrv */ -#define IP_WPUT 2 /* Called from ip_wput */ -#define IRE_SEND 3 /* Called from ire_send */ +#define DCEF_DEFAULT 0x0001 /* Default DCE - no pmtu or uinfo */ +#define DCEF_PMTU 0x0002 /* Different than interface MTU */ +#define DCEF_UINFO 0x0004 /* dce_uinfo set */ +#define DCEF_TOO_SMALL_PMTU 0x0008 /* Smaller than IPv4/IPv6 MIN */ +#ifdef _KERNEL /* * Extra structures need for per-src-addr filtering (IGMPv3/MLDv2) */ @@ -786,90 +544,80 @@ typedef struct mrec_s { } mrec_t; /* Group membership list per upper conn */ + /* - * XXX add ilg info for ifaddr/ifindex. - * XXX can we make ilg survive an ifconfig unplumb + plumb - * by setting the ipif/ill to NULL and recover that later? + * We record the multicast information from the socket option in + * ilg_ifaddr/ilg_ifindex. This allows rejoining the group in the case when + * the ifaddr (or ifindex) disappears and later reappears, potentially on + * a different ill. The IPv6 multicast socket options and ioctls all specify + * the interface using an ifindex. For IPv4 some socket options/ioctls use + * the interface address and others use the index. We record here the method + * that was actually used (and leave the other of ilg_ifaddr or ilg_ifindex) + * at zero so that we can rejoin the way the application intended. * - * ilg_ipif is used by IPv4 as multicast groups are joined using an interface - * address (ipif). - * ilg_ill is used by IPv6 as multicast groups are joined using an interface - * index (phyint->phyint_ifindex). - * ilg_ill is NULL for IPv4 and ilg_ipif is NULL for IPv6. + * We track the ill on which we will or already have joined an ilm using + * ilg_ill. When we have succeeded joining the ilm and have a refhold on it + * then we set ilg_ilm. Thus intentionally there is a window where ilg_ill is + * set and ilg_ilm is not set. This allows clearing ilg_ill as a signal that + * the ill is being unplumbed and the ilm should be discarded. * * ilg records the state of multicast memberships of a socket end point. * ilm records the state of multicast memberships with the driver and is * maintained per interface. * - * There is no direct link between a given ilg and ilm. If the - * application has joined a group G with ifindex I, we will have - * an ilg with ilg_v6group and ilg_ill. There will be a corresponding - * ilm with ilm_ill/ilm_v6addr recording the multicast membership. - * To delete the membership: - * - * a) Search for ilg matching on G and I with ilg_v6group - * and ilg_ill. Delete ilg_ill. - * b) Search the corresponding ilm matching on G and I with - * ilm_v6addr and ilm_ill. Delete ilm. - * - * For IPv4 the only difference is that we look using ipifs, not ills. + * The ilg state is protected by conn_ilg_lock. + * The ilg will not be freed until ilg_refcnt drops to zero. */ - -/* - * The ilg_t and ilm_t members are protected by ipsq. They can be changed only - * by a thread executing in the ipsq. In other words add/delete of a - * multicast group has to execute in the ipsq. - */ -#define ILG_DELETED 0x1 /* ilg_flags */ typedef struct ilg_s { + struct ilg_s *ilg_next; + struct ilg_s **ilg_ptpn; + struct conn_s *ilg_connp; /* Back pointer to get lock */ in6_addr_t ilg_v6group; - struct ipif_s *ilg_ipif; /* Logical interface we are member on */ - struct ill_s *ilg_ill; /* Used by IPv6 */ - uint_t ilg_flags; + ipaddr_t ilg_ifaddr; /* For some IPv4 cases */ + uint_t ilg_ifindex; /* IPv6 and some other IPv4 cases */ + struct ill_s *ilg_ill; /* Where ilm is joined. No refhold */ + struct ilm_s *ilg_ilm; /* With ilm_refhold */ + uint_t ilg_refcnt; mcast_record_t ilg_fmode; /* MODE_IS_INCLUDE/MODE_IS_EXCLUDE */ slist_t *ilg_filter; + boolean_t ilg_condemned; /* Conceptually deleted */ } ilg_t; /* * Multicast address list entry for ill. - * ilm_ipif is used by IPv4 as multicast groups are joined using ipif. - * ilm_ill is used by IPv6 as multicast groups are joined using ill. - * ilm_ill is NULL for IPv4 and ilm_ipif is NULL for IPv6. + * ilm_ill is used by IPv4 and IPv6 + * + * The ilm state (and other multicast state on the ill) is protected by + * ill_mcast_lock. Operations that change state on both an ilg and ilm + * in addition use ill_mcast_serializer to ensure that we can't have + * interleaving between e.g., add and delete operations for the same conn_t, + * group, and ill. * * The comment below (and for other netstack_t references) refers * to the fact that we only do netstack_hold in particular cases, - * such as the references from open streams (ill_t and conn_t's + * such as the references from open endpoints (ill_t and conn_t's * pointers). Internally within IP we rely on IP's ability to cleanup e.g. * ire_t's when an ill goes away. */ -#define ILM_DELETED 0x1 /* ilm_flags */ typedef struct ilm_s { in6_addr_t ilm_v6addr; int ilm_refcnt; uint_t ilm_timer; /* IGMP/MLD query resp timer, in msec */ - struct ipif_s *ilm_ipif; /* Back pointer to ipif for IPv4 */ struct ilm_s *ilm_next; /* Linked list for each ill */ uint_t ilm_state; /* state of the membership */ - struct ill_s *ilm_ill; /* Back pointer to ill for IPv6 */ - uint_t ilm_flags; - boolean_t ilm_notify_driver; /* Need to notify the driver */ + struct ill_s *ilm_ill; /* Back pointer to ill - ill_ilm_cnt */ zoneid_t ilm_zoneid; int ilm_no_ilg_cnt; /* number of joins w/ no ilg */ mcast_record_t ilm_fmode; /* MODE_IS_INCLUDE/MODE_IS_EXCLUDE */ slist_t *ilm_filter; /* source filter list */ slist_t *ilm_pendsrcs; /* relevant src addrs for pending req */ rtx_state_t ilm_rtx; /* SCR retransmission state */ + ipaddr_t ilm_ifaddr; /* For IPv4 netstat */ ip_stack_t *ilm_ipst; /* Does not have a netstack_hold */ } ilm_t; #define ilm_addr V4_PART_OF_V6(ilm_v6addr) -typedef struct ilm_walker { - struct ill_s *ilw_ill; /* associated ill */ - struct ill_s *ilw_ipmp_ill; /* associated ipmp ill (if any) */ - struct ill_s *ilw_walk_ill; /* current ill being walked */ -} ilm_walker_t; - /* * Soft reference to an IPsec SA. * @@ -898,40 +646,28 @@ typedef struct ipsa_ref_s * In the presence of IPsec policy, fully-bound conn's bind a connection * to more than just the 5-tuple, but also a specific IPsec action and * identity-pair. - * - * As an optimization, we also cache soft references to IPsec SA's - * here so that we can fast-path around most of the work needed for + * The identity pair is accessed from both the receive and transmit side + * hence it is maintained in the ipsec_latch_t structure. conn_latch and + * ixa_ipsec_latch points to it. + * The policy and actions are stored in conn_latch_in_policy and + * conn_latch_in_action for the inbound side, and in ixa_ipsec_policy and + * ixa_ipsec_action for the transmit side. + * + * As an optimization, we also cache soft references to IPsec SA's in + * ip_xmit_attr_t so that we can fast-path around most of the work needed for * outbound IPsec SA selection. - * - * Were it not for TCP's detached connections, this state would be - * in-line in conn_t; instead, this is in a separate structure so it - * can be handed off to TCP when a connection is detached. */ typedef struct ipsec_latch_s { kmutex_t ipl_lock; uint32_t ipl_refcnt; - uint64_t ipl_unique; - struct ipsec_policy_s *ipl_in_policy; /* latched policy (in) */ - struct ipsec_policy_s *ipl_out_policy; /* latched policy (out) */ - struct ipsec_action_s *ipl_in_action; /* latched action (in) */ - struct ipsec_action_s *ipl_out_action; /* latched action (out) */ - cred_t *ipl_local_id; struct ipsid_s *ipl_local_cid; struct ipsid_s *ipl_remote_cid; unsigned int - ipl_out_action_latched : 1, - ipl_in_action_latched : 1, - ipl_out_policy_latched : 1, - ipl_in_policy_latched : 1, - ipl_ids_latched : 1, - ipl_pad_to_bit_31 : 27; - - ipsa_ref_t ipl_ref[2]; /* 0: ESP, 1: AH */ - + ipl_pad_to_bit_31 : 31; } ipsec_latch_t; #define IPLATCH_REFHOLD(ipl) { \ @@ -939,97 +675,19 @@ typedef struct ipsec_latch_s ASSERT((ipl)->ipl_refcnt != 0); \ } -#define IPLATCH_REFRELE(ipl, ns) { \ +#define IPLATCH_REFRELE(ipl) { \ ASSERT((ipl)->ipl_refcnt != 0); \ membar_exit(); \ if (atomic_add_32_nv(&(ipl)->ipl_refcnt, -1) == 0) \ - iplatch_free(ipl, ns); \ + iplatch_free(ipl); \ } /* * peer identity structure. */ - typedef struct conn_s conn_t; /* - * The old IP client structure "ipc_t" is gone. All the data is stored in the - * connection structure "conn_t" now. The mapping of old and new fields looks - * like this: - * - * ipc_ulp conn_ulp - * ipc_rq conn_rq - * ipc_wq conn_wq - * - * ipc_laddr conn_src - * ipc_faddr conn_rem - * ipc_v6laddr conn_srcv6 - * ipc_v6faddr conn_remv6 - * - * ipc_lport conn_lport - * ipc_fport conn_fport - * ipc_ports conn_ports - * - * ipc_policy conn_policy - * ipc_latch conn_latch - * - * ipc_irc_lock conn_lock - * ipc_ire_cache conn_ire_cache - * - * ipc_state_flags conn_state_flags - * ipc_outgoing_ill conn_outgoing_ill - * - * ipc_dontroute conn_dontroute - * ipc_loopback conn_loopback - * ipc_broadcast conn_broadcast - * ipc_reuseaddr conn_reuseaddr - * - * ipc_multicast_loop conn_multicast_loop - * ipc_multi_router conn_multi_router - * ipc_draining conn_draining - * - * ipc_did_putbq conn_did_putbq - * ipc_unspec_src conn_unspec_src - * ipc_policy_cached conn_policy_cached - * - * ipc_in_enforce_policy conn_in_enforce_policy - * ipc_out_enforce_policy conn_out_enforce_policy - * ipc_af_isv6 conn_af_isv6 - * ipc_pkt_isv6 conn_pkt_isv6 - * - * ipc_ipv6_recvpktinfo conn_ipv6_recvpktinfo - * - * ipc_ipv6_recvhoplimit conn_ipv6_recvhoplimit - * ipc_ipv6_recvhopopts conn_ipv6_recvhopopts - * ipc_ipv6_recvdstopts conn_ipv6_recvdstopts - * - * ipc_ipv6_recvrthdr conn_ipv6_recvrthdr - * ipc_ipv6_recvrtdstopts conn_ipv6_recvrtdstopts - * ipc_fully_bound conn_fully_bound - * - * ipc_recvif conn_recvif - * - * ipc_recvslla conn_recvslla - * ipc_acking_unbind conn_acking_unbind - * ipc_pad_to_bit_31 conn_pad_to_bit_31 - * - * ipc_proto conn_proto - * ipc_incoming_ill conn_incoming_ill - * ipc_pending_ill conn_pending_ill - * ipc_unbind_mp conn_unbind_mp - * ipc_ilg conn_ilg - * ipc_ilg_allocated conn_ilg_allocated - * ipc_ilg_inuse conn_ilg_inuse - * ipc_ilg_walker_cnt conn_ilg_walker_cnt - * ipc_refcv conn_refcv - * ipc_multicast_ipif conn_multicast_ipif - * ipc_multicast_ill conn_multicast_ill - * ipc_drain_next conn_drain_next - * ipc_drain_prev conn_drain_prev - * ipc_idl conn_idl - */ - -/* * This is used to match an inbound/outbound datagram with policy. */ typedef struct ipsec_selector { @@ -1069,22 +727,6 @@ typedef struct ipsec_selector { #define IPSEC_POLICY_MAX 5 /* Always max + 1. */ /* - * Folowing macro is used whenever the code does not know whether there - * is a M_CTL present in the front and it needs to examine the actual mp - * i.e the IP header. As a M_CTL message could be in the front, this - * extracts the packet into mp and the M_CTL mp into first_mp. If M_CTL - * mp is not present, both first_mp and mp point to the same message. - */ -#define EXTRACT_PKT_MP(mp, first_mp, mctl_present) \ - (first_mp) = (mp); \ - if ((mp)->b_datap->db_type == M_CTL) { \ - (mp) = (mp)->b_cont; \ - (mctl_present) = B_TRUE; \ - } else { \ - (mctl_present) = B_FALSE; \ - } - -/* * Check with IPSEC inbound policy if * * 1) per-socket policy is present - indicated by conn_in_enforce_policy. @@ -1113,11 +755,6 @@ typedef struct ipsec_selector { /* * Information cached in IRE for upper layer protocol (ULP). - * - * Notice that ire_max_frag is not included in the iulp_t structure, which - * it may seem that it should. But ire_max_frag cannot really be cached. It - * is fixed for each interface. For MTU found by PMTUd, we may want to cache - * it. But currently, we do not do that. */ typedef struct iulp_s { boolean_t iulp_set; /* Is any metric set? */ @@ -1128,17 +765,21 @@ typedef struct iulp_s { uint32_t iulp_rpipe; /* Receive pipe size. */ uint32_t iulp_rtomax; /* Max round trip timeout. */ uint32_t iulp_sack; /* Use SACK option (TCP)? */ + uint32_t iulp_mtu; /* Setable with routing sockets */ + uint32_t iulp_tstamp_ok : 1, /* Use timestamp option (TCP)? */ iulp_wscale_ok : 1, /* Use window scale option (TCP)? */ iulp_ecn_ok : 1, /* Enable ECN (for TCP)? */ iulp_pmtud_ok : 1, /* Enable PMTUd? */ - iulp_not_used : 28; -} iulp_t; + /* These three are passed out by ip_set_destination */ + iulp_localnet: 1, /* IRE_ONLINK */ + iulp_loopback: 1, /* IRE_LOOPBACK */ + iulp_local: 1, /* IRE_LOCAL */ -/* Zero iulp_t. */ -extern const iulp_t ire_uinfo_null; + iulp_not_used : 25; +} iulp_t; /* * The conn drain list structure (idl_t). @@ -1173,7 +814,6 @@ struct idl_tx_list_s { struct idl_s { conn_t *idl_conn; /* Head of drain list */ kmutex_t idl_lock; /* Lock for this list */ - conn_t *idl_conn_draining; /* conn that is draining */ uint32_t idl_repeat : 1, /* Last conn must re-enable */ /* drain list again */ @@ -1182,36 +822,38 @@ struct idl_s { }; #define CONN_DRAIN_LIST_LOCK(connp) (&((connp)->conn_idl->idl_lock)) + /* * Interface route structure which holds the necessary information to recreate - * routes that are tied to an interface (namely where ire_ipif != NULL). + * routes that are tied to an interface i.e. have ire_ill set. + * * These routes which were initially created via a routing socket or via the * SIOCADDRT ioctl may be gateway routes (RTF_GATEWAY being set) or may be - * traditional interface routes. When an interface comes back up after being - * marked down, this information will be used to recreate the routes. These - * are part of an mblk_t chain that hangs off of the IPIF (ipif_saved_ire_mp). + * traditional interface routes. When an ill comes back up after being + * down, this information will be used to recreate the routes. These + * are part of an mblk_t chain that hangs off of the ILL (ill_saved_ire_mp). */ typedef struct ifrt_s { ushort_t ifrt_type; /* Type of IRE */ in6_addr_t ifrt_v6addr; /* Address IRE represents. */ - in6_addr_t ifrt_v6gateway_addr; /* Gateway if IRE_OFFSUBNET */ - in6_addr_t ifrt_v6src_addr; /* Src addr if RTF_SETSRC */ + in6_addr_t ifrt_v6gateway_addr; /* Gateway if IRE_OFFLINK */ + in6_addr_t ifrt_v6setsrc_addr; /* Src addr if RTF_SETSRC */ in6_addr_t ifrt_v6mask; /* Mask for matching IRE. */ uint32_t ifrt_flags; /* flags related to route */ - uint_t ifrt_max_frag; /* MTU (next hop or path). */ - iulp_t ifrt_iulp_info; /* Cached IRE ULP info. */ + iulp_t ifrt_metrics; /* Routing socket metrics */ + zoneid_t ifrt_zoneid; /* zoneid for route */ } ifrt_t; #define ifrt_addr V4_PART_OF_V6(ifrt_v6addr) #define ifrt_gateway_addr V4_PART_OF_V6(ifrt_v6gateway_addr) -#define ifrt_src_addr V4_PART_OF_V6(ifrt_v6src_addr) #define ifrt_mask V4_PART_OF_V6(ifrt_v6mask) +#define ifrt_setsrc_addr V4_PART_OF_V6(ifrt_v6setsrc_addr) /* Number of IP addresses that can be hosted on a physical interface */ #define MAX_ADDRS_PER_IF 8192 /* * Number of Source addresses to be considered for source address - * selection. Used by ipif_select_source[_v6]. + * selection. Used by ipif_select_source_v4/v6. */ #define MAX_IPIF_SELECT_SOURCE 50 @@ -1245,16 +887,13 @@ typedef struct th_hash_s { #define IPIF_CONDEMNED 0x1 /* The ipif is being removed */ #define IPIF_CHANGING 0x2 /* A critcal ipif field is changing */ #define IPIF_SET_LINKLOCAL 0x10 /* transient flag during bringup */ -#define IPIF_ZERO_SOURCE 0x20 /* transient flag during bringup */ /* IP interface structure, one per local address */ typedef struct ipif_s { struct ipif_s *ipif_next; struct ill_s *ipif_ill; /* Back pointer to our ill */ int ipif_id; /* Logical unit number */ - uint_t ipif_mtu; /* Starts at ipif_ill->ill_max_frag */ in6_addr_t ipif_v6lcl_addr; /* Local IP address for this if. */ - in6_addr_t ipif_v6src_addr; /* Source IP address for this if. */ in6_addr_t ipif_v6subnet; /* Subnet prefix for this if. */ in6_addr_t ipif_v6net_mask; /* Net mask for this interface. */ in6_addr_t ipif_v6brd_addr; /* Broadcast addr for this interface. */ @@ -1262,47 +901,29 @@ typedef struct ipif_s { uint64_t ipif_flags; /* Interface flags. */ uint_t ipif_metric; /* BSD if metric, for compatibility. */ uint_t ipif_ire_type; /* IRE_LOCAL or IRE_LOOPBACK */ - mblk_t *ipif_arp_del_mp; /* Allocated at time arp comes up, to */ - /* prevent awkward out of mem */ - /* condition later */ - mblk_t *ipif_saved_ire_mp; /* Allocated for each extra */ - /* IRE_IF_NORESOLVER/IRE_IF_RESOLVER */ - /* on this interface so that they */ - /* can survive ifconfig down. */ - kmutex_t ipif_saved_ire_lock; /* Protects ipif_saved_ire_mp */ - - mrec_t *ipif_igmp_rpt; /* List of group memberships which */ - /* will be reported on. Used when */ - /* handling an igmp timeout. */ /* - * The packet counts in the ipif contain the sum of the - * packet counts in dead IREs that were affiliated with - * this ipif. + * The packet count in the ipif contain the sum of the + * packet counts in dead IRE_LOCAL/LOOPBACK for this ipif. */ - uint_t ipif_fo_pkt_count; /* Forwarded thru our dead IREs */ uint_t ipif_ib_pkt_count; /* Inbound packets for our dead IREs */ - uint_t ipif_ob_pkt_count; /* Outbound packets to our dead IREs */ + /* Exclusive bit fields, protected by ipsq_t */ unsigned int - ipif_multicast_up : 1, /* ipif_multicast_up() successful */ ipif_was_up : 1, /* ipif was up before */ ipif_addr_ready : 1, /* DAD is done */ ipif_was_dup : 1, /* DAD had failed */ - - ipif_joined_allhosts : 1, /* allhosts joined */ ipif_added_nce : 1, /* nce added for local address */ - ipif_pad_to_31 : 26; + + ipif_pad_to_31 : 28; + + ilm_t *ipif_allhosts_ilm; /* For all-nodes join */ + ilm_t *ipif_solmulti_ilm; /* For IPv6 solicited multicast join */ uint_t ipif_seqid; /* unique index across all ills */ uint_t ipif_state_flags; /* See IPIF_* flag defs above */ uint_t ipif_refcnt; /* active consistent reader cnt */ - /* Number of ire's and ilm's referencing this ipif */ - uint_t ipif_ire_cnt; - uint_t ipif_ilm_cnt; - - uint_t ipif_saved_ire_cnt; zoneid_t ipif_zoneid; /* zone ID number */ timeout_id_t ipif_recovery_id; /* Timer for DAD recovery */ boolean_t ipif_trace_disable; /* True when alloc fails */ @@ -1313,40 +934,12 @@ typedef struct ipif_s { * part of a group will be pointed to, and an ill cannot disappear * while it's in a group. */ - struct ill_s *ipif_bound_ill; - struct ipif_s *ipif_bound_next; /* bound ipif chain */ - boolean_t ipif_bound; /* B_TRUE if we successfully bound */ -} ipif_t; + struct ill_s *ipif_bound_ill; + struct ipif_s *ipif_bound_next; /* bound ipif chain */ + boolean_t ipif_bound; /* B_TRUE if we successfully bound */ -/* - * IPIF_FREE_OK() means that there are no incoming references - * to the ipif. Incoming refs would prevent the ipif from being freed. - */ -#define IPIF_FREE_OK(ipif) \ - ((ipif)->ipif_ire_cnt == 0 && (ipif)->ipif_ilm_cnt == 0) -/* - * IPIF_DOWN_OK() determines whether the incoming pointer reference counts - * would permit the ipif to be considered quiescent. In order for - * an ipif or ill to be considered quiescent, the ire and nce references - * to that ipif/ill must be zero. - * - * We do not require the ilm references to go to zero for quiescence - * because the quiescence checks are done to ensure that - * outgoing packets do not use addresses from the ipif/ill after it - * has been marked down, and incoming packets to addresses on a - * queiscent interface are rejected. This implies that all the - * ire/nce's using that source address need to be deleted and future - * creation of any ires using that source address must be prevented. - * Similarly incoming unicast packets destined to the 'down' address - * will not be accepted once that ire is gone. However incoming - * multicast packets are not destined to the downed address. - * They are only related to the ill in question. Furthermore - * the current API behavior allows applications to join or leave - * multicast groups, i.e., IP_ADD_MEMBERSHIP / LEAVE_MEMBERSHIP, using a - * down address. Therefore the ilm references are not included in - * the _DOWN_OK macros. - */ -#define IPIF_DOWN_OK(ipif) ((ipif)->ipif_ire_cnt == 0) + struct ire_s *ipif_ire_local; /* Our IRE_LOCAL or LOOPBACK */ +} ipif_t; /* * The following table lists the protection levels of the various members @@ -1371,9 +964,7 @@ typedef struct ipif_s { * ill_g_lock ill_g_lock * ipif_ill ipsq + down ipif write once * ipif_id ipsq + down ipif write once - * ipif_mtu ipsq * ipif_v6lcl_addr ipsq + down ipif up ipif - * ipif_v6src_addr ipsq + down ipif up ipif * ipif_v6subnet ipsq + down ipif up ipif * ipif_v6net_mask ipsq + down ipif up ipif * @@ -1383,28 +974,30 @@ typedef struct ipif_s { * ipif_metric * ipif_ire_type ipsq + down ill up ill * - * ipif_arp_del_mp ipsq ipsq - * ipif_saved_ire_mp ipif_saved_ire_lock ipif_saved_ire_lock - * ipif_igmp_rpt ipsq ipsq - * - * ipif_fo_pkt_count Approx * ipif_ib_pkt_count Approx - * ipif_ob_pkt_count Approx * * bit fields ill_lock ill_lock * + * ipif_allhosts_ilm ipsq ipsq + * ipif_solmulti_ilm ipsq ipsq + * * ipif_seqid ipsq Write once * * ipif_state_flags ill_lock ill_lock * ipif_refcnt ill_lock ill_lock - * ipif_ire_cnt ill_lock ill_lock - * ipif_ilm_cnt ill_lock ill_lock - * ipif_saved_ire_cnt - * * ipif_bound_ill ipsq + ipmp_lock ipsq OR ipmp_lock * ipif_bound_next ipsq ipsq * ipif_bound ipsq ipsq + * + * ipif_ire_local ipsq + ips_ill_g_lock ipsq OR ips_ill_g_lock + */ + +/* + * Return values from ip_laddr_verify_{v4,v6} */ +typedef enum { IPVL_UNICAST_UP, IPVL_UNICAST_DOWN, IPVL_MCAST, IPVL_BCAST, + IPVL_BAD} ip_laddr_t; + #define IP_TR_HASH(tid) ((((uintptr_t)tid) >> 6) & (IP_TR_HASH_MAX - 1)) @@ -1422,18 +1015,12 @@ typedef struct ipif_s { /* IPv4 compatibility macros */ #define ipif_lcl_addr V4_PART_OF_V6(ipif_v6lcl_addr) -#define ipif_src_addr V4_PART_OF_V6(ipif_v6src_addr) #define ipif_subnet V4_PART_OF_V6(ipif_v6subnet) #define ipif_net_mask V4_PART_OF_V6(ipif_v6net_mask) #define ipif_brd_addr V4_PART_OF_V6(ipif_v6brd_addr) #define ipif_pp_dst_addr V4_PART_OF_V6(ipif_v6pp_dst_addr) /* Macros for easy backreferences to the ill. */ -#define ipif_wq ipif_ill->ill_wq -#define ipif_rq ipif_ill->ill_rq -#define ipif_net_type ipif_ill->ill_net_type -#define ipif_ipif_up_count ipif_ill->ill_ipif_up_count -#define ipif_type ipif_ill->ill_type #define ipif_isv6 ipif_ill->ill_isv6 #define SIOCLIFADDR_NDX 112 /* ndx of SIOCLIFADDR in the ndx ioctl table */ @@ -1524,7 +1111,7 @@ typedef struct ipxop_s { boolean_t ipx_current_done; /* is the current operation done? */ int ipx_current_ioctl; /* current ioctl, or 0 if no ioctl */ ipif_t *ipx_current_ipif; /* ipif for current op */ - ipif_t *ipx_pending_ipif; /* ipif for ipsq_pending_mp */ + ipif_t *ipx_pending_ipif; /* ipif for ipx_pending_mp */ mblk_t *ipx_pending_mp; /* current ioctl mp while waiting */ boolean_t ipx_forced; /* debugging aid */ #ifdef DEBUG @@ -1642,24 +1229,62 @@ typedef struct irb { krwlock_t irb_lock; /* Protect this bucket */ uint_t irb_refcnt; /* Protected by irb_lock */ uchar_t irb_marks; /* CONDEMNED ires in this bucket ? */ -#define IRB_MARK_CONDEMNED 0x0001 -#define IRB_MARK_FTABLE 0x0002 +#define IRB_MARK_CONDEMNED 0x0001 /* Contains some IRE_IS_CONDEMNED */ +#define IRB_MARK_DYNAMIC 0x0002 /* Dynamically allocated */ + /* Once IPv6 uses radix then IRB_MARK_DYNAMIC will be always be set */ uint_t irb_ire_cnt; /* Num of active IRE in this bucket */ - uint_t irb_tmp_ire_cnt; /* Num of temporary IRE */ - struct ire_s *irb_rr_origin; /* origin for round-robin */ int irb_nire; /* Num of ftable ire's that ref irb */ ip_stack_t *irb_ipst; /* Does not have a netstack_hold */ } irb_t; #define IRB2RT(irb) (rt_t *)((caddr_t)(irb) - offsetof(rt_t, rt_irb)) -/* The following are return values of ip_xmit_v4() */ -typedef enum { - SEND_PASSED = 0, /* sent packet out on wire */ - SEND_FAILED, /* sending of packet failed */ - LOOKUP_IN_PROGRESS, /* ire cache found, ARP resolution in progress */ - LLHDR_RESLV_FAILED /* macaddr resl of onlink dst or nexthop failed */ -} ipxmit_state_t; +/* Forward declarations */ +struct dce_s; +typedef struct dce_s dce_t; +struct ire_s; +typedef struct ire_s ire_t; +struct ncec_s; +typedef struct ncec_s ncec_t; +struct nce_s; +typedef struct nce_s nce_t; +struct ip_recv_attr_s; +typedef struct ip_recv_attr_s ip_recv_attr_t; +struct ip_xmit_attr_s; +typedef struct ip_xmit_attr_s ip_xmit_attr_t; + +struct tsol_ire_gw_secattr_s; +typedef struct tsol_ire_gw_secattr_s tsol_ire_gw_secattr_t; + +/* + * This is a structure for a one-element route cache that is passed + * by reference between ip_input and ill_inputfn. + */ +typedef struct { + ire_t *rtc_ire; + ipaddr_t rtc_ipaddr; + in6_addr_t rtc_ip6addr; +} rtc_t; + +/* + * Note: Temporarily use 64 bits, and will probably go back to 32 bits after + * more cleanup work is done. + */ +typedef uint64_t iaflags_t; + +/* The ill input function pointer type */ +typedef void (*pfillinput_t)(mblk_t *, void *, void *, ip_recv_attr_t *, + rtc_t *); + +/* The ire receive function pointer type */ +typedef void (*pfirerecv_t)(ire_t *, mblk_t *, void *, ip_recv_attr_t *); + +/* The ire send and postfrag function pointer types */ +typedef int (*pfiresend_t)(ire_t *, mblk_t *, void *, + ip_xmit_attr_t *, uint32_t *); +typedef int (*pfirepostfrag_t)(mblk_t *, nce_t *, iaflags_t, uint_t, uint32_t, + zoneid_t, zoneid_t, uintptr_t *); + #define IP_V4_G_HEAD 0 #define IP_V6_G_HEAD 1 @@ -1733,26 +1358,12 @@ typedef union ill_g_head_u { /* * Capabilities, possible flags for ill_capabilities. */ - -#define ILL_CAPAB_AH 0x01 /* IPsec AH acceleration */ -#define ILL_CAPAB_ESP 0x02 /* IPsec ESP acceleration */ -#define ILL_CAPAB_MDT 0x04 /* Multidata Transmit */ +#define ILL_CAPAB_LSO 0x04 /* Large Send Offload */ #define ILL_CAPAB_HCKSUM 0x08 /* Hardware checksumming */ #define ILL_CAPAB_ZEROCOPY 0x10 /* Zero-copy */ #define ILL_CAPAB_DLD 0x20 /* DLD capabilities */ #define ILL_CAPAB_DLD_POLL 0x40 /* Polling */ #define ILL_CAPAB_DLD_DIRECT 0x80 /* Direct function call */ -#define ILL_CAPAB_DLD_LSO 0x100 /* Large Segment Offload */ - -/* - * Per-ill Multidata Transmit capabilities. - */ -typedef struct ill_mdt_capab_s ill_mdt_capab_t; - -/* - * Per-ill IPsec capabilities. - */ -typedef struct ill_ipsec_capab_s ill_ipsec_capab_t; /* * Per-ill Hardware Checksumming capbilities. @@ -1775,15 +1386,18 @@ typedef struct ill_dld_capab_s ill_dld_capab_t; typedef struct ill_rx_ring ill_rx_ring_t; /* - * Per-ill Large Segment Offload capabilities. + * Per-ill Large Send Offload capabilities. */ typedef struct ill_lso_capab_s ill_lso_capab_t; /* The following are ill_state_flags */ #define ILL_LL_SUBNET_PENDING 0x01 /* Waiting for DL_INFO_ACK from drv */ #define ILL_CONDEMNED 0x02 /* No more new ref's to the ILL */ -#define ILL_CHANGING 0x04 /* ILL not globally visible */ -#define ILL_DL_UNBIND_IN_PROGRESS 0x08 /* UNBIND_REQ is sent */ +#define ILL_DL_UNBIND_IN_PROGRESS 0x04 /* UNBIND_REQ is sent */ +#define ILL_DOWN_IN_PROGRESS 0x08 /* ILL is going down - no new nce's */ +#define ILL_LL_BIND_PENDING 0x0020 /* XXX Reuse ILL_LL_SUBNET_PENDING ? */ +#define ILL_LL_UP 0x0040 +#define ILL_LL_DOWN 0x0080 /* Is this an ILL whose source address is used by other ILL's ? */ #define IS_USESRC_ILL(ill) \ @@ -1796,10 +1410,9 @@ typedef struct ill_lso_capab_s ill_lso_capab_t; ((ill)->ill_usesrc_grp_next != NULL)) /* Is this an virtual network interface (vni) ILL ? */ -#define IS_VNI(ill) \ - (((ill) != NULL) && \ +#define IS_VNI(ill) \ (((ill)->ill_phyint->phyint_flags & (PHYI_LOOPBACK|PHYI_VIRTUAL)) == \ - PHYI_VIRTUAL)) + PHYI_VIRTUAL) /* Is this a loopback ILL? */ #define IS_LOOPBACK(ill) \ @@ -1900,18 +1513,41 @@ typedef struct ipmp_grp_s { * ARP up-to-date as the active set of interfaces in the group changes. */ typedef struct ipmp_arpent_s { - mblk_t *ia_area_mp; /* AR_ENTRY_ADD pointer */ ipaddr_t ia_ipaddr; /* IP address for this entry */ boolean_t ia_proxyarp; /* proxy ARP entry? */ boolean_t ia_notified; /* ARP notified about this entry? */ list_node_t ia_node; /* next ARP entry in list */ + uint16_t ia_flags; /* nce_flags for the address */ + size_t ia_lladdr_len; + uchar_t *ia_lladdr; } ipmp_arpent_t; +struct arl_s; + +/* + * Per-ill capabilities. + */ +struct ill_hcksum_capab_s { + uint_t ill_hcksum_version; /* interface version */ + uint_t ill_hcksum_txflags; /* capabilities on transmit */ +}; + +struct ill_zerocopy_capab_s { + uint_t ill_zerocopy_version; /* interface version */ + uint_t ill_zerocopy_flags; /* capabilities */ +}; + +struct ill_lso_capab_s { + uint_t ill_lso_flags; /* capabilities */ + uint_t ill_lso_max; /* maximum size of payload */ +}; + /* * IP Lower level Structure. * Instance data structure in ip_open when there is a device below us. */ typedef struct ill_s { + pfillinput_t ill_inputfn; /* Fast input function selector */ ill_if_t *ill_ifptr; /* pointer to interface type */ queue_t *ill_rq; /* Read queue. */ queue_t *ill_wq; /* Write queue. */ @@ -1922,6 +1558,8 @@ typedef struct ill_s { uint_t ill_ipif_up_count; /* Number of IPIFs currently up. */ uint_t ill_max_frag; /* Max IDU from DLPI. */ + uint_t ill_current_frag; /* Current IDU from DLPI. */ + uint_t ill_mtu; /* User-specified MTU; SIOCSLIFMTU */ char *ill_name; /* Our name. */ uint_t ill_ipif_dup_count; /* Number of duplicate addresses. */ uint_t ill_name_length; /* Name length, incl. terminator. */ @@ -1941,8 +1579,9 @@ typedef struct ill_s { uint8_t *ill_frag_ptr; /* Reassembly state. */ timeout_id_t ill_frag_timer_id; /* timeout id for the frag timer */ ipfb_t *ill_frag_hash_tbl; /* Fragment hash list head. */ - ipif_t *ill_pending_ipif; /* IPIF waiting for DL operation. */ + krwlock_t ill_mcast_lock; /* Protects multicast state */ + kmutex_t ill_mcast_serializer; /* Serialize across ilg and ilm state */ ilm_t *ill_ilm; /* Multicast membership for ill */ uint_t ill_global_timer; /* for IGMPv3/MLDv2 general queries */ int ill_mcast_type; /* type of router which is querier */ @@ -1955,22 +1594,20 @@ typedef struct ill_s { uint8_t ill_mcast_rv; /* IGMPv3/MLDv2 robustness variable */ int ill_mcast_qi; /* IGMPv3/MLDv2 query interval var */ - mblk_t *ill_pending_mp; /* IOCTL/DLPI awaiting completion. */ /* * All non-NULL cells between 'ill_first_mp_to_free' and * 'ill_last_mp_to_free' are freed in ill_delete. */ #define ill_first_mp_to_free ill_bcast_mp mblk_t *ill_bcast_mp; /* DLPI header for broadcasts. */ - mblk_t *ill_resolver_mp; /* Resolver template. */ mblk_t *ill_unbind_mp; /* unbind mp from ill_dl_up() */ mblk_t *ill_promiscoff_mp; /* for ill_leave_allmulti() */ mblk_t *ill_dlpi_deferred; /* b_next chain of control messages */ - mblk_t *ill_ardeact_mp; /* deact mp from ipmp_ill_activate() */ mblk_t *ill_dest_addr_mp; /* mblk which holds ill_dest_addr */ mblk_t *ill_replumb_mp; /* replumb mp from ill_replumb() */ mblk_t *ill_phys_addr_mp; /* mblk which holds ill_phys_addr */ -#define ill_last_mp_to_free ill_phys_addr_mp + mblk_t *ill_mcast_deferred; /* b_next chain of IGMP/MLD packets */ +#define ill_last_mp_to_free ill_mcast_deferred cred_t *ill_credp; /* opener's credentials */ uint8_t *ill_phys_addr; /* ill_phys_addr_mp->b_rptr + off */ @@ -1986,37 +1623,33 @@ typedef struct ill_s { ill_dlpi_style_set : 1, ill_ifname_pending : 1, - ill_join_allmulti : 1, ill_logical_down : 1, ill_dl_up : 1, - ill_up_ipifs : 1, + ill_note_link : 1, /* supports link-up notification */ ill_capab_reneg : 1, /* capability renegotiation to be done */ ill_dld_capab_inprog : 1, /* direct dld capab call in prog */ - ill_need_recover_multicast : 1, - ill_pad_to_bit_31 : 19; + + ill_replumbing : 1, + ill_arl_dlpi_pending : 1, + + ill_pad_to_bit_31 : 18; /* Following bit fields protected by ill_lock */ uint_t ill_fragtimer_executing : 1, ill_fragtimer_needrestart : 1, - ill_ilm_cleanup_reqd : 1, - ill_arp_closing : 1, - - ill_arp_bringup_pending : 1, - ill_arp_extend : 1, /* ARP has DAD extensions */ ill_manual_token : 1, /* system won't override ill_token */ ill_manual_linklocal : 1, /* system won't auto-conf linklocal */ - ill_pad_bit_31 : 24; + ill_pad_bit_31 : 28; /* * Used in SIOCSIFMUXID and SIOCGIFMUXID for 'ifconfig unplumb'. */ - int ill_arp_muxid; /* muxid returned from plink for arp */ - int ill_ip_muxid; /* muxid returned from plink for ip */ + int ill_muxid; /* muxid returned from plink */ /* Used for IP frag reassembly throttling on a per ILL basis. */ uint_t ill_ipf_gen; /* Generation of next fragment queue */ @@ -2033,20 +1666,13 @@ typedef struct ill_s { uint_t ill_dlpi_capab_state; /* State of capability query, IDCS_* */ uint_t ill_capab_pending_cnt; uint64_t ill_capabilities; /* Enabled capabilities, ILL_CAPAB_* */ - ill_mdt_capab_t *ill_mdt_capab; /* Multidata Transmit capabilities */ - ill_ipsec_capab_t *ill_ipsec_capab_ah; /* IPsec AH capabilities */ - ill_ipsec_capab_t *ill_ipsec_capab_esp; /* IPsec ESP capabilities */ ill_hcksum_capab_t *ill_hcksum_capab; /* H/W cksumming capabilities */ ill_zerocopy_capab_t *ill_zerocopy_capab; /* Zero-copy capabilities */ ill_dld_capab_t *ill_dld_capab; /* DLD capabilities */ ill_lso_capab_t *ill_lso_capab; /* Large Segment Offload capabilities */ mblk_t *ill_capab_reset_mp; /* Preallocated mblk for capab reset */ - /* - * Fields for IPv6 - */ uint8_t ill_max_hops; /* Maximum hops for any logical interface */ - uint_t ill_max_mtu; /* Maximum MTU for any logical interface */ uint_t ill_user_mtu; /* User-specified MTU via SIOCSLIFLNKINFO */ uint32_t ill_reachable_time; /* Value for ND algorithm in msec */ uint32_t ill_reachable_retrans_time; /* Value for ND algorithm msec */ @@ -2057,20 +1683,6 @@ typedef struct ill_s { uint32_t ill_xmit_count; /* ndp max multicast xmits */ mib2_ipIfStatsEntry_t *ill_ip_mib; /* ver indep. interface mib */ mib2_ipv6IfIcmpEntry_t *ill_icmp6_mib; /* Per interface mib */ - /* - * Following two mblks are allocated common to all - * the ipifs when the first interface is coming up. - * It is sent up to arp when the last ipif is coming - * down. - */ - mblk_t *ill_arp_down_mp; - mblk_t *ill_arp_del_mapping_mp; - /* - * Used for implementing IFF_NOARP. As IFF_NOARP is used - * to turn off for all the logicals, it is here instead - * of the ipif. - */ - mblk_t *ill_arp_on_mp; phyint_t *ill_phyint; uint64_t ill_flags; @@ -2094,11 +1706,11 @@ typedef struct ill_s { */ uint_t ill_ifname_pending_err; avl_node_t ill_avl_byppa; /* avl node based on ppa */ - void *ill_fastpath_list; /* both ire and nce hang off this */ + list_t ill_nce; /* pointer to nce_s list */ uint_t ill_refcnt; /* active refcnt by threads */ uint_t ill_ire_cnt; /* ires associated with this ill */ kcondvar_t ill_cv; - uint_t ill_ilm_walker_cnt; /* snmp ilm walkers */ + uint_t ill_ncec_cnt; /* ncecs associated with this ill */ uint_t ill_nce_cnt; /* nces associated with this ill */ uint_t ill_waiters; /* threads waiting in ipsq_enter */ /* @@ -2119,6 +1731,17 @@ typedef struct ill_s { void *ill_flownotify_mh; /* Tx flow ctl, mac cb handle */ uint_t ill_ilm_cnt; /* ilms referencing this ill */ uint_t ill_ipallmulti_cnt; /* ip_join_allmulti() calls */ + ilm_t *ill_ipallmulti_ilm; + + mblk_t *ill_saved_ire_mp; /* Allocated for each extra IRE */ + /* with ire_ill set so they can */ + /* survive the ill going down and up. */ + kmutex_t ill_saved_ire_lock; /* Protects ill_saved_ire_mp, cnt */ + uint_t ill_saved_ire_cnt; /* # entries */ + struct arl_ill_common_s *ill_common; + ire_t *ill_ire_multicast; /* IRE_MULTICAST for ill */ + clock_t ill_defend_start; /* start of 1 hour period */ + uint_t ill_defend_count; /* # of announce/defends per ill */ /* * IPMP fields. */ @@ -2131,6 +1754,8 @@ typedef struct ill_s { uint_t ill_bound_cnt; /* # of data addresses bound to ill */ ipif_t *ill_bound_ipif; /* ipif chain bound to ill */ timeout_id_t ill_refresh_tid; /* ill refresh retry timeout id */ + + uint32_t ill_mrouter_cnt; /* mrouter allmulti joins */ } ill_t; /* @@ -2139,15 +1764,17 @@ typedef struct ill_s { */ #define ILL_FREE_OK(ill) \ ((ill)->ill_ire_cnt == 0 && (ill)->ill_ilm_cnt == 0 && \ - (ill)->ill_nce_cnt == 0) + (ill)->ill_ncec_cnt == 0 && (ill)->ill_nce_cnt == 0) /* - * An ipif/ill can be marked down only when the ire and nce references + * An ipif/ill can be marked down only when the ire and ncec references * to that ipif/ill goes to zero. ILL_DOWN_OK() is a necessary condition * quiescence checks. See comments above IPIF_DOWN_OK for details * on why ires and nces are selectively considered for this macro. */ -#define ILL_DOWN_OK(ill) (ill->ill_ire_cnt == 0 && ill->ill_nce_cnt == 0) +#define ILL_DOWN_OK(ill) \ + (ill->ill_ire_cnt == 0 && ill->ill_ncec_cnt == 0 && \ + ill->ill_nce_cnt == 0) /* * The following table lists the protection levels of the various members @@ -2162,7 +1789,8 @@ typedef struct ill_s { * ill_error ipsq None * ill_ipif ill_g_lock + ipsq ill_g_lock OR ipsq * ill_ipif_up_count ill_lock + ipsq ill_lock OR ipsq - * ill_max_frag ipsq Write once + * ill_max_frag ill_lock ill_lock + * ill_current_frag ill_lock ill_lock * * ill_name ill_g_lock + ipsq Write once * ill_name_length ill_g_lock + ipsq Write once @@ -2179,23 +1807,22 @@ typedef struct ill_s { * * ill_frag_timer_id ill_lock ill_lock * ill_frag_hash_tbl ipsq up ill - * ill_ilm ipsq + ill_lock ill_lock - * ill_mcast_type ill_lock ill_lock - * ill_mcast_v1_time ill_lock ill_lock - * ill_mcast_v2_time ill_lock ill_lock - * ill_mcast_v1_tset ill_lock ill_lock - * ill_mcast_v2_tset ill_lock ill_lock - * ill_mcast_rv ill_lock ill_lock - * ill_mcast_qi ill_lock ill_lock - * ill_pending_mp ill_lock ill_lock - * - * ill_bcast_mp ipsq ipsq - * ill_resolver_mp ipsq only when ill is up + * ill_ilm ill_mcast_lock(WRITER) ill_mcast_lock(READER) + * ill_global_timer ill_mcast_lock(WRITER) ill_mcast_lock(READER) + * ill_mcast_type ill_mcast_lock(WRITER) ill_mcast_lock(READER) + * ill_mcast_v1_time ill_mcast_lock(WRITER) ill_mcast_lock(READER) + * ill_mcast_v2_time ill_mcast_lock(WRITER) ill_mcast_lock(READER) + * ill_mcast_v1_tset ill_mcast_lock(WRITER) ill_mcast_lock(READER) + * ill_mcast_v2_tset ill_mcast_lock(WRITER) ill_mcast_lock(READER) + * ill_mcast_rv ill_mcast_lock(WRITER) ill_mcast_lock(READER) + * ill_mcast_qi ill_mcast_lock(WRITER) ill_mcast_lock(READER) + * * ill_down_mp ipsq ipsq * ill_dlpi_deferred ill_lock ill_lock * ill_dlpi_pending ipsq + ill_lock ipsq or ill_lock or * absence of ipsq writer. * ill_phys_addr_mp ipsq + down ill only when ill is up + * ill_mcast_deferred ill_lock ill_lock * ill_phys_addr ipsq + down ill only when ill is up * ill_dest_addr_mp ipsq + down ill only when ill is up * ill_dest_addr ipsq + down ill only when ill is up @@ -2204,8 +1831,7 @@ typedef struct ill_s { * exclusive bit flags ipsq_t ipsq_t * shared bit flags ill_lock ill_lock * - * ill_arp_muxid ipsq Not atomic - * ill_ip_muxid ipsq Not atomic + * ill_muxid ipsq Not atomic * * ill_ipf_gen Not atomic * ill_frag_count atomics atomics @@ -2215,7 +1841,7 @@ typedef struct ill_s { * ill_dlpi_capab_state ipsq ipsq * ill_max_hops ipsq Not atomic * - * ill_max_mtu + * ill_mtu ill_lock None * * ill_user_mtu ipsq + ill_lock ill_lock * ill_reachable_time ipsq + ill_lock ill_lock @@ -2230,9 +1856,6 @@ typedef struct ill_s { * ill_xmit_count ipsq + down ill write once * ill_ip6_mib ipsq + down ill only when ill is up * ill_icmp6_mib ipsq + down ill only when ill is up - * ill_arp_down_mp ipsq ipsq - * ill_arp_del_mapping_mp ipsq ipsq - * ill_arp_on_mp ipsq ipsq * * ill_phyint ipsq, ill_g_lock, ill_lock Any of them * ill_flags ill_lock ill_lock @@ -2247,7 +1870,7 @@ typedef struct ill_s { * ill_refcnt ill_lock ill_lock * ill_ire_cnt ill_lock ill_lock * ill_cv ill_lock ill_lock - * ill_ilm_walker_cnt ill_lock ill_lock + * ill_ncec_cnt ill_lock ill_lock * ill_nce_cnt ill_lock ill_lock * ill_ilm_cnt ill_lock ill_lock * ill_src_ipif ill_g_lock ill_g_lock @@ -2256,8 +1879,12 @@ typedef struct ill_s { * ill_dhcpinit atomics atomics * ill_flownotify_mh write once write once * ill_capab_pending_cnt ipsq ipsq - * - * ill_bound_cnt ipsq ipsq + * ill_ipallmulti_cnt ill_lock ill_lock + * ill_ipallmulti_ilm ill_lock ill_lock + * ill_saved_ire_mp ill_saved_ire_lock ill_saved_ire_lock + * ill_saved_ire_cnt ill_saved_ire_lock ill_saved_ire_lock + * ill_arl ??? ??? + * ill_ire_multicast ipsq + quiescent none * ill_bound_ipif ipsq ipsq * ill_actnode ipsq + ipmp_lock ipsq OR ipmp_lock * ill_grpnode ipsq + ill_g_lock ipsq OR ill_g_lock @@ -2267,6 +1894,7 @@ typedef struct ill_s { * ill_refresh_tid ill_lock ill_lock * ill_grp (for IPMP ill) write once write once * ill_grp (for underlying ill) ipsq + ill_g_lock ipsq OR ill_g_lock + * ill_mrouter_cnt atomics atomics * * NOTE: It's OK to make heuristic decisions on an underlying interface * by using IS_UNDER_IPMP() or comparing ill_grp's raw pointer value. @@ -2311,7 +1939,6 @@ enum { IF_CMD = 1, LIF_CMD, ARP_CMD, XARP_CMD, MSFILT_CMD, MISC_CMD }; #define IPI_GET_CMD 0x8 /* branch to mi_copyout on success */ /* unused 0x10 */ #define IPI_NULL_BCONT 0x20 /* ioctl has not data and hence no b_cont */ -#define IPI_PASS_DOWN 0x40 /* pass this ioctl down when a module only */ extern ip_ioctl_cmd_t ip_ndx_ioctl_table[]; extern ip_ioctl_cmd_t ip_misc_ioctl_table[]; @@ -2362,6 +1989,430 @@ typedef struct ipndp_s { char *ip_ndp_name; } ipndp_t; +/* IXA Notification types */ +typedef enum { + IXAN_LSO, /* LSO capability change */ + IXAN_PMTU, /* PMTU change */ + IXAN_ZCOPY /* ZEROCOPY capability change */ +} ixa_notify_type_t; + +typedef uint_t ixa_notify_arg_t; + +typedef void (*ixa_notify_t)(void *, ip_xmit_attr_t *ixa, ixa_notify_type_t, + ixa_notify_arg_t); + +/* + * Attribute flags that are common to the transmit and receive attributes + */ +#define IAF_IS_IPV4 0x80000000 /* ipsec_*_v4 */ +#define IAF_TRUSTED_ICMP 0x40000000 /* ipsec_*_icmp_loopback */ +#define IAF_NO_LOOP_ZONEID_SET 0x20000000 /* Zone that shouldn't have */ + /* a copy */ +#define IAF_LOOPBACK_COPY 0x10000000 /* For multi and broadcast */ + +#define IAF_MASK 0xf0000000 /* Flags that are common */ + +/* + * Transmit side attributes used between the transport protocols and IP as + * well as inside IP. It is also used to cache information in the conn_t i.e. + * replaces conn_ire and the IPsec caching in the conn_t. + */ +struct ip_xmit_attr_s { + iaflags_t ixa_flags; /* IXAF_*. See below */ + + uint32_t ixa_free_flags; /* IXA_FREE_*. See below */ + uint32_t ixa_refcnt; /* Using atomics */ + + /* + * Always initialized independently of ixa_flags settings. + * Used by ip_xmit so we keep them up front for cache locality. + */ + uint32_t ixa_xmit_hint; /* For ECMP and GLD TX ring fanout */ + uint_t ixa_pktlen; /* Always set. For frag and stats */ + zoneid_t ixa_zoneid; /* Assumed always set */ + + /* Always set for conn_ip_output(); might be stale */ + /* + * Since TCP keeps the conn_t around past the process going away + * we need to use the "notr" (e.g, ire_refhold_notr) for ixa_ire, + * ixa_nce, and ixa_dce. + */ + ire_t *ixa_ire; /* Forwarding table entry */ + uint_t ixa_ire_generation; + nce_t *ixa_nce; /* Neighbor cache entry */ + dce_t *ixa_dce; /* Destination cache entry */ + uint_t ixa_dce_generation; + uint_t ixa_src_generation; /* If IXAF_VERIFY_SOURCE */ + + uint32_t ixa_src_preferences; /* prefs for src addr select */ + uint32_t ixa_pmtu; /* IXAF_VERIFY_PMTU */ + + /* Set by ULP if IXAF_VERIFY_PMTU; otherwise set by IP */ + uint32_t ixa_fragsize; + + int8_t ixa_use_min_mtu; /* IXAF_USE_MIN_MTU values */ + + pfirepostfrag_t ixa_postfragfn; /* Set internally in IP */ + + in6_addr_t ixa_nexthop_v6; /* IXAF_NEXTHOP_SET */ +#define ixa_nexthop_v4 V4_PART_OF_V6(ixa_nexthop_v6) + + zoneid_t ixa_no_loop_zoneid; /* IXAF_NO_LOOP_ZONEID_SET */ + + uint_t ixa_scopeid; /* For IPv6 link-locals */ + + uint_t ixa_broadcast_ttl; /* IXAF_BROACAST_TTL_SET */ + + uint_t ixa_multicast_ttl; /* Assumed set for multicast */ + uint_t ixa_multicast_ifindex; /* Assumed set for multicast */ + ipaddr_t ixa_multicast_ifaddr; /* Assumed set for multicast */ + + int ixa_raw_cksum_offset; /* If IXAF_SET_RAW_CKSUM */ + + uint32_t ixa_ident; /* For IPv6 fragment header */ + + /* + * Cached LSO information. + */ + ill_lso_capab_t ixa_lso_capab; /* Valid when IXAF_LSO_CAPAB */ + + uint64_t ixa_ipsec_policy_gen; /* Generation from iph_gen */ + /* + * The following IPsec fields are only initialized when + * IXAF_IPSEC_SECURE is set. Otherwise they contain garbage. + */ + ipsec_latch_t *ixa_ipsec_latch; /* Just the ids */ + struct ipsa_s *ixa_ipsec_ah_sa; /* Hard reference SA for AH */ + struct ipsa_s *ixa_ipsec_esp_sa; /* Hard reference SA for ESP */ + struct ipsec_policy_s *ixa_ipsec_policy; /* why are we here? */ + struct ipsec_action_s *ixa_ipsec_action; /* For reflected packets */ + ipsa_ref_t ixa_ipsec_ref[2]; /* Soft reference to SA */ + /* 0: ESP, 1: AH */ + + /* + * The selectors here are potentially different than the SPD rule's + * selectors, and we need to have both available for IKEv2. + * + * NOTE: "Source" and "Dest" are w.r.t. outbound datagrams. Ports can + * be zero, and the protocol number is needed to make the ports + * significant. + */ + uint16_t ixa_ipsec_src_port; /* Source port number of d-gram. */ + uint16_t ixa_ipsec_dst_port; /* Destination port number of d-gram. */ + uint8_t ixa_ipsec_icmp_type; /* ICMP type of d-gram */ + uint8_t ixa_ipsec_icmp_code; /* ICMP code of d-gram */ + + sa_family_t ixa_ipsec_inaf; /* Inner address family */ +#define IXA_MAX_ADDRLEN 4 /* Max addr len. (in 32-bit words) */ + uint32_t ixa_ipsec_insrc[IXA_MAX_ADDRLEN]; /* Inner src address */ + uint32_t ixa_ipsec_indst[IXA_MAX_ADDRLEN]; /* Inner dest address */ + uint8_t ixa_ipsec_insrcpfx; /* Inner source prefix */ + uint8_t ixa_ipsec_indstpfx; /* Inner destination prefix */ + + uint8_t ixa_ipsec_proto; /* IP protocol number for d-gram. */ + + /* Always initialized independently of ixa_flags settings */ + uint_t ixa_ifindex; /* Assumed always set */ + uint16_t ixa_ip_hdr_length; /* Points to ULP header */ + uint8_t ixa_protocol; /* Protocol number for ULP cksum */ + ts_label_t *ixa_tsl; /* Always set. NULL if not TX */ + ip_stack_t *ixa_ipst; /* Always set */ + uint32_t ixa_extra_ident; /* Set if LSO */ + cred_t *ixa_cred; /* For getpeerucred */ + pid_t ixa_cpid; /* For getpeerucred */ + +#ifdef DEBUG + kthread_t *ixa_curthread; /* For serialization assert */ +#endif + squeue_t *ixa_sqp; /* Set from conn_sqp as a hint */ + uintptr_t ixa_cookie; /* cookie to use for tx flow control */ + + /* + * Must be set by ULP if any of IXAF_VERIFY_LSO, IXAF_VERIFY_PMTU, + * or IXAF_VERIFY_ZCOPY is set. + */ + ixa_notify_t ixa_notify; /* Registered upcall notify function */ + void *ixa_notify_cookie; /* ULP cookie for ixa_notify */ +}; + +/* + * Flags to indicate which transmit attributes are set. + * Split into "xxx_SET" ones which indicate that the "xxx" field it set, and + * single flags. + */ +#define IXAF_REACH_CONF 0x00000001 /* Reachability confirmation */ +#define IXAF_BROADCAST_TTL_SET 0x00000002 /* ixa_broadcast_ttl valid */ +#define IXAF_SET_SOURCE 0x00000004 /* Replace if broadcast */ +#define IXAF_USE_MIN_MTU 0x00000008 /* IPV6_USE_MIN_MTU */ + +#define IXAF_DONTFRAG 0x00000010 /* IP*_DONTFRAG */ +#define IXAF_VERIFY_PMTU 0x00000020 /* ixa_pmtu/ixa_fragsize set */ +#define IXAF_PMTU_DISCOVERY 0x00000040 /* Create/use PMTU state */ +#define IXAF_MULTICAST_LOOP 0x00000080 /* IP_MULTICAST_LOOP */ + +#define IXAF_IPSEC_SECURE 0x00000100 /* Need IPsec processing */ +#define IXAF_UCRED_TSL 0x00000200 /* ixa_tsl from SCM_UCRED */ +#define IXAF_DONTROUTE 0x00000400 /* SO_DONTROUTE */ +#define IXAF_NO_IPSEC 0x00000800 /* Ignore policy */ + +#define IXAF_PMTU_TOO_SMALL 0x00001000 /* PMTU too small */ +#define IXAF_SET_ULP_CKSUM 0x00002000 /* Calculate ULP checksum */ +#define IXAF_VERIFY_SOURCE 0x00004000 /* Check that source is ok */ +#define IXAF_NEXTHOP_SET 0x00008000 /* ixa_nexthop set */ + +#define IXAF_PMTU_IPV4_DF 0x00010000 /* Set IPv4 DF */ +#define IXAF_NO_DEV_FLOW_CTL 0x00020000 /* Protocol needs no flow ctl */ +#define IXAF_NO_TTL_CHANGE 0x00040000 /* Internal to IP */ +#define IXAF_IPV6_ADD_FRAGHDR 0x00080000 /* Add fragment header */ + +#define IXAF_IPSEC_TUNNEL 0x00100000 /* Tunnel mode */ +#define IXAF_NO_PFHOOK 0x00200000 /* Skip xmit pfhook */ +#define IXAF_NO_TRACE 0x00400000 /* When back from ARP/ND */ +#define IXAF_SCOPEID_SET 0x00800000 /* ixa_scopeid set */ + +#define IXAF_MULTIRT_MULTICAST 0x01000000 /* MULTIRT for multicast */ +#define IXAF_NO_HW_CKSUM 0x02000000 /* Force software cksum */ +#define IXAF_SET_RAW_CKSUM 0x04000000 /* Use ixa_raw_cksum_offset */ +#define IXAF_IPSEC_GLOBAL_POLICY 0x08000000 /* Policy came from global */ + +/* Note the following uses bits 0x10000000 through 0x80000000 */ +#define IXAF_IS_IPV4 IAF_IS_IPV4 +#define IXAF_TRUSTED_ICMP IAF_TRUSTED_ICMP +#define IXAF_NO_LOOP_ZONEID_SET IAF_NO_LOOP_ZONEID_SET +#define IXAF_LOOPBACK_COPY IAF_LOOPBACK_COPY + +/* Note: use the upper 32 bits */ +#define IXAF_VERIFY_LSO 0x100000000 /* Check LSO capability */ +#define IXAF_LSO_CAPAB 0x200000000 /* Capable of LSO */ +#define IXAF_VERIFY_ZCOPY 0x400000000 /* Check Zero Copy capability */ +#define IXAF_ZCOPY_CAPAB 0x800000000 /* Capable of ZEROCOPY */ + +/* + * The normal flags for sending packets e.g., icmp errors + */ +#define IXAF_BASIC_SIMPLE_V4 (IXAF_SET_ULP_CKSUM | IXAF_IS_IPV4) +#define IXAF_BASIC_SIMPLE_V6 (IXAF_SET_ULP_CKSUM) + +/* + * Normally these fields do not have a hold. But in some cases they do, for + * instance when we've gone through ip_*_attr_to/from_mblk. + * We use ixa_free_flags to indicate that they have a hold and need to be + * released on cleanup. + */ +#define IXA_FREE_CRED 0x00000001 /* ixa_cred needs to be rele */ +#define IXA_FREE_TSL 0x00000002 /* ixa_tsl needs to be rele */ + +/* + * Simplistic way to set the ixa_xmit_hint for locally generated traffic + * and forwarded traffic. The shift amount are based on the size of the + * structs to discard the low order bits which don't have much if any variation + * (coloring in kmem_cache_alloc might provide some variation). + * + * Basing the locally generated hint on the address of the conn_t means that + * the packets from the same socket/connection do not get reordered. + * Basing the hint for forwarded traffic on the ill_ring_t means that + * packets from the same NIC+ring are likely to use the same outbound ring + * hence we get low contention on the ring in the transmitting driver. + */ +#define CONN_TO_XMIT_HINT(connp) ((uint32_t)(((uintptr_t)connp) >> 11)) +#define ILL_RING_TO_XMIT_HINT(ring) ((uint32_t)(((uintptr_t)ring) >> 7)) + +/* + * IP set Destination Flags used by function ip_set_destination, + * ip_attr_connect, and conn_connect. + */ +#define IPDF_ALLOW_MCBC 0x1 /* Allow multi/broadcast */ +#define IPDF_VERIFY_DST 0x2 /* Verify destination addr */ +#define IPDF_SELECT_SRC 0x4 /* Select source address */ +#define IPDF_LSO 0x8 /* Try LSO */ +#define IPDF_IPSEC 0x10 /* Set IPsec policy */ +#define IPDF_ZONE_IS_GLOBAL 0x20 /* From conn_zone_is_global */ +#define IPDF_ZCOPY 0x40 /* Try ZEROCOPY */ +#define IPDF_UNIQUE_DCE 0x80 /* Get a per-destination DCE */ + +/* + * Receive side attributes used between the transport protocols and IP as + * well as inside IP. + */ +struct ip_recv_attr_s { + iaflags_t ira_flags; /* See below */ + + uint32_t ira_free_flags; /* IRA_FREE_*. See below */ + + /* + * This is a hint for TCP SYN packets. + * Always initialized independently of ira_flags settings + */ + squeue_t *ira_sqp; + ill_rx_ring_t *ira_ring; /* Internal to IP */ + + /* For ip_accept_tcp when IRAF_TARGET_SQP is set */ + squeue_t *ira_target_sqp; + mblk_t *ira_target_sqp_mp; + + /* Always initialized independently of ira_flags settings */ + uint32_t ira_xmit_hint; /* For ECMP and GLD TX ring fanout */ + zoneid_t ira_zoneid; /* ALL_ZONES unless local delivery */ + uint_t ira_pktlen; /* Always set. For frag and stats */ + uint16_t ira_ip_hdr_length; /* Points to ULP header */ + uint8_t ira_protocol; /* Protocol number for ULP cksum */ + uint_t ira_rifindex; /* Received ifindex */ + uint_t ira_ruifindex; /* Received upper ifindex */ + ts_label_t *ira_tsl; /* Always set. NULL if not TX */ + /* + * ira_rill and ira_ill is set inside IP, but not when conn_recv is + * called; ULPs should use ira_ruifindex instead. + */ + ill_t *ira_rill; /* ill where packet came */ + ill_t *ira_ill; /* ill where IP address hosted */ + cred_t *ira_cred; /* For getpeerucred */ + pid_t ira_cpid; /* For getpeerucred */ + + /* Used when IRAF_VERIFIED_SRC is set; this source was ok */ + ipaddr_t ira_verified_src; + + /* + * The following IPsec fields are only initialized when + * IRAF_IPSEC_SECURE is set. Otherwise they contain garbage. + */ + struct ipsec_action_s *ira_ipsec_action; /* how we made it in.. */ + struct ipsa_s *ira_ipsec_ah_sa; /* SA for AH */ + struct ipsa_s *ira_ipsec_esp_sa; /* SA for ESP */ + + ipaddr_t ira_mroute_tunnel; /* IRAF_MROUTE_TUNNEL_SET */ + + zoneid_t ira_no_loop_zoneid; /* IRAF_NO_LOOP_ZONEID_SET */ + + uint32_t ira_esp_udp_ports; /* IRAF_ESP_UDP_PORTS */ + + /* + * For IP_RECVSLLA and ip_ndp_conflict/find_solicitation. + * Same size as max for sockaddr_dl + */ +#define IRA_L2SRC_SIZE 244 + uint8_t ira_l2src[IRA_L2SRC_SIZE]; /* If IRAF_L2SRC_SET */ + + /* + * Local handle that we use to do lazy setting of ira_l2src. + * We defer setting l2src until needed but we do before any + * ip_input pullupmsg or copymsg. + */ + struct mac_header_info_s *ira_mhip; /* Could be NULL */ +}; + +/* + * Flags to indicate which receive attributes are set. + */ +#define IRAF_SYSTEM_LABELED 0x00000001 /* is_system_labeled() */ +#define IRAF_IPV4_OPTIONS 0x00000002 /* Performance */ +#define IRAF_MULTICAST 0x00000004 /* Was multicast at L3 */ +#define IRAF_BROADCAST 0x00000008 /* Was broadcast at L3 */ +#define IRAF_MULTIBROADCAST (IRAF_MULTICAST|IRAF_BROADCAST) + +#define IRAF_LOOPBACK 0x00000010 /* Looped back by IP */ +#define IRAF_VERIFY_IP_CKSUM 0x00000020 /* Need to verify IP */ +#define IRAF_VERIFY_ULP_CKSUM 0x00000040 /* Need to verify TCP,UDP,etc */ +#define IRAF_SCTP_CSUM_ERR 0x00000080 /* sctp pkt has failed chksum */ + +#define IRAF_IPSEC_SECURE 0x00000100 /* Passed AH and/or ESP */ +#define IRAF_DHCP_UNICAST 0x00000200 +#define IRAF_IPSEC_DECAPS 0x00000400 /* Was packet decapsulated */ + /* from a matching inner packet? */ +#define IRAF_TARGET_SQP 0x00000800 /* ira_target_sqp is set */ +#define IRAF_VERIFIED_SRC 0x00001000 /* ira_verified_src set */ +#define IRAF_RSVP 0x00002000 /* RSVP packet for rsvpd */ +#define IRAF_MROUTE_TUNNEL_SET 0x00004000 /* From ip_mroute_decap */ +#define IRAF_PIM_REGISTER 0x00008000 /* From register_mforward */ + +#define IRAF_TX_MAC_EXEMPTABLE 0x00010000 /* Allow MAC_EXEMPT readdown */ +#define IRAF_TX_SHARED_ADDR 0x00020000 /* Arrived on ALL_ZONES addr */ +#define IRAF_ESP_UDP_PORTS 0x00040000 /* NAT-traversal packet */ +#define IRAF_NO_HW_CKSUM 0x00080000 /* Force software cksum */ + +#define IRAF_ICMP_ERROR 0x00100000 /* Send to conn_recvicmp */ +#define IRAF_ROUTER_ALERT 0x00200000 /* IPv6 router alert */ +#define IRAF_L2SRC_SET 0x00400000 /* ira_l2src has been set */ +#define IRAF_L2SRC_LOOPBACK 0x00800000 /* Came from us */ + +#define IRAF_L2DST_MULTICAST 0x01000000 /* Multicast at L2 */ +#define IRAF_L2DST_BROADCAST 0x02000000 /* Broadcast at L2 */ +/* Unused 0x04000000 */ +/* Unused 0x08000000 */ + +/* Below starts with 0x10000000 */ +#define IRAF_IS_IPV4 IAF_IS_IPV4 +#define IRAF_TRUSTED_ICMP IAF_TRUSTED_ICMP +#define IRAF_NO_LOOP_ZONEID_SET IAF_NO_LOOP_ZONEID_SET +#define IRAF_LOOPBACK_COPY IAF_LOOPBACK_COPY + +/* + * Normally these fields do not have a hold. But in some cases they do, for + * instance when we've gone through ip_*_attr_to/from_mblk. + * We use ira_free_flags to indicate that they have a hold and need to be + * released on cleanup. + */ +#define IRA_FREE_CRED 0x00000001 /* ira_cred needs to be rele */ +#define IRA_FREE_TSL 0x00000002 /* ira_tsl needs to be rele */ + +/* + * Optional destination cache entry for path MTU information, + * and ULP metrics. + */ +struct dce_s { + uint_t dce_generation; /* Changed since cached? */ + uint_t dce_flags; /* See below */ + uint_t dce_ipversion; /* IPv4/IPv6 version */ + uint32_t dce_pmtu; /* Path MTU if DCEF_PMTU */ + uint32_t dce_ident; /* Per destination IP ident. */ + iulp_t dce_uinfo; /* Metrics if DCEF_UINFO */ + + struct dce_s *dce_next; + struct dce_s **dce_ptpn; + struct dcb_s *dce_bucket; + + union { + in6_addr_t dceu_v6addr; + ipaddr_t dceu_v4addr; + } dce_u; +#define dce_v4addr dce_u.dceu_v4addr +#define dce_v6addr dce_u.dceu_v6addr + /* Note that for IPv6+IPMP we use the ifindex for the upper interface */ + uint_t dce_ifindex; /* For IPv6 link-locals */ + + kmutex_t dce_lock; + uint_t dce_refcnt; + uint64_t dce_last_change_time; /* Path MTU. In seconds */ + + ip_stack_t *dce_ipst; /* Does not have a netstack_hold */ +}; + +/* + * Values for dce_generation. + * + * If a DCE has DCE_GENERATION_CONDEMNED, the last dce_refrele should delete + * it. + * + * DCE_GENERATION_VERIFY is never stored in dce_generation but it is + * stored in places that cache DCE (such as ixa_dce_generation). + * It is used as a signal that the cache is stale and needs to be reverified. + */ +#define DCE_GENERATION_CONDEMNED 0 +#define DCE_GENERATION_VERIFY 1 +#define DCE_GENERATION_INITIAL 2 +#define DCE_IS_CONDEMNED(dce) \ + ((dce)->dce_generation == DCE_GENERATION_CONDEMNED) + + +/* + * Values for ips_src_generation. + * + * SRC_GENERATION_VERIFY is never stored in ips_src_generation but it is + * stored in places that cache IREs (ixa_src_generation). It is used as a + * signal that the cache is stale and needs to be reverified. + */ +#define SRC_GENERATION_VERIFY 0 +#define SRC_GENERATION_INITIAL 1 + /* * The kernel stores security attributes of all gateways in a database made * up of one or more tsol_gcdb_t elements. Each tsol_gcdb_t contains the @@ -2453,183 +2504,28 @@ extern kmutex_t gcgrp_lock; */ struct tsol_tnrhc; -typedef struct tsol_ire_gw_secattr_s { +struct tsol_ire_gw_secattr_s { kmutex_t igsa_lock; /* lock to protect following */ struct tsol_tnrhc *igsa_rhc; /* host entry for gateway */ tsol_gc_t *igsa_gc; /* for prefix IREs */ - tsol_gcgrp_t *igsa_gcgrp; /* for cache IREs */ -} tsol_ire_gw_secattr_t; - -/* - * Following are the macros to increment/decrement the reference - * count of the IREs and IRBs (ire bucket). - * - * 1) We bump up the reference count of an IRE to make sure that - * it does not get deleted and freed while we are using it. - * Typically all the lookup functions hold the bucket lock, - * and look for the IRE. If it finds an IRE, it bumps up the - * reference count before dropping the lock. Sometimes we *may* want - * to bump up the reference count after we *looked* up i.e without - * holding the bucket lock. So, the IRE_REFHOLD macro does not assert - * on the bucket lock being held. Any thread trying to delete from - * the hash bucket can still do so but cannot free the IRE if - * ire_refcnt is not 0. - * - * 2) We bump up the reference count on the bucket where the IRE resides - * (IRB), when we want to prevent the IREs getting deleted from a given - * hash bucket. This makes life easier for ire_walk type functions which - * wants to walk the IRE list, call a function, but needs to drop - * the bucket lock to prevent recursive rw_enters. While the - * lock is dropped, the list could be changed by other threads or - * the same thread could end up deleting the ire or the ire pointed by - * ire_next. IRE_REFHOLDing the ire or ire_next is not sufficient as - * a delete will still remove the ire from the bucket while we have - * dropped the lock and hence the ire_next would be NULL. Thus, we - * need a mechanism to prevent deletions from a given bucket. - * - * To prevent deletions, we bump up the reference count on the - * bucket. If the bucket is held, ire_delete just marks IRE_MARK_CONDEMNED - * both on the ire's ire_marks and the bucket's irb_marks. When the - * reference count on the bucket drops to zero, all the CONDEMNED ires - * are deleted. We don't have to bump up the reference count on the - * bucket if we are walking the bucket and never have to drop the bucket - * lock. Note that IRB_REFHOLD does not prevent addition of new ires - * in the list. It is okay because addition of new ires will not cause - * ire_next to point to freed memory. We do IRB_REFHOLD only when - * all of the 3 conditions are true : - * - * 1) The code needs to walk the IRE bucket from start to end. - * 2) It may have to drop the bucket lock sometimes while doing (1) - * 3) It does not want any ires to be deleted meanwhile. - */ - -/* - * Bump up the reference count on the IRE. We cannot assert that the - * bucket lock is being held as it is legal to bump up the reference - * count after the first lookup has returned the IRE without - * holding the lock. Currently ip_wput does this for caching IRE_CACHEs. - */ - -#ifdef DEBUG -#define IRE_UNTRACE_REF(ire) ire_untrace_ref(ire); -#define IRE_TRACE_REF(ire) ire_trace_ref(ire); -#else -#define IRE_UNTRACE_REF(ire) -#define IRE_TRACE_REF(ire) -#endif - -#define IRE_REFHOLD_NOTR(ire) { \ - atomic_add_32(&(ire)->ire_refcnt, 1); \ - ASSERT((ire)->ire_refcnt != 0); \ -} - -#define IRE_REFHOLD(ire) { \ - IRE_REFHOLD_NOTR(ire); \ - IRE_TRACE_REF(ire); \ -} - -#define IRE_REFHOLD_LOCKED(ire) { \ - IRE_TRACE_REF(ire); \ - (ire)->ire_refcnt++; \ -} - -/* - * Decrement the reference count on the IRE. - * In architectures e.g sun4u, where atomic_add_32_nv is just - * a cas, we need to maintain the right memory barrier semantics - * as that of mutex_exit i.e all the loads and stores should complete - * before the cas is executed. membar_exit() does that here. - * - * NOTE : This macro is used only in places where we want performance. - * To avoid bloating the code, we use the function "ire_refrele" - * which essentially calls the macro. - */ -#define IRE_REFRELE_NOTR(ire) { \ - ASSERT((ire)->ire_refcnt != 0); \ - membar_exit(); \ - if (atomic_add_32_nv(&(ire)->ire_refcnt, -1) == 0) \ - ire_inactive(ire); \ -} - -#define IRE_REFRELE(ire) { \ - if (ire->ire_bucket != NULL) { \ - IRE_UNTRACE_REF(ire); \ - } \ - IRE_REFRELE_NOTR(ire); \ -} - -/* - * Bump up the reference count on the hash bucket - IRB to - * prevent ires from being deleted in this bucket. - */ -#define IRB_REFHOLD(irb) { \ - rw_enter(&(irb)->irb_lock, RW_WRITER); \ - (irb)->irb_refcnt++; \ - ASSERT((irb)->irb_refcnt != 0); \ - rw_exit(&(irb)->irb_lock); \ -} -#define IRB_REFHOLD_LOCKED(irb) { \ - ASSERT(RW_WRITE_HELD(&(irb)->irb_lock)); \ - (irb)->irb_refcnt++; \ - ASSERT((irb)->irb_refcnt != 0); \ -} +}; void irb_refrele_ftable(irb_t *); -/* - * Note: when IRB_MARK_FTABLE (i.e., IRE_CACHETABLE entry), the irb_t - * is statically allocated, so that when the irb_refcnt goes to 0, - * we simply clean up the ire list and continue. - */ -#define IRB_REFRELE(irb) { \ - if ((irb)->irb_marks & IRB_MARK_FTABLE) { \ - irb_refrele_ftable((irb)); \ - } else { \ - rw_enter(&(irb)->irb_lock, RW_WRITER); \ - ASSERT((irb)->irb_refcnt != 0); \ - if (--(irb)->irb_refcnt == 0 && \ - ((irb)->irb_marks & IRE_MARK_CONDEMNED)) { \ - ire_t *ire_list; \ - \ - ire_list = ire_unlink(irb); \ - rw_exit(&(irb)->irb_lock); \ - ASSERT(ire_list != NULL); \ - ire_cleanup(ire_list); \ - } else { \ - rw_exit(&(irb)->irb_lock); \ - } \ - } \ -} extern struct kmem_cache *rt_entry_cache; -/* - * Lock the fast path mp for access, since the fp_mp can be deleted - * due a DL_NOTE_FASTPATH_FLUSH in the case of IRE_BROADCAST - */ - -#define LOCK_IRE_FP_MP(ire) { \ - if ((ire)->ire_type == IRE_BROADCAST) \ - mutex_enter(&ire->ire_nce->nce_lock); \ - } -#define UNLOCK_IRE_FP_MP(ire) { \ - if ((ire)->ire_type == IRE_BROADCAST) \ - mutex_exit(&ire->ire_nce->nce_lock); \ - } - typedef struct ire4 { - ipaddr_t ire4_src_addr; /* Source address to use. */ ipaddr_t ire4_mask; /* Mask for matching this IRE. */ ipaddr_t ire4_addr; /* Address this IRE represents. */ - ipaddr_t ire4_gateway_addr; /* Gateway if IRE_CACHE/IRE_OFFSUBNET */ - ipaddr_t ire4_cmask; /* Mask from parent prefix route */ + ipaddr_t ire4_gateway_addr; /* Gateway including for IRE_ONLINK */ + ipaddr_t ire4_setsrc_addr; /* RTF_SETSRC */ } ire4_t; typedef struct ire6 { - in6_addr_t ire6_src_addr; /* Source address to use. */ in6_addr_t ire6_mask; /* Mask for matching this IRE. */ in6_addr_t ire6_addr; /* Address this IRE represents. */ - in6_addr_t ire6_gateway_addr; /* Gateway if IRE_CACHE/IRE_OFFSUBNET */ - in6_addr_t ire6_cmask; /* Mask from parent prefix route */ + in6_addr_t ire6_gateway_addr; /* Gateway including for IRE_ONLINK */ + in6_addr_t ire6_setsrc_addr; /* RTF_SETSRC */ } ire6_t; typedef union ire_addr { @@ -2637,115 +2533,131 @@ typedef union ire_addr { ire4_t ire4_u; } ire_addr_u_t; -/* Internet Routing Entry */ -typedef struct ire_s { +/* + * Internet Routing Entry + * When we have multiple identical IREs we logically add them by manipulating + * ire_identical_ref and ire_delete first decrements + * that and when it reaches 1 we know it is the last IRE. + * "identical" is defined as being the same for: + * ire_addr, ire_netmask, ire_gateway, ire_ill, ire_zoneid, and ire_type + * For instance, multiple IRE_BROADCASTs for the same subnet number are + * viewed as identical, and so are the IRE_INTERFACEs when there are + * multiple logical interfaces (on the same ill) with the same subnet prefix. + */ +struct ire_s { struct ire_s *ire_next; /* The hash chain must be first. */ struct ire_s **ire_ptpn; /* Pointer to previous next. */ uint32_t ire_refcnt; /* Number of references */ - mblk_t *ire_mp; /* Non-null if allocated as mblk */ - queue_t *ire_rfq; /* recv from this queue */ - queue_t *ire_stq; /* send to this queue */ - union { - uint_t *max_fragp; /* Used only during ire creation */ - uint_t max_frag; /* MTU (next hop or path). */ - } imf_u; -#define ire_max_frag imf_u.max_frag -#define ire_max_fragp imf_u.max_fragp - uint32_t ire_frag_flag; /* IPH_DF or zero. */ - uint32_t ire_ident; /* Per IRE IP ident. */ - uint32_t ire_tire_mark; /* Used for reclaim of unused. */ + ill_t *ire_ill; + uint32_t ire_identical_ref; /* IRE_INTERFACE, IRE_BROADCAST */ uchar_t ire_ipversion; /* IPv4/IPv6 version */ - uchar_t ire_marks; /* IRE_MARK_CONDEMNED etc. */ ushort_t ire_type; /* Type of IRE */ + uint_t ire_generation; /* Generation including CONDEMNED */ uint_t ire_ib_pkt_count; /* Inbound packets for ire_addr */ uint_t ire_ob_pkt_count; /* Outbound packets to ire_addr */ - uint_t ire_ll_hdr_length; /* Non-zero if we do M_DATA prepends */ time_t ire_create_time; /* Time (in secs) IRE was created. */ - uint32_t ire_phandle; /* Associate prefix IREs to cache */ - uint32_t ire_ihandle; /* Associate interface IREs to cache */ - ipif_t *ire_ipif; /* the interface that this ire uses */ uint32_t ire_flags; /* flags related to route (RTF_*) */ /* - * Neighbor Cache Entry for IPv6; arp info for IPv4 + * ire_testhidden is TRUE for INTERFACE IREs of IS_UNDER_IPMP(ill) + * interfaces */ - struct nce_s *ire_nce; + boolean_t ire_testhidden; + pfirerecv_t ire_recvfn; /* Receive side handling */ + pfiresend_t ire_sendfn; /* Send side handling */ + pfirepostfrag_t ire_postfragfn; /* Bottom end of send handling */ + uint_t ire_masklen; /* # bits in ire_mask{,_v6} */ ire_addr_u_t ire_u; /* IPv4/IPv6 address info. */ irb_t *ire_bucket; /* Hash bucket when ire_ptphn is set */ - iulp_t ire_uinfo; /* Upper layer protocol info. */ - /* - * Protects ire_uinfo, ire_max_frag, and ire_frag_flag. - */ kmutex_t ire_lock; - uint_t ire_ipif_seqid; /* ipif_seqid of ire_ipif */ - uint_t ire_ipif_ifindex; /* ifindex associated with ipif */ - clock_t ire_last_used_time; /* Last used time */ + clock_t ire_last_used_time; /* For IRE_LOCAL reception */ tsol_ire_gw_secattr_t *ire_gw_secattr; /* gateway security attributes */ - zoneid_t ire_zoneid; /* for local address discrimination */ + zoneid_t ire_zoneid; + + /* + * Cached information of where to send packets that match this route. + * The ire_dep_* information is used to determine when ire_nce_cache + * needs to be updated. + * ire_nce_cache is the fastpath for the Neighbor Cache Entry + * for IPv6; arp info for IPv4 + * Since this is a cache setup and torn down independently of + * applications we need to use nce_ref{rele,hold}_notr for it. + */ + nce_t *ire_nce_cache; + + /* + * Quick check whether the ire_type and ire_masklen indicates + * that the IRE can have ire_nce_cache set i.e., whether it is + * IRE_ONLINK and for a single destination. + */ + boolean_t ire_nce_capable; + /* - * ire's that are embedded inside mblk_t and sent to the external - * resolver use the ire_stq_ifindex to track the ifindex of the - * ire_stq, so that the ill (if it exists) can be correctly recovered - * for cleanup in the esbfree routine when arp failure occurs. - * Similarly, the ire_stackid is used to recover the ip_stack_t. + * Dependency tracking so we can safely cache IRE and NCE pointers + * in offlink and onlink IREs. + * These are locked under the ips_ire_dep_lock rwlock. Write held + * when modifying the linkage. + * ire_dep_parent (Also chain towards IRE for nexthop) + * ire_dep_parent_generation: ire_generation of ire_dep_parent + * ire_dep_children (From parent to first child) + * ire_dep_sib_next (linked list of siblings) + * ire_dep_sib_ptpn (linked list of siblings) + * + * The parent has a ire_refhold on each child, and each child has + * an ire_refhold on its parent. + * Since ire_dep_parent is a cache setup and torn down independently of + * applications we need to use ire_ref{rele,hold}_notr for it. */ - uint_t ire_stq_ifindex; - netstackid_t ire_stackid; + ire_t *ire_dep_parent; + ire_t *ire_dep_children; + ire_t *ire_dep_sib_next; + ire_t **ire_dep_sib_ptpn; /* Pointer to previous next */ + uint_t ire_dep_parent_generation; + + uint_t ire_badcnt; /* Number of times ND_UNREACHABLE */ + uint64_t ire_last_badcnt; /* In seconds */ + + /* ire_defense* and ire_last_used_time are only used on IRE_LOCALs */ uint_t ire_defense_count; /* number of ARP conflicts */ uint_t ire_defense_time; /* last time defended (secs) */ + boolean_t ire_trace_disable; /* True when alloc fails */ ip_stack_t *ire_ipst; /* Does not have a netstack_hold */ -} ire_t; + iulp_t ire_metrics; +}; /* IPv4 compatibility macros */ -#define ire_src_addr ire_u.ire4_u.ire4_src_addr #define ire_mask ire_u.ire4_u.ire4_mask #define ire_addr ire_u.ire4_u.ire4_addr #define ire_gateway_addr ire_u.ire4_u.ire4_gateway_addr -#define ire_cmask ire_u.ire4_u.ire4_cmask +#define ire_setsrc_addr ire_u.ire4_u.ire4_setsrc_addr -#define ire_src_addr_v6 ire_u.ire6_u.ire6_src_addr #define ire_mask_v6 ire_u.ire6_u.ire6_mask #define ire_addr_v6 ire_u.ire6_u.ire6_addr #define ire_gateway_addr_v6 ire_u.ire6_u.ire6_gateway_addr -#define ire_cmask_v6 ire_u.ire6_u.ire6_cmask - -/* Convenient typedefs for sockaddrs */ -typedef struct sockaddr_in sin_t; -typedef struct sockaddr_in6 sin6_t; - -/* Address structure used for internal bind with IP */ -typedef struct ipa_conn_s { - ipaddr_t ac_laddr; - ipaddr_t ac_faddr; - uint16_t ac_fport; - uint16_t ac_lport; -} ipa_conn_t; - -typedef struct ipa6_conn_s { - in6_addr_t ac6_laddr; - in6_addr_t ac6_faddr; - uint16_t ac6_fport; - uint16_t ac6_lport; -} ipa6_conn_t; +#define ire_setsrc_addr_v6 ire_u.ire6_u.ire6_setsrc_addr /* - * Using ipa_conn_x_t or ipa6_conn_x_t allows us to modify the behavior of IP's - * bind handler. + * Values for ire_generation. + * + * If an IRE is marked with IRE_IS_CONDEMNED, the last walker of + * the bucket should delete this IRE from this bucket. + * + * IRE_GENERATION_VERIFY is never stored in ire_generation but it is + * stored in places that cache IREs (such as ixa_ire_generation and + * ire_dep_parent_generation). It is used as a signal that the cache is + * stale and needs to be reverified. */ -typedef struct ipa_conn_extended_s { - uint64_t acx_flags; - ipa_conn_t acx_conn; -} ipa_conn_x_t; +#define IRE_GENERATION_CONDEMNED 0 +#define IRE_GENERATION_VERIFY 1 +#define IRE_GENERATION_INITIAL 2 +#define IRE_IS_CONDEMNED(ire) \ + ((ire)->ire_generation == IRE_GENERATION_CONDEMNED) -typedef struct ipa6_conn_extended_s { - uint64_t ac6x_flags; - ipa6_conn_t ac6x_conn; -} ipa6_conn_x_t; - -/* flag values for ipa_conn_x_t and ipa6_conn_x_t. */ -#define ACX_VERIFY_DST 0x1ULL /* verify destination address is reachable */ +/* Convenient typedefs for sockaddrs */ +typedef struct sockaddr_in sin_t; +typedef struct sockaddr_in6 sin6_t; /* Name/Value Descriptor. */ typedef struct nv_s { @@ -2784,110 +2696,83 @@ extern uint_t ip_max_frag_dups; * to support the needs of such tools and private definitions moved to * private headers. */ -struct ip6_pkt_s { +struct ip_pkt_s { uint_t ipp_fields; /* Which fields are valid */ - uint_t ipp_sticky_ignored; /* sticky fields to ignore */ - uint_t ipp_ifindex; /* pktinfo ifindex */ in6_addr_t ipp_addr; /* pktinfo src/dst addr */ - uint_t ipp_unicast_hops; /* IPV6_UNICAST_HOPS */ - uint_t ipp_multicast_hops; /* IPV6_MULTICAST_HOPS */ +#define ipp_addr_v4 V4_PART_OF_V6(ipp_addr) + uint_t ipp_unicast_hops; /* IPV6_UNICAST_HOPS, IP_TTL */ uint_t ipp_hoplimit; /* IPV6_HOPLIMIT */ uint_t ipp_hopoptslen; - uint_t ipp_rtdstoptslen; + uint_t ipp_rthdrdstoptslen; uint_t ipp_rthdrlen; uint_t ipp_dstoptslen; - uint_t ipp_pathmtulen; uint_t ipp_fraghdrlen; ip6_hbh_t *ipp_hopopts; - ip6_dest_t *ipp_rtdstopts; + ip6_dest_t *ipp_rthdrdstopts; ip6_rthdr_t *ipp_rthdr; ip6_dest_t *ipp_dstopts; ip6_frag_t *ipp_fraghdr; - struct ip6_mtuinfo *ipp_pathmtu; - in6_addr_t ipp_nexthop; /* Transmit only */ - uint8_t ipp_tclass; - int8_t ipp_use_min_mtu; + uint8_t ipp_tclass; /* IPV6_TCLASS */ + uint8_t ipp_type_of_service; /* IP_TOS */ + uint_t ipp_ipv4_options_len; /* Len of IPv4 options */ + uint8_t *ipp_ipv4_options; /* Ptr to IPv4 options */ + uint_t ipp_label_len_v4; /* Len of TX label for IPv4 */ + uint8_t *ipp_label_v4; /* TX label for IPv4 */ + uint_t ipp_label_len_v6; /* Len of TX label for IPv6 */ + uint8_t *ipp_label_v6; /* TX label for IPv6 */ }; -typedef struct ip6_pkt_s ip6_pkt_t; - -extern void ip6_pkt_free(ip6_pkt_t *); /* free storage inside ip6_pkt_t */ - -/* - * This struct is used by ULP_opt_set() functions to return value of IPv4 - * ancillary options. Currently this is only used by udp and icmp and only - * IP_PKTINFO option is supported. - */ -typedef struct ip4_pkt_s { - uint_t ip4_ill_index; /* interface index */ - ipaddr_t ip4_addr; /* source address */ -} ip4_pkt_t; - -/* - * Used by ULP's to pass options info to ip_output - * currently only IP_PKTINFO is supported. - */ -typedef struct ip_opt_info_s { - uint_t ip_opt_ill_index; - uint_t ip_opt_flags; -} ip_opt_info_t; - -/* - * value for ip_opt_flags - */ -#define IP_VERIFY_SRC 0x1 +typedef struct ip_pkt_s ip_pkt_t; -/* - * This structure is used to convey information from IP and the ULP. - * Currently used for the IP_RECVSLLA, IP_RECVIF and IP_RECVPKTINFO options. - * The type of information field is set to IN_PKTINFO (i.e inbound pkt info) - */ -typedef struct ip_pktinfo { - uint32_t ip_pkt_ulp_type; /* type of info sent */ - uint32_t ip_pkt_flags; /* what is sent up by IP */ - uint32_t ip_pkt_ifindex; /* inbound interface index */ - struct sockaddr_dl ip_pkt_slla; /* has source link layer addr */ - struct in_addr ip_pkt_match_addr; /* matched address */ -} ip_pktinfo_t; - -/* - * flags to tell UDP what IP is sending; in_pkt_flags - */ -#define IPF_RECVIF 0x01 /* inbound interface index */ -#define IPF_RECVSLLA 0x02 /* source link layer address */ -/* - * Inbound interface index + matched address. - * Used only by IPV4. - */ -#define IPF_RECVADDR 0x04 +extern void ip_pkt_free(ip_pkt_t *); /* free storage inside ip_pkt_t */ +extern ipaddr_t ip_pkt_source_route_v4(const ip_pkt_t *); +extern in6_addr_t *ip_pkt_source_route_v6(const ip_pkt_t *); +extern int ip_pkt_copy(ip_pkt_t *, ip_pkt_t *, int); +extern void ip_pkt_source_route_reverse_v4(ip_pkt_t *); /* ipp_fields values */ -#define IPPF_IFINDEX 0x0001 /* Part of in6_pktinfo: ifindex */ -#define IPPF_ADDR 0x0002 /* Part of in6_pktinfo: src/dst addr */ -#define IPPF_SCOPE_ID 0x0004 /* Add xmit ip6i_t for sin6_scope_id */ -#define IPPF_NO_CKSUM 0x0008 /* Add xmit ip6i_t for IP6I_NO_*_CKSUM */ - -#define IPPF_RAW_CKSUM 0x0010 /* Add xmit ip6i_t for IP6I_RAW_CHECKSUM */ -#define IPPF_HOPLIMIT 0x0020 -#define IPPF_HOPOPTS 0x0040 -#define IPPF_RTHDR 0x0080 - -#define IPPF_RTDSTOPTS 0x0100 -#define IPPF_DSTOPTS 0x0200 -#define IPPF_NEXTHOP 0x0400 -#define IPPF_PATHMTU 0x0800 - -#define IPPF_TCLASS 0x1000 -#define IPPF_DONTFRAG 0x2000 -#define IPPF_USE_MIN_MTU 0x04000 -#define IPPF_MULTICAST_HOPS 0x08000 - -#define IPPF_UNICAST_HOPS 0x10000 -#define IPPF_FRAGHDR 0x20000 - -#define IPPF_HAS_IP6I \ - (IPPF_IFINDEX|IPPF_ADDR|IPPF_NEXTHOP|IPPF_SCOPE_ID| \ - IPPF_NO_CKSUM|IPPF_RAW_CKSUM|IPPF_HOPLIMIT|IPPF_DONTFRAG| \ - IPPF_USE_MIN_MTU|IPPF_MULTICAST_HOPS|IPPF_UNICAST_HOPS) +#define IPPF_ADDR 0x0001 /* Part of in6_pktinfo: src/dst addr */ +#define IPPF_HOPLIMIT 0x0002 /* Overrides unicast and multicast */ +#define IPPF_TCLASS 0x0004 /* Overrides class in sin6_flowinfo */ + +#define IPPF_HOPOPTS 0x0010 /* ipp_hopopts set */ +#define IPPF_RTHDR 0x0020 /* ipp_rthdr set */ +#define IPPF_RTHDRDSTOPTS 0x0040 /* ipp_rthdrdstopts set */ +#define IPPF_DSTOPTS 0x0080 /* ipp_dstopts set */ + +#define IPPF_IPV4_OPTIONS 0x0100 /* ipp_ipv4_options set */ +#define IPPF_LABEL_V4 0x0200 /* ipp_label_v4 set */ +#define IPPF_LABEL_V6 0x0400 /* ipp_label_v6 set */ + +#define IPPF_FRAGHDR 0x0800 /* Used for IPsec receive side */ + +/* + * Data structure which is passed to conn_opt_get/set. + * The conn_t is included even though it can be inferred from queue_t. + * setsockopt and getsockopt use conn_ixa and conn_xmit_ipp. However, + * when handling ancillary data we use separate ixa and ipps. + */ +typedef struct conn_opt_arg_s { + conn_t *coa_connp; + ip_xmit_attr_t *coa_ixa; + ip_pkt_t *coa_ipp; + boolean_t coa_ancillary; /* Ancillary data and not setsockopt */ + uint_t coa_changed; /* See below */ +} conn_opt_arg_t; + +/* + * Flags for what changed. + * If we want to be more efficient in the future we can have more fine + * grained flags e.g., a flag for just IP_TOS changing. + * For now we either call ip_set_destination (for "route changed") + * and/or conn_build_hdr_template/conn_prepend_hdr (for "header changed"). + */ +#define COA_HEADER_CHANGED 0x0001 +#define COA_ROUTE_CHANGED 0x0002 +#define COA_RCVBUF_CHANGED 0x0004 /* SO_RCVBUF */ +#define COA_SNDBUF_CHANGED 0x0008 /* SO_SNDBUF */ +#define COA_WROFF_CHANGED 0x0010 /* Header size changed */ +#define COA_ICMP_BIND_NEEDED 0x0020 +#define COA_OOBINLINE_CHANGED 0x0040 #define TCP_PORTS_OFFSET 0 #define UDP_PORTS_OFFSET 0 @@ -2902,32 +2787,21 @@ typedef struct ip_pktinfo { #define IPIF_LOOKUP_FAILED 2 /* Used as error code */ #define ILL_CAN_LOOKUP(ill) \ - (!((ill)->ill_state_flags & (ILL_CONDEMNED | ILL_CHANGING)) || \ + (!((ill)->ill_state_flags & ILL_CONDEMNED) || \ IAM_WRITER_ILL(ill)) -#define ILL_CAN_WAIT(ill, q) \ - (((q) != NULL) && !((ill)->ill_state_flags & (ILL_CONDEMNED))) +#define ILL_IS_CONDEMNED(ill) \ + ((ill)->ill_state_flags & ILL_CONDEMNED) #define IPIF_CAN_LOOKUP(ipif) \ - (!((ipif)->ipif_state_flags & (IPIF_CONDEMNED | IPIF_CHANGING)) || \ + (!((ipif)->ipif_state_flags & IPIF_CONDEMNED) || \ IAM_WRITER_IPIF(ipif)) -/* - * If the parameter 'q' is NULL, the caller is not interested in wait and - * restart of the operation if the ILL or IPIF cannot be looked up when it is - * marked as 'CHANGING'. Typically a thread that tries to send out data will - * end up passing NULLs as the last 4 parameters to ill_lookup_on_ifindex and - * in this case 'q' is NULL - */ -#define IPIF_CAN_WAIT(ipif, q) \ - (((q) != NULL) && !((ipif)->ipif_state_flags & (IPIF_CONDEMNED))) - -#define IPIF_CAN_LOOKUP_WALKER(ipif) \ - (!((ipif)->ipif_state_flags & (IPIF_CONDEMNED)) || \ - IAM_WRITER_IPIF(ipif)) +#define IPIF_IS_CONDEMNED(ipif) \ + ((ipif)->ipif_state_flags & IPIF_CONDEMNED) -#define ILL_UNMARK_CHANGING(ill) \ - (ill)->ill_state_flags &= ~ILL_CHANGING; +#define IPIF_IS_CHANGING(ipif) \ + ((ipif)->ipif_state_flags & IPIF_CHANGING) /* Macros used to assert that this thread is a writer */ #define IAM_WRITER_IPSQ(ipsq) ((ipsq)->ipsq_xop->ipx_writer == curthread) @@ -2956,9 +2830,9 @@ typedef struct ip_pktinfo { #define RELEASE_ILL_LOCKS(ill_1, ill_2) \ { \ if (ill_1 != NULL) \ - mutex_exit(&(ill_1)->ill_lock); \ + mutex_exit(&(ill_1)->ill_lock); \ if (ill_2 != NULL && ill_2 != ill_1) \ - mutex_exit(&(ill_2)->ill_lock); \ + mutex_exit(&(ill_2)->ill_lock); \ } /* Get the other protocol instance ill */ @@ -2975,20 +2849,13 @@ typedef struct cmd_info_s struct lifreq *ci_lifr; /* the lifreq struct passed down */ } cmd_info_t; -/* - * List of AH and ESP IPsec acceleration capable ills - */ -typedef struct ipsec_capab_ill_s { - uint_t ill_index; - boolean_t ill_isv6; - struct ipsec_capab_ill_s *next; -} ipsec_capab_ill_t; - extern struct kmem_cache *ire_cache; extern ipaddr_t ip_g_all_ones; -extern uint_t ip_loopback_mtu; /* /etc/system */ +extern uint_t ip_loopback_mtu; /* /etc/system */ +extern uint_t ip_loopback_mtuplus; +extern uint_t ip_loopback_mtu_v6plus; extern vmem_t *ip_minor_arena_sa; extern vmem_t *ip_minor_arena_la; @@ -3014,18 +2881,18 @@ extern vmem_t *ip_minor_arena_la; #define ips_ip_g_send_redirects ips_param_arr[5].ip_param_value #define ips_ip_g_forward_directed_bcast ips_param_arr[6].ip_param_value #define ips_ip_mrtdebug ips_param_arr[7].ip_param_value -#define ips_ip_timer_interval ips_param_arr[8].ip_param_value -#define ips_ip_ire_arp_interval ips_param_arr[9].ip_param_value -#define ips_ip_ire_redir_interval ips_param_arr[10].ip_param_value +#define ips_ip_ire_reclaim_fraction ips_param_arr[8].ip_param_value +#define ips_ip_nce_reclaim_fraction ips_param_arr[9].ip_param_value +#define ips_ip_dce_reclaim_fraction ips_param_arr[10].ip_param_value #define ips_ip_def_ttl ips_param_arr[11].ip_param_value #define ips_ip_forward_src_routed ips_param_arr[12].ip_param_value #define ips_ip_wroff_extra ips_param_arr[13].ip_param_value -#define ips_ip_ire_pathmtu_interval ips_param_arr[14].ip_param_value +#define ips_ip_pathmtu_interval ips_param_arr[14].ip_param_value #define ips_ip_icmp_return ips_param_arr[15].ip_param_value #define ips_ip_path_mtu_discovery ips_param_arr[16].ip_param_value -#define ips_ip_ignore_delete_time ips_param_arr[17].ip_param_value +#define ips_ip_pmtu_min ips_param_arr[17].ip_param_value #define ips_ip_ignore_redirect ips_param_arr[18].ip_param_value -#define ips_ip_output_queue ips_param_arr[19].ip_param_value +#define ips_ip_arp_icmp_error ips_param_arr[19].ip_param_value #define ips_ip_broadcast_ttl ips_param_arr[20].ip_param_value #define ips_ip_icmp_err_interval ips_param_arr[21].ip_param_value #define ips_ip_icmp_err_burst ips_param_arr[22].ip_param_value @@ -3046,7 +2913,7 @@ extern vmem_t *ip_minor_arena_la; #define ips_ipv6_send_redirects ips_param_arr[35].ip_param_value #define ips_ipv6_ignore_redirect ips_param_arr[36].ip_param_value #define ips_ipv6_strict_dst_multihoming ips_param_arr[37].ip_param_value -#define ips_ip_ire_reclaim_fraction ips_param_arr[38].ip_param_value +#define ips_src_check ips_param_arr[38].ip_param_value #define ips_ipsec_policy_log_interval ips_param_arr[39].ip_param_value #define ips_pim_accept_clear_messages ips_param_arr[40].ip_param_value #define ips_ip_ndp_unsolicit_interval ips_param_arr[41].ip_param_value @@ -3055,21 +2922,37 @@ extern vmem_t *ip_minor_arena_la; /* Misc IP configuration knobs */ #define ips_ip_policy_mask ips_param_arr[44].ip_param_value -#define ips_ip_multirt_resolution_interval ips_param_arr[45].ip_param_value +#define ips_ip_ecmp_behavior ips_param_arr[45].ip_param_value #define ips_ip_multirt_ttl ips_param_arr[46].ip_param_value -#define ips_ip_multidata_outbound ips_param_arr[47].ip_param_value -#define ips_ip_ndp_defense_interval ips_param_arr[48].ip_param_value -#define ips_ip_max_temp_idle ips_param_arr[49].ip_param_value -#define ips_ip_max_temp_defend ips_param_arr[50].ip_param_value -#define ips_ip_max_defend ips_param_arr[51].ip_param_value -#define ips_ip_defend_interval ips_param_arr[52].ip_param_value -#define ips_ip_dup_recovery ips_param_arr[53].ip_param_value -#define ips_ip_restrict_interzone_loopback ips_param_arr[54].ip_param_value -#define ips_ip_lso_outbound ips_param_arr[55].ip_param_value -#define ips_igmp_max_version ips_param_arr[56].ip_param_value -#define ips_mld_max_version ips_param_arr[57].ip_param_value -#define ips_ip_pmtu_min ips_param_arr[58].ip_param_value -#define ips_ipv6_drop_inbound_icmpv6 ips_param_arr[59].ip_param_value +#define ips_ip_ire_badcnt_lifetime ips_param_arr[47].ip_param_value +#define ips_ip_max_temp_idle ips_param_arr[48].ip_param_value +#define ips_ip_max_temp_defend ips_param_arr[49].ip_param_value +#define ips_ip_max_defend ips_param_arr[50].ip_param_value +#define ips_ip_defend_interval ips_param_arr[51].ip_param_value +#define ips_ip_dup_recovery ips_param_arr[52].ip_param_value +#define ips_ip_restrict_interzone_loopback ips_param_arr[53].ip_param_value +#define ips_ip_lso_outbound ips_param_arr[54].ip_param_value +#define ips_igmp_max_version ips_param_arr[55].ip_param_value +#define ips_mld_max_version ips_param_arr[56].ip_param_value +#define ips_ipv6_drop_inbound_icmpv6 ips_param_arr[57].ip_param_value +#define ips_arp_probe_delay ips_param_arr[58].ip_param_value +#define ips_arp_fastprobe_delay ips_param_arr[59].ip_param_value +#define ips_arp_probe_interval ips_param_arr[60].ip_param_value +#define ips_arp_fastprobe_interval ips_param_arr[61].ip_param_value +#define ips_arp_probe_count ips_param_arr[62].ip_param_value +#define ips_arp_fastprobe_count ips_param_arr[63].ip_param_value +#define ips_ipv4_dad_announce_interval ips_param_arr[64].ip_param_value +#define ips_ipv6_dad_announce_interval ips_param_arr[65].ip_param_value +#define ips_arp_defend_interval ips_param_arr[66].ip_param_value +#define ips_arp_defend_rate ips_param_arr[67].ip_param_value +#define ips_ndp_defend_interval ips_param_arr[68].ip_param_value +#define ips_ndp_defend_rate ips_param_arr[69].ip_param_value +#define ips_arp_defend_period ips_param_arr[70].ip_param_value +#define ips_ndp_defend_period ips_param_arr[71].ip_param_value +#define ips_ipv4_icmp_return_pmtu ips_param_arr[72].ip_param_value +#define ips_ipv6_icmp_return_pmtu ips_param_arr[73].ip_param_value +#define ips_ip_arp_publish_count ips_param_arr[74].ip_param_value +#define ips_ip_arp_publish_interval ips_param_arr[75].ip_param_value extern int dohwcksum; /* use h/w cksum if supported by the h/w */ #ifdef ZC_TEST @@ -3102,13 +2985,13 @@ extern struct module_info ip_mod_info; ((ipst)->ips_ip4_loopback_out_event.he_interested) #define HOOKS6_INTERESTED_LOOPBACK_OUT(ipst) \ ((ipst)->ips_ip6_loopback_out_event.he_interested) - /* - * Hooks macros used inside of ip + * Hooks marcos used inside of ip + * The callers use the above INTERESTED macros first, hence + * the he_interested check is superflous. */ -#define FW_HOOKS(_hook, _event, _ilp, _olp, _iph, _fm, _m, _llm, ipst) \ - \ - if ((_hook).he_interested) { \ +#define FW_HOOKS(_hook, _event, _ilp, _olp, _iph, _fm, _m, _llm, ipst, _err) \ + if ((_hook).he_interested) { \ hook_pkt_event_t info; \ \ _NOTE(CONSTCOND) \ @@ -3121,12 +3004,15 @@ extern struct module_info ip_mod_info; info.hpe_mp = &(_fm); \ info.hpe_mb = _m; \ info.hpe_flags = _llm; \ - if (hook_run(ipst->ips_ipv4_net_data->netd_hooks, \ - _event, (hook_data_t)&info) != 0) { \ + _err = hook_run(ipst->ips_ipv4_net_data->netd_hooks, \ + _event, (hook_data_t)&info); \ + if (_err != 0) { \ ip2dbg(("%s hook dropped mblk chain %p hdr %p\n",\ (_hook).he_name, (void *)_fm, (void *)_m)); \ - freemsg(_fm); \ - _fm = NULL; \ + if (_fm != NULL) { \ + freemsg(_fm); \ + _fm = NULL; \ + } \ _iph = NULL; \ _m = NULL; \ } else { \ @@ -3135,9 +3021,8 @@ extern struct module_info ip_mod_info; } \ } -#define FW_HOOKS6(_hook, _event, _ilp, _olp, _iph, _fm, _m, _llm, ipst) \ - \ - if ((_hook).he_interested) { \ +#define FW_HOOKS6(_hook, _event, _ilp, _olp, _iph, _fm, _m, _llm, ipst, _err) \ + if ((_hook).he_interested) { \ hook_pkt_event_t info; \ \ _NOTE(CONSTCOND) \ @@ -3150,12 +3035,15 @@ extern struct module_info ip_mod_info; info.hpe_mp = &(_fm); \ info.hpe_mb = _m; \ info.hpe_flags = _llm; \ - if (hook_run(ipst->ips_ipv6_net_data->netd_hooks, \ - _event, (hook_data_t)&info) != 0) { \ + _err = hook_run(ipst->ips_ipv6_net_data->netd_hooks, \ + _event, (hook_data_t)&info); \ + if (_err != 0) { \ ip2dbg(("%s hook dropped mblk chain %p hdr %p\n",\ (_hook).he_name, (void *)_fm, (void *)_m)); \ - freemsg(_fm); \ - _fm = NULL; \ + if (_fm != NULL) { \ + freemsg(_fm); \ + _fm = NULL; \ + } \ _iph = NULL; \ _m = NULL; \ } else { \ @@ -3194,24 +3082,6 @@ extern struct module_info ip_mod_info; #define IP_LOOPBACK_ADDR(addr) \ (((addr) & N_IN_CLASSA_NET == N_IN_LOOPBACK_NET)) -#ifdef DEBUG -/* IPsec HW acceleration debugging support */ - -#define IPSECHW_CAPAB 0x0001 /* capability negotiation */ -#define IPSECHW_SADB 0x0002 /* SADB exchange */ -#define IPSECHW_PKT 0x0004 /* general packet flow */ -#define IPSECHW_PKTIN 0x0008 /* driver in pkt processing details */ -#define IPSECHW_PKTOUT 0x0010 /* driver out pkt processing details */ - -#define IPSECHW_DEBUG(f, x) if (ipsechw_debug & (f)) { (void) printf x; } -#define IPSECHW_CALL(f, r, x) if (ipsechw_debug & (f)) { (void) r x; } - -extern uint32_t ipsechw_debug; -#else -#define IPSECHW_DEBUG(f, x) {} -#define IPSECHW_CALL(f, r, x) {} -#endif - extern int ip_debug; extern uint_t ip_thread_data; extern krwlock_t ip_thread_rwlock; @@ -3235,8 +3105,6 @@ extern list_t ip_thread_list; /* Default MAC-layer address string length for mac_colon_addr */ #define MAC_STR_LEN 128 -struct ipsec_out_s; - struct mac_header_info_s; extern void ill_frag_timer(void *); @@ -3252,86 +3120,173 @@ extern char *ip_dot_addr(ipaddr_t, char *); extern const char *mac_colon_addr(const uint8_t *, size_t, char *, size_t); extern void ip_lwput(queue_t *, mblk_t *); extern boolean_t icmp_err_rate_limit(ip_stack_t *); -extern void icmp_time_exceeded(queue_t *, mblk_t *, uint8_t, zoneid_t, - ip_stack_t *); -extern void icmp_unreachable(queue_t *, mblk_t *, uint8_t, zoneid_t, - ip_stack_t *); -extern mblk_t *ip_add_info(mblk_t *, ill_t *, uint_t, zoneid_t, ip_stack_t *); -cred_t *ip_best_cred(mblk_t *, conn_t *, pid_t *); -extern mblk_t *ip_bind_v4(queue_t *, mblk_t *, conn_t *); -extern boolean_t ip_bind_ipsec_policy_set(conn_t *, mblk_t *); -extern int ip_bind_laddr_v4(conn_t *, mblk_t **, uint8_t, ipaddr_t, - uint16_t, boolean_t); -extern int ip_proto_bind_laddr_v4(conn_t *, mblk_t **, uint8_t, ipaddr_t, - uint16_t, boolean_t); -extern int ip_proto_bind_connected_v4(conn_t *, mblk_t **, - uint8_t, ipaddr_t *, uint16_t, ipaddr_t, uint16_t, boolean_t, boolean_t, - cred_t *); -extern int ip_bind_connected_v4(conn_t *, mblk_t **, uint8_t, ipaddr_t *, - uint16_t, ipaddr_t, uint16_t, boolean_t, boolean_t, cred_t *); +extern void icmp_frag_needed(mblk_t *, int, ip_recv_attr_t *); +extern mblk_t *icmp_inbound_v4(mblk_t *, ip_recv_attr_t *); +extern void icmp_time_exceeded(mblk_t *, uint8_t, ip_recv_attr_t *); +extern void icmp_unreachable(mblk_t *, uint8_t, ip_recv_attr_t *); +extern boolean_t ip_ipsec_policy_inherit(conn_t *, conn_t *, ip_recv_attr_t *); +extern void *ip_pullup(mblk_t *, ssize_t, ip_recv_attr_t *); +extern void ip_setl2src(mblk_t *, ip_recv_attr_t *, ill_t *); +extern mblk_t *ip_check_and_align_header(mblk_t *, uint_t, ip_recv_attr_t *); +extern mblk_t *ip_check_length(mblk_t *, uchar_t *, ssize_t, uint_t, uint_t, + ip_recv_attr_t *); +extern mblk_t *ip_check_optlen(mblk_t *, ipha_t *, uint_t, uint_t, + ip_recv_attr_t *); +extern mblk_t *ip_fix_dbref(mblk_t *, ip_recv_attr_t *); extern uint_t ip_cksum(mblk_t *, int, uint32_t); extern int ip_close(queue_t *, int); extern uint16_t ip_csum_hdr(ipha_t *); -extern void ip_proto_not_sup(queue_t *, mblk_t *, uint_t, zoneid_t, - ip_stack_t *); +extern void ip_forward_xmit_v4(nce_t *, ill_t *, mblk_t *, ipha_t *, + ip_recv_attr_t *, uint32_t, uint32_t); +extern boolean_t ip_forward_options(mblk_t *, ipha_t *, ill_t *, + ip_recv_attr_t *); +extern int ip_fragment_v4(mblk_t *, nce_t *, iaflags_t, uint_t, uint32_t, + uint32_t, zoneid_t, zoneid_t, pfirepostfrag_t postfragfn, + uintptr_t *cookie); +extern void ip_proto_not_sup(mblk_t *, ip_recv_attr_t *); extern void ip_ire_g_fini(void); extern void ip_ire_g_init(void); extern void ip_ire_fini(ip_stack_t *); extern void ip_ire_init(ip_stack_t *); +extern void ip_mdata_to_mhi(ill_t *, mblk_t *, struct mac_header_info_s *); extern int ip_openv4(queue_t *q, dev_t *devp, int flag, int sflag, cred_t *credp); extern int ip_openv6(queue_t *q, dev_t *devp, int flag, int sflag, cred_t *credp); extern int ip_reassemble(mblk_t *, ipf_t *, uint_t, boolean_t, ill_t *, size_t); -extern int ip_opt_set_ill(conn_t *, int, boolean_t, boolean_t, - int, int, mblk_t *); extern void ip_rput(queue_t *, mblk_t *); extern void ip_input(ill_t *, ill_rx_ring_t *, mblk_t *, struct mac_header_info_s *); +extern void ip_input_v6(ill_t *, ill_rx_ring_t *, mblk_t *, + struct mac_header_info_s *); +extern mblk_t *ip_input_common_v4(ill_t *, ill_rx_ring_t *, mblk_t *, + struct mac_header_info_s *, squeue_t *, mblk_t **, uint_t *); +extern mblk_t *ip_input_common_v6(ill_t *, ill_rx_ring_t *, mblk_t *, + struct mac_header_info_s *, squeue_t *, mblk_t **, uint_t *); +extern void ill_input_full_v4(mblk_t *, void *, void *, + ip_recv_attr_t *, rtc_t *); +extern void ill_input_short_v4(mblk_t *, void *, void *, + ip_recv_attr_t *, rtc_t *); +extern void ill_input_full_v6(mblk_t *, void *, void *, + ip_recv_attr_t *, rtc_t *); +extern void ill_input_short_v6(mblk_t *, void *, void *, + ip_recv_attr_t *, rtc_t *); +extern ipaddr_t ip_input_options(ipha_t *, ipaddr_t, mblk_t *, + ip_recv_attr_t *, int *); +extern boolean_t ip_input_local_options(mblk_t *, ipha_t *, ip_recv_attr_t *); +extern mblk_t *ip_input_fragment(mblk_t *, ipha_t *, ip_recv_attr_t *); +extern mblk_t *ip_input_fragment_v6(mblk_t *, ip6_t *, ip6_frag_t *, uint_t, + ip_recv_attr_t *); +extern void ip_input_post_ipsec(mblk_t *, ip_recv_attr_t *); +extern void ip_fanout_v4(mblk_t *, ipha_t *, ip_recv_attr_t *); +extern void ip_fanout_v6(mblk_t *, ip6_t *, ip_recv_attr_t *); +extern void ip_fanout_proto_conn(conn_t *, mblk_t *, ipha_t *, ip6_t *, + ip_recv_attr_t *); +extern void ip_fanout_proto_v4(mblk_t *, ipha_t *, ip_recv_attr_t *); +extern void ip_fanout_send_icmp_v4(mblk_t *, uint_t, uint_t, + ip_recv_attr_t *); +extern void ip_fanout_udp_conn(conn_t *, mblk_t *, ipha_t *, ip6_t *, + ip_recv_attr_t *); +extern void ip_fanout_udp_multi_v4(mblk_t *, ipha_t *, uint16_t, uint16_t, + ip_recv_attr_t *); +extern mblk_t *zero_spi_check(mblk_t *, ip_recv_attr_t *); +extern void ip_build_hdrs_v4(uchar_t *, uint_t, const ip_pkt_t *, uint8_t); +extern int ip_find_hdr_v4(ipha_t *, ip_pkt_t *, boolean_t); +extern int ip_total_hdrs_len_v4(const ip_pkt_t *); + extern mblk_t *ip_accept_tcp(ill_t *, ill_rx_ring_t *, squeue_t *, mblk_t *, mblk_t **, uint_t *cnt); -extern void ip_rput_dlpi(queue_t *, mblk_t *); -extern void ip_rput_forward(ire_t *, ipha_t *, mblk_t *, ill_t *); -extern void ip_rput_forward_multicast(ipaddr_t, mblk_t *, ipif_t *); +extern void ip_rput_dlpi(ill_t *, mblk_t *); +extern void ip_rput_notdata(ill_t *, mblk_t *); extern void ip_mib2_add_ip_stats(mib2_ipIfStatsEntry_t *, mib2_ipIfStatsEntry_t *); extern void ip_mib2_add_icmp6_stats(mib2_ipv6IfIcmpEntry_t *, mib2_ipv6IfIcmpEntry_t *); -extern void ip_udp_input(queue_t *, mblk_t *, ipha_t *, ire_t *, ill_t *); -extern void ip_proto_input(queue_t *, mblk_t *, ipha_t *, ire_t *, ill_t *, - uint32_t); extern void ip_rput_other(ipsq_t *, queue_t *, mblk_t *, void *); extern ire_t *ip_check_multihome(void *, ire_t *, ill_t *); -extern void ip_setpktversion(conn_t *, boolean_t, boolean_t, ip_stack_t *); -extern void ip_trash_ire_reclaim(void *); -extern void ip_trash_timer_expire(void *); -extern void ip_wput(queue_t *, mblk_t *); -extern void ip_output(void *, mblk_t *, void *, int); -extern void ip_output_options(void *, mblk_t *, void *, int, - ip_opt_info_t *); - -extern void ip_wput_ire(queue_t *, mblk_t *, ire_t *, conn_t *, int, - zoneid_t); -extern void ip_wput_local(queue_t *, ill_t *, ipha_t *, mblk_t *, ire_t *, - int, zoneid_t); -extern void ip_wput_multicast(queue_t *, mblk_t *, ipif_t *, zoneid_t); -extern void ip_wput_nondata(ipsq_t *, queue_t *, mblk_t *, void *); +extern void ip_send_potential_redirect_v4(mblk_t *, ipha_t *, ire_t *, + ip_recv_attr_t *); +extern int ip_set_destination_v4(ipaddr_t *, ipaddr_t, ipaddr_t, + ip_xmit_attr_t *, iulp_t *, uint32_t, uint_t); +extern int ip_set_destination_v6(in6_addr_t *, const in6_addr_t *, + const in6_addr_t *, ip_xmit_attr_t *, iulp_t *, uint32_t, uint_t); + +extern int ip_output_simple(mblk_t *, ip_xmit_attr_t *); +extern int ip_output_simple_v4(mblk_t *, ip_xmit_attr_t *); +extern int ip_output_simple_v6(mblk_t *, ip_xmit_attr_t *); +extern int ip_output_options(mblk_t *, ipha_t *, ip_xmit_attr_t *, + ill_t *); +extern void ip_output_local_options(ipha_t *, ip_stack_t *); + +extern ip_xmit_attr_t *conn_get_ixa(conn_t *, boolean_t); +extern ip_xmit_attr_t *conn_get_ixa_tryhard(conn_t *, boolean_t); +extern ip_xmit_attr_t *conn_replace_ixa(conn_t *, ip_xmit_attr_t *); +extern ip_xmit_attr_t *conn_get_ixa_exclusive(conn_t *); +extern ip_xmit_attr_t *ip_xmit_attr_duplicate(ip_xmit_attr_t *); +extern void ip_xmit_attr_replace_tsl(ip_xmit_attr_t *, ts_label_t *); +extern void ip_xmit_attr_restore_tsl(ip_xmit_attr_t *, cred_t *); +boolean_t ip_recv_attr_replace_label(ip_recv_attr_t *, ts_label_t *); +extern void ixa_inactive(ip_xmit_attr_t *); +extern void ixa_refrele(ip_xmit_attr_t *); +extern boolean_t ixa_check_drain_insert(conn_t *, ip_xmit_attr_t *); +extern void ixa_cleanup(ip_xmit_attr_t *); +extern void ira_cleanup(ip_recv_attr_t *, boolean_t); +extern void ixa_safe_copy(ip_xmit_attr_t *, ip_xmit_attr_t *); + +extern int conn_ip_output(mblk_t *, ip_xmit_attr_t *); +extern boolean_t ip_output_verify_local(ip_xmit_attr_t *); +extern mblk_t *ip_output_process_local(mblk_t *, ip_xmit_attr_t *, boolean_t, + boolean_t, conn_t *); + +extern int conn_opt_get(conn_opt_arg_t *, t_scalar_t, t_scalar_t, + uchar_t *); +extern int conn_opt_set(conn_opt_arg_t *, t_scalar_t, t_scalar_t, uint_t, + uchar_t *, boolean_t, cred_t *); +extern boolean_t conn_same_as_last_v4(conn_t *, sin_t *); +extern boolean_t conn_same_as_last_v6(conn_t *, sin6_t *); +extern int conn_update_label(const conn_t *, const ip_xmit_attr_t *, + const in6_addr_t *, ip_pkt_t *); + +extern int ip_opt_set_multicast_group(conn_t *, t_scalar_t, + uchar_t *, boolean_t, boolean_t); +extern int ip_opt_set_multicast_sources(conn_t *, t_scalar_t, + uchar_t *, boolean_t, boolean_t); +extern int conn_getsockname(conn_t *, struct sockaddr *, uint_t *); +extern int conn_getpeername(conn_t *, struct sockaddr *, uint_t *); + +extern int conn_build_hdr_template(conn_t *, uint_t, uint_t, + const in6_addr_t *, const in6_addr_t *, uint32_t); +extern mblk_t *conn_prepend_hdr(ip_xmit_attr_t *, const ip_pkt_t *, + const in6_addr_t *, const in6_addr_t *, uint8_t, uint32_t, uint_t, + mblk_t *, uint_t, uint_t, uint32_t *, int *); +extern void ip_attr_newdst(ip_xmit_attr_t *); +extern void ip_attr_nexthop(const ip_pkt_t *, const ip_xmit_attr_t *, + const in6_addr_t *, in6_addr_t *); +extern int conn_connect(conn_t *, iulp_t *, uint32_t); +extern int ip_attr_connect(const conn_t *, ip_xmit_attr_t *, + const in6_addr_t *, const in6_addr_t *, const in6_addr_t *, in_port_t, + in6_addr_t *, iulp_t *, uint32_t); +extern int conn_inherit_parent(conn_t *, conn_t *); + +extern void conn_ixa_cleanup(conn_t *connp, void *arg); + +extern boolean_t conn_wantpacket(conn_t *, ip_recv_attr_t *, ipha_t *); +extern uint_t ip_type_v4(ipaddr_t, ip_stack_t *); +extern uint_t ip_type_v6(const in6_addr_t *, ip_stack_t *); + +extern void ip_wput_nondata(queue_t *, mblk_t *); extern void ip_wsrv(queue_t *); extern char *ip_nv_lookup(nv_t *, int); extern boolean_t ip_local_addr_ok_v6(const in6_addr_t *, const in6_addr_t *); extern boolean_t ip_remote_addr_ok_v6(const in6_addr_t *, const in6_addr_t *); extern ipaddr_t ip_massage_options(ipha_t *, netstack_t *); extern ipaddr_t ip_net_mask(ipaddr_t); -extern void ip_newroute(queue_t *, mblk_t *, ipaddr_t, conn_t *, zoneid_t, - ip_stack_t *); -extern ipxmit_state_t ip_xmit_v4(mblk_t *, ire_t *, struct ipsec_out_s *, - boolean_t, conn_t *); -extern int ip_hdr_complete(ipha_t *, zoneid_t, ip_stack_t *); +extern void arp_bringup_done(ill_t *, int); +extern void arp_replumb_done(ill_t *, int); extern struct qinit iprinitv6; -extern struct qinit ipwinitv6; extern void ipmp_init(ip_stack_t *); extern void ipmp_destroy(ip_stack_t *); @@ -3347,12 +3302,11 @@ extern ill_t *ipmp_illgrp_add_ipif(ipmp_illgrp_t *, ipif_t *); extern void ipmp_illgrp_del_ipif(ipmp_illgrp_t *, ipif_t *); extern ill_t *ipmp_illgrp_next_ill(ipmp_illgrp_t *); extern ill_t *ipmp_illgrp_hold_next_ill(ipmp_illgrp_t *); -extern ill_t *ipmp_illgrp_cast_ill(ipmp_illgrp_t *); extern ill_t *ipmp_illgrp_hold_cast_ill(ipmp_illgrp_t *); extern ill_t *ipmp_illgrp_ipmp_ill(ipmp_illgrp_t *); extern void ipmp_illgrp_refresh_mtu(ipmp_illgrp_t *); -extern ipmp_arpent_t *ipmp_illgrp_create_arpent(ipmp_illgrp_t *, mblk_t *, - boolean_t); +extern ipmp_arpent_t *ipmp_illgrp_create_arpent(ipmp_illgrp_t *, + boolean_t, ipaddr_t, uchar_t *, size_t, uint16_t); extern void ipmp_illgrp_destroy_arpent(ipmp_illgrp_t *, ipmp_arpent_t *); extern ipmp_arpent_t *ipmp_illgrp_lookup_arpent(ipmp_illgrp_t *, ipaddr_t *); extern void ipmp_illgrp_refresh_arpent(ipmp_illgrp_t *); @@ -3373,19 +3327,25 @@ extern ill_t *ipmp_ipif_bound_ill(const ipif_t *); extern ill_t *ipmp_ipif_hold_bound_ill(const ipif_t *); extern boolean_t ipmp_ipif_is_dataaddr(const ipif_t *); extern boolean_t ipmp_ipif_is_stubaddr(const ipif_t *); +extern boolean_t ipmp_packet_is_probe(mblk_t *, ill_t *); +extern ill_t *ipmp_ill_get_xmit_ill(ill_t *, boolean_t); +extern void ipmp_ncec_flush_nce(ncec_t *); +extern void ipmp_ncec_fastpath(ncec_t *, ill_t *); extern void conn_drain_insert(conn_t *, idl_tx_list_t *); +extern void conn_setqfull(conn_t *, boolean_t *); +extern void conn_clrqfull(conn_t *, boolean_t *); extern int conn_ipsec_length(conn_t *); -extern void ip_wput_ipsec_out(queue_t *, mblk_t *, ipha_t *, ill_t *, - ire_t *); extern ipaddr_t ip_get_dst(ipha_t *); -extern int ipsec_out_extra_length(mblk_t *); -extern int ipsec_in_extra_length(mblk_t *); -extern mblk_t *ipsec_in_alloc(boolean_t, netstack_t *); -extern boolean_t ipsec_in_is_secure(mblk_t *); -extern void ipsec_out_process(queue_t *, mblk_t *, ire_t *, uint_t); -extern void ipsec_out_to_in(mblk_t *); -extern void ip_fanout_proto_again(mblk_t *, ill_t *, ill_t *, ire_t *); +extern uint_t ip_get_pmtu(ip_xmit_attr_t *); +extern uint_t ip_get_base_mtu(ill_t *, ire_t *); +extern mblk_t *ip_output_attach_policy(mblk_t *, ipha_t *, ip6_t *, + const conn_t *, ip_xmit_attr_t *); +extern int ipsec_out_extra_length(ip_xmit_attr_t *); +extern int ipsec_out_process(mblk_t *, ip_xmit_attr_t *); +extern int ip_output_post_ipsec(mblk_t *, ip_xmit_attr_t *); +extern void ipsec_out_to_in(ip_xmit_attr_t *, ill_t *ill, + ip_recv_attr_t *); extern void ire_cleanup(ire_t *); extern void ire_inactive(ire_t *); @@ -3407,14 +3367,13 @@ extern uint_t ip_srcid_find_addr(const in6_addr_t *, zoneid_t, netstack_t *); extern uint8_t ipoptp_next(ipoptp_t *); extern uint8_t ipoptp_first(ipoptp_t *, ipha_t *); -extern int ip_opt_get_user(const ipha_t *, uchar_t *); +extern int ip_opt_get_user(conn_t *, uchar_t *); extern int ipsec_req_from_conn(conn_t *, ipsec_req_t *, int); extern int ip_snmp_get(queue_t *q, mblk_t *mctl, int level); extern int ip_snmp_set(queue_t *q, int, int, uchar_t *, int); extern void ip_process_ioctl(ipsq_t *, queue_t *, mblk_t *, void *); extern void ip_quiesce_conn(conn_t *); extern void ip_reprocess_ioctl(ipsq_t *, queue_t *, mblk_t *, void *); -extern void ip_restart_optmgmt(ipsq_t *, queue_t *, mblk_t *, void *); extern void ip_ioctl_finish(queue_t *, mblk_t *, int, int, ipsq_t *); extern boolean_t ip_cmpbuf(const void *, uint_t, boolean_t, const void *, @@ -3425,32 +3384,36 @@ extern void ip_savebuf(void **, uint_t *, boolean_t, const void *, uint_t); extern boolean_t ipsq_pending_mp_cleanup(ill_t *, conn_t *); extern void conn_ioctl_cleanup(conn_t *); -extern ill_t *conn_get_held_ill(conn_t *, ill_t **, int *); - -struct tcp_stack; -extern void ip_xmit_reset_serialize(mblk_t *, int, zoneid_t, struct tcp_stack *, - conn_t *); - -struct multidata_s; -struct pdesc_s; - -extern mblk_t *ip_mdinfo_alloc(ill_mdt_capab_t *); -extern mblk_t *ip_mdinfo_return(ire_t *, conn_t *, char *, ill_mdt_capab_t *); -extern mblk_t *ip_lsoinfo_alloc(ill_lso_capab_t *); -extern mblk_t *ip_lsoinfo_return(ire_t *, conn_t *, char *, - ill_lso_capab_t *); -extern uint_t ip_md_cksum(struct pdesc_s *, int, uint_t); -extern boolean_t ip_md_addr_attr(struct multidata_s *, struct pdesc_s *, - const mblk_t *); -extern boolean_t ip_md_hcksum_attr(struct multidata_s *, struct pdesc_s *, - uint32_t, uint32_t, uint32_t, uint32_t); -extern boolean_t ip_md_zcopy_attr(struct multidata_s *, struct pdesc_s *, - uint_t); + extern void ip_unbind(conn_t *); extern void tnet_init(void); extern void tnet_fini(void); +/* + * Hook functions to enable cluster networking + * On non-clustered systems these vectors must always be NULL. + */ +extern int (*cl_inet_isclusterwide)(netstackid_t stack_id, uint8_t protocol, + sa_family_t addr_family, uint8_t *laddrp, void *args); +extern uint32_t (*cl_inet_ipident)(netstackid_t stack_id, uint8_t protocol, + sa_family_t addr_family, uint8_t *laddrp, uint8_t *faddrp, + void *args); +extern int (*cl_inet_connect2)(netstackid_t stack_id, uint8_t protocol, + boolean_t is_outgoing, sa_family_t addr_family, uint8_t *laddrp, + in_port_t lport, uint8_t *faddrp, in_port_t fport, void *args); +extern void (*cl_inet_getspi)(netstackid_t, uint8_t, uint8_t *, size_t, + void *); +extern void (*cl_inet_getspi)(netstackid_t stack_id, uint8_t protocol, + uint8_t *ptr, size_t len, void *args); +extern int (*cl_inet_checkspi)(netstackid_t stack_id, uint8_t protocol, + uint32_t spi, void *args); +extern void (*cl_inet_deletespi)(netstackid_t stack_id, uint8_t protocol, + uint32_t spi, void *args); +extern void (*cl_inet_idlesa)(netstackid_t, uint8_t, uint32_t, + sa_family_t, in6_addr_t, in6_addr_t, void *); + + /* Hooks for CGTP (multirt routes) filtering module */ #define CGTP_FILTER_REV_1 1 #define CGTP_FILTER_REV_2 2 @@ -3491,73 +3454,6 @@ extern int ip_cgtp_filter_register(netstackid_t, cgtp_filter_ops_t *); extern int ip_cgtp_filter_unregister(netstackid_t); extern int ip_cgtp_filter_is_registered(netstackid_t); -/* Flags for ire_multirt_lookup() */ - -#define MULTIRT_USESTAMP 0x0001 -#define MULTIRT_SETSTAMP 0x0002 -#define MULTIRT_CACHEGW 0x0004 - -/* Debug stuff for multirt route resolution. */ -#if defined(DEBUG) && !defined(__lint) -/* Our "don't send, rather drop" flag. */ -#define MULTIRT_DEBUG_FLAG 0x8000 - -#define MULTIRT_TRACE(x) ip2dbg(x) - -#define MULTIRT_DEBUG_TAG(mblk) \ - do { \ - ASSERT(mblk != NULL); \ - MULTIRT_TRACE(("%s[%d]: tagging mblk %p, tag was %d\n", \ - __FILE__, __LINE__, \ - (void *)(mblk), (mblk)->b_flag & MULTIRT_DEBUG_FLAG)); \ - (mblk)->b_flag |= MULTIRT_DEBUG_FLAG; \ - } while (0) - -#define MULTIRT_DEBUG_UNTAG(mblk) \ - do { \ - ASSERT(mblk != NULL); \ - MULTIRT_TRACE(("%s[%d]: untagging mblk %p, tag was %d\n", \ - __FILE__, __LINE__, \ - (void *)(mblk), (mblk)->b_flag & MULTIRT_DEBUG_FLAG)); \ - (mblk)->b_flag &= ~MULTIRT_DEBUG_FLAG; \ - } while (0) - -#define MULTIRT_DEBUG_TAGGED(mblk) \ - (((mblk)->b_flag & MULTIRT_DEBUG_FLAG) ? B_TRUE : B_FALSE) -#else -#define MULTIRT_DEBUG_TAG(mblk) ASSERT(mblk != NULL) -#define MULTIRT_DEBUG_UNTAG(mblk) ASSERT(mblk != NULL) -#define MULTIRT_DEBUG_TAGGED(mblk) B_FALSE -#endif - -/* - * Per-ILL Multidata Transmit capabilities. - */ -struct ill_mdt_capab_s { - uint_t ill_mdt_version; /* interface version */ - uint_t ill_mdt_on; /* on/off switch for MDT on this ILL */ - uint_t ill_mdt_hdr_head; /* leading header fragment extra space */ - uint_t ill_mdt_hdr_tail; /* trailing header fragment extra space */ - uint_t ill_mdt_max_pld; /* maximum payload buffers per Multidata */ - uint_t ill_mdt_span_limit; /* maximum payload span per packet */ -}; - -struct ill_hcksum_capab_s { - uint_t ill_hcksum_version; /* interface version */ - uint_t ill_hcksum_txflags; /* capabilities on transmit */ -}; - -struct ill_zerocopy_capab_s { - uint_t ill_zerocopy_version; /* interface version */ - uint_t ill_zerocopy_flags; /* capabilities */ -}; - -struct ill_lso_capab_s { - uint_t ill_lso_on; /* on/off switch for LSO on this ILL */ - uint_t ill_lso_flags; /* capabilities */ - uint_t ill_lso_max; /* maximum size of payload */ -}; - /* * rr_ring_state cycles in the order shown below from RR_FREE through * RR_FREE_IN_PROG and back to RR_FREE. @@ -3669,18 +3565,61 @@ extern void ip_squeue_clean_ring(ill_t *, ill_rx_ring_t *); extern void ip_squeue_quiesce_ring(ill_t *, ill_rx_ring_t *); extern void ip_squeue_restart_ring(ill_t *, ill_rx_ring_t *); extern void ip_squeue_clean_all(ill_t *); +extern boolean_t ip_source_routed(ipha_t *, ip_stack_t *); extern void tcp_wput(queue_t *, mblk_t *); -extern int ip_fill_mtuinfo(struct in6_addr *, in_port_t, - struct ip6_mtuinfo *, netstack_t *); -extern ipif_t *conn_get_held_ipif(conn_t *, ipif_t **, int *); +extern int ip_fill_mtuinfo(conn_t *, ip_xmit_attr_t *, + struct ip6_mtuinfo *); extern hook_t *ipobs_register_hook(netstack_t *, pfv_t); extern void ipobs_unregister_hook(netstack_t *, hook_t *); extern void ipobs_hook(mblk_t *, int, zoneid_t, zoneid_t, const ill_t *, ip_stack_t *); typedef void (*ipsq_func_t)(ipsq_t *, queue_t *, mblk_t *, void *); +extern void dce_g_init(void); +extern void dce_g_destroy(void); +extern void dce_stack_init(ip_stack_t *); +extern void dce_stack_destroy(ip_stack_t *); +extern void dce_cleanup(uint_t, ip_stack_t *); +extern dce_t *dce_get_default(ip_stack_t *); +extern dce_t *dce_lookup_pkt(mblk_t *, ip_xmit_attr_t *, uint_t *); +extern dce_t *dce_lookup_v4(ipaddr_t, ip_stack_t *, uint_t *); +extern dce_t *dce_lookup_v6(const in6_addr_t *, uint_t, ip_stack_t *, + uint_t *); +extern dce_t *dce_lookup_and_add_v4(ipaddr_t, ip_stack_t *); +extern dce_t *dce_lookup_and_add_v6(const in6_addr_t *, uint_t, + ip_stack_t *); +extern int dce_update_uinfo_v4(ipaddr_t, iulp_t *, ip_stack_t *); +extern int dce_update_uinfo_v6(const in6_addr_t *, uint_t, iulp_t *, + ip_stack_t *); +extern int dce_update_uinfo(const in6_addr_t *, uint_t, iulp_t *, + ip_stack_t *); +extern void dce_increment_generation(dce_t *); +extern void dce_increment_all_generations(boolean_t, ip_stack_t *); +extern void dce_refrele(dce_t *); +extern void dce_refhold(dce_t *); +extern void dce_refrele_notr(dce_t *); +extern void dce_refhold_notr(dce_t *); +mblk_t *ip_snmp_get_mib2_ip_dce(queue_t *, mblk_t *, ip_stack_t *ipst); + +extern ip_laddr_t ip_laddr_verify_v4(ipaddr_t, zoneid_t, + ip_stack_t *, boolean_t); +extern ip_laddr_t ip_laddr_verify_v6(const in6_addr_t *, zoneid_t, + ip_stack_t *, boolean_t, uint_t); +extern int ip_laddr_fanout_insert(conn_t *); + +extern boolean_t ip_verify_src(mblk_t *, ip_xmit_attr_t *, uint_t *); +extern int ip_verify_ire(mblk_t *, ip_xmit_attr_t *); + +extern mblk_t *ip_xmit_attr_to_mblk(ip_xmit_attr_t *); +extern boolean_t ip_xmit_attr_from_mblk(mblk_t *, ip_xmit_attr_t *); +extern mblk_t *ip_xmit_attr_free_mblk(mblk_t *); +extern mblk_t *ip_recv_attr_to_mblk(ip_recv_attr_t *); +extern boolean_t ip_recv_attr_from_mblk(mblk_t *, ip_recv_attr_t *); +extern mblk_t *ip_recv_attr_free_mblk(mblk_t *); +extern boolean_t ip_recv_attr_is_mblk(mblk_t *); + /* * Squeue tags. Tags only need to be unique when the callback function is the * same to distinguish between different calls, but we use unique tags for @@ -3729,16 +3668,8 @@ typedef void (*ipsq_func_t)(ipsq_t *, queue_t *, mblk_t *, void *); #define SQTAG_CONNECT_FINISH 41 #define SQTAG_SYNCHRONOUS_OP 42 #define SQTAG_TCP_SHUTDOWN_OUTPUT 43 -#define SQTAG_XMIT_EARLY_RESET 44 - -#define NOT_OVER_IP(ip_wq) \ - (ip_wq->q_next != NULL || \ - (ip_wq->q_qinfo->qi_minfo->mi_idname) == NULL || \ - strcmp(ip_wq->q_qinfo->qi_minfo->mi_idname, \ - IP_MOD_NAME) != 0 || \ - ip_wq->q_qinfo->qi_minfo->mi_idnum != IP_MOD_ID) +#define SQTAG_TCP_IXA_CLEANUP 44 -#define PROTO_FLOW_CNTRLD(connp) (connp->conn_flow_cntrld) #endif /* _KERNEL */ #ifdef __cplusplus |