curl_url_get.md (7300B)
1 --- 2 c: Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al. 3 SPDX-License-Identifier: curl 4 Title: curl_url_get 5 Section: 3 6 Source: libcurl 7 See-also: 8 - CURLOPT_CURLU (3) 9 - curl_url (3) 10 - curl_url_cleanup (3) 11 - curl_url_dup (3) 12 - curl_url_set (3) 13 - curl_url_strerror (3) 14 Protocol: 15 - All 16 Added-in: 7.62.0 17 --- 18 19 # NAME 20 21 curl_url_get - extract a part from a URL 22 23 # SYNOPSIS 24 25 ~~~c 26 #include <curl/curl.h> 27 28 CURLUcode curl_url_get(const CURLU *url, 29 CURLUPart part, 30 char **content, 31 unsigned int flags); 32 ~~~ 33 34 # DESCRIPTION 35 36 Given a *url* handle of a URL object, this function extracts an individual 37 piece or the full URL from it. 38 39 The *part* argument specifies which part to extract (see list below) and 40 *content* points to a 'char *' to get updated to point to a newly 41 allocated string with the contents. 42 43 The *flags* argument is a bitmask with individual features. 44 45 The returned content pointer must be freed with curl_free(3) after use. 46 47 # FLAGS 48 49 The flags argument is zero, one or more bits set in a bitmask. 50 51 ## CURLU_DEFAULT_PORT 52 53 If the handle has no port stored, this option makes curl_url_get(3) 54 return the default port for the used scheme. 55 56 ## CURLU_DEFAULT_SCHEME 57 58 If the handle has no scheme stored, this option makes curl_url_get(3) 59 return the default scheme instead of error. 60 61 ## CURLU_NO_DEFAULT_PORT 62 63 Instructs curl_url_get(3) to not return a port number if it matches the 64 default port for the scheme. 65 66 ## CURLU_URLDECODE 67 68 Asks curl_url_get(3) to URL decode the contents before returning it. It 69 does not decode the scheme, the port number or the full URL. 70 71 The query component also gets plus-to-space conversion as a bonus when this 72 bit is set. 73 74 Note that this URL decoding is charset unaware and you get a null-terminated 75 string back with data that could be intended for a particular encoding. 76 77 If there are byte values lower than 32 in the decoded string, the get 78 operation returns an error instead. 79 80 ## CURLU_URLENCODE 81 82 If set, curl_url_get(3) URL encodes the hostname part when a full URL is 83 retrieved. If not set (default), libcurl returns the URL with the hostname raw 84 to support IDN names to appear as-is. IDN hostnames are typically using 85 non-ASCII bytes that otherwise gets percent-encoded. 86 87 Note that even when not asking for URL encoding, the '%' (byte 37) is URL 88 encoded to make sure the hostname remains valid. 89 90 ## CURLU_PUNYCODE 91 92 If set and *CURLU_URLENCODE* is not set, and asked to retrieve the 93 **CURLUPART_HOST** or **CURLUPART_URL** parts, libcurl returns the host 94 name in its punycode version if it contains any non-ASCII octets (and is an 95 IDN name). 96 97 If libcurl is built without IDN capabilities, using this bit makes 98 curl_url_get(3) return *CURLUE_LACKS_IDN* if the hostname contains 99 anything outside the ASCII range. 100 101 (Added in curl 7.88.0) 102 103 ## CURLU_PUNY2IDN 104 105 If set and asked to retrieve the **CURLUPART_HOST** or **CURLUPART_URL** 106 parts, libcurl returns the hostname in its IDN (International Domain Name) 107 UTF-8 version if it otherwise is a punycode version. If the punycode name 108 cannot be converted to IDN correctly, libcurl returns 109 *CURLUE_BAD_HOSTNAME*. 110 111 If libcurl is built without IDN capabilities, using this bit makes 112 curl_url_get(3) return *CURLUE_LACKS_IDN* if the hostname is using 113 punycode. 114 115 (Added in curl 8.3.0) 116 117 ## CURLU_GET_EMPTY 118 119 When this flag is used in curl_url_get(), it makes the function return empty 120 query and fragments parts or when used in the full URL. By default, libcurl 121 otherwise considers empty parts non-existing. 122 123 An empty query part is one where this is nothing following the question mark 124 (before the possible fragment). An empty fragments part is one where there is 125 nothing following the hash sign. 126 127 (Added in curl 8.8.0) 128 129 ## CURLU_NO_GUESS_SCHEME 130 131 When this flag is used in curl_url_get(), it treats the scheme as non-existing 132 if it was set as a result of a previous guess; when CURLU_GUESS_SCHEME was 133 used parsing a URL. 134 135 Using this flag when getting CURLUPART_SCHEME if the scheme was set as the 136 result of a guess makes curl_url_get() return CURLUE_NO_SCHEME. 137 138 Using this flag when getting CURLUPART_URL if the scheme was set as the result 139 of a guess makes curl_url_get() return the full URL without the scheme 140 component. Such a URL can then only be parsed with curl_url_set() if 141 CURLU_GUESS_SCHEME is used. 142 143 (Added in curl 8.9.0) 144 145 # PARTS 146 147 ## CURLUPART_URL 148 149 When asked to return the full URL, curl_url_get(3) returns a slightly cleaned 150 up version of the complete URL using all available parts. 151 152 We advise using the *CURLU_PUNYCODE* option to get the URL as "normalized" as 153 possible since IDN allows hostnames to be written in many different ways that 154 still end up the same punycode version. 155 156 Zero-length queries and fragments are excluded from the URL unless 157 CURLU_GET_EMPTY is set. 158 159 ## CURLUPART_SCHEME 160 161 Scheme cannot be URL decoded on get. 162 163 ## CURLUPART_USER 164 165 ## CURLUPART_PASSWORD 166 167 ## CURLUPART_OPTIONS 168 169 The options field is an optional field that might follow the password in the 170 userinfo part. It is only recognized/used when parsing URLs for the following 171 schemes: pop3, smtp and imap. The URL API still allows users to set and get 172 this field independently of scheme when not parsing full URLs. 173 174 ## CURLUPART_HOST 175 176 The hostname. If it is an IPv6 numeric address, the zone id is not part of it 177 but is provided separately in *CURLUPART_ZONEID*. IPv6 numerical addresses 178 are returned within brackets ([]). 179 180 IPv6 names are normalized when set, which should make them as short as 181 possible while maintaining correct syntax. 182 183 ## CURLUPART_ZONEID 184 185 If the hostname is a numeric IPv6 address, this field might also be set. 186 187 ## CURLUPART_PORT 188 189 A port cannot be URL decoded on get. This number is returned in a string just 190 like all other parts. That string is guaranteed to hold a valid port number in 191 ASCII using base 10. 192 193 ## CURLUPART_PATH 194 195 The *part* is always at least a slash ('/') even if no path was supplied 196 in the URL. A URL path always starts with a slash. 197 198 ## CURLUPART_QUERY 199 200 The initial question mark that denotes the beginning of the query part is a 201 delimiter only. It is not part of the query contents. 202 203 A not-present query returns *part* set to NULL. 204 205 A zero-length query returns *part* as NULL unless CURLU_GET_EMPTY is set. 206 207 The query part gets pluses converted to space when asked to URL decode on get 208 with the CURLU_URLDECODE bit. 209 210 ## CURLUPART_FRAGMENT 211 212 The initial hash sign that denotes the beginning of the fragment is a 213 delimiter only. It is not part of the fragment contents. 214 215 A not-present fragment returns *part* set to NULL. 216 217 A zero-length fragment returns *part* as NULL unless CURLU_GET_EMPTY is set. 218 219 # %PROTOCOLS% 220 221 # EXAMPLE 222 223 ~~~c 224 int main(void) 225 { 226 CURLUcode rc; 227 CURLU *url = curl_url(); 228 rc = curl_url_set(url, CURLUPART_URL, "https://example.com", 0); 229 if(!rc) { 230 char *scheme; 231 rc = curl_url_get(url, CURLUPART_SCHEME, &scheme, 0); 232 if(!rc) { 233 printf("the scheme is %s\n", scheme); 234 curl_free(scheme); 235 } 236 curl_url_cleanup(url); 237 } 238 } 239 ~~~ 240 241 # %AVAILABILITY% 242 243 # RETURN VALUE 244 245 Returns a CURLUcode error value, which is CURLUE_OK (0) if everything went 246 fine. See the libcurl-errors(3) man page for the full list with descriptions. 247 248 If this function returns an error, no URL part is returned.