First, we need to load the data into a Vaex dataframe. We'll be working with the listings.csv file, which contains information on the Airbnb listings in New York City. Similar to import pandas as pd
, you can just use import vaex as vx
or just import vaex
because of the short name of Vaex.
import vaex
# Load the data into a Vaex dataframe
df = vaex.from_csv('./data/AB_NYC_2019.csv')
df.head()
# | id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2539 | Clean & quiet apt home by the park | 2787 | John | Brooklyn | Kensington | 40.6475 | -73.9724 | Private room | 149 | 1 | 9 | 2018-10-19 | 0.21 | 6 | 365 |
1 | 2595 | Skylit Midtown Castle | 2845 | Jennifer | Manhattan | Midtown | 40.7536 | -73.9838 | Entire home/apt | 225 | 1 | 45 | 2019-05-21 | 0.38 | 2 | 355 |
2 | 3647 | THE VILLAGE OF HARLEM....NEW YORK ! | 4632 | Elisabeth | Manhattan | Harlem | 40.809 | -73.9419 | Private room | 150 | 3 | 0 | -- | nan | 1 | 365 |
3 | 3831 | Cozy Entire Floor of Brownstone | 4869 | LisaRoxanne | Brooklyn | Clinton Hill | 40.6851 | -73.9598 | Entire home/apt | 89 | 1 | 270 | 2019-07-05 | 4.64 | 1 | 194 |
4 | 5022 | Entire Apt: Spacious Studio/Loft by central park | 7192 | Laura | Manhattan | East Harlem | 40.7985 | -73.944 | Entire home/apt | 80 | 10 | 9 | 2018-11-19 | 0.1 | 1 | 0 |
5 | 5099 | Large Cozy 1 BR Apartment In Midtown East | 7322 | Chris | Manhattan | Murray Hill | 40.7477 | -73.975 | Entire home/apt | 200 | 3 | 74 | 2019-06-22 | 0.59 | 1 | 129 |
6 | 5121 | BlissArtsSpace! | 7356 | Garon | Brooklyn | Bedford-Stuyvesant | 40.6869 | -73.956 | Private room | 60 | 45 | 49 | 2017-10-05 | 0.4 | 1 | 0 |
7 | 5178 | Large Furnished Room Near B'way | 8967 | Shunichi | Manhattan | Hell's Kitchen | 40.7649 | -73.9849 | Private room | 79 | 2 | 430 | 2019-06-24 | 3.47 | 1 | 220 |
8 | 5203 | Cozy Clean Guest Room - Family Apt | 7490 | MaryEllen | Manhattan | Upper West Side | 40.8018 | -73.9672 | Private room | 79 | 2 | 118 | 2017-07-21 | 0.99 | 1 | 0 |
9 | 5238 | Cute & Cozy Lower East Side 1 bdrm | 7549 | Ben | Manhattan | Chinatown | 40.7134 | -73.9904 | Entire home/apt | 150 | 1 | 160 | 2019-06-09 | 1.33 | 4 | 188 |
Different from pandas, this will display the first 9 rows of the dataframe rather than the first 5 rows. But we can specify the number of rows to display as pandas, for example the first 5 rows.
df.head(5)
# | id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2539 | Clean & quiet apt home by the park | 2787 | John | Brooklyn | Kensington | 40.6475 | -73.9724 | Private room | 149 | 1 | 9 | 2018-10-19 | 0.21 | 6 | 365 |
1 | 2595 | Skylit Midtown Castle | 2845 | Jennifer | Manhattan | Midtown | 40.7536 | -73.9838 | Entire home/apt | 225 | 1 | 45 | 2019-05-21 | 0.38 | 2 | 355 |
2 | 3647 | THE VILLAGE OF HARLEM....NEW YORK ! | 4632 | Elisabeth | Manhattan | Harlem | 40.809 | -73.9419 | Private room | 150 | 3 | 0 | -- | nan | 1 | 365 |
3 | 3831 | Cozy Entire Floor of Brownstone | 4869 | LisaRoxanne | Brooklyn | Clinton Hill | 40.6851 | -73.9598 | Entire home/apt | 89 | 1 | 270 | 2019-07-05 | 4.64 | 1 | 194 |
4 | 5022 | Entire Apt: Spacious Studio/Loft by central park | 7192 | Laura | Manhattan | East Harlem | 40.7985 | -73.944 | Entire home/apt | 80 | 10 | 9 | 2018-11-19 | 0.1 | 1 | 0 |
Similarly, we can also display the last few rows. Default, it shows the last 9 rows.
df.tail()
# | id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 36482809 | Stunning Bedroom NYC! Walking to Central Park!! | 131529729 | Kendall | Manhattan | East Harlem | 40.7963 | -73.936 | Private room | 75 | 2 | 0 | -- | nan | 2 | 353 |
1 | 36483010 | Comfy 1 Bedroom in Midtown East | 274311461 | Scott | Manhattan | Midtown | 40.7556 | -73.9672 | Entire home/apt | 200 | 6 | 0 | -- | nan | 1 | 176 |
2 | 36483152 | Garden Jewel Apartment in Williamsburg New York | 208514239 | Melki | Brooklyn | Williamsburg | 40.7123 | -73.9422 | Entire home/apt | 170 | 1 | 0 | -- | nan | 3 | 365 |
3 | 36484087 | Spacious Room w/ Private Rooftop, Central location | 274321313 | Kat | Manhattan | Hell's Kitchen | 40.7639 | -73.9918 | Private room | 125 | 4 | 0 | -- | nan | 1 | 31 |
4 | 36484363 | QUIT PRIVATE HOUSE | 107716952 | Michael | Queens | Jamaica | 40.6914 | -73.8084 | Private room | 65 | 1 | 0 | -- | nan | 2 | 163 |
5 | 36484665 | Charming one bedroom - newly renovated rowhouse | 8232441 | Sabrina | Brooklyn | Bedford-Stuyvesant | 40.6785 | -73.95 | Private room | 70 | 2 | 0 | -- | nan | 2 | 9 |
6 | 36485057 | Affordable room in Bushwick/East Williamsburg | 6570630 | Marisol | Brooklyn | Bushwick | 40.7018 | -73.9332 | Private room | 40 | 4 | 0 | -- | nan | 2 | 36 |
7 | 36485431 | Sunny Studio at Historical Neighborhood | 23492952 | Ilgar & Aysel | Manhattan | Harlem | 40.8147 | -73.9487 | Entire home/apt | 115 | 10 | 0 | -- | nan | 1 | 27 |
8 | 36485609 | 43rd St. Time Square-cozy single bed | 30985759 | Taz | Manhattan | Hell's Kitchen | 40.7575 | -73.9911 | Shared room | 55 | 1 | 0 | -- | nan | 6 | 2 |
9 | 36487245 | Trendy duplex in the very heart of Hell's Kitchen | 68119814 | Christophe | Manhattan | Hell's Kitchen | 40.764 | -73.9893 | Private room | 90 | 7 | 0 | -- | nan | 1 | 23 |
# display last 5 rows
df.tail(5)
# | id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 36484665 | Charming one bedroom - newly renovated rowhouse | 8232441 | Sabrina | Brooklyn | Bedford-Stuyvesant | 40.6785 | -73.95 | Private room | 70 | 2 | 0 | -- | nan | 2 | 9 |
1 | 36485057 | Affordable room in Bushwick/East Williamsburg | 6570630 | Marisol | Brooklyn | Bushwick | 40.7018 | -73.9332 | Private room | 40 | 4 | 0 | -- | nan | 2 | 36 |
2 | 36485431 | Sunny Studio at Historical Neighborhood | 23492952 | Ilgar & Aysel | Manhattan | Harlem | 40.8147 | -73.9487 | Entire home/apt | 115 | 10 | 0 | -- | nan | 1 | 27 |
3 | 36485609 | 43rd St. Time Square-cozy single bed | 30985759 | Taz | Manhattan | Hell's Kitchen | 40.7575 | -73.9911 | Shared room | 55 | 1 | 0 | -- | nan | 6 | 2 |
4 | 36487245 | Trendy duplex in the very heart of Hell's Kitchen | 68119814 | Christophe | Manhattan | Hell's Kitchen | 40.764 | -73.9893 | Private room | 90 | 7 | 0 | -- | nan | 1 | 23 |
We can easily view both the first and the last n elements of a DataFrame.
df.head_and_tail_print(5)
# | id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2539 | Clean & quiet apt home by the park | 2787 | John | Brooklyn | Kensington | 40.64749 | -73.97237 | Private room | 149 | 1 | 9 | 2018-10-19 | 0.21 | 6 | 365 |
1 | 2595 | Skylit Midtown Castle | 2845 | Jennifer | Manhattan | Midtown | 40.75362 | -73.98377 | Entire home/apt | 225 | 1 | 45 | 2019-05-21 | 0.38 | 2 | 355 |
2 | 3647 | THE VILLAGE OF HARLEM....NEW YORK ! | 4632 | Elisabeth | Manhattan | Harlem | 40.80902 | -73.9419 | Private room | 150 | 3 | 0 | -- | nan | 1 | 365 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48,893 | 36485609 | 43rd St. Time Square-cozy single bed | 30985759 | Taz | Manhattan | Hell's Kitchen | 40.75751 | -73.99112 | Shared room | 55 | 1 | 0 | -- | nan | 6 | 2 |
48,894 | 36487245 | Trendy duplex in the very heart of Hell's Kitchen | 68119814 | Christophe | Manhattan | Hell's Kitchen | 40.76404 | -73.98933 | Private room | 90 | 7 | 0 | -- | nan | 1 | 23 |
df.sample(7)
# | id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 9051314 | Private Room in Fab East Village | 22007415 | Hannah | Manhattan | East Village | 40.7225 | -73.9791 | Private room | 110 | 3 | 2 | 2016-06-12 | 0.05 | 1 | 0 |
1 | 23882240 | Large Studio in South Williamsburg | 10132406 | Michael | Brooklyn | Williamsburg | 40.7101 | -73.9659 | Entire home/apt | 150 | 3 | 2 | 2018-08-27 | 0.14 | 1 | 0 |
2 | 11803893 | Home 4 Medical Professionals-LIU | 26377263 | Stat | Brooklyn | Fort Greene | 40.6901 | -73.9806 | Private room | 54 | 30 | 0 | -- | nan | 43 | 361 |
3 | 23542115 | Cute private room in friendly shared living space | 4265419 | Mary | Brooklyn | Bedford-Stuyvesant | 40.6875 | -73.9465 | Private room | 50 | 1 | 0 | -- | nan | 1 | 0 |
4 | 4629359 | ENTIRE home - Modern, huge, sunny 2BD | 23974215 | Alex | Manhattan | Harlem | 40.808 | -73.9518 | Entire home/apt | 105 | 3 | 29 | 2019-03-20 | 0.52 | 2 | 0 |
5 | 19297819 | Clean&Simple (45 minutes to Manhattan) | 132341923 | Cynthia | Queens | Jamaica | 40.6887 | -73.7879 | Entire home/apt | 57 | 1 | 120 | 2019-07-03 | 5.37 | 2 | 56 |
6 | 1762852 | Charming and Cozy Bedroom in Artists Colony | 173997 | Beth | Brooklyn | Williamsburg | 40.7094 | -73.953 | Private room | 35 | 2 | 3 | 2017-06-24 | 0.12 | 2 | 0 |
We can display the shape of the DataFrame.
df.shape
(48895, 16)
Next, let's see what columns are in the dataframe using the column_names
attribute:
df.column_names
['id', 'name', 'host_id', 'host_name', 'neighbourhood_group', 'neighbourhood', 'latitude', 'longitude', 'room_type', 'price', 'minimum_nights', 'number_of_reviews', 'last_review', 'reviews_per_month', 'calculated_host_listings_count', 'availability_365']
We can also get some summary statistics for the dataframe using the describe()
method:
df.describe()
id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
data_type | int64 | string | int64 | string | string | string | float64 | float64 | string | int64 | int64 | int64 | string | float64 | int64 | int64 |
count | 48895 | 48879 | 48895 | 48874 | 48895 | 48895 | 48895 | 48895 | 48895 | 48895 | 48895 | 48895 | 38843 | 38843 | 48895 | 48895 |
NA | 0 | 16 | 0 | 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10052 | 10052 | 0 | 0 |
mean | 19017143.236179568 | -- | 67620010.64661008 | -- | -- | -- | 40.72894888066265 | -73.95216961468472 | -- | 152.7206871868289 | 7.029962163820431 | 23.274465691788528 | -- | 1.373221429858667 | 7.143982002249719 | 112.78132733408324 |
std | 10982996.07183 | -- | 78610163.153242 | -- | -- | -- | 0.05453 | 0.046156 | -- | 240.151714 | 20.51034 | 44.550127 | -- | 1.68042 | 32.952182 | 131.620943 |
min | 2539 | -- | 2438 | -- | -- | -- | 40.49979 | -74.24442 | -- | 0 | 1 | 0 | -- | 0.01 | 1 | 0 |
max | 36487245 | -- | 274321313 | -- | -- | -- | 40.91306 | -73.71299 | -- | 10000 | 1250 | 629 | -- | 58.5 | 327 | 365 |
This will display some basic statistics for each numerical column in the dataframe.
Now that we have an idea of what our data looks like, let's filter and select some specific data.
df[df.price < 200]
# | id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2539 | Clean & quiet apt home by the park | 2787 | John | Brooklyn | Kensington | 40.64749 | -73.97237 | Private room | 149 | 1 | 9 | 2018-10-19 | 0.21 | 6 | 365 |
1 | 3647 | THE VILLAGE OF HARLEM....NEW YORK ! | 4632 | Elisabeth | Manhattan | Harlem | 40.80902 | -73.9419 | Private room | 150 | 3 | 0 | -- | nan | 1 | 365 |
2 | 3831 | Cozy Entire Floor of Brownstone | 4869 | LisaRoxanne | Brooklyn | Clinton Hill | 40.68514 | -73.95976 | Entire home/apt | 89 | 1 | 270 | 2019-07-05 | 4.64 | 1 | 194 |
3 | 5022 | Entire Apt: Spacious Studio/Loft by central park | 7192 | Laura | Manhattan | East Harlem | 40.79851 | -73.94399 | Entire home/apt | 80 | 10 | 9 | 2018-11-19 | 0.1 | 1 | 0 |
4 | 5121 | BlissArtsSpace! | 7356 | Garon | Brooklyn | Bedford-Stuyvesant | 40.68688 | -73.95596 | Private room | 60 | 45 | 49 | 2017-10-05 | 0.4 | 1 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
39,105 | 36484665 | Charming one bedroom - newly renovated rowhouse | 8232441 | Sabrina | Brooklyn | Bedford-Stuyvesant | 40.67853 | -73.94995 | Private room | 70 | 2 | 0 | -- | nan | 2 | 9 |
39,106 | 36485057 | Affordable room in Bushwick/East Williamsburg | 6570630 | Marisol | Brooklyn | Bushwick | 40.70184 | -73.93317 | Private room | 40 | 4 | 0 | -- | nan | 2 | 36 |
39,107 | 36485431 | Sunny Studio at Historical Neighborhood | 23492952 | Ilgar & Aysel | Manhattan | Harlem | 40.81475 | -73.94867 | Entire home/apt | 115 | 10 | 0 | -- | nan | 1 | 27 |
39,108 | 36485609 | 43rd St. Time Square-cozy single bed | 30985759 | Taz | Manhattan | Hell's Kitchen | 40.75751 | -73.99112 | Shared room | 55 | 1 | 0 | -- | nan | 6 | 2 |
39,109 | 36487245 | Trendy duplex in the very heart of Hell's Kitchen | 68119814 | Christophe | Manhattan | Hell's Kitchen | 40.76404 | -73.98933 | Private room | 90 | 7 | 0 | -- | nan | 1 | 23 |
This will display only the rows where the price column is less than 200.
Rows are filtered based on two or more conditions.
df[(df.price > 5000)|(df.price < 30)]
# | id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 375249 | Enjoy Staten Island Hospitality | 1887999 | Rimma & Jim | Staten Island | Graniteville | 40.62109 | -74.16534 | Private room | 20 | 3 | 80 | 2019-05-26 | 0.92 | 1 | 226 |
1 | 1428154 | Central, Peaceful Semi-Private Room | 5912572 | Tangier | Brooklyn | Flatbush | 40.63899 | -73.95177 | Shared room | 29 | 2 | 5 | 2014-10-20 | 0.07 | 1 | 321 |
2 | 1620248 | Large furnished 2 bedrooms- - 30 days Minimum | 2196224 | Sally | Manhattan | East Village | 40.73051 | -73.9814 | Entire home/apt | 10 | 30 | 0 | -- | nan | 4 | 137 |
3 | 1767037 | Small Cozy Room Wifi & AC near JFK | 9284163 | Antonio | Queens | Woodhaven | 40.68968 | -73.85219 | Private room | 29 | 2 | 386 | 2019-06-19 | 5.53 | 3 | 50 |
4 | 2110145 | UWS 1BR w/backyard + block from CP | 2151325 | Jay And Liz | Manhattan | Upper West Side | 40.77782 | -73.97848 | Entire home/apt | 6000 | 14 | 17 | 2015-02-17 | 0.27 | 1 | 359 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
419 | 36280646 | Cable and wfi, L/G included. | 272872092 | Chris | Queens | Forest Hills | 40.73657 | -73.85088 | Entire home/apt | 16 | 9 | 1 | 2019-07-07 | 1.0 | 1 | 322 |
420 | 36354776 | Cozy bedroom in diverse neighborhood near JFK | 273393150 | Liza | Queens | Richmond Hill | 40.68639 | -73.81847 | Private room | 28 | 2 | 0 | -- | nan | 1 | 24 |
421 | 36450814 | FLATBUSH HANG OUT AND GO | 267223765 | Jarmel | Brooklyn | Flatbush | 40.64922 | -73.96078 | Shared room | 20 | 1 | 0 | -- | nan | 3 | 363 |
422 | 36473044 | The place you were dreaming for.(only for guys) | 261338177 | Diana | Brooklyn | Gravesend | 40.5908 | -73.97116 | Shared room | 25 | 1 | 0 | -- | nan | 6 | 338 |
423 | 36473253 | Heaven for you(only for guy) | 261338177 | Diana | Brooklyn | Gravesend | 40.59118 | -73.97119 | Shared room | 25 | 7 | 0 | -- | nan | 6 | 365 |
df_filter = df.filter((df.price > 5000)|(df.price < 30))
df_filter
# | id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 375249 | Enjoy Staten Island Hospitality | 1887999 | Rimma & Jim | Staten Island | Graniteville | 40.62109 | -74.16534 | Private room | 20 | 3 | 80 | 2019-05-26 | 0.92 | 1 | 226 |
1 | 1428154 | Central, Peaceful Semi-Private Room | 5912572 | Tangier | Brooklyn | Flatbush | 40.63899 | -73.95177 | Shared room | 29 | 2 | 5 | 2014-10-20 | 0.07 | 1 | 321 |
2 | 1620248 | Large furnished 2 bedrooms- - 30 days Minimum | 2196224 | Sally | Manhattan | East Village | 40.73051 | -73.9814 | Entire home/apt | 10 | 30 | 0 | -- | nan | 4 | 137 |
3 | 1767037 | Small Cozy Room Wifi & AC near JFK | 9284163 | Antonio | Queens | Woodhaven | 40.68968 | -73.85219 | Private room | 29 | 2 | 386 | 2019-06-19 | 5.53 | 3 | 50 |
4 | 2110145 | UWS 1BR w/backyard + block from CP | 2151325 | Jay And Liz | Manhattan | Upper West Side | 40.77782 | -73.97848 | Entire home/apt | 6000 | 14 | 17 | 2015-02-17 | 0.27 | 1 | 359 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
419 | 36280646 | Cable and wfi, L/G included. | 272872092 | Chris | Queens | Forest Hills | 40.73657 | -73.85088 | Entire home/apt | 16 | 9 | 1 | 2019-07-07 | 1.0 | 1 | 322 |
420 | 36354776 | Cozy bedroom in diverse neighborhood near JFK | 273393150 | Liza | Queens | Richmond Hill | 40.68639 | -73.81847 | Private room | 28 | 2 | 0 | -- | nan | 1 | 24 |
421 | 36450814 | FLATBUSH HANG OUT AND GO | 267223765 | Jarmel | Brooklyn | Flatbush | 40.64922 | -73.96078 | Shared room | 20 | 1 | 0 | -- | nan | 3 | 363 |
422 | 36473044 | The place you were dreaming for.(only for guys) | 261338177 | Diana | Brooklyn | Gravesend | 40.5908 | -73.97116 | Shared room | 25 | 1 | 0 | -- | nan | 6 | 338 |
423 | 36473253 | Heaven for you(only for guy) | 261338177 | Diana | Brooklyn | Gravesend | 40.59118 | -73.97119 | Shared room | 25 | 7 | 0 | -- | nan | 6 | 365 |
df.host_name
Expression = host_name Length: 48,895 dtype: string (column) ------------------------------------- 0 John 1 Jennifer 2 Elisabeth 3 LisaRoxanne 4 Laura ... 48890 Sabrina 48891 Marisol 48892 Ilgar & Aysel 48893 Taz 48894 Christophe
df['host_name']
Expression = host_name Length: 48,895 dtype: string (column) ------------------------------------- 0 John 1 Jennifer 2 Elisabeth 3 LisaRoxanne 4 Laura ... 48890 Sabrina 48891 Marisol 48892 Ilgar & Aysel 48893 Taz 48894 Christophe
df.col.host_name
Expression = host_name Length: 48,895 dtype: string (column) ------------------------------------- 0 John 1 Jennifer 2 Elisabeth 3 LisaRoxanne 4 Laura ... 48890 Sabrina 48891 Marisol 48892 Ilgar & Aysel 48893 Taz 48894 Christophe
The results are no differences.
We can also select specific columns using the df[['column1', 'column2']]
syntax:
df[['id', 'name', 'host_name']]
# | id | name | host_name |
---|---|---|---|
0 | 2539 | Clean & quiet apt home by the park | John |
1 | 2595 | Skylit Midtown Castle | Jennifer |
2 | 3647 | THE VILLAGE OF HARLEM....NEW YORK ! | Elisabeth |
3 | 3831 | Cozy Entire Floor of Brownstone | LisaRoxanne |
4 | 5022 | Entire Apt: Spacious Studio/Loft by central park | Laura |
... | ... | ... | ... |
48,890 | 36484665 | Charming one bedroom - newly renovated rowhouse | Sabrina |
48,891 | 36485057 | Affordable room in Bushwick/East Williamsburg | Marisol |
48,892 | 36485431 | Sunny Studio at Historical Neighborhood | Ilgar & Aysel |
48,893 | 36485609 | 43rd St. Time Square-cozy single bed | Taz |
48,894 | 36487245 | Trendy duplex in the very heart of Hell's Kitchen | Christophe |
This will display only the 'id', 'name', and 'host_name' columns of the DataFrame.
Select columns by drop one or more columns.
df_drop = df.drop(['name','host_name'])
df_drop
# | id | host_id | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2539 | 2787 | Brooklyn | Kensington | 40.64749 | -73.97237 | Private room | 149 | 1 | 9 | 2018-10-19 | 0.21 | 6 | 365 |
1 | 2595 | 2845 | Manhattan | Midtown | 40.75362 | -73.98377 | Entire home/apt | 225 | 1 | 45 | 2019-05-21 | 0.38 | 2 | 355 |
2 | 3647 | 4632 | Manhattan | Harlem | 40.80902 | -73.9419 | Private room | 150 | 3 | 0 | -- | nan | 1 | 365 |
3 | 3831 | 4869 | Brooklyn | Clinton Hill | 40.68514 | -73.95976 | Entire home/apt | 89 | 1 | 270 | 2019-07-05 | 4.64 | 1 | 194 |
4 | 5022 | 7192 | Manhattan | East Harlem | 40.79851 | -73.94399 | Entire home/apt | 80 | 10 | 9 | 2018-11-19 | 0.1 | 1 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
48,890 | 36484665 | 8232441 | Brooklyn | Bedford-Stuyvesant | 40.67853 | -73.94995 | Private room | 70 | 2 | 0 | -- | nan | 2 | 9 |
48,891 | 36485057 | 6570630 | Brooklyn | Bushwick | 40.70184 | -73.93317 | Private room | 40 | 4 | 0 | -- | nan | 2 | 36 |
48,892 | 36485431 | 23492952 | Manhattan | Harlem | 40.81475 | -73.94867 | Entire home/apt | 115 | 10 | 0 | -- | nan | 1 | 27 |
48,893 | 36485609 | 30985759 | Manhattan | Hell's Kitchen | 40.75751 | -73.99112 | Shared room | 55 | 1 | 0 | -- | nan | 6 | 2 |
48,894 | 36487245 | 68119814 | Manhattan | Hell's Kitchen | 40.76404 | -73.98933 | Private room | 90 | 7 | 0 | -- | nan | 1 | 23 |
Besides head
, tail
, and filter methods to filter certain rows, we can easily select rows by row indexes. For example, we select the rows from index 3 to index 10.
df[3:10]
# | id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3831 | Cozy Entire Floor of Brownstone | 4869 | LisaRoxanne | Brooklyn | Clinton Hill | 40.6851 | -73.9598 | Entire home/apt | 89 | 1 | 270 | 2019-07-05 | 4.64 | 1 | 194 |
1 | 5022 | Entire Apt: Spacious Studio/Loft by central park | 7192 | Laura | Manhattan | East Harlem | 40.7985 | -73.944 | Entire home/apt | 80 | 10 | 9 | 2018-11-19 | 0.1 | 1 | 0 |
2 | 5099 | Large Cozy 1 BR Apartment In Midtown East | 7322 | Chris | Manhattan | Murray Hill | 40.7477 | -73.975 | Entire home/apt | 200 | 3 | 74 | 2019-06-22 | 0.59 | 1 | 129 |
3 | 5121 | BlissArtsSpace! | 7356 | Garon | Brooklyn | Bedford-Stuyvesant | 40.6869 | -73.956 | Private room | 60 | 45 | 49 | 2017-10-05 | 0.4 | 1 | 0 |
4 | 5178 | Large Furnished Room Near B'way | 8967 | Shunichi | Manhattan | Hell's Kitchen | 40.7649 | -73.9849 | Private room | 79 | 2 | 430 | 2019-06-24 | 3.47 | 1 | 220 |
5 | 5203 | Cozy Clean Guest Room - Family Apt | 7490 | MaryEllen | Manhattan | Upper West Side | 40.8018 | -73.9672 | Private room | 79 | 2 | 118 | 2017-07-21 | 0.99 | 1 | 0 |
6 | 5238 | Cute & Cozy Lower East Side 1 bdrm | 7549 | Ben | Manhattan | Chinatown | 40.7134 | -73.9904 | Entire home/apt | 150 | 1 | 160 | 2019-06-09 | 1.33 | 4 | 188 |
We can easily slice DataFrame into sub DataFrame by row and column indexes.
df_slice = df[0:15,1:3]
df_slice
# | name | host_id |
---|---|---|
0 | Clean & quiet apt home by the park | 2787 |
1 | Skylit Midtown Castle | 2845 |
2 | THE VILLAGE OF HARLEM....NEW YORK ! | 4632 |
3 | Cozy Entire Floor of Brownstone | 4869 |
4 | Entire Apt: Spacious Studio/Loft by central park | 7192 |
... | ... | ... |
10 | Beautiful 1br on Upper West Side | 7702 |
11 | Central Manhattan/near Broadway | 7989 |
12 | Lovely Room 1, Garden, Best Area, Legal rental | 9744 |
13 | Wonderful Guest Bedroom in Manhattan for SINGLES | 11528 |
14 | West Village Nest - Superhost | 11975 |
We can also aggregate our data to get some summary statistics. For example, let's say we want to find the average price of a listing by neighborhood. We can use the groupby()
method to group our data by neighborhood and then calculate the mean of the price column.
grouped_data = df.groupby(by='neighbourhood').agg({'price':'mean'})
grouped_data
# | neighbourhood | price |
---|---|---|
0 | Rosebank | 111.85714285714286 |
1 | Rossville | 75.0 |
2 | Rego Park | 83.87735849056604 |
3 | Melrose | 83.3 |
4 | St. Albans | 100.82894736842105 |
... | ... | ... |
216 | Washington Heights | 89.6106785317019 |
217 | Wakefield | 85.58 |
218 | Edgemere | 94.72727272727273 |
219 | Unionport | 137.14285714285714 |
220 | Fort Wadsworth | 800.0 |
This will display the average price of a listing for each neighborhood in the dataframe.
It can also easily calculate more summary statistics indexes for one column.
df.groupby(by='neighbourhood').agg({'price': 'mean',
'number_of_reviews': ['sum', 'std']})
# | neighbourhood | price | number_of_reviews_sum | number_of_reviews_std |
---|---|---|---|---|
0 | Rossville | 75.0 | 21 | 0.0 |
1 | Rosebank | 111.85714285714286 | 215 | 30.183790757454503 |
2 | Rego Park | 83.87735849056604 | 2754 | 35.24950986734733 |
3 | Melrose | 83.3 | 90 | 16.852299546352718 |
4 | St. Albans | 100.82894736842105 | 2584 | 37.90570016848598 |
... | ... | ... | ... | ... |
216 | Washington Heights | 89.6106785317019 | 17161 | 36.70231510883223 |
217 | Wakefield | 85.58 | 1279 | 42.909248420358054 |
218 | Unionport | 137.14285714285714 | 104 | 22.86874705436952 |
219 | Edgemere | 94.72727272727273 | 113 | 21.447995918631182 |
220 | Fort Wadsworth | 800.0 | 0 | 0.0 |
Finally, we can also visualize our data using Matplotlib and Vaex's built-in plotting functions.
We can use Matplotlib to create different plots. In this example, we selected the 'price' columns and create a simple line plot with Matplotlib.
import matplotlib.pyplot as plt
plt.plot(df.price)
plt.show()
For example, let's say we want to create a scatter plot of the latitude and longitude columns to see the locations of the listings. We can use viz.heatmap
.
df.viz.heatmap(df.latitude, df.longitude, what=vaex.stat.count(), f='log1p', colormap='plasma')
<matplotlib.image.AxesImage at 0x19411c7a050>
In this Vaex tutorial, we explored the New York City Airbnb Open Data to demonstrate how Vaex can be used to analyze and visualize real-world data. We loaded the data into a Vaex dataframe, explored the data using various methods such as head()
, describe()
, and column_names
, filtered and selected specific data using []
and [['column1', 'column2']]
syntax, aggregated data using groupby()
, and visualized the data using Matplotlib and df.viz.heatmap()
.
Vaex provides a fast and memory-efficient way to work with large datasets, making it an excellent tool for data analysis and visualization. With its simple syntax and built-in plotting functions, Vaex is a great choice for both beginners and experienced data scientists.